Wednesday, March 28, 2012

Long running Query

Hi,

I'm trying to optimize a long running (several hours) query. This query is a cross join on two tables. Table 1 has 3 fields - ROWID, LAt and Long. Table 2 has Name, Addr1,Addr2,City,State,Zip,Lat,Long.

Both tables has LatRad - Lat in radians, LonRad- Lon in Radians. Sin and Cos values of Lat and Lon are calulated and stored to be used in the distance formula.

What I'm trying to do here is find the nearest dealer (Table 2) for each of Table 1 rows. The Select statement takes a long time to execute as there are about 19 million recs on table 1 and 1250 rows in table 2. I ran into Log issues- filling the transaction log, so I'm currently using table variables and split up the process into 100000 recs at a time. I cross join and calculate the distance (@.DistValues) and then find the minimum distance (tablevar2) for each rowid and then the result is inserted into another Table (ResultTable).

My Code looks like this:

Declare @.DistValues table (DataSeqno varchar(10),Lat2 numeric(20,6),Lon2 numeric(20,6),StoreNo varchar(60), Lat1 numeric(20,6),Long1 numeric(20,6),Distance numeric(20,15))

Declare @.MinDistance table (DataSeqno varchar(10) primary key,distance numeric(20,15))

Insert into @.DistValues

Select DataSeqno,T.Lat Lat2,T.Lon Lon2,S.StoreNo,S.Lat Lat1,S.Long Long1,

distance=3963.1*Case when cast(S.SinLat * T.SinLat + S.CosLat * T.cosLat * cos(T.Lonrad - s.Lonrad) as numeric(20,15)) not between -1.0 and 1.0 then 0.0 else acos(cast(S.SinLat * T.SinLat + S.CosLat * T.cosLat * cos(T.Lonrad - s.Lonrad) as numeric(20,15))) end

from dbo.TopNForProcess T , dbo.Table2 S where Isnull(T.Lat,0) <> 0 and Isnull(T.Lon,0)<> 0

Insert into @.MinDistance

Select DataSeqno,Min(distance) From @.DistValues Group by DataSeqno

Insert into ResultTable (DataSeqno,Lat2,Lon2,StoreNo,LAt1,Long1,distance)

Select D.DataSeqno, D.Lat2, D.Lon2, D.StoreNo, D.LAt1, D.Long1, M.distance from @.DistValues D Inner Join @.MinDistance M on D.DataSeqno = M.DataSeqno and D.Distance = M.Distance

I created a View called TopNForProcess which looks like this. This cut down the processing time compared to when I had this as the Subquery.

SELECT TOP (100000) DataSeqno, lat, Lon, LatRad, LonRad, SinLat, cosLat, SinLon, CosLon FROM Table1 WHERE (DataSeqno NOT IN (SELECT DataSeqno FROM dbo.ResultTable)) AND (ISNULL(lat, 0) <> 0) AND (ISNULL(Lon, 0) <> 0)

I have indexes on table table1 - Rowid and another one with Lat and lon. Table2 - Lat and Long.

Is there any way this can be optimized/improved? This is already in a stored procedure.

Thanks in advance.

Hello

I just wrote a longer post but i deleted it by pressing the wrong mouse key :(

Anyway.. Tips to optimize - short edition now...

1. Avoid a FULL cross join... If someone is 180° from your store, then i BET he wont be a best match. Limit the stores to the max range you expect + some room for error (depending on how good you can guess) Example:

where Store.Lang between cust.lang - 10 and cust.lang + 10

This is simplified, since i dont take the rollovers at 180° into account (Lang gues from -180 to + 180 right? I allways mix them up... Even in my native language)

If you really want to make it optimal, then you could analyse the store table 1st and determine the "closest" lang and long for each store... So you dont use the "worst case" range for all stores... I am sure in NYC there will be a shorter range then in "Nowhere",Tx

2. Avoid a function on the LEFT side of the where clause! So NO whatever(store.lang) = Bla Bla... This will prohibit the use of an Index...

3. Do you need ALL data NOW or only some data on request? (Like for 1 Custumer?) If its only some on request. Consider using a calculated collumn... But this collumn cant be indexed since the function would point to an external resource. Another thing is how often do stores open/close? If its often then the function becomes more attractive...

|||Use the Sql server 2005 CLR integration|||

Thanks. I do have a parameter that would limit the range and I used only 1 deg and even then it took hours. All the customers had to be calculated at one time and we're planning to schedule this as a job.

|||

joeydj wrote:

Use the Sql server 2005 CLR integration

We are yet to try this. Will sure try this one.Thanks.

|||

Hello

Try to precalculate a "cover Area" for each store. Define this area by 4 Numbers. Max and min long/lat. Store this information in each store record.

This cover area should only take other stores to calculate. Then define 2 new Indexes on those fields. Each consisting max and min of the cover area (for each long and lat)

Now you will be able to reduce the amount of hits to a few per custumer. From those you can calculate the exact distance and extract the closest one. Thats still a lot of processing of geometry functions, but it should be a lot faster. You might also think about simplifying the formulat you use to a^2+b^2=c^2 (There is also no need to calculate the root from c in order get the closest value of c)

Yes... Its kinda cheating but the error you would get here ~0... Its at least so small that the custumer in the same region wont be able to tell the difference ;)

|||Thanks. We decided to follow this route as we just found out now that our current legacy program was doing this. thanks for your inputs.

No comments:

Post a Comment