Wednesday, March 28, 2012

long running query

Using SS2000 SP4. I have a query that has been running for 17 hours now and
I'm wondering if it is doing anything. The transaction log for the database
has the same time stamp on it that it had yesterday before I started running
the query and the file size is the same. Shouldn't something have been
written to the transaction logs by now?
There is nothing else running on the server of any consequence. There is
disk activity (reads but no writes ) on the disk where the database is. There
are no reads or writes on the disk where the transaction logs are located.
The processor is at 2%. There is no blocking going on.
Here's the query. I have indexes on all columns in the join and where
clause. There are about 11M records in the table. There is no problem with
free disk space. The tempdb and logs also have a time stamp before I started
the query.
UPDATE tblwf3
SET BRANCH = a.branch_id
from dbo.tblBranchZip a, tblwf3 b
WHERE a.zip = b.zip5
and a.branch_id < '2900'
and a.Franchise is null
and a.closed is null
and a.commercial is null
and a.fume is null
--and b.branch is null
go
Any idea what might be going on? Thanks.
--
Dan D.How about some nasty cartesian product? joining on a zip field (assuming
zip code) would likely result in billions or possibly trillions of joins
between the tables.
Do this. Take the update statement and paste it into another QA window.
Hit CTRL-L to have the Estimated Query Plan displayed. Place the cursor
over the connecting lines in the plan and note the estimated row count.
Also check the left most icon for total estimated rows and the estimated
query cost. I think you may be surprised by what you see.
Also, note that your join is not ANSI compliant and will not be supported by
sql server at some point in the future. Start working to rewrite stuff like
that into this format:
UPDATE tblwf3
SET BRANCH = a.branch_id
from dbo.tblBranchZip a inner join tblwf3 b ON a.zip = b.zip5
WHERE a.branch_id < '2900'
and a.Franchise is null
and a.closed is null
and a.commercial is null
and a.fume is null
TheSQLGuru
President
Indicium Resources, Inc.
"Dan D." <DanD@.discussions.microsoft.com> wrote in message
news:EAFCE68D-33C7-48BC-A16A-AA42821FBB17@.microsoft.com...
> Using SS2000 SP4. I have a query that has been running for 17 hours now
> and
> I'm wondering if it is doing anything. The transaction log for the
> database
> has the same time stamp on it that it had yesterday before I started
> running
> the query and the file size is the same. Shouldn't something have been
> written to the transaction logs by now?
> There is nothing else running on the server of any consequence. There is
> disk activity (reads but no writes ) on the disk where the database is.
> There
> are no reads or writes on the disk where the transaction logs are located.
> The processor is at 2%. There is no blocking going on.
> Here's the query. I have indexes on all columns in the join and where
> clause. There are about 11M records in the table. There is no problem with
> free disk space. The tempdb and logs also have a time stamp before I
> started
> the query.
> UPDATE tblwf3
> SET BRANCH = a.branch_id
> from dbo.tblBranchZip a, tblwf3 b
> WHERE a.zip = b.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> --and b.branch is null
> go
> Any idea what might be going on? Thanks.
> --
> Dan D.|||As usual, TheSQLGuru is spot on, although I am not sure whether the SQL
(s)he posted is actually standard.
The version below is, and because it (also) eliminates the implicit
cross join it should run in a reasonable amount of time.
UPDATE tblwf3
SET BRANCH = (
SELECT a.branch_id
from dbo.tblBranchZip a
WHERE a.zip = tblwf3.zip5
and a.branch_id < '2900'
and a.Franchise is null
and a.closed is null
and a.commercial is null
and a.fume is null
)
--where b.branch is null
HTH,
Gert-Jan
Dan D. wrote:
> Using SS2000 SP4. I have a query that has been running for 17 hours now and
> I'm wondering if it is doing anything. The transaction log for the database
> has the same time stamp on it that it had yesterday before I started running
> the query and the file size is the same. Shouldn't something have been
> written to the transaction logs by now?
> There is nothing else running on the server of any consequence. There is
> disk activity (reads but no writes ) on the disk where the database is. There
> are no reads or writes on the disk where the transaction logs are located.
> The processor is at 2%. There is no blocking going on.
> Here's the query. I have indexes on all columns in the join and where
> clause. There are about 11M records in the table. There is no problem with
> free disk space. The tempdb and logs also have a time stamp before I started
> the query.
> UPDATE tblwf3
> SET BRANCH = a.branch_id
> from dbo.tblBranchZip a, tblwf3 b
> WHERE a.zip = b.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> --and b.branch is null
> go
> Any idea what might be going on? Thanks.
> --
> Dan D.|||I had checked the execution plan and didn't get any large numbers that seemed
out of the ordinary. On the lower level the first step uses a clustered index
and gives a row count 0f 504. The step on the level above it has another
clustered index and gives a row count of 24,000+. Then there is a nested loop
and parallelism steps that give row counts of 11M but since I know that it's
updating every row in the table I think that is to be expected. The left most
icon shows 11M rows.
Thanks,
--
Dan D.
"TheSQLGuru" wrote:
> How about some nasty cartesian product? joining on a zip field (assuming
> zip code) would likely result in billions or possibly trillions of joins
> between the tables.
> Do this. Take the update statement and paste it into another QA window.
> Hit CTRL-L to have the Estimated Query Plan displayed. Place the cursor
> over the connecting lines in the plan and note the estimated row count.
> Also check the left most icon for total estimated rows and the estimated
> query cost. I think you may be surprised by what you see.
> Also, note that your join is not ANSI compliant and will not be supported by
> sql server at some point in the future. Start working to rewrite stuff like
> that into this format:
> UPDATE tblwf3
> SET BRANCH = a.branch_id
> from dbo.tblBranchZip a inner join tblwf3 b ON a.zip = b.zip5
> WHERE a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
>
> --
> TheSQLGuru
> President
> Indicium Resources, Inc.
> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
> news:EAFCE68D-33C7-48BC-A16A-AA42821FBB17@.microsoft.com...
> > Using SS2000 SP4. I have a query that has been running for 17 hours now
> > and
> > I'm wondering if it is doing anything. The transaction log for the
> > database
> > has the same time stamp on it that it had yesterday before I started
> > running
> > the query and the file size is the same. Shouldn't something have been
> > written to the transaction logs by now?
> >
> > There is nothing else running on the server of any consequence. There is
> > disk activity (reads but no writes ) on the disk where the database is.
> > There
> > are no reads or writes on the disk where the transaction logs are located.
> > The processor is at 2%. There is no blocking going on.
> >
> > Here's the query. I have indexes on all columns in the join and where
> > clause. There are about 11M records in the table. There is no problem with
> > free disk space. The tempdb and logs also have a time stamp before I
> > started
> > the query.
> >
> > UPDATE tblwf3
> > SET BRANCH = a.branch_id
> > from dbo.tblBranchZip a, tblwf3 b
> > WHERE a.zip = b.zip5
> > and a.branch_id < '2900'
> > and a.Franchise is null
> > and a.closed is null
> > and a.commercial is null
> > and a.fume is null
> > --and b.branch is null
> >
> > go
> >
> > Any idea what might be going on? Thanks.
> > --
> > Dan D.
>
>|||I didn't respond to your suggested rewrite of the query. That is normally how
I write queries and it is what I tried first but there was no difference in
performance. I was grasping at straws. I can update the with either query in
< 10 secs if I have 100K rows. I just didn't expect 11M rows to take so long.
I have 1.74T of free space on RAID 0. We wanted to do RAID 10 but found out
the controller wouldn't support it so I decided to test with RAID 0.
Thanks,
--
Dan D.
"TheSQLGuru" wrote:
> How about some nasty cartesian product? joining on a zip field (assuming
> zip code) would likely result in billions or possibly trillions of joins
> between the tables.
> Do this. Take the update statement and paste it into another QA window.
> Hit CTRL-L to have the Estimated Query Plan displayed. Place the cursor
> over the connecting lines in the plan and note the estimated row count.
> Also check the left most icon for total estimated rows and the estimated
> query cost. I think you may be surprised by what you see.
> Also, note that your join is not ANSI compliant and will not be supported by
> sql server at some point in the future. Start working to rewrite stuff like
> that into this format:
> UPDATE tblwf3
> SET BRANCH = a.branch_id
> from dbo.tblBranchZip a inner join tblwf3 b ON a.zip = b.zip5
> WHERE a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
>
> --
> TheSQLGuru
> President
> Indicium Resources, Inc.
> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
> news:EAFCE68D-33C7-48BC-A16A-AA42821FBB17@.microsoft.com...
> > Using SS2000 SP4. I have a query that has been running for 17 hours now
> > and
> > I'm wondering if it is doing anything. The transaction log for the
> > database
> > has the same time stamp on it that it had yesterday before I started
> > running
> > the query and the file size is the same. Shouldn't something have been
> > written to the transaction logs by now?
> >
> > There is nothing else running on the server of any consequence. There is
> > disk activity (reads but no writes ) on the disk where the database is.
> > There
> > are no reads or writes on the disk where the transaction logs are located.
> > The processor is at 2%. There is no blocking going on.
> >
> > Here's the query. I have indexes on all columns in the join and where
> > clause. There are about 11M records in the table. There is no problem with
> > free disk space. The tempdb and logs also have a time stamp before I
> > started
> > the query.
> >
> > UPDATE tblwf3
> > SET BRANCH = a.branch_id
> > from dbo.tblBranchZip a, tblwf3 b
> > WHERE a.zip = b.zip5
> > and a.branch_id < '2900'
> > and a.Franchise is null
> > and a.closed is null
> > and a.commercial is null
> > and a.fume is null
> > --and b.branch is null
> >
> > go
> >
> > Any idea what might be going on? Thanks.
> > --
> > Dan D.
>
>|||I should have added that I can also run a query and update 5M rows in about
10 minis. It's just when I try 11M rows something happens and it takes
forever.
--
Dan D.
"TheSQLGuru" wrote:
> How about some nasty cartesian product? joining on a zip field (assuming
> zip code) would likely result in billions or possibly trillions of joins
> between the tables.
> Do this. Take the update statement and paste it into another QA window.
> Hit CTRL-L to have the Estimated Query Plan displayed. Place the cursor
> over the connecting lines in the plan and note the estimated row count.
> Also check the left most icon for total estimated rows and the estimated
> query cost. I think you may be surprised by what you see.
> Also, note that your join is not ANSI compliant and will not be supported by
> sql server at some point in the future. Start working to rewrite stuff like
> that into this format:
> UPDATE tblwf3
> SET BRANCH = a.branch_id
> from dbo.tblBranchZip a inner join tblwf3 b ON a.zip = b.zip5
> WHERE a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
>
> --
> TheSQLGuru
> President
> Indicium Resources, Inc.
> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
> news:EAFCE68D-33C7-48BC-A16A-AA42821FBB17@.microsoft.com...
> > Using SS2000 SP4. I have a query that has been running for 17 hours now
> > and
> > I'm wondering if it is doing anything. The transaction log for the
> > database
> > has the same time stamp on it that it had yesterday before I started
> > running
> > the query and the file size is the same. Shouldn't something have been
> > written to the transaction logs by now?
> >
> > There is nothing else running on the server of any consequence. There is
> > disk activity (reads but no writes ) on the disk where the database is.
> > There
> > are no reads or writes on the disk where the transaction logs are located.
> > The processor is at 2%. There is no blocking going on.
> >
> > Here's the query. I have indexes on all columns in the join and where
> > clause. There are about 11M records in the table. There is no problem with
> > free disk space. The tempdb and logs also have a time stamp before I
> > started
> > the query.
> >
> > UPDATE tblwf3
> > SET BRANCH = a.branch_id
> > from dbo.tblBranchZip a, tblwf3 b
> > WHERE a.zip = b.zip5
> > and a.branch_id < '2900'
> > and a.Franchise is null
> > and a.closed is null
> > and a.commercial is null
> > and a.fume is null
> > --and b.branch is null
> >
> > go
> >
> > Any idea what might be going on? Thanks.
> > --
> > Dan D.
>
>|||I can see that the execution plan is quite different from the one I was
using. I tried your query on 5M rows and it finished in 11mins which is the
same as the query I was using. However, when I ran it on the 22M row table it
only took 2mins. The indexes are the same so I'm not sure what is going on.
I'll test some more.
Thanks,
--
Dan D.
"Gert-Jan Strik" wrote:
> As usual, TheSQLGuru is spot on, although I am not sure whether the SQL
> (s)he posted is actually standard.
> The version below is, and because it (also) eliminates the implicit
> cross join it should run in a reasonable amount of time.
> UPDATE tblwf3
> SET BRANCH = (
> SELECT a.branch_id
> from dbo.tblBranchZip a
> WHERE a.zip = tblwf3.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> )
> --where b.branch is null
> HTH,
> Gert-Jan
> Dan D. wrote:
> >
> > Using SS2000 SP4. I have a query that has been running for 17 hours now and
> > I'm wondering if it is doing anything. The transaction log for the database
> > has the same time stamp on it that it had yesterday before I started running
> > the query and the file size is the same. Shouldn't something have been
> > written to the transaction logs by now?
> >
> > There is nothing else running on the server of any consequence. There is
> > disk activity (reads but no writes ) on the disk where the database is. There
> > are no reads or writes on the disk where the transaction logs are located.
> > The processor is at 2%. There is no blocking going on.
> >
> > Here's the query. I have indexes on all columns in the join and where
> > clause. There are about 11M records in the table. There is no problem with
> > free disk space. The tempdb and logs also have a time stamp before I started
> > the query.
> >
> > UPDATE tblwf3
> > SET BRANCH = a.branch_id
> > from dbo.tblBranchZip a, tblwf3 b
> > WHERE a.zip = b.zip5
> > and a.branch_id < '2900'
> > and a.Franchise is null
> > and a.closed is null
> > and a.commercial is null
> > and a.fume is null
> > --and b.branch is null
> >
> > go
> >
> > Any idea what might be going on? Thanks.
> > --
> > Dan D.
>|||I don't doubt that you are updating 11M rows. The question is how many
TIMES are you updating those 11M rows. 24000+ times perhaps? Not sure
since I don't have access to the data.
However, I can state with almost 100% certainty that this update should NOT
be using a nested loop!! Try using OPTION (HASH JOIN) or MERGE JOIN (prolly
not right if not in sorted order already). See BOL for details.
--
TheSQLGuru
President
Indicium Resources, Inc.
"Dan D." <DanD@.discussions.microsoft.com> wrote in message
news:E77D9053-A6FE-4E46-9A2F-A42CA967B0B2@.microsoft.com...
>I had checked the execution plan and didn't get any large numbers that
>seemed
> out of the ordinary. On the lower level the first step uses a clustered
> index
> and gives a row count 0f 504. The step on the level above it has another
> clustered index and gives a row count of 24,000+. Then there is a nested
> loop
> and parallelism steps that give row counts of 11M but since I know that
> it's
> updating every row in the table I think that is to be expected. The left
> most
> icon shows 11M rows.
> Thanks,
> --
> Dan D.
>
> "TheSQLGuru" wrote:
>> How about some nasty cartesian product? joining on a zip field (assuming
>> zip code) would likely result in billions or possibly trillions of joins
>> between the tables.
>> Do this. Take the update statement and paste it into another QA window.
>> Hit CTRL-L to have the Estimated Query Plan displayed. Place the cursor
>> over the connecting lines in the plan and note the estimated row count.
>> Also check the left most icon for total estimated rows and the estimated
>> query cost. I think you may be surprised by what you see.
>> Also, note that your join is not ANSI compliant and will not be supported
>> by
>> sql server at some point in the future. Start working to rewrite stuff
>> like
>> that into this format:
>> UPDATE tblwf3
>> SET BRANCH = a.branch_id
>> from dbo.tblBranchZip a inner join tblwf3 b ON a.zip = b.zip5
>> WHERE a.branch_id < '2900'
>> and a.Franchise is null
>> and a.closed is null
>> and a.commercial is null
>> and a.fume is null
>>
>> --
>> TheSQLGuru
>> President
>> Indicium Resources, Inc.
>> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
>> news:EAFCE68D-33C7-48BC-A16A-AA42821FBB17@.microsoft.com...
>> > Using SS2000 SP4. I have a query that has been running for 17 hours now
>> > and
>> > I'm wondering if it is doing anything. The transaction log for the
>> > database
>> > has the same time stamp on it that it had yesterday before I started
>> > running
>> > the query and the file size is the same. Shouldn't something have been
>> > written to the transaction logs by now?
>> >
>> > There is nothing else running on the server of any consequence. There
>> > is
>> > disk activity (reads but no writes ) on the disk where the database is.
>> > There
>> > are no reads or writes on the disk where the transaction logs are
>> > located.
>> > The processor is at 2%. There is no blocking going on.
>> >
>> > Here's the query. I have indexes on all columns in the join and where
>> > clause. There are about 11M records in the table. There is no problem
>> > with
>> > free disk space. The tempdb and logs also have a time stamp before I
>> > started
>> > the query.
>> >
>> > UPDATE tblwf3
>> > SET BRANCH = a.branch_id
>> > from dbo.tblBranchZip a, tblwf3 b
>> > WHERE a.zip = b.zip5
>> > and a.branch_id < '2900'
>> > and a.Franchise is null
>> > and a.closed is null
>> > and a.commercial is null
>> > and a.fume is null
>> > --and b.branch is null
>> >
>> > go
>> >
>> > Any idea what might be going on? Thanks.
>> > --
>> > Dan D.
>>|||1) From SQL 2005 BOL,
ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/40e63302-0c68-4593-af3e-6d190181fee7.htm,
about 3/4 down the page, you will find this example:
USE AdventureWorks;
GO
UPDATE Sales.SalesPerson
SET SalesYTD = SalesYTD + SubTotal
FROM Sales.SalesPerson AS sp
JOIN Sales.SalesOrderHeader AS so
ON sp.SalesPersonID = so.SalesPersonID
AND so.OrderDate = (SELECT MAX(OrderDate)
FROM Sales.SalesOrderHeader
WHERE SalesPersonID = sp.SalesPersonID);
GO2) "As usual, TheSQLGuru is spot on..." - thanks very much for the
compliment!!
3) "(s)he" - it is he actually. Kevin. Glad to meet you! :-)
--
TheSQLGuru
President
Indicium Resources, Inc.
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:46324FBF.204D08B8@.toomuchspamalready.nl...
> As usual, TheSQLGuru is spot on, although I am not sure whether the SQL
> (s)he posted is actually standard.
> The version below is, and because it (also) eliminates the implicit
> cross join it should run in a reasonable amount of time.
> UPDATE tblwf3
> SET BRANCH = (
> SELECT a.branch_id
> from dbo.tblBranchZip a
> WHERE a.zip = tblwf3.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> )
> --where b.branch is null
> HTH,
> Gert-Jan
> Dan D. wrote:
>> Using SS2000 SP4. I have a query that has been running for 17 hours now
>> and
>> I'm wondering if it is doing anything. The transaction log for the
>> database
>> has the same time stamp on it that it had yesterday before I started
>> running
>> the query and the file size is the same. Shouldn't something have been
>> written to the transaction logs by now?
>> There is nothing else running on the server of any consequence. There is
>> disk activity (reads but no writes ) on the disk where the database is.
>> There
>> are no reads or writes on the disk where the transaction logs are
>> located.
>> The processor is at 2%. There is no blocking going on.
>> Here's the query. I have indexes on all columns in the join and where
>> clause. There are about 11M records in the table. There is no problem
>> with
>> free disk space. The tempdb and logs also have a time stamp before I
>> started
>> the query.
>> UPDATE tblwf3
>> SET BRANCH = a.branch_id
>> from dbo.tblBranchZip a, tblwf3 b
>> WHERE a.zip = b.zip5
>> and a.branch_id < '2900'
>> and a.Franchise is null
>> and a.closed is null
>> and a.commercial is null
>> and a.fume is null
>> --and b.branch is null
>> go
>> Any idea what might be going on? Thanks.
>> --
>> Dan D.|||How can I tell how many times I'm updating the rows? The cost for the nested
loop is 2%. And if I write the query as you suggested the execution plan is
exactly the same.
I tried the way Gert-Jan suggested and I get a much different execution plan
and the query runs in a little over 2 minutes.
Thanks,
--
Dan D.
"TheSQLGuru" wrote:
> I don't doubt that you are updating 11M rows. The question is how many
> TIMES are you updating those 11M rows. 24000+ times perhaps? Not sure
> since I don't have access to the data.
> However, I can state with almost 100% certainty that this update should NOT
> be using a nested loop!! Try using OPTION (HASH JOIN) or MERGE JOIN (prolly
> not right if not in sorted order already). See BOL for details.
> --
> TheSQLGuru
> President
> Indicium Resources, Inc.
> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
> news:E77D9053-A6FE-4E46-9A2F-A42CA967B0B2@.microsoft.com...
> >I had checked the execution plan and didn't get any large numbers that
> >seemed
> > out of the ordinary. On the lower level the first step uses a clustered
> > index
> > and gives a row count 0f 504. The step on the level above it has another
> > clustered index and gives a row count of 24,000+. Then there is a nested
> > loop
> > and parallelism steps that give row counts of 11M but since I know that
> > it's
> > updating every row in the table I think that is to be expected. The left
> > most
> > icon shows 11M rows.
> >
> > Thanks,
> > --
> > Dan D.
> >
> >
> > "TheSQLGuru" wrote:
> >
> >> How about some nasty cartesian product? joining on a zip field (assuming
> >> zip code) would likely result in billions or possibly trillions of joins
> >> between the tables.
> >>
> >> Do this. Take the update statement and paste it into another QA window.
> >> Hit CTRL-L to have the Estimated Query Plan displayed. Place the cursor
> >> over the connecting lines in the plan and note the estimated row count.
> >> Also check the left most icon for total estimated rows and the estimated
> >> query cost. I think you may be surprised by what you see.
> >>
> >> Also, note that your join is not ANSI compliant and will not be supported
> >> by
> >> sql server at some point in the future. Start working to rewrite stuff
> >> like
> >> that into this format:
> >>
> >> UPDATE tblwf3
> >> SET BRANCH = a.branch_id
> >> from dbo.tblBranchZip a inner join tblwf3 b ON a.zip = b.zip5
> >> WHERE a.branch_id < '2900'
> >> and a.Franchise is null
> >> and a.closed is null
> >> and a.commercial is null
> >> and a.fume is null
> >>
> >>
> >> --
> >> TheSQLGuru
> >> President
> >> Indicium Resources, Inc.
> >>
> >> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
> >> news:EAFCE68D-33C7-48BC-A16A-AA42821FBB17@.microsoft.com...
> >> > Using SS2000 SP4. I have a query that has been running for 17 hours now
> >> > and
> >> > I'm wondering if it is doing anything. The transaction log for the
> >> > database
> >> > has the same time stamp on it that it had yesterday before I started
> >> > running
> >> > the query and the file size is the same. Shouldn't something have been
> >> > written to the transaction logs by now?
> >> >
> >> > There is nothing else running on the server of any consequence. There
> >> > is
> >> > disk activity (reads but no writes ) on the disk where the database is.
> >> > There
> >> > are no reads or writes on the disk where the transaction logs are
> >> > located.
> >> > The processor is at 2%. There is no blocking going on.
> >> >
> >> > Here's the query. I have indexes on all columns in the join and where
> >> > clause. There are about 11M records in the table. There is no problem
> >> > with
> >> > free disk space. The tempdb and logs also have a time stamp before I
> >> > started
> >> > the query.
> >> >
> >> > UPDATE tblwf3
> >> > SET BRANCH = a.branch_id
> >> > from dbo.tblBranchZip a, tblwf3 b
> >> > WHERE a.zip = b.zip5
> >> > and a.branch_id < '2900'
> >> > and a.Franchise is null
> >> > and a.closed is null
> >> > and a.commercial is null
> >> > and a.fume is null
> >> > --and b.branch is null
> >> >
> >> > go
> >> >
> >> > Any idea what might be going on? Thanks.
> >> > --
> >> > Dan D.
> >>
> >>
> >>
>
>|||I've tried your method on a couple of other queries and it is definitely
faster. How would you write the query if I wanted to update several rows. For
instance,
UPDATE tblwf3
SET BRANCH = a.branch_id, term=a.term, hhseg=a.hhseg
from dbo.tblBranchZip a inner join tblwf3 b
ON a.zip = b.zip5
and a.branch_id < '2900'
and a.Franchise is null
and a.closed is null
and a.commercial is null
and a.fume is null
Thanks,
--
Dan D.
"Gert-Jan Strik" wrote:
> As usual, TheSQLGuru is spot on, although I am not sure whether the SQL
> (s)he posted is actually standard.
> The version below is, and because it (also) eliminates the implicit
> cross join it should run in a reasonable amount of time.
> UPDATE tblwf3
> SET BRANCH = (
> SELECT a.branch_id
> from dbo.tblBranchZip a
> WHERE a.zip = tblwf3.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> )
> --where b.branch is null
> HTH,
> Gert-Jan
> Dan D. wrote:
> >
> > Using SS2000 SP4. I have a query that has been running for 17 hours now and
> > I'm wondering if it is doing anything. The transaction log for the database
> > has the same time stamp on it that it had yesterday before I started running
> > the query and the file size is the same. Shouldn't something have been
> > written to the transaction logs by now?
> >
> > There is nothing else running on the server of any consequence. There is
> > disk activity (reads but no writes ) on the disk where the database is. There
> > are no reads or writes on the disk where the transaction logs are located.
> > The processor is at 2%. There is no blocking going on.
> >
> > Here's the query. I have indexes on all columns in the join and where
> > clause. There are about 11M records in the table. There is no problem with
> > free disk space. The tempdb and logs also have a time stamp before I started
> > the query.
> >
> > UPDATE tblwf3
> > SET BRANCH = a.branch_id
> > from dbo.tblBranchZip a, tblwf3 b
> > WHERE a.zip = b.zip5
> > and a.branch_id < '2900'
> > and a.Franchise is null
> > and a.closed is null
> > and a.commercial is null
> > and a.fume is null
> > --and b.branch is null
> >
> > go
> >
> > Any idea what might be going on? Thanks.
> > --
> > Dan D.
>|||The syntax that ANSI has for that is not supported by SQL Server. It
would be something along the lines of
UPDATE ...
SET (BRANCH, term, hhseg) = (
SELECT branch_id, term, hhseg
FROM ...
WHERE ...zip = ...zip5
)
So then you have to make a choice. If you feel strongly about using the
ANSI SQL standard syntax, you would get something like this:
UPDATE tblwf3
SET BRANCH = (
SELECT a.branch_id
from dbo.tblBranchZip a
WHERE a.zip = tblwf3.zip5
and a.branch_id < '2900'
and a.Franchise is null
and a.closed is null
and a.commercial is null
and a.fume is null
), term=(
SELECT a.term
from dbo.tblBranchZip a
WHERE a.zip = tblwf3.zip5
and a.branch_id < '2900'
and a.Franchise is null
and a.closed is null
and a.commercial is null
and a.fume is null
), hhseg=(
SELECT a.hhseg
from dbo.tblBranchZip a
WHERE a.zip = tblwf3.zip5
and a.branch_id < '2900'
and a.Franchise is null
and a.closed is null
and a.commercial is null
and a.fume is null
)
However, chances are that this syntax (above) will not run very fast,
because it it probably not optimized to do only one index seek/scan per
row in tblwf3.
So the other choice is to use the UPDATE ... FROM syntax as described
before. See untested version below:
UPDATE tblwf3
SET BRANCH = a.branch_id, term=a.term, hhseg=a.hhseg
from dbo.tblBranchZip a
WHERE a.zip = tblwf3.zip5
and a.branch_id < '2900'
and a.Franchise is null
and a.closed is null
and a.commercial is null
and a.fume is null
Note that this version is potentially different, because it will only
update rows in tblwf3 where there is a matching row in tblBranchZip. The
earlier syntax will update all rows in tblwf3, unless you add a WHERE
clause.
Gert-Jan
Dan D. wrote:
> I've tried your method on a couple of other queries and it is definitely
> faster. How would you write the query if I wanted to update several rows. For
> instance,
> UPDATE tblwf3
> SET BRANCH = a.branch_id, term=a.term, hhseg=a.hhseg
> from dbo.tblBranchZip a inner join tblwf3 b
> ON a.zip = b.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> Thanks,
> --
> Dan D.
> "Gert-Jan Strik" wrote:
> > As usual, TheSQLGuru is spot on, although I am not sure whether the SQL
> > (s)he posted is actually standard.
> >
> > The version below is, and because it (also) eliminates the implicit
> > cross join it should run in a reasonable amount of time.
> >
> > UPDATE tblwf3
> > SET BRANCH = (
> > SELECT a.branch_id
> > from dbo.tblBranchZip a
> > WHERE a.zip = tblwf3.zip5
> > and a.branch_id < '2900'
> > and a.Franchise is null
> > and a.closed is null
> > and a.commercial is null
> > and a.fume is null
> > )
> > --where b.branch is null
> >
> > HTH,
> > Gert-Jan
> >
> > Dan D. wrote:
> > >
> > > Using SS2000 SP4. I have a query that has been running for 17 hours now and
> > > I'm wondering if it is doing anything. The transaction log for the database
> > > has the same time stamp on it that it had yesterday before I started running
> > > the query and the file size is the same. Shouldn't something have been
> > > written to the transaction logs by now?
> > >
> > > There is nothing else running on the server of any consequence. There is
> > > disk activity (reads but no writes ) on the disk where the database is. There
> > > are no reads or writes on the disk where the transaction logs are located.
> > > The processor is at 2%. There is no blocking going on.
> > >
> > > Here's the query. I have indexes on all columns in the join and where
> > > clause. There are about 11M records in the table. There is no problem with
> > > free disk space. The tempdb and logs also have a time stamp before I started
> > > the query.
> > >
> > > UPDATE tblwf3
> > > SET BRANCH = a.branch_id
> > > from dbo.tblBranchZip a, tblwf3 b
> > > WHERE a.zip = b.zip5
> > > and a.branch_id < '2900'
> > > and a.Franchise is null
> > > and a.closed is null
> > > and a.commercial is null
> > > and a.fume is null
> > > --and b.branch is null
> > >
> > > go
> > >
> > > Any idea what might be going on? Thanks.
> > > --
> > > Dan D.
> >|||Nice knowing you Kevin.
Usually, when people talk about "standard" SQL, they refer to queries
that conform to the ANSI SQL-92 standard, or ANSI SQL-99. AFAIK, those
standards to not support UPDATE ... FROM.
Gert-Jan
TheSQLGuru wrote:
> 1) From SQL 2005 BOL,
> ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/40e63302-0c68-4593-af3e-6d190181fee7.htm,
> about 3/4 down the page, you will find this example:
> USE AdventureWorks;
> GO
> UPDATE Sales.SalesPerson
> SET SalesYTD = SalesYTD + SubTotal
> FROM Sales.SalesPerson AS sp
> JOIN Sales.SalesOrderHeader AS so
> ON sp.SalesPersonID = so.SalesPersonID
> AND so.OrderDate = (SELECT MAX(OrderDate)
> FROM Sales.SalesOrderHeader
> WHERE SalesPersonID => sp.SalesPersonID);
> GO2) "As usual, TheSQLGuru is spot on..." - thanks very much for the
> compliment!!
> 3) "(s)he" - it is he actually. Kevin. Glad to meet you! :-)
> --
> TheSQLGuru
> President
> Indicium Resources, Inc.
> "Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> news:46324FBF.204D08B8@.toomuchspamalready.nl...
> > As usual, TheSQLGuru is spot on, although I am not sure whether the SQL
> > (s)he posted is actually standard.
> >
> > The version below is, and because it (also) eliminates the implicit
> > cross join it should run in a reasonable amount of time.
> >
> > UPDATE tblwf3
> > SET BRANCH = (
> > SELECT a.branch_id
> > from dbo.tblBranchZip a
> > WHERE a.zip = tblwf3.zip5
> > and a.branch_id < '2900'
> > and a.Franchise is null
> > and a.closed is null
> > and a.commercial is null
> > and a.fume is null
> > )
> > --where b.branch is null
> >
> > HTH,
> > Gert-Jan
> >
> > Dan D. wrote:
> >>
> >> Using SS2000 SP4. I have a query that has been running for 17 hours now
> >> and
> >> I'm wondering if it is doing anything. The transaction log for the
> >> database
> >> has the same time stamp on it that it had yesterday before I started
> >> running
> >> the query and the file size is the same. Shouldn't something have been
> >> written to the transaction logs by now?
> >>
> >> There is nothing else running on the server of any consequence. There is
> >> disk activity (reads but no writes ) on the disk where the database is.
> >> There
> >> are no reads or writes on the disk where the transaction logs are
> >> located.
> >> The processor is at 2%. There is no blocking going on.
> >>
> >> Here's the query. I have indexes on all columns in the join and where
> >> clause. There are about 11M records in the table. There is no problem
> >> with
> >> free disk space. The tempdb and logs also have a time stamp before I
> >> started
> >> the query.
> >>
> >> UPDATE tblwf3
> >> SET BRANCH = a.branch_id
> >> from dbo.tblBranchZip a, tblwf3 b
> >> WHERE a.zip = b.zip5
> >> and a.branch_id < '2900'
> >> and a.Franchise is null
> >> and a.closed is null
> >> and a.commercial is null
> >> and a.fume is null
> >> --and b.branch is null
> >>
> >> go
> >>
> >> Any idea what might be going on? Thanks.
> >> --
> >> Dan D.|||You can tell by doing an aggregate on zip on each table. Any value for zip
that comes up with more than one row on each table will be doing multiple
rows.
select zip, count(*) from tblBranchZip (nolock) group by zip
select zip5, count(*) from tblwf3 (nolock) group by zip5
Even worse is when you have multiple entries for a given zip in BOTH tables
say tblBranchZip has 147 rows with zip 30338 and tblwf3 has 9137 rows with
zip 30338. When you join on these and do the update as you had it, you will
have 147*9137 join hits. That means more than 1.3MILLION updates for a
SINGLE zip value.
--
TheSQLGuru
President
Indicium Resources, Inc.
"Dan D." <DanD@.discussions.microsoft.com> wrote in message
news:1DD9F77F-9AD2-4729-8149-FA5D694F4FB6@.microsoft.com...
> How can I tell how many times I'm updating the rows? The cost for the
> nested
> loop is 2%. And if I write the query as you suggested the execution plan
> is
> exactly the same.
> I tried the way Gert-Jan suggested and I get a much different execution
> plan
> and the query runs in a little over 2 minutes.
> Thanks,
> --
> Dan D.
>
> "TheSQLGuru" wrote:
>> I don't doubt that you are updating 11M rows. The question is how many
>> TIMES are you updating those 11M rows. 24000+ times perhaps? Not sure
>> since I don't have access to the data.
>> However, I can state with almost 100% certainty that this update should
>> NOT
>> be using a nested loop!! Try using OPTION (HASH JOIN) or MERGE JOIN
>> (prolly
>> not right if not in sorted order already). See BOL for details.
>> --
>> TheSQLGuru
>> President
>> Indicium Resources, Inc.
>> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
>> news:E77D9053-A6FE-4E46-9A2F-A42CA967B0B2@.microsoft.com...
>> >I had checked the execution plan and didn't get any large numbers that
>> >seemed
>> > out of the ordinary. On the lower level the first step uses a clustered
>> > index
>> > and gives a row count 0f 504. The step on the level above it has
>> > another
>> > clustered index and gives a row count of 24,000+. Then there is a
>> > nested
>> > loop
>> > and parallelism steps that give row counts of 11M but since I know that
>> > it's
>> > updating every row in the table I think that is to be expected. The
>> > left
>> > most
>> > icon shows 11M rows.
>> >
>> > Thanks,
>> > --
>> > Dan D.
>> >
>> >
>> > "TheSQLGuru" wrote:
>> >
>> >> How about some nasty cartesian product? joining on a zip field
>> >> (assuming
>> >> zip code) would likely result in billions or possibly trillions of
>> >> joins
>> >> between the tables.
>> >>
>> >> Do this. Take the update statement and paste it into another QA
>> >> window.
>> >> Hit CTRL-L to have the Estimated Query Plan displayed. Place the
>> >> cursor
>> >> over the connecting lines in the plan and note the estimated row
>> >> count.
>> >> Also check the left most icon for total estimated rows and the
>> >> estimated
>> >> query cost. I think you may be surprised by what you see.
>> >>
>> >> Also, note that your join is not ANSI compliant and will not be
>> >> supported
>> >> by
>> >> sql server at some point in the future. Start working to rewrite
>> >> stuff
>> >> like
>> >> that into this format:
>> >>
>> >> UPDATE tblwf3
>> >> SET BRANCH = a.branch_id
>> >> from dbo.tblBranchZip a inner join tblwf3 b ON a.zip = b.zip5
>> >> WHERE a.branch_id < '2900'
>> >> and a.Franchise is null
>> >> and a.closed is null
>> >> and a.commercial is null
>> >> and a.fume is null
>> >>
>> >>
>> >> --
>> >> TheSQLGuru
>> >> President
>> >> Indicium Resources, Inc.
>> >>
>> >> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
>> >> news:EAFCE68D-33C7-48BC-A16A-AA42821FBB17@.microsoft.com...
>> >> > Using SS2000 SP4. I have a query that has been running for 17 hours
>> >> > now
>> >> > and
>> >> > I'm wondering if it is doing anything. The transaction log for the
>> >> > database
>> >> > has the same time stamp on it that it had yesterday before I started
>> >> > running
>> >> > the query and the file size is the same. Shouldn't something have
>> >> > been
>> >> > written to the transaction logs by now?
>> >> >
>> >> > There is nothing else running on the server of any consequence.
>> >> > There
>> >> > is
>> >> > disk activity (reads but no writes ) on the disk where the database
>> >> > is.
>> >> > There
>> >> > are no reads or writes on the disk where the transaction logs are
>> >> > located.
>> >> > The processor is at 2%. There is no blocking going on.
>> >> >
>> >> > Here's the query. I have indexes on all columns in the join and
>> >> > where
>> >> > clause. There are about 11M records in the table. There is no
>> >> > problem
>> >> > with
>> >> > free disk space. The tempdb and logs also have a time stamp before I
>> >> > started
>> >> > the query.
>> >> >
>> >> > UPDATE tblwf3
>> >> > SET BRANCH = a.branch_id
>> >> > from dbo.tblBranchZip a, tblwf3 b
>> >> > WHERE a.zip = b.zip5
>> >> > and a.branch_id < '2900'
>> >> > and a.Franchise is null
>> >> > and a.closed is null
>> >> > and a.commercial is null
>> >> > and a.fume is null
>> >> > --and b.branch is null
>> >> >
>> >> > go
>> >> >
>> >> > Any idea what might be going on? Thanks.
>> >> > --
>> >> > Dan D.
>> >>
>> >>
>> >>
>>|||It is certainly the case that there are multiple zips in both tables. But
since the cost of the nested loop was only 2% wouldn't you think that it
isn't having much impact on the overall process?
The execution plan for the query of Gert-Jan didn't have the nested loop. In
this case after several tests, that query seems to be the one to use.
Thanks,
--
Dan D.
"TheSQLGuru" wrote:
> You can tell by doing an aggregate on zip on each table. Any value for zip
> that comes up with more than one row on each table will be doing multiple
> rows.
> select zip, count(*) from tblBranchZip (nolock) group by zip
> select zip5, count(*) from tblwf3 (nolock) group by zip5
> Even worse is when you have multiple entries for a given zip in BOTH tables
> say tblBranchZip has 147 rows with zip 30338 and tblwf3 has 9137 rows with
> zip 30338. When you join on these and do the update as you had it, you will
> have 147*9137 join hits. That means more than 1.3MILLION updates for a
> SINGLE zip value.
> --
> TheSQLGuru
> President
> Indicium Resources, Inc.
> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
> news:1DD9F77F-9AD2-4729-8149-FA5D694F4FB6@.microsoft.com...
> > How can I tell how many times I'm updating the rows? The cost for the
> > nested
> > loop is 2%. And if I write the query as you suggested the execution plan
> > is
> > exactly the same.
> >
> > I tried the way Gert-Jan suggested and I get a much different execution
> > plan
> > and the query runs in a little over 2 minutes.
> >
> > Thanks,
> > --
> > Dan D.
> >
> >
> > "TheSQLGuru" wrote:
> >
> >> I don't doubt that you are updating 11M rows. The question is how many
> >> TIMES are you updating those 11M rows. 24000+ times perhaps? Not sure
> >> since I don't have access to the data.
> >>
> >> However, I can state with almost 100% certainty that this update should
> >> NOT
> >> be using a nested loop!! Try using OPTION (HASH JOIN) or MERGE JOIN
> >> (prolly
> >> not right if not in sorted order already). See BOL for details.
> >>
> >> --
> >> TheSQLGuru
> >> President
> >> Indicium Resources, Inc.
> >>
> >> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
> >> news:E77D9053-A6FE-4E46-9A2F-A42CA967B0B2@.microsoft.com...
> >> >I had checked the execution plan and didn't get any large numbers that
> >> >seemed
> >> > out of the ordinary. On the lower level the first step uses a clustered
> >> > index
> >> > and gives a row count 0f 504. The step on the level above it has
> >> > another
> >> > clustered index and gives a row count of 24,000+. Then there is a
> >> > nested
> >> > loop
> >> > and parallelism steps that give row counts of 11M but since I know that
> >> > it's
> >> > updating every row in the table I think that is to be expected. The
> >> > left
> >> > most
> >> > icon shows 11M rows.
> >> >
> >> > Thanks,
> >> > --
> >> > Dan D.
> >> >
> >> >
> >> > "TheSQLGuru" wrote:
> >> >
> >> >> How about some nasty cartesian product? joining on a zip field
> >> >> (assuming
> >> >> zip code) would likely result in billions or possibly trillions of
> >> >> joins
> >> >> between the tables.
> >> >>
> >> >> Do this. Take the update statement and paste it into another QA
> >> >> window.
> >> >> Hit CTRL-L to have the Estimated Query Plan displayed. Place the
> >> >> cursor
> >> >> over the connecting lines in the plan and note the estimated row
> >> >> count.
> >> >> Also check the left most icon for total estimated rows and the
> >> >> estimated
> >> >> query cost. I think you may be surprised by what you see.
> >> >>
> >> >> Also, note that your join is not ANSI compliant and will not be
> >> >> supported
> >> >> by
> >> >> sql server at some point in the future. Start working to rewrite
> >> >> stuff
> >> >> like
> >> >> that into this format:
> >> >>
> >> >> UPDATE tblwf3
> >> >> SET BRANCH = a.branch_id
> >> >> from dbo.tblBranchZip a inner join tblwf3 b ON a.zip = b.zip5
> >> >> WHERE a.branch_id < '2900'
> >> >> and a.Franchise is null
> >> >> and a.closed is null
> >> >> and a.commercial is null
> >> >> and a.fume is null
> >> >>
> >> >>
> >> >> --
> >> >> TheSQLGuru
> >> >> President
> >> >> Indicium Resources, Inc.
> >> >>
> >> >> "Dan D." <DanD@.discussions.microsoft.com> wrote in message
> >> >> news:EAFCE68D-33C7-48BC-A16A-AA42821FBB17@.microsoft.com...
> >> >> > Using SS2000 SP4. I have a query that has been running for 17 hours
> >> >> > now
> >> >> > and
> >> >> > I'm wondering if it is doing anything. The transaction log for the
> >> >> > database
> >> >> > has the same time stamp on it that it had yesterday before I started
> >> >> > running
> >> >> > the query and the file size is the same. Shouldn't something have
> >> >> > been
> >> >> > written to the transaction logs by now?
> >> >> >
> >> >> > There is nothing else running on the server of any consequence.
> >> >> > There
> >> >> > is
> >> >> > disk activity (reads but no writes ) on the disk where the database
> >> >> > is.
> >> >> > There
> >> >> > are no reads or writes on the disk where the transaction logs are
> >> >> > located.
> >> >> > The processor is at 2%. There is no blocking going on.
> >> >> >
> >> >> > Here's the query. I have indexes on all columns in the join and
> >> >> > where
> >> >> > clause. There are about 11M records in the table. There is no
> >> >> > problem
> >> >> > with
> >> >> > free disk space. The tempdb and logs also have a time stamp before I
> >> >> > started
> >> >> > the query.
> >> >> >
> >> >> > UPDATE tblwf3
> >> >> > SET BRANCH = a.branch_id
> >> >> > from dbo.tblBranchZip a, tblwf3 b
> >> >> > WHERE a.zip = b.zip5
> >> >> > and a.branch_id < '2900'
> >> >> > and a.Franchise is null
> >> >> > and a.closed is null
> >> >> > and a.commercial is null
> >> >> > and a.fume is null
> >> >> > --and b.branch is null
> >> >> >
> >> >> > go
> >> >> >
> >> >> > Any idea what might be going on? Thanks.
> >> >> > --
> >> >> > Dan D.
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
>
>|||When you say that "this version is potentially different, because it will only
update rows in tblwf3 where there is a matching row in tblBranchZip. The
earlier syntax will update all rows in tblwf3, unless you add a WHERE
clause", I don't understand because each query has a join of "a.zip =tblwf3.zip5". Could you explain that a little more?
Thanks,
--
Dan D.
"Gert-Jan Strik" wrote:
> The syntax that ANSI has for that is not supported by SQL Server. It
> would be something along the lines of
> UPDATE ...
> SET (BRANCH, term, hhseg) = (
> SELECT branch_id, term, hhseg
> FROM ...
> WHERE ...zip = ...zip5
> )
> So then you have to make a choice. If you feel strongly about using the
> ANSI SQL standard syntax, you would get something like this:
> UPDATE tblwf3
> SET BRANCH = (
> SELECT a.branch_id
> from dbo.tblBranchZip a
> WHERE a.zip = tblwf3.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> ), term=(
> SELECT a.term
> from dbo.tblBranchZip a
> WHERE a.zip = tblwf3.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> ), hhseg=(
> SELECT a.hhseg
> from dbo.tblBranchZip a
> WHERE a.zip = tblwf3.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> )
> However, chances are that this syntax (above) will not run very fast,
> because it it probably not optimized to do only one index seek/scan per
> row in tblwf3.
> So the other choice is to use the UPDATE ... FROM syntax as described
> before. See untested version below:
> UPDATE tblwf3
> SET BRANCH = a.branch_id, term=a.term, hhseg=a.hhseg
> from dbo.tblBranchZip a
> WHERE a.zip = tblwf3.zip5
> and a.branch_id < '2900'
> and a.Franchise is null
> and a.closed is null
> and a.commercial is null
> and a.fume is null
> Note that this version is potentially different, because it will only
> update rows in tblwf3 where there is a matching row in tblBranchZip. The
> earlier syntax will update all rows in tblwf3, unless you add a WHERE
> clause.
> Gert-Jan
>
> Dan D. wrote:
> >
> > I've tried your method on a couple of other queries and it is definitely
> > faster. How would you write the query if I wanted to update several rows. For
> > instance,
> > UPDATE tblwf3
> > SET BRANCH = a.branch_id, term=a.term, hhseg=a.hhseg
> > from dbo.tblBranchZip a inner join tblwf3 b
> > ON a.zip = b.zip5
> > and a.branch_id < '2900'
> > and a.Franchise is null
> > and a.closed is null
> > and a.commercial is null
> > and a.fume is null
> >
> > Thanks,
> >
> > --
> > Dan D.
> >
> > "Gert-Jan Strik" wrote:
> >
> > > As usual, TheSQLGuru is spot on, although I am not sure whether the SQL
> > > (s)he posted is actually standard.
> > >
> > > The version below is, and because it (also) eliminates the implicit
> > > cross join it should run in a reasonable amount of time.
> > >
> > > UPDATE tblwf3
> > > SET BRANCH = (
> > > SELECT a.branch_id
> > > from dbo.tblBranchZip a
> > > WHERE a.zip = tblwf3.zip5
> > > and a.branch_id < '2900'
> > > and a.Franchise is null
> > > and a.closed is null
> > > and a.commercial is null
> > > and a.fume is null
> > > )
> > > --where b.branch is null
> > >
> > > HTH,
> > > Gert-Jan
> > >
> > > Dan D. wrote:
> > > >
> > > > Using SS2000 SP4. I have a query that has been running for 17 hours now and
> > > > I'm wondering if it is doing anything. The transaction log for the database
> > > > has the same time stamp on it that it had yesterday before I started running
> > > > the query and the file size is the same. Shouldn't something have been
> > > > written to the transaction logs by now?
> > > >
> > > > There is nothing else running on the server of any consequence. There is
> > > > disk activity (reads but no writes ) on the disk where the database is. There
> > > > are no reads or writes on the disk where the transaction logs are located.
> > > > The processor is at 2%. There is no blocking going on.
> > > >
> > > > Here's the query. I have indexes on all columns in the join and where
> > > > clause. There are about 11M records in the table. There is no problem with
> > > > free disk space. The tempdb and logs also have a time stamp before I started
> > > > the query.
> > > >
> > > > UPDATE tblwf3
> > > > SET BRANCH = a.branch_id
> > > > from dbo.tblBranchZip a, tblwf3 b
> > > > WHERE a.zip = b.zip5
> > > > and a.branch_id < '2900'
> > > > and a.Franchise is null
> > > > and a.closed is null
> > > > and a.commercial is null
> > > > and a.fume is null
> > > > --and b.branch is null
> > > >
> > > > go
> > > >
> > > > Any idea what might be going on? Thanks.
> > > > --
> > > > Dan D.
> > >
>|||Dan,
When you use the UPDATE ... FROM syntax, what the engine basically does
is execute the query as specified with the FROM clause and predicates in
the WHERE clause and create a resultset. Only matching rows of this
resultset to the UPDATE table are updated.
So in your particular case, if there is no matching row for a zip5 in
tblwf3, then this row in tblwf3 will not be updated if you use the
UPDATE ... FROM syntax.
What TheSQLGuru was suggesting, was that you have even more serious
problems, because he suspects that you might have several rows for one
zip in tblBranchZip. And you basically said that that was true. This
fact will have its effect on the way the UPDATE statement is executed.
Let's say there is a row in tblwf3 with zip5 of '90210', and let's
assume that you have 5 rows for this zip in tblBranchZip, like this:
zip | branch_id
--+--
90210 | 1200
90210 | 1201
90210 | 2000
90210 | 2001
90210 | 2002
If you use the UPDATE ... FROM syntax, SQL Server will try to update the
row in tblwf3 for zip5 90210 at least 5 times. I am saying at least
here, because it depends on the number of times tblwf3 has a zip5 of
90210. If there are two rows in tblwf3 with a zip5 of 90210, then each
row will be updated 10 times.
If you use the UPDATE with a correlated query, then the correlated query
should return one scalar for each row in tblwf3. So if there are two
rows in tblwf3 with a zip5 of 90210, then for each of these 2 rows SQL
Server will find the branch_id and will update each row only once.
But your real problem is not that of performance. It is that of
function. Because look at the example above. Which branch_id should SQL
Server pick for zip 90210? There shouldn't be 5 possible answers. There
should be one (or zero) answer. That is why TheSQLGuru posted the
example using the MAX aggregate.
My advice would be: first figure out how to write the correct query
without the UPDATE, and make sure that there will only be one row for
each zip code. When that is all working well, then change it to an
UPDATE query.
Gert-Jan
Dan D. wrote:
> When you say that "this version is potentially different, because it will only
> update rows in tblwf3 where there is a matching row in tblBranchZip. The
> earlier syntax will update all rows in tblwf3, unless you add a WHERE
> clause", I don't understand because each query has a join of "a.zip => tblwf3.zip5". Could you explain that a little more?
> Thanks,
> --
> Dan D.
[snip]|||Gotcha on the "standard" thing. I figured since this was a Microsoft SQL
Server-only forum (and also since I only work with clients that are SQL
Server only) we were talking about Transact SQL as the "standard". :-D
--
TheSQLGuru
President
Indicium Resources, Inc.
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:463388E3.EDE740BE@.toomuchspamalready.nl...
> Nice knowing you Kevin.
> Usually, when people talk about "standard" SQL, they refer to queries
> that conform to the ANSI SQL-92 standard, or ANSI SQL-99. AFAIK, those
> standards to not support UPDATE ... FROM.
> Gert-Jan|||Sorry I didn't have a chance to get back to this sooner. I appreciate your
help. The object of this is to match potential customers (in tblwf3) with
company branches (tblBranchZip). There can be more than one branch that
covers a particular zip code. It is usually only two branches at most but
sometimes more.
There is no data (like using zip+4) that will allow us to match a particular
customer better to a branch on the first pass. So what we've done in the past
is do a rough match like the one I'm asking about and then use other data in
the file like demographic data to more evenly distribute the customers
between two or more branches.
I'm still trying to understand the technical part of how each of these two
queries are processed. In the update ... from in your example, I'm not sure I
understand why a row could be processed 10 times. I thought that there would
be a result set of all of the rows in tblwf3 and that the program would loop
through that result set and for each row it would match to the tblbranchzip
table based on the join tblwf3.zip5=tblbranchzip.zip. In your example if
there are 5 rows in tblbranchzip for a particular zip code wouldn't it only
process 5 times rather than 10?
In the second example of your code with the correlated query, I don't
understand why it wouldn't process the same way - for each row of the tblwf3
result set, it joins to the subquery using the zip for the particular row of
tblwf3 that it's on and find a branch for that zip code. I know that must not
be how it works because if it was the subquery could return 2 rows or more
for some zip and the subquery would fail. I can also tell from the execution
plans that the two queries work differently (the first one has a nested loop
but not the second one) but I'm still not clear why. How does the subquery
not return more than row for some zips since there are two or more branches
in some zips?
I have what I think is a very good book on SS2005 T-SQL Querying by Ben-Gan
that I'm reading to help understand this better. We process mailfiles and the
faster and more accurate we can make the process, the better.
Thanks again,
--
Dan D.
"Gert-Jan Strik" wrote:
> Dan,
> When you use the UPDATE ... FROM syntax, what the engine basically does
> is execute the query as specified with the FROM clause and predicates in
> the WHERE clause and create a resultset. Only matching rows of this
> resultset to the UPDATE table are updated.
> So in your particular case, if there is no matching row for a zip5 in
> tblwf3, then this row in tblwf3 will not be updated if you use the
> UPDATE ... FROM syntax.
>
> What TheSQLGuru was suggesting, was that you have even more serious
> problems, because he suspects that you might have several rows for one
> zip in tblBranchZip. And you basically said that that was true. This
> fact will have its effect on the way the UPDATE statement is executed.
> Let's say there is a row in tblwf3 with zip5 of '90210', and let's
> assume that you have 5 rows for this zip in tblBranchZip, like this:
> zip | branch_id
> --+--
> 90210 | 1200
> 90210 | 1201
> 90210 | 2000
> 90210 | 2001
> 90210 | 2002
> If you use the UPDATE ... FROM syntax, SQL Server will try to update the
> row in tblwf3 for zip5 90210 at least 5 times. I am saying at least
> here, because it depends on the number of times tblwf3 has a zip5 of
> 90210. If there are two rows in tblwf3 with a zip5 of 90210, then each
> row will be updated 10 times.
> If you use the UPDATE with a correlated query, then the correlated query
> should return one scalar for each row in tblwf3. So if there are two
> rows in tblwf3 with a zip5 of 90210, then for each of these 2 rows SQL
> Server will find the branch_id and will update each row only once.
>
> But your real problem is not that of performance. It is that of
> function. Because look at the example above. Which branch_id should SQL
> Server pick for zip 90210? There shouldn't be 5 possible answers. There
> should be one (or zero) answer. That is why TheSQLGuru posted the
> example using the MAX aggregate.
> My advice would be: first figure out how to write the correct query
> without the UPDATE, and make sure that there will only be one row for
> each zip code. When that is all working well, then change it to an
> UPDATE query.
> Gert-Jan
>
> Dan D. wrote:
> >
> > When you say that "this version is potentially different, because it will only
> > update rows in tblwf3 where there is a matching row in tblBranchZip. The
> > earlier syntax will update all rows in tblwf3, unless you add a WHERE
> > clause", I don't understand because each query has a join of "a.zip => > tblwf3.zip5". Could you explain that a little more?
> >
> > Thanks,
> >
> > --
> > Dan D.
> [snip]
>|||Dan D. wrote:
[snip]
> I'm still trying to understand the technical part of how each of these two
> queries are processed. In the update ... from in your example, I'm not sure I
> understand why a row could be processed 10 times. I thought that there would
> be a result set of all of the rows in tblwf3 and that the program would loop
> through that result set and for each row it would match to the tblbranchzip
> table based on the join tblwf3.zip5=tblbranchzip.zip. In your example if
> there are 5 rows in tblbranchzip for a particular zip code wouldn't it only
> process 5 times rather than 10?
It all depends how the optimizer happens to generate the query plan. But
you think of it like this: the query will be performed first. A join
between
zip5
--
90210
90210
and
zip | branch_id
--+--
90210 | 1200
90210 | 1201
90210 | 2000
will result in 6 rows, which all happen to have the same zip code. Then,
this resultset is processed. For each row in the resultset, the UPDATE
table is looked up and updated. So it would do 6 scans and therefore
update both rows 6 times.
Please note that this is not necessarily how the UPDATE ... FROM query
is executed, but sometimes it is. You would have to check your query
plan to see what happens in your case.
> In the second example of your code with the correlated query, I don't
> understand why it wouldn't process the same way - for each row of the tblwf3
> result set, it joins to the subquery using the zip for the particular row of
> tblwf3 that it's on and find a branch for that zip code. I know that must not
> be how it works because if it was the subquery could return 2 rows or more
> for some zip and the subquery would fail.
Yes, in that case it should fail.
> I can also tell from the execution
> plans that the two queries work differently (the first one has a nested loop
> but not the second one) but I'm still not clear why.
The optimizer does not need a reason to execute two logically equivalent
queries with two different query plans. However, in this case I would
say that the query plan must be different, since the two queries are
logically not equivalent. One query updates all rows, the other query
updates at least 0 and at most all rows.
> How does the subquery not return more than row for some zips since there are two or more branches
> in some zips?
I don't know. You have the query and the data and therefore the means to
check it out. I don't even have the query plan(s).
[snip]
> > My advice would be: first figure out how to write the correct query
> > without the UPDATE, and make sure that there will only be one row for
> > each zip code. When that is all working well, then change it to an
> > UPDATE query.
Gert-Jan|||I now understand the difference now of why the update..from processes so many
more rows than the correlated subquery. I still don't know why the subquery
doesn't give an error because in some cases it should return more than one
row but I'll work on that.
Thanks for your patience and help Gert-Jan.
--
Dan D.
"Gert-Jan Strik" wrote:
> Dan D. wrote:
> [snip]
> > I'm still trying to understand the technical part of how each of these two
> > queries are processed. In the update ... from in your example, I'm not sure I
> > understand why a row could be processed 10 times. I thought that there would
> > be a result set of all of the rows in tblwf3 and that the program would loop
> > through that result set and for each row it would match to the tblbranchzip
> > table based on the join tblwf3.zip5=tblbranchzip.zip. In your example if
> > there are 5 rows in tblbranchzip for a particular zip code wouldn't it only
> > process 5 times rather than 10?
> It all depends how the optimizer happens to generate the query plan. But
> you think of it like this: the query will be performed first. A join
> between
> zip5
> --
> 90210
> 90210
> and
> zip | branch_id
> --+--
> 90210 | 1200
> 90210 | 1201
> 90210 | 2000
> will result in 6 rows, which all happen to have the same zip code. Then,
> this resultset is processed. For each row in the resultset, the UPDATE
> table is looked up and updated. So it would do 6 scans and therefore
> update both rows 6 times.
> Please note that this is not necessarily how the UPDATE ... FROM query
> is executed, but sometimes it is. You would have to check your query
> plan to see what happens in your case.
> > In the second example of your code with the correlated query, I don't
> > understand why it wouldn't process the same way - for each row of the tblwf3
> > result set, it joins to the subquery using the zip for the particular row of
> > tblwf3 that it's on and find a branch for that zip code. I know that must not
> > be how it works because if it was the subquery could return 2 rows or more
> > for some zip and the subquery would fail.
> Yes, in that case it should fail.
> > I can also tell from the execution
> > plans that the two queries work differently (the first one has a nested loop
> > but not the second one) but I'm still not clear why.
> The optimizer does not need a reason to execute two logically equivalent
> queries with two different query plans. However, in this case I would
> say that the query plan must be different, since the two queries are
> logically not equivalent. One query updates all rows, the other query
> updates at least 0 and at most all rows.
> > How does the subquery not return more than row for some zips since there are two or more branches
> > in some zips?
> I don't know. You have the query and the data and therefore the means to
> check it out. I don't even have the query plan(s).
> [snip]
> > > My advice would be: first figure out how to write the correct query
> > > without the UPDATE, and make sure that there will only be one row for
> > > each zip code. When that is all working well, then change it to an
> > > UPDATE query.
> Gert-Jan
>

No comments:

Post a Comment