Need to sum columns: Nested Loops vs. Hash Match

I have two tables, Users and UserFile. Users.UserID is a PK, and
UserFile.UserID is a FK to Users. The relationship between Users and
UserFile is one-to-many.
I have the following two queries:
--Query 1
select count(*)
from Users
join UserFile
on UserFile.UserID = Users.UserID
where Users.UserLastActive > getdate() - 14
--Query 2
select count(*)
from Users
join UserFile
on UserFile.UserID = Users.UserID
where Users.UserLastActive > getdate() - 14
and UserFile.UserFileBlockTrades = 0
The only difference is that the last line of Query 2 is omitted in Query 1.
The selectivity of the Users filter is quite high -- about 6%. The
selectivity of the UserFile filter is quite low -- about 96%.
Here's the problem. Query 1 results in Index Seeks on both tables, with a
Nested Loops join joining the two tables. However, Query 2 results in an
Index Seek on Users, but an Index Scan on UserFile, as well as the more
costly Hash Match join joining the two tables. The result is, Query 2 has
about 4x CPU cost of Query 1.
Here's my theory.
Query 1 is saying (in English), take all Users (55,100), filter by
UserLastActive (3,226), and join the remaining records on UserFile (8,307).
Query 2 is saying, take all Users (55,100), filter by UserLastActive
(3,226), then take all UserFiles (68,617), filter by UserFileBlockTrades
(65,814), and join the two results (7,520).
Running a filter (UserFileBlockTrades) with such a low selectivity on such a
large set of records is obviously going to be costly. So, my question is,
is there a way to restructure my query so that SQL will do this instead:
Take all Users (55,100), filter by UserLastActive (3,226), join the
remaining records on UserFile (8,307), and filter by UserFileBlockTrades
(7,520).
The selectivity of the UserFileBlockTrades filter will still be low, but it
will be dealing with a much smaller set of data. I don't know if this is
possible, but I would think it would be, considering the fact that this
seems like something that would happen quite a bit.
I tried to use a derived table to force the low selectivity filter after
everything else has been done:
select count(*)
from (
select UserFile.*
from Users
join UserFile
on UserFile.UserID = Users.UserID
where Users.UserLastActive > getdate() - 14
) UserFile
where UserFile.UserFileBlockTrades = 0
But the execution plan was exactly the same as the one from Query 2, so this
had zero effect.
Surely this can be done.
Any ideas or suggestions? Let me know if you need any other info from me.
Thanks in advance for your help.
Jerad
I believe I posted this in the wrong group, so I will repost in
microsoft.public.sqlserver.programming. Please respond on that board, to
prevent duplicate posts. Sorry for the cross posting.
Thanks.
Jerad
"Jerad Rose" <no@.spam.com> wrote in message
news:utuFQ9o4FHA.276@.TK2MSFTNGP09.phx.gbl...
>I have two tables, Users and UserFile. Users.UserID is a PK, and
>UserFile.UserID is a FK to Users. The relationship between Users and
>UserFile is one-to-many.
> I have the following two queries:
> --Query 1
> select count(*)
> from Users
> join UserFile
> on UserFile.UserID = Users.UserID
> where Users.UserLastActive > getdate() - 14
> --Query 2
> select count(*)
> from Users
> join UserFile
> on UserFile.UserID = Users.UserID
> where Users.UserLastActive > getdate() - 14
> and UserFile.UserFileBlockTrades = 0
> The only difference is that the last line of Query 2 is omitted in Query
> 1. The selectivity of the Users filter is quite high -- about 6%. The
> selectivity of the UserFile filter is quite low -- about 96%.
> Here's the problem. Query 1 results in Index Seeks on both tables, with a
> Nested Loops join joining the two tables. However, Query 2 results in an
> Index Seek on Users, but an Index Scan on UserFile, as well as the more
> costly Hash Match join joining the two tables. The result is, Query 2 has
> about 4x CPU cost of Query 1.
> Here's my theory.
> Query 1 is saying (in English), take all Users (55,100), filter by
> UserLastActive (3,226), and join the remaining records on UserFile
> (8,307).
> Query 2 is saying, take all Users (55,100), filter by UserLastActive
> (3,226), then take all UserFiles (68,617), filter by UserFileBlockTrades
> (65,814), and join the two results (7,520).
> Running a filter (UserFileBlockTrades) with such a low selectivity on such
> a large set of records is obviously going to be costly. So, my question
> is, is there a way to restructure my query so that SQL will do this
> instead:
> Take all Users (55,100), filter by UserLastActive (3,226), join the
> remaining records on UserFile (8,307), and filter by UserFileBlockTrades
> (7,520).
> The selectivity of the UserFileBlockTrades filter will still be low, but
> it will be dealing with a much smaller set of data. I don't know if this
> is possible, but I would think it would be, considering the fact that this
> seems like something that would happen quite a bit.
> I tried to use a derived table to force the low selectivity filter after
> everything else has been done:
> select count(*)
> from (
> select UserFile.*
> from Users
> join UserFile
> on UserFile.UserID = Users.UserID
> where Users.UserLastActive > getdate() - 14
> ) UserFile
> where UserFile.UserFileBlockTrades = 0
> But the execution plan was exactly the same as the one from Query 2, so
> this had zero effect.
> Surely this can be done.
> Any ideas or suggestions? Let me know if you need any other info from me.
> Thanks in advance for your help.
> Jerad
>
|||Hi Jerad
The answer to this most likely lies in indexing, not re-structuring the
query. What indexes are available on these tables? Running sp_helpindex
[tablename] against both table would give us more of a full picture.
As a started, I'd suggest that the following two indexes should be there,
but the recommendation could change depending on whether either table has
clustered indexes & on which columns..
Users (UserLastActive, UserID)
UserFile (UserID, UserFileBlockTrades)
HTH
Regards,
Greg Linwood
SQL Server MVP
"Jerad Rose" <no@.spam.com> wrote in message
news:utuFQ9o4FHA.276@.TK2MSFTNGP09.phx.gbl...
>I have two tables, Users and UserFile. Users.UserID is a PK, and
>UserFile.UserID is a FK to Users. The relationship between Users and
>UserFile is one-to-many.
> I have the following two queries:
> --Query 1
> select count(*)
> from Users
> join UserFile
> on UserFile.UserID = Users.UserID
> where Users.UserLastActive > getdate() - 14
> --Query 2
> select count(*)
> from Users
> join UserFile
> on UserFile.UserID = Users.UserID
> where Users.UserLastActive > getdate() - 14
> and UserFile.UserFileBlockTrades = 0
> The only difference is that the last line of Query 2 is omitted in Query
> 1. The selectivity of the Users filter is quite high -- about 6%. The
> selectivity of the UserFile filter is quite low -- about 96%.
> Here's the problem. Query 1 results in Index Seeks on both tables, with a
> Nested Loops join joining the two tables. However, Query 2 results in an
> Index Seek on Users, but an Index Scan on UserFile, as well as the more
> costly Hash Match join joining the two tables. The result is, Query 2 has
> about 4x CPU cost of Query 1.
> Here's my theory.
> Query 1 is saying (in English), take all Users (55,100), filter by
> UserLastActive (3,226), and join the remaining records on UserFile
> (8,307).
> Query 2 is saying, take all Users (55,100), filter by UserLastActive
> (3,226), then take all UserFiles (68,617), filter by UserFileBlockTrades
> (65,814), and join the two results (7,520).
> Running a filter (UserFileBlockTrades) with such a low selectivity on such
> a large set of records is obviously going to be costly. So, my question
> is, is there a way to restructure my query so that SQL will do this
> instead:
> Take all Users (55,100), filter by UserLastActive (3,226), join the
> remaining records on UserFile (8,307), and filter by UserFileBlockTrades
> (7,520).
> The selectivity of the UserFileBlockTrades filter will still be low, but
> it will be dealing with a much smaller set of data. I don't know if this
> is possible, but I would think it would be, considering the fact that this
> seems like something that would happen quite a bit.
> I tried to use a derived table to force the low selectivity filter after
> everything else has been done:
> select count(*)
> from (
> select UserFile.*
> from Users
> join UserFile
> on UserFile.UserID = Users.UserID
> where Users.UserLastActive > getdate() - 14
> ) UserFile
> where UserFile.UserFileBlockTrades = 0
> But the execution plan was exactly the same as the one from Query 2, so
> this had zero effect.
> Surely this can be done.
> Any ideas or suggestions? Let me know if you need any other info from me.
> Thanks in advance for your help.
> Jerad
>

Monday, March 19, 2012

Nested Loops vs. Hash Match

No comments:

Post a Comment

Need to sum columns

Blog Archive

About Me