Date comparison slow for large number of joined rows

Date comparison slow for large number of joined rows - sql

The following query groups Snippets by ChannelId and returns an UnreadSnippetCount.
To determine the UnreadSnippetCount, Channel is joined onto ChannelUsers to fetch the date that the User last read the Channel and it uses this LastReadDate to limit the count to rows where the snippet was created after the user last read the channel.
SELECT c.Id, COUNT(s.Id) as [UnreadSnippetCount]
FROM Channels c
INNER JOIN ChannelUsers cu
ON cu.ChannelId = c.Id
LEFT JOIN Snippets s
ON cu.ChannelId = s.ChannelId
AND s.CreatedByUserId <> #UserId
WHERE cu.UserId = #UserId
AND (cu.LastReadDate IS NULL OR s.CreatedDate > cu.LastReadDate)
AND c.Id IN (select value from STRING_SPLIT(#ChannelIds, ','))
GROUP BY c.Id
The query works well logically but for Channels that have a large number of Snippets (97691), the query can take 10 minutes or more to return.
The following index is created:
CREATE NONCLUSTERED INDEX [IX_Snippets_CreatedDate] ON [dbo].[Snippets]
(
[CreatedDate] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
Update:
Query execution plan (original query):
https://www.brentozar.com/pastetheplan/?id=B19sI105F
Update 2
Moving the where clause into the join as suggested:
SELECT c.Id, COUNT(s.Id) as [UnreadSnippetCount]
FROM Channels c
INNER JOIN ChannelUsers cu
ON cu.ChannelId = c.Id
LEFT JOIN Snippets s
ON cu.ChannelId = s.ChannelId
AND s.CreatedByUserId <> #UserId
AND s.CreatedDate > cu.LastReadDate
WHERE cu.UserId = #UserId
AND c.Id IN (select value from STRING_SPLIT(#ChannelIds, ',')
Produces this execution plan:
https://www.brentozar.com/pastetheplan/?id=HkqwFk0ct
Is there a better date comparison method I can use?
Update 3 - Solution
Index
CREATE NONCLUSTERED INDEX [IX_Snippet_Created] ON [dbo].[Snippets]
(ChannelId ASC, CreatedDate ASC) INCLUDE (CreatedByUserId);
Stored Proc
ALTER PROCEDURE [dbo].[GetUnreadSnippetCounts2]
(
#ChannelIds ChannelIdsType READONLY,
#UserId nvarchar(36)
)
AS
SET NOCOUNT ON
SELECT
c.Id,
COUNT(s.Id) as [UnreadSnippetCount]
FROM Channels c
JOIN #ChannelIds cid
ON cid.Id = c.Id
INNER JOIN ChannelUsers cu
ON cu.ChannelId = c.Id
AND cu.UserId = #UserId
JOIN Snippets s
ON cu.ChannelId = s.ChannelId
AND s.CreatedByUserId <> #UserId
AND (cu.LastReadDate IS NULL OR s.CreatedDate > cu.LastReadDate)
GROUP BY c.Id;
This gives the correct results logically and returns quickly.
Resulting execution plan:
https://www.brentozar.com/pastetheplan/?id=S1GwRCCcK

There are a number of inefficiencies I can see in the query plan.
Using STRING_SPLIT means the compiler does not know how many values are being returned, or that they are unique, and the data type is mismatched. Ideally you would pass in a Table valued Parameter, however if you cannot do so then another solution is to dump them into a table variable
DECLARE #tmp TABLE (Id int PRIMARY KEY);
INSERT #tmp (Id)
select value
from STRING_SPLIT(#ChannelIds, ',')
You need better indexing on Snippets. I would suggest the following
CREATE NONCLUSTERED INDEX [IX_Snippet_Created] ON [dbo].[Snippets]
(ChannelId ASC, CreatedDate ASC) INCLUDE (CreatedByUserId);
It doesn't make sense to place CreatedByUserId in the key, because it's an inequality. Keep it in the INCLUDE
As you have already been told, it's better if you move the conditions (for left-joined tables) to the ON clause. I don't know if you then still need the cu.LastReadDate IS NULL check, I've left it in.
I must say, I'm unclear your schema, but INNER JOIN ChannelUsers cu feels wrong here, perhaps it should be a LEFT JOIN? I cannot say further without seeing your full setup and required output.
SELECT
c.Id,
COUNT(s.Id) as [UnreadSnippetCount]
FROM Channels c
JOIN #tmp t
ON t.Id = c.Id
INNER JOIN ChannelUsers cu
ON cu.ChannelId = c.Id
AND cu.UserId = #UserId
LEFT JOIN Snippets s
ON cu.ChannelId = s.ChannelId
AND s.CreatedByUserId <> #UserId
AND (cu.LastReadDate IS NULL OR s.CreatedDate > cu.LastReadDate)
GROUP BY c.Id;

Related

Get Help Tuning This Query Microsoft SQL Server

It is really slow when using the website but when I try to run the exact same query directly in SQL Management Studio, it is quite fast.
Actual Execution Plan: https://www.brentozar.com/pastetheplan/?id=HkKs2Ad8q

It seems like the issue when away once I set the maximum rows to a static value. Perhaps it wasn't being optimized before.
SELECT
b.ID,
a.ID AS 'AuctionId',
au.UserName,
a.Name,
b.MaxBidAmount,
b.CurrentBidAmount,
b.BidDateTime,
b.Info
FROM Bids b
JOIN Auctions a on a.ID = b.AuctionID
JOIN AuctionGroups ag on ag.ID = a.AuctionGroupID
JOIN AspNetUsers au on au.Id = b.UserId
JOIN Parties p on p.Id = ag.PartyId
WHERE (p.DomainId = #domainId OR #domainId IS NULL)
AND (b.AuctionID = #auctionId OR #auctionId IS NULL)
ORDER BY b.ID DESC
OFFSET #startRowIndex ROWS
FETCH NEXT #maximumRows ROWS ONLY

How to diagnose slow/inconsistent SQL Server query?

Running Windows Server 2012, Hyper-V, SQL Server 2012 Active/Passive failover cluster w/two 8-processor, 60GB nodes, single instance, 300 databases. This query produces inconsistent results, running anywhere between 10 and 30 seconds.
DECLARE #OrgID BigInt = 780246
DECLARE #ActiveOnly Bit = 0
DECLARE #RestrictToOrgID Bit = 0;
WITH og (OrgID, GroupID) AS
(
SELECT ID, ID FROM Common.com.Organizations WHERE ISNULL(ParentID, 0) <> ID
UNION ALL
SELECT o.ID, og.GroupID FROM Common.com.Organizations o JOIN og ON og.OrgID = o.ParentID
)
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN books.Entities e ON e.OrgID = po.ID
JOIN Vendors v ON v.ID = e.ID
AND (e.OrgID = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
Replacing the LEFT JOIN Addresses with JOIN Addresses
JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
or reducing the length of the columns selected from Addresses to less than 100 bytes
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.Fax
reduces the execution time to about .5 seconds.
In addition, using SELECT DISTINCT and joining books.Entities to Vendors
SELECT DISTINCT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN Vendors v
JOIN books.Entities e ON v.ID = e.ID
ON e.OrgID = bo.ID OR (e.OrgID = po.ID AND v.DistrictWide = 1)
Reduces the time to about .75 seconds.
Summary
These conditions indicate there is some kind of resource limitation in the SQL Server instance that is causing these erratic results and I don't know how to go about diagnosing it. If I copy the offending database to my laptop running SQL Server 2012, the problem does not present. I can continue to change the SQL around and hope for the best but I would prefer to find a more definitive solution.
Any suggestions are appreciated.
Update 2/27/18
The execution plan for the unmodified query shows a Clustered Index Seek against the Addresses table as the problem.
Reducing the length of the columns selected from Addresses to less than 100 bytes
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.Fax
replaced the Clustered Index Seek with a Clustered Index Scan to retrieve a.Fax and a Hash Match to join this value to the results.
The Addresses table primary key is created as follows:
ALTER TABLE dbo.Addresses
ADD CONSTRAINT PK_Addresses PRIMARY KEY CLUSTERED (ID ASC)
WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON PRIMARY
This index is defragged and optimized, as needed, every day.
So far, I can find nothing helpful as to why the Clustered Index Seek adds so much time to the query.

Ok, as is so often the case, there was not one problem, but two problems. This is an example of where complex problem analysis can lead to the wrong conclusions.
The primary problem turned out to be the recursive CTE og which returns a pivot table giving the parent/child relationships between organizations. However, analysis of the execution plans appeared to indicate the problem was some kind of glitch in the optimizer related to the amount of data being returned from a left-joined table. This may be entirely the result of my inability to properly analyze an execution plan but there does appear to be some issue in how SQL Server 2012 SP4 creates an execution plan under these circumstances.
While far more significant on our production server, the problem with SQL Server's optimization of recursive CTE was apparent on both my localhost, running 2012 SP4, and staging server, running SP2. But it took further analysis and some guesswork to see it.
The Solution
I replaced the recursive CTE with a pivot table and added a trigger to the Organizations table to maintain it.
USE Common
GO
CREATE VIEW com.OrganizationGroupsCTE
AS
WITH cte (OrgID, GroupID) AS
(
SELECT ID, ID FROM com.Organizations WHERE ISNULL(ParentID, 0) <> ID
UNION ALL
SELECT o.ID, cte.GroupID FROM com.Organizations o JOIN cte ON cte.OrgID = o.ParentID
)
SELECT OrgID, GroupID FROM cte
GO
CREATE TABLE com.OrganizationGroups
(
OrgID BIGINT,
GroupID BIGINT
)
INSERT com.OrganizationGroups
SELECT OrgID, GroupID
FROM com.OrganizationGroupsCTE
GO
CREATE TRIGGER TR_OrganizationGroups ON com.Organizations AFTER INSERT,UPDATE,DELETE
AS
DELETE og
FROM com.OrganizationGroups og
JOIN deleted d ON d.ID IN (og.groupID, og.orgID);
INSERT com.OrganizationGroups
SELECT orgID, groupID
FROM inserted i
JOIN OrganizationGroupsCTE cte ON i.ID IN (cte.orgID, cte.groupID)
GO
After modifying the query to use the pivot table,
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM Common.com.OrganizationGroups og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN books.Entities e ON e.OrgID = po.ID
JOIN Vendors v ON v.ID = e.ID
AND (e.OrgID = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
SQL Server performance was improved, and consistent, in all three environments. Problems on the production server have now been eliminated.

Why do I get this unexpected SQL performance gain?

This is more a quiz question rather than me panicking over a deadline, however understanding how/why would no doubt let me scratch my head a little less!
So I have this UPDATE statement:
/*** #Table is a TABLE Variable ***/
UPDATE O
SET O.PPTime = T.PPTime
FROM #Table AS [O]
INNER JOIN
(SELECT O.OSID, O.STID, DATEDIFF(SECOND, O.StartDateTime, O.EndDateTime) AS [PPTime]
FROM tblO AS [O]
INNER JOIN tblS AS [S] ON O.OSID = S.OSID
INNER JOIN tblE AS [E] ON S.EID = E.EID
INNER JOIN tblEF AS [EF] ON E.EFID = EF.EFID
GROUP BY O.OSID, O.STID, O.StartDateTime, O.EndDateTime) AS [T]
ON O.OSID = T.OSID
WHERE O.PPTime IS NULL
The execution time is approximately 12 seconds.
Now below I have added in a small WHERE statement which does not have any impact on how many rows of data are returned to the user:
/*** #Table is a TABLE Variable ***/
UPDATE O
SET O.PPTime = T.PPTime
FROM #Table AS [O]
INNER JOIN
(SELECT O.OSID, O.STID, DATEDIFF(SECOND, O.StartDateTime, O.EndDateTime) AS [PPTime]
FROM tblO AS [O]
INNER JOIN tblS AS [S] ON O.OSID = S.OSID
INNER JOIN tblE AS [E] ON S.EID = E.EID
INNER JOIN tblEF AS [EF] ON E.EFID = EF.EFID
WHERE O.OSID >= 0 /*** Somehow fixes performance slow down! ***/
GROUP BY O.OSID, O.STID, O.StartDateTime, O.EndDateTime) AS [T]
ON O.OSID = T.OSID
WHERE O.PPTime IS NULL
The execution time is now less than a second. If I run both SELECT statements individually, they execute in the same time and return the same data.
Why do I get such a performance gain?

After reviewing the code, I noticed that adding a Primary Key and/or indexing to the table variable done the trick! One for me to remember!

Deadlock on Bulk Delete, need better performance

I am doing a bulk delete on a set of ID's as a string sent to a stored procedure separated by commas. I have a function that splits these into a table so I can compare to them. I sometimes get a deadlock on this SP even though I have SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;. Is there a better way to do a bulk delete then this with a SP for performance and no deadlocks?
DELETE FROM Game
WHERE Id IN (
SELECT g.Id
FROM Game g
INNER JOIN [EventGame] eg ON g.Id = eg.Id
INNER JOIN MemberEvent me ON me.EventId = eg.EventId
WHERE
eg.EventId = #EventId AND
g.Id IN (SELECT * FROM dbo.Split(#DeletedGameIds, ',')) AND
(g.[Type] = 1 OR g.[Type] IS NULL) AND
me.MemberId = #MemberId
)

Better way to delete is by identifying the list of id's to be deleted and then do a clustered index delete only thru id's.
CREATE Table #temp (Id Int)
Insert into #temp (Id)
SELECT Id FROM Game
WHERE Id IN (
SELECT g.Id
FROM Game g
INNER JOIN [EventGame] eg ON g.Id = eg.Id
INNER JOIN MemberEvent me ON me.EventId = eg.EventId
WHERE
eg.EventId = #EventId AND
g.Id IN (SELECT * FROM dbo.Split(#DeletedGameIds, ',')) AND
(g.[Type] = 1 OR g.[Type] IS NULL) AND
me.MemberId = #MemberId
)
--you can do this delete in looping with batch size of 50000 records and use checkpoint
DELETE G
FROM Game G
Inner Join #temp t
ON G.ID = t.Id
--checkpoint

LINQ thinks I need an extra INNER JOIN, but why?

I have a LINQ query, which for some reason is generating an extra/duplicate INNER JOIN. This is causing the query to not return the expected output. If I manually comment that extra JOIN from the generated SQL, then I get seemingly correct output.
Can you detect what I might have done in this LINQ to have caused this extra JOIN?
Thanks.
Here is my approx LINQ
predicate=predicate.And(condition1);
predicate1=predicate1.And(condition2);
predicate1=predicate1.And(condition3);
predicate2=predicate2.Or(predicate1);
predicate=predicate.And(predicate2);
var ids = context.Code.Where(predicate);
var rs = from r in ids
group r by r.PersonID into g
let matchcount=g.Select(p => p.phonenumbers.PhoneNum).Distinct().Count()
where matchcount ==2
select new
{
personid = g.Key
};
and here is the generated SQL (the duplicate join is [t7])
Declare #p1 VarChar(10)='Home'
Declare #p2 VarChar(10)='111'
Declare #p3 VarChar(10)='Office'
Declare #p4 VarChar(10)='222'
Declare #p5 int=2
SELECT [t9].[PersonID] AS [pid]
FROM (
SELECT [t3].[PersonID], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t7].[PhoneValue]
FROM [dbo].[Person] AS [t4]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t5] ON [t5].[PersonID] = [t4].[PersonID]
INNER JOIN [dbo].[CodeMaster] AS [t6] ON [t6].[Code] = [t5].[PhoneType]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t7] ON [t7].[PersonID] = [t4].[PersonID]
WHERE ([t3].[PersonID] = [t4].[PersonID]) AND ([t6].[Enumeration] = #p0) AND ((([t6].[CodeDescription] = #p1) AND ([t5].[PhoneValue] = #p2)) OR (([t6].[CodeDescription] = #p3) AND ([t5].[PhoneValue] = #p4)))
) AS [t8]
) AS [value]
FROM (
SELECT [t0].[PersonID]
FROM [dbo].[Person] AS [t0]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t1] ON [t1].[PersonID] = [t0].[PersonID]
INNER JOIN [dbo].[CodeMaster] AS [t2] ON [t2].[Code] = [t1].[PhoneType]
WHERE ([t2].[Enumeration] = #p0) AND ((([t2].[CodeDescription] = #p1) AND ([t1].[PhoneValue] = #p2)) OR (([t2].[CodeDescription] = #p3) AND ([t1].[PhoneValue] = #p4)))
GROUP BY [t0].[PersonID]
) AS [t3]
) AS [t9]
WHERE [t9].[value] = #p5

They aren't being duplicated. You are asking for two different values from the data source.
let matchcount=g.Select(p => p.phonenumbers.PhoneNum).Distinct().Count()
is causing
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t7].[PhoneValue]
FROM [dbo].[Person] AS [t4]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t5] ON [t5].[PersonID] = [t4].[PersonID]
INNER JOIN [dbo].[CodeMaster] AS [t6] ON [t6].[Code] = [t5].[PhoneType]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t7] ON [t7].[PersonID] = [t4].[PersonID]
WHERE ([t3].[PersonID] = [t4].[PersonID]) AND ([t6].[Enumeration] = #p0) AND ((([t6].[CodeDescription] = #p1) AND ([t5].[PhoneValue] = #p2)) OR (([t6].[CodeDescription] = #p3) AND ([t5].[PhoneValue] = #p4)))
) AS [t8]
and
from r in ids
group r by r.PersonID into g
is causing
SELECT [t0].[PersonID]
FROM [dbo].[Person] AS [t0]
INNER JOIN [dbo].[PersonPhoneNumber] AS [t1] ON [t1].[PersonID] = [t0].[PersonID]
INNER JOIN [dbo].[CodeMaster] AS [t2] ON [t2].[Code] = [t1].[PhoneType]
WHERE ([t2].[Enumeration] = #p0) AND ((([t2].[CodeDescription] = #p1) AND ([t1].[PhoneValue] = #p2)) OR (([t2].[CodeDescription] = #p3) AND ([t1].[PhoneValue] = #p4)))
GROUP BY [t0].[PersonID]
) AS [t3]
as for the INNER JOINS, the reason you are getting them is because of the relationship between those tables. For instance Person is 1..1 with PersonPhoneNumber (or 1..*). In either case I assume PersonID on PersonPhoneNumber is an FK and a PK value. So in that case the data source has to go out to that external table to see if the value for the PersonPhoneNumber navigation property actually exists. It does this by performing an INNER JOIN on that table.

My gut feeling is that the .DISTINCT().COUNT() is treated separately by the linq to sql translation.
I'd also wager that the execution plan on SQL just threw out the dupe.

Try to rewrite with explicit condition instead of thah abstract "predicate" construction. From what I see in SQL that composition might look weird to a parser in isolation and one join [t5] which you just called dupe :-) is there to serve that condition.
Also, try to tell us what tit you really want to find with that query and try to write normal SQL that does what you wanted. I'm supposed to be human :-) and it look weird to me as well :-))
Technically speaking, you forced double joint by using a condition on in in two separate queries (every var assignment it technically separate query).
Also doing group by a column without doing any aggregation is not alway equivalent to select distinct. In particular select distinct on a join is allowed to take precedence over a join - queries are declatative (can undergo reorderings) and you were trying to force it to be procedural. So LINQ gave you exact procedural :-) and then SQL reordered according to SQL rules :-))
So, just write normal SQL first, and if you can't LINQ-ize it put it into sproc - it's going to make it faster anyway :-)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Date comparison slow for large number of joined rows - sql

Related

Get Help Tuning This Query Microsoft SQL Server

How to diagnose slow/inconsistent SQL Server query?

Why do I get this unexpected SQL performance gain?

Deadlock on Bulk Delete, need better performance

LINQ thinks I need an extra INNER JOIN, but why?

Categories

Resources