Any Alternatives to Unions in SQL Server?

Any Alternatives to Unions in SQL Server? - sql

So, I'm in the process of creating a query that will bring back all the data I need.
Here's my query:
Declare #StartDate datetime
Set #StartDate = '2/1/2018'
Declare #EndDate datetime
Set #EndDate = '4/5/2018'
Declare #UserID int
Set #UserId = '153056'
;with EntData as
(
select
distinct (Entity_ID), a.user_ID, c.User_OrganizationalUnitID
from
ViewMgmt as a
ViewConsole b on a.Role_ID = b.RoleID
ViewUsers c on a.User_ID = c.UserID
where
b.RoleID in ( 53354666, 5363960) and
a.User_ID = #UserID and
a.Entity_ID <> 6912276036227
)
select a.User_ID, a.User_Name, a.UOName,
b.C_OID, c.OName,
d.CID, e.Affected
from view.a
inner join view_Cool.a1 on a.User_ID = a1.UserID and a.CI_D = a1.CID
inner join view_New.b on a.CI_D = b.C_ID
left join view_Large.c on b.C_OID = c.OID
left join view_Small.d on a.CI_D = d.CID
left join view_Old.e on d.Cert_ID = e.CI_D and a.User_ID = e.User_ID
inner join EntData on b.C_OID = EntityData.Entity_ID
where ((a.ExpirationDate between #StartDate and #EndDate and a.ExpirationDate = a.ExpirationDate)
or (a.ExpirationDate between #StartDate and #EndDate and a.ExpirationDate is null)) and (a.UI <> 6912276036227 or a.UI <> 1414)
and a1.IsHidden = 0 and a.UCS <> 13
UNION
select a.User_ID, a.User_Name, a.UOName,
b.C_OID, c.OName,
d.CID, e.Affected
from view.a
inner join view_Cool.a1 on a.User_ID = a1.UserID and a.CI_D = a1.CID
inner join view_New.b on a.CI_D = b.C_ID
left join view_Large.c on b.C_OID = c.OID
left join view_Small.d on a.CI_D = d.CID
left join view_Old.e on d.Cert_ID = e.CI_D and a.User_ID = e.User_ID
inner join EntData g on a.UI = g.OID
where ((a.ExpirationDate between #StartDate and #EndDate and a.ExpirationDate = a.ExpirationDate)
or (a.ExpirationDate between #StartDate and #EndDate and a.ExpirationDate is null)) and a.UI <> 6912276036227
and a.UI = 1414
and (c.OName like '%VXA%' or c.OName like '%Amazon%' or a.CI_D in (414,4561))
and a1.Hidden = 0 and a.UCS <> 13
Originally, the first part of the union was all that I had, but someone wanted to see extra data SPECIFICALLY for one ID (a.UI = 1414). I didn't want to bring back more data for all the UI's in the system, so I made a union to bring back extra data specifically in one UI. The data that I want is coming back; however, now, instead of loading within a minute, the query can take upwards of 4 minutes to load (versus 30-40 seconds for the first SELECT statement). I've been wrestling with this code for a while now and I'm ready to get it working efficiently.
I was trying to think if there was a way to do that last join (inner join EntData g on a.UI = g.OID in the last part of the union, rather than having to tack on a completely separate SELECT statement) only when the UI equals 1414, but I don't think that's possible. I tried implementing that last join into the first SELECT, but it did not run. I'm still a novice with SQL, so any help would greatly be appreciated.
Thank You.

Your joint queries are almost same. Try to join them and make one query from them. I think this query should return same result. But will include duplicate rows if there are.
select a.User_ID, a.User_Name, a.UOName,
b.C_OID, c.OName,
d.CID, e.Affected
from view.a
inner join view_Cool.a1 on a.User_ID = a1.UserID and a.CI_D = a1.CID
inner join view_New.b on a.CI_D = b.C_ID
left join view_Large.c on b.C_OID = c.OID
left join view_Small.d on a.CI_D = d.CID
left join view_Old.e on d.Cert_ID = e.CI_D and a.User_ID = e.User_ID
inner join EntData on b.C_OID = EntityData.Entity_ID
where ((a.ExpirationDate between #StartDate and #EndDate and a.ExpirationDate = a.ExpirationDate)
or (a.ExpirationDate between #StartDate and #EndDate and a.ExpirationDate is null)) and (a.UI <> 6912276036227)
and a1.IsHidden = 0 and a.UCS <> 13
and 1 = case
when a.UI = 1414
case
when c.OName like '%VXA%' or c.OName like '%Amazon%' or a.CI_D in (414,4561) then 1
else 0
end
else 1
end

I refactored and added some notes.
DECLARE #StartDate datetime
SET #StartDate = '2018-02-01' --- Use ISO 8601 dates. YYYY-MM-DD
DECLARE #EndDate datetime
SET #EndDate = '2018-04-05' --- Use ISO 8601 dates. YYYY-MM-DD
DECLARE #UserID int
SET #UserId = 153056 --- Remove single quotes. You're assigning a string to an int.
; WITH EntData AS (
SELECT DISTINCT Entity_ID , OID /* This needs to be included for later JOIN. Remove the other columns you don't use. */
FROM ViewMgmt a --- USE ANSI-92 syntax. Pretty please.
INNER JOIN ViewConsole b on a.Role_ID = b.RoleID
AND b.RoleID in ( 53354666, 5363960 )
INNER JOIN ViewUsers c on a.User_ID = c.UserID
WHERE
a.User_ID = #UserID
AND a.Entity_ID <> 6912276036227
)
, UnionedQueries AS ( -- Combine the common parts of the UNIONed queries into a second CTE for reuse.
SELECT a.User_ID, a.User_Name, a.UOName
, b.C_OID
, c.OName
, d.CID
, e.Affected
, a.UI -- Added for UNION
, a.CI_D -- Added for UNION
FROM view.a a
INNER JOIN view_Cool.a1 a2 ON a.User_ID = a1.UserID
AND a.CI_D = a1.CID
AND a1.IsHidden = 0 --- Move this filter into the INNER JOIN. It will reduce the JOINed resultset.
INNER JOIN view_New.b b ON a.CI_D = b.C_ID
LEFT JOIN view_Large.c c ON b.C_OID = c.OID
/* These JOINs are connecting across multiple tables. Make sure it's returning what you think it is how it should be. */
LEFT JOIN view_Small.d d ON a.CI_D = d.CID
LEFT JOIN view_Old.e e ON d.Cert_ID = e.CI_D
AND a.User_ID = e.User_ID
WHERE (
( a.ExpirationDate BETWEEN #StartDate AND #EndDate ) /* ??? and a.ExpirationDate = a.ExpirationDate ??? Typo? What was this supposed to do? */
OR
( a.ExpirationDate IS NULL ) --If this is checking for NULL, it won't be BETWEEN #StartDate and #EndDate
--- These two conditions could be combined as ISNULL(a.ExpirationDate,#StartDate), but that is very micro-optimization.
)
and a.UI NOT IN ( 6912276036227, 1414 ) -- This is functionally the same as using two <>s, just easier to follow.
and a.UCS <> 13
)
/* Now that the common query is already run, we can just use those results to get our final UNION */
SELECT u1.User_ID, u1.User_Name, u1.UOName, u1.C_OID, u1.OName, u1.CID, u1.Affected
FROM UnionedQueries u1
INNER JOIN EntData ON u1.C_OID = EntData.Entity_ID -- This JOIN seems to be the only significant difference between the two queries.
UNION
SELECT u2.User_ID, u2.User_Name, u2.UOName, u2.C_OID, u2.OName, u2.CID, u2.Affected
FROM UnionedQueries u2
INNER JOIN EntData g on u2.UI = g.OID -- This JOIN seems to be the only significant difference between the two queries.
WHERE u2.UI = 1414
AND (
u2.OName LIKE '%VXA%'
OR u2.OName LIKE '%Amazon%'
OR u2.CI_D IN ( 414,4561 )
)
;
Note: This will need to be tested. I don't know how much data the EntData CTE filters the queries, so excluding it to the end may result in a much larger dataset in the main queries.

Related

using CTE while declaring variables in SQL

I have two decraed variables and I am trying to set the values from the result of my CTE,
declare #Total_new_claims_received int
declare #Total_Claims_Processed int
End solution I'm looking for is being able to set both declared values from CTE results:
select #Total_new_claims_received = count(id)
from cte
where benefit_code_id not in ('739')
select #Total_Claims_Processed = count(id)
from cte2
where benefit_code_id not in ('739')
Current Code:
declare #Total_new_claims_received int
declare #Total_Claims_Processed int
with cte (ID, Date)
as (
select c.id, c.date
from axiscore.dbo.claim c with (nolock)
inner join claim_line cl on c.Claim_ID = cl.Claim_ID and cl.Linenum = 1
left join axiscore.dbo.member_policy mp with (nolock) on c.Member_Policy_ID = mp.Member_Policy_ID
left join axiscore.dbo.policy p with (nolock) on mp.policy_id = p.policy_id
inner join axiscore.dbo.Claim_Status s with (nolock) on c.Claim_Status_ID = s.Claim_Status_ID
left outer join axiscore.dbo.Claim_Reason_Type r with (nolock) on c.Claim_Reason_Type_ID = r.Claim_Reason_Type_ID
where
c.Updated_Date between '10-1-2019' and '10-31-2019'
and p.Payor_ID = 8
and (c.Claim_Status_ID <> 8)),
cte2 (ID, Date)
as
(
select c.id, c.date
from axiscore.dbo.claim c with (nolock)
inner join claim_line cl on c.Claim_ID = cl.Claim_ID and cl.Linenum = 1
left join axiscore.dbo.member_policy mp with (nolock) on c.Member_Policy_ID = mp.Member_Policy_ID
left join axiscore.dbo.policy p with (nolock) on mp.policy_id = p.policy_id
inner join axiscore.dbo.Claim_Status s with (nolock) on c.Claim_Status_ID = s.Claim_Status_ID
left outer join axiscore.dbo.Claim_Reason_Type r with (nolock) on c.Claim_Reason_Type_ID = r.Claim_Reason_Type_ID
where
c.Updated_Date between '10-1-2019' and '10-31-2019'
and p.Payor_ID = 8
and (c.Claim_Status_ID in (7,6))
and (c.Claim_Reason_Type_ID not in (136,137)))
select #Total_new_claims_received = count(id)
from cte
where benefit_code_id not in ('739')
select #Total_Claims_Processed = count(id)
from cte2
where benefit_code_id not in ('739')
Currently,
it's only setting the value for Total_new_claims_received. It errors out on the second select when I'm setting the value for Total_Claims_Processed. The error is 'invalid object name 'cte2'.
I'm using CTE instead of temp tables becuase I'm calling this proc in a SSIS package. SSIS package doesn't do well with Temp tables. Any other ideas welcome as well.
thanks for your time!

From WITH common_table_expression (Transact-SQL):
A CTE must be followed by a single SELECT, INSERT, UPDATE, or
DELETE statement that references some or all the CTE columns
So define each of your CTEs before each of the select statements that uses it:
declare #Total_new_claims_received int
declare #Total_Claims_Processed int
with cte (ID, Date) as (
select c.id, c.date
from axiscore.dbo.claim c with (nolock)
inner join claim_line cl on c.Claim_ID = cl.Claim_ID and cl.Linenum = 1
left join axiscore.dbo.member_policy mp with (nolock) on c.Member_Policy_ID = mp.Member_Policy_ID
left join axiscore.dbo.policy p with (nolock) on mp.policy_id = p.policy_id
inner join axiscore.dbo.Claim_Status s with (nolock) on c.Claim_Status_ID = s.Claim_Status_ID
left outer join axiscore.dbo.Claim_Reason_Type r with (nolock) on c.Claim_Reason_Type_ID = r.Claim_Reason_Type_ID
where
c.Updated_Date between '10-1-2019' and '10-31-2019'
and p.Payor_ID = 8
and (c.Claim_Status_ID <> 8)
)
select #Total_new_claims_received = count(id)
from cte
where benefit_code_id not in ('739');
with cte2 (ID, Date) as (
select c.id, c.date
from axiscore.dbo.claim c with (nolock)
inner join claim_line cl on c.Claim_ID = cl.Claim_ID and cl.Linenum = 1
left join axiscore.dbo.member_policy mp with (nolock) on c.Member_Policy_ID = mp.Member_Policy_ID
left join axiscore.dbo.policy p with (nolock) on mp.policy_id = p.policy_id
inner join axiscore.dbo.Claim_Status s with (nolock) on c.Claim_Status_ID = s.Claim_Status_ID
left outer join axiscore.dbo.Claim_Reason_Type r with (nolock) on c.Claim_Reason_Type_ID = r.Claim_Reason_Type_ID
where
c.Updated_Date between '10-1-2019' and '10-31-2019'
and p.Payor_ID = 8
and (c.Claim_Status_ID in (7,6))
and (c.Claim_Reason_Type_ID not in (136,137))
)
select #Total_Claims_Processed = count(id)
from cte2
where benefit_code_id not in ('739');

Query is working fine in second execution but taking too much time in first execution

I am writing a query which is accepting a comma separated string and calculating the sum of transaction. which is working fine as result wise but taking too much time to execute in first attempt. I understand its need tuning but didn't find out the exact reason can any one point me whats wrong with my query.
Declare #IDs nvarchar(max)='1,4,5,6,8,9,43,183'
SELECT isnull(isnull(SUM(FT.PaidAmt),0) - isnull(SUM(CT.PaidAmt),0),0) [Amount], convert(char(10),FT.TranDate,126) [Date]
from FeeTransaction FT
Inner Join (
Select max(P.Id) [Id], P.TranMainId, isnull(SUM(P.AmtToPay),0) [Amt]
From Patient_Account P
Group By P.TranMainId
) PA ON FT.Id = PA.TranMainId
Inner Join Patient_Account XP ON PA.Id = XP.Id
Inner Join Master_Fee MF ON XP.FeeId = MF.Id
INNER Join Master_Patient MP ON FT.PID = MP.Id
Inner Join Master_FeeType TY ON MF.FeeTypeId = TY.Id
Left JOIN FeeTransaction CT on FT.TransactionId = CT.TransactionId AND CT.TranDate between '2019'+'08'+'01' and '2019'+'08'+'31' and CT.[Status] <> 'A' AND isnull(CT.IsCancel,0) = 1
Where convert(nvarchar,FT.TranDate,112) between '2019'+'08'+'01' and '2019'+'08'+'31' AND FT.[Status] = 'A' AND XP.FeeId in (SELECT val FROM dbo.f_split(#IDs, ','))
AND isnull(FT.IsCancel,0) = 0 AND FT.EntryBy = 'rajan'
Group By convert(char(10),FT.TranDate,126)

I would rephrase the query a bit:
select coalesce(SUM(FT.PaidAmt), 0) - coalesce(SUM(CT.PaidAmt), 0)as [Amount],
convert(char(10),FT.TranDate,126) [Date]
from FeeTransaction FT join
(select xp.*,
coalesce(sum(p.amttopay) over (TranMainId), 0) as amt
from Patient_Account XP ON PA.Id = XP.Id
) xp join
Master_Fee MF
on XP.FeeId = MF.Id join
Master_Patient MP
on FT.PID = MP.Id join
Master_FeeType TY
on MF.FeeTypeId = TY.Id left join
FeeTransaction CT
on FT.TransactionId = CT.TransactionId and
CT.TranDate between '20190801' and '20190831' and
CT.[Status] <> 'A' and
CT.IsCanel = 1
where FT.TranDate >= '20190801' and and
FT.TranDate < '20190901'
FT.[Status] = 'A' AND
XP.FeeId in (SELECT val FROM dbo.f_split(#IDs, ',')) and
(FT.IsCancel = 0 or FT.IsCancel IS NULL) and
FT.EntryBy = 'rajan'
Group By convert(char(10), FT.TranDate, 126)
Then for this version, you specifically an index on FeeTransaction(EntryBy, Status, TranDate, Cancel).
Note the following changes:
You do not need to aggregate Patient_Account as a subquery. Window functions are quite convenient.
Your date comparisons preclude the use of indexes. Converting dates to strings is a bad practice in general.
You have over-used isnull().
I assume that the appropriate indexes are in place for the joins.

I would use STRING_SPLIT and Common Table Expressions and do away with date conversions:
Declare #IDs nvarchar(max)='1,4,5,6,8,9,43,183'
;WITH CTE_ID AS
(
SELECT value AS ID FROM STRING_SPLIT(#IDs, ',');)
),
MaxPatient
AS
(
SELECT MAX(P.Id) [Id], P.TranMainId, isnull(SUM(P.AmtToPay),0) [Amt]
From Patient_Account P
Group By P.TranMainId
)
SELECT isnull(isnull(SUM(FT.PaidAmt),0) - isnull(SUM(CT.PaidAmt),0),0) As [Amount],
convert(char(10),FT.TranDate,126) [Date]
FROM FeeTransaction FT
INNER JOIN MaxPatient PA
ON FT.Id = PA.TranMainId
INNER JOIN Patient_Account XP
ON PA.Id = XP.Id
INNER JOIN Master_Fee MF
ON XP.FeeId = MF.Id
INNER Join Master_Patient MP
ON FT.PID = MP.Id
INNER JOIN Master_FeeType TY
ON MF.FeeTypeId = TY.Id
INNER JOIN CTE_ID
ON XP.FeeId = CTE_ID.ID
LEFT JOIN FeeTransaction CT
ON FT.TransactionId = CT.TransactionId AND
CT.TranDate >= '20190801' AND CT.TranDate < '20190831' AND
CT.[Status] <> 'A' AND isnull(CT.IsCancel,0) = 1
WHERE FT.TranDate >= '20190801' and FT.TranDate < '20190831' AND
FT.[Status] = 'A' AND
ISNULL(FT.IsCancel,0) = 0 AND
FT.EntryBy = 'rajan'
GROUP BY CAST(FT.TranDate AS Date)

Not only is your query slow, but it appear that it is giving incorrect output.
i) When you are not using any column of Patient_Account in your resultset then why are you writing this sub query?
Select max(P.Id) [Id], P.TranMainId, isnull(SUM(P.AmtToPay),0) [Amt]
From Patient_Account P
Group By P.TranMainId
ii) Avoid using <>.So Status must be either 'A' or 'I'
so write this instead CT.[Status] = 'I'
iii) What is the correct data type of TranDate ?Don't use function in where condition. .
iv) No need of isnull(CT.IsCancel,0) = 1,instead write CT.IsCancel = 1
So my script is just outline, but it is easy to understand.
Declare #IDs nvarchar(max)='1,4,5,6,8,9,43,183'
create table #temp(id int)
insert into #temp(id)
SELECT val FROM dbo.f_split(#IDs, ',')
declare #FromDate Datetime='2019-08-01'
declare #toDate Datetime='2019-08-31'
-- mention all column of FeeTransaction that you need in this query along with correct data type
-- Store TranDate in this manner convert(char(10),FT.TranDate,126) in this table
create table #Transaction()
select * from FeeTransaction FT
where FT.TranDate>=#FromDate and FT.TranDate<#toDate
and exists(select 1 from #temp t where t .val=ft.id)
-- mention all column of Patient_Account that you need in this query along with correct data type
create table #Patient_Account()
Select max(P.Id) [Id], P.TranMainId, isnull(SUM(P.AmtToPay),0) [Amt]
From Patient_Account P
where exists(select 1 from #Transaction T where t.id=PA.TranMainId)
Group By P.TranMainId
SELECT isnull(isnull(SUM(FT.PaidAmt),0) - isnull(SUM(CT.PaidAmt),0),0) [Amount], TranDate [Date]
from #Transaction FT
Inner Join #Patient_Account XP ON PA.Id = XP.Id
Inner Join Master_Fee MF ON XP.FeeId = MF.Id
INNER Join Master_Patient MP ON FT.PID = MP.Id
Inner Join Master_FeeType TY ON MF.FeeTypeId = TY.Id
Left JOIN #Transaction CT on FT.TransactionId = CT.TransactionId AND CT.[Status] = 'I' AND CT.IsCancel = 1
Where AND FT.[Status] = 'A' AND XP.FeeId in (SELECT val FROM #temp t)
AND FT.IsCancel = 0 AND FT.EntryBy = 'rajan'
Group By TranDate

Must declare the scalar variable SQL Server?

SELECT CD.CartId
,PR.Name
,PR.SKU
,CD.Quantity
,CD.Price
,CD.Total
,CD.IsAddedFromWidget
,CD.WidgetSlotLabel
,CD.AddToCartDate
,CO.UpdatedDate AS [CheckoutDate]
,CD.PurchaseDate {from [Tracking].[CartDetail] CD
INNER JOIN [Tracking].[Cart] C ON CD.CartId = C.Id
INNER JOIN [Tracking].[Product] PR ON CD.ProductId = PR.Id
INNER JOIN [Tracking].[Checkout] CO ON C.$NODE_ID = CO.$TO_ID
WHERE C.WebsiteId = #Websited
AND C.STATUS = 20
AND CD.PurchaseDate >= #FromDate
AND CD.PurchaseDate <= #ToDate
ORDER BY CD.PurchaseDate DESC
,CD.CartId DESC}
However, I am getting the error:
Must declare the scalar variable "#WebsiteID".

Here is an example of what your declarations might look like.
DECLARE #WebsiteID uniqueidentifier = NEWID()
,#FromDate DATETIME = GETDATE() - 1
,#ToDate DATETIME = GETDATE()
SELECT CD.CartId
,PR.Name
,PR.SKU
,CD.Quantity
,CD.Price
,CD.Total
,CD.IsAddedFromWidget
,CD.WidgetSlotLabel
,CD.AddToCartDate
,CO.UpdatedDate AS [CheckoutDate]
,CD.PurchaseDate
FROM [Tracking].[CartDetail] CD
INNER JOIN [Tracking].[Cart] C ON CD.CartId = C.Id
INNER JOIN [Tracking].[Product] PR ON CD.ProductId = PR.Id
INNER JOIN [Tracking].[Checkout] CO ON C.$NODE_ID = CO.$TO_ID
WHERE C.WebsiteId = #WebsiteID
AND C.STATUS = 20
AND CD.PurchaseDate >= #FromDate
AND CD.PurchaseDate <= #ToDate
ORDER BY CD.PurchaseDate DESC
,CD.CartId DESC
Also, it appears to me that #Websited is a typo, so I've used #WebsiteID.
I've also removed the invalid { } braces from your code.
Additionally, the column names $NODE_ID and $TO_ID look a little weird to me. (The usage of $ is what I am talking about).

Slow query when using NOT EXIST in Query

I would like to seek some help regarding the query below.
Running this Script causes the system to timeout. The query is so slow it took 5 minutes to run for just 22 records. I believe this has something to do with "NOT IN" statement. I already look for answers here in Stackoverflow regarding this and some are suggesting using LEFT OUTER JOIN and WHERE NOT EXIST but I can't seem to incorporate it in this query.
SELECT a.UserId, COUNT(DISTINCT(a.CustomerId)) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' AND a.ID
NOT IN (
SELECT DISTINCT(COALESCE(a.activitylogid, 0))
FROM [CustomerNoteInteractions] a WITH(NOLOCK)
WHERE a.reason IN ('20', '36') AND CAST(a.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' AND a.UserId IN (SELECT b.Id
FROM [User] b
WHERE b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1)
)
AND a.UserId IN (
SELECT b.Id
FROM [User] b
WHERE b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1)
GROUP BY a.UserId

Here is what should be an equivalent query using EXISTS and NOT EXISTS:
SELECT a.UserId,
COUNT(DISTINCT a.CustomerId) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30'
AND EXISTS (SELECT *
FROM [User] b
WHERE b.Id = a.UserId
AND b.UserType = 'EpicUser'
AND b.IsEpicEmployee = 1
AND b.IsActive = 1)
AND NOT EXISTS (SELECT *
FROM [CustomerNoteInteractions] b WITH(NOLOCK)
JOIN [User] c
ON c.Id = b.UserId
AND c.UserType = 'EpicUser'
AND c.IsEpicEmployee = 1
AND c.IsActive = 1
WHERE b.activitylogid = a.ID
AND b.reason IN ('20', '36')
AND CAST(b.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' )
GROUP BY a.UserId
Obviously, it's hard to understand what will truly help your performance without understanding your data. But here is what I expect:
I think the EXISTS/NOT EXISTS version of the query will help.
I think your conditions on UserActivityLog.ActivityDateTime and CustomerNoteInteractions.datecreated are a problem. Why are you casting? Is it not a date type? If not, why not? You would probably get big gains if you could take advantage of an index on those columns. But with the cast, I don't think you can use an index there. Can you do something about it?
You'll also probably benefit from indexes on User.Id (probably the PK anyways), and CustomerNoteInteractions.ActivityLogId.
Also, not a big fan of using with (nolock) to improve performance (Bad habits : Putting NOLOCK everywhere).
EDIT
If your date columns are of type DateTime as you mention in the comments, and so you are using the CAST to eliminate the time portion, a much better alternative for performance is to not cast, but instead modify the way you filter the column. Doing this will allow you to take advantage of any index on the date column. It could make a very big difference.
The query could then be further improved like this:
SELECT a.UserId,
COUNT(DISTINCT a.CustomerId) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
WHERE a.ActivityDatetime >= '2015-09-28'
AND a.ActivityDatetime < dateadd(day, 1, '2015-09-30')
AND EXISTS (SELECT *
FROM [User] b
WHERE b.Id = a.UserId
AND b.UserType = 'EpicUser'
AND b.IsEpicEmployee = 1
AND b.IsActive = 1)
AND NOT EXISTS (SELECT *
FROM [CustomerNoteInteractions] b WITH(NOLOCK)
JOIN [User] c
ON c.Id = b.UserId
AND c.UserType = 'EpicUser'
AND c.IsEpicEmployee = 1
AND c.IsActive = 1
WHERE b.activitylogid = a.ID
AND b.reason IN ('20', '36')
AND b.datecreated >= '2015-09-28'
AND b.datecreated < dateadd(day, 1, '2015-09-30'))
GROUP BY a.UserId

This should get you pretty close or exactly work:
SELECT a.UserId, COUNT(DISTINCT(a.CustomerId)) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
inner join [User] b with (Nolock) on a.userid = b.id
and b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1
left outer join [CustomerNoteInteractions] c with (nolock) on a.id = c.activitylogid
and c.reason IN ('20', '36') AND CAST(c.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30'
left outer join [User] d with (nolock) on c.userid = d.id
and d.UserType = 'EpicUser' AND d.IsEpicEmployee = 1 AND d.IsActive = 1
WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30'
and c.activitylogid is null
GROUP BY a.UserId

Query takes forever to output info - tips on optimization

I have a query that works awesomely - but - it takes about 10 minutes to load up. Which is insane. And I would like for it to run faster than it currently does now.
I was wondering if there were any tips I could take to optimize my query to make it run faster?
select DISTINCT
c.PaperID,
cdd.CodesF,
c.PageCount,
prr.projectname,
u.firstname + ' ' + u.lastname as Name,
ett.EventName,
cast(c.AssignedDate as DATE) [AssignedDate],
cast(ev.EventCompletionDate as DATE) [CompletionDate],
ar.ResultDescription,
a.Editor
from tbl_Papers c
left outer join (select cd.PaperId, count(*) as CodesF
from tbl_PaperCodes cd group by cd.PaperId) cdd
on cdd.PaperId = c.PaperId
left outer join
(SELECT
wfce.PaperEventActionNum,
c.PaperId,
CONVERT(varchar,wfce.ActionDate,101) CompletionDate,
pr.ProjectName,
wfce.ActionUserId,
u.firstname+' '+u.lastname [Editor]
FROM
dbo.tbl_WFPaperEventActions wfce
INNER JOIN dbo.tbl_Papers c ON wfce.PaperId = c.PaperId
INNER JOIN tbl_Providers p ON p.ProviderID = c.ProviderID
INNER JOIN tbl_Sites s ON s.SiteID = p.SiteID
INNER JOIN tbl_Projects pr ON s.ProjectId=pr.ProjectId
INNER JOIN tbl_Users u ON wfce.ActionUserId=u.UserId
WHERE
wfce.EventId = 204
AND c.Papersource =0
GROUP BY
wfce.PaperEventActionNum,
c.PaperId,
CONVERT(varchar,wfce.ActionDate,101),
pr.ProjectName,
wfce.ActionUserId,
u.firstname+' '+u.lastname
)a ON a.PaperId=c.PaperId,
tbl_Providers p, tbl_Sites s,
tbl_Projects prr, tbl_WFPaperEvents ev,
tbl_Users u, tbl_WFPaperEventTypes ett,
tbl_WFPaperEventActions arr, tbl_WFPaperEventActionResults ar
where s.SiteId = p.SiteId
and p.ProviderId = c.ProviderId
and s.ProjectId = prr.ProjectId
and ev.PaperId = c.PaperId
and ev.EventCreateUserId = u.UserId
and ev.EventCompletionDate >= dateadd(day,datediff(day,1,GETDATE()),0)
and ev.EventCompletionDate < dateadd(day,datediff(day,0,GETDATE()),0)
and ev.EventStatusId = 3
and ev.EventId in (201, 203)
and c.Papersource =0--Offshore
and ev.EventId=ett.EventID
and arr.PaperId=c.PaperId
and arr.EventId=ev.EventId
and arr.EventId=ar.EventID
and arr.ActionResultId=ar.ResultID
and arr.ActionResultId in (1,2,3,4)
order by paperid, u.FirstName + ' ' + u.LastName

You need to re-look carefully at every piece of this query and ask yourself, is that needed?
Take the subquery with alias a.
It joins 6 tables, but if you trace up to your final select clause only [Editor] is supplied from that alias. So do you need 6 tables to arrive at editor? No you don't in fact you only need 2 tbl_WFPaperEventActions and tbl_Users. Furthermore, this subquery is grouping by 6 items including a date, but 3 of those items are not used anywhere else in the overall query - so why go include these in the grouping? This allows us to drop 3 of the joined tables.
Of the remaining 3 grouping items a further 1 can be substituted to avoid the join between tbl_WFPaperEventActions and tbl_Papers because the join condition is "wfce.PaperId = c.PaperId", all we need then is to group by wfce.PaperId instead of c.PaperId
Finally we are then interested in the field wfce.PaperEventActionNum this is supplied by the subquery but isn't used in the larger query? Why provide that field is it isn't used? Well it turns out that it should be used to complete a join. The subquery aliased as a needs joining into the outer query on both PaperEventActionNum and PaperId. This by the way also requires that the whole subquery needs to be pushed down the joining structure to comply with ANSI join syntax rules.
Never "mix" ANSI join syntax with joins done "the old fashioned way"
This really is a recipe for a disaster.
Below I have "started" some amendments to your query, but I cannot really complete it as I have no way to test any part of it; and I don't know your data model at all.
Personally, I would re-start this query from scratch, starting lean and adding item by item to ensure it remains lean.
SELECT DISTINCT /* distinct isn't a good solution here */
c.PaperID
, cdd.CodesF
, c.PageCount
, prr.projectname
, u.firstname + ' ' + u.lastname AS Name
, ett.EventName
, CAST(c.AssignedDate AS date) [AssignedDate]
, CAST(ev.EventCompletionDate AS date) [CompletionDate]
, ar.ResultDescription
, a.Editor
FROM tbl_Papers c
LEFT OUTER JOIN ( -- can this be an inner join instead?
SELECT
cd.PaperId
, COUNT(*) AS CodesF
FROM tbl_PaperCodes cd
GROUP BY
cd.PaperId
) cdd
ON cdd.PaperId = c.PaperId
INNER JOIN tbl_Providers p ON c.ProviderId = p.ProviderId
INNER JOIN tbl_Sites s ON p.SiteId = s.SiteId
INNER JOIN tbl_Projects prr ON s.ProjectId = prr.ProjectId
INNER JOIN tbl_WFPaperEvents ev ON c.PaperId = ev.PaperId
INNER JOIN tbl_Users u ON ev.EventCreateUserId = u.UserId
INNER JOIN tbl_WFPaperEventTypes ett ON ev.EventId = ett.EventID
INNER JOIN tbl_WFPaperEventActions arr ON c.PaperId = arr.PaperId
AND ev.EventId = arr.EventId
INNER JOIN tbl_WFPaperEventActionResults ar ON arr.EventId = ar.EventID
AND arr.ActionResultId = ar.ResultID
AND arr.ActionResultId IN (1, 2, 3, 4)
LEFT OUTER JOIN (
SELECT
wfce.PaperEventActionNum
, wfce.PaperId
--, c.PaperId
--, CONVERT(varchar, wfce.ActionDate, 101) CompletionDate -- cast to date here
--, pr.ProjectName
--, wfce.ActionUserId
, u.firstname + ' ' + u.lastname [Editor]
FROM dbo.tbl_WFPaperEventActions wfce
--INNER JOIN dbo.tbl_Papers c ON wfce.PaperId = c.PaperId
--INNER JOIN tbl_Providers p ON p.ProviderID = c.ProviderID
--INNER JOIN tbl_Sites s ON s.SiteID = p.SiteID
--INNER JOIN tbl_Projects pr ON s.ProjectId = pr.ProjectId
tbl_Users INNER JOIN u ON wfce.ActionUserId = u.UserId
WHERE wfce.EventId = 204
AND c.Papersource = 0
GROUP BY
wfce.PaperEventActionNum
, wfce.PaperId
--, c.PaperId
--, CONVERT(varchar, wfce.ActionDate, 101)
--, pr.ProjectName
--, wfce.ActionUserId
, u.firstname + ' ' + u.lastname
) a
ON c.PaperId = a.PaperId AND arr.PaperEventActionNum = a.PaperEventActionNum
WHERE ev.EventCompletionDate >= DATEADD(DAY, DATEDIFF(DAY, 1, GETDATE()), 0)
AND ev.EventCompletionDate < DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0)
AND ev.EventStatusId = 3
AND ev.EventId IN (201, 203)
AND c.Papersource = 0--Offshore
ORDER BY
paperid, u.FirstName + ' ' + u.LastName
I really do hate DISTINCT. It is nasty. It does not solve problems, it just hides them; AND slows down everything to do the hiding.
Use distinct in inverse proportion to query complexity:
if a query is really simple you can use distinct
If a query is complex do not use distinct

Check how many of fields on which you have join, where and group by clauses, have indexes. Every non-indexed field can negatively affect performance.
Calculated fields in GROUP BY are probably a pain, as well as DISTINCT (especially if they are not indexed). E.g. grouping on something like u.ID instead of u.firstname+' '+u.lastname or pr.ProjectId i/o pr.ProjectName should make things faster (you can sort the output according to the other criteria if needed).
Do you really need left join where you use it? I.e. do you want to keep the tables from the other side of the join even when there is no match on the other side? If not, replace it with inner join.
Various small improvements here, e.g.:
(assuming Papersource and EventId are indexes):
FROM
(SELECT * FROM dbo.tbl_WFPaperEventActions WHERE EventId = 204) wfce
INNER JOIN
(SELECT * FROM dbo.tbl_Papers WHERE Papersource = 0) c
ON wfce.PaperId = c.PaperId
INNER JOIN tbl_Providers p ON p.ProviderID = c.ProviderID
INNER JOIN tbl_Sites s ON s.SiteID = p.SiteID
INNER JOIN tbl_Projects pr ON s.ProjectId=pr.ProjectId
INNER JOIN tbl_Users u ON wfce.ActionUserId=u.UserId
instead of
FROM
dbo.tbl_WFPaperEventActions wfce
INNER JOIN dbo.tbl_Papers c ON wfce.PaperId = c.PaperId
INNER JOIN tbl_Providers p ON p.ProviderID = c.ProviderID
INNER JOIN tbl_Sites s ON s.SiteID = p.SiteID
INNER JOIN tbl_Projects pr ON s.ProjectId=pr.ProjectId
INNER JOIN tbl_Users u ON wfce.ActionUserId=u.UserId
WHERE
wfce.EventId = 204
AND c.Papersource =0
or (if I understood the idea correctly):
and ev.EventCompletionDate BETWEEN (
dateadd(day, -1, GETDATE()) and dateadd(ns, -1, GETDATE())
instead of:
and ev.EventCompletionDate >= dateadd(day,datediff(day,1,GETDATE()),0)
and ev.EventCompletionDate < dateadd(day,datediff(day,0,GETDATE()),0)
In general: ask yourself what exactly you want to achieve with this query, which parts of the data are relevant for it, how many of your source tables can be replaced with snippets from them (this can make JOINs work faster), and try to be consistent regarding the usage of JOIN and WHERE clauses.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Any Alternatives to Unions in SQL Server? - sql

Related

using CTE while declaring variables in SQL

Query is working fine in second execution but taking too much time in first execution

Must declare the scalar variable SQL Server?

Slow query when using NOT EXIST in Query

Query takes forever to output info - tips on optimization

Categories

Resources