An application which we have built has undergone a large change in its database schema, particularly in the way financial data is stored. We have functions that calculate the total amount of billing, based on various scenarios; and the change is causing huge performance problems when the functions must be run many times in a row.
I'll include an explanation, the function and the relevant schema, and I hope someone sees a much better way to write the function. This is SQL Server 2008.
First, the business basis: think of a medical Procedure. The healthcare Provider performing the Procedure sends one or more Bills, each of which may have one or more line items (BillItems).
That Procedure is the re-billed to another party. The amount billed to the third party may be:
The total of the Provider's billing
The total of the Provider's billing plus a Copay amount,or
A completely separate amount (a Rebill amount)
The current function for calculating the billing for a Procedure looks at all three scenarios:
CREATE FUNCTION [dbo].[fnProcTotalBilled] (#PROCEDUREID INT)
RETURNS MONEY AS
BEGIN
DECLARE #billed MONEY
SELECT #billed = (SELECT COALESCE((SELECT COALESCE(sum(bi.Amount),0)
FROM BillItems bi INNER JOIN Bills b ON b.BillID=bi.BillID
INNER JOIN Procedures p on p.ProcedureID=b.ProcedureID
WHERE b.ProcedureID=#PROCEDUREID
AND p.StatusID=3
AND b.HasCopay=0
AND b.Rebill=0),0))
-- the total of the provider's billing, with no copay and not rebilled
+
(SELECT COALESCE((SELECT sum(bi.Amount) + COALESCE(b.CopayAmt,0)
FROM BillItems bi INNER JOIN Bills b ON b.BillID=bi.BillID
INNER JOIN Procedures p on p.ProcedureID=b.ProcedureID
WHERE b.ProcedureID=#PROCEDUREID
AND p.StatusID=3
AND b.HasCopay=1
GROUP BY b.billid,b.CopayAmt),0))
-- the total of the provider's billing, plus a Copay amount
+
(SELECT COALESCE((SELECT sum(COALESCE(b.RebillAmt,0))
FROM Bills b
INNER JOIN Procedures p on p.ProcedureID=b.ProcedureID
WHERE b.ProcedureID=#PROCEDUREID
AND p.StatusID=3
AND b.Rebill=1),0))
-- the Rebill amount, instead of the provider's billing
RETURN #billed
END
I'll omit the DDL for the Procedure. Suffice to say, it must have a certain status (shown in the function as p.StatusID= 3).
Here are the DDLs for Bills and related BillItems:
CREATE TABLE dbo.Bills (
BillID int IDENTITY(1,1) NOT NULL,
InvoiceID int DEFAULT ((0)),
CaseID int NOT NULL,
ProcedureID int NOT NULL,
TherapyGroupID int DEFAULT ((0)) NOT NULL,
ProviderID int NOT NULL,
Description varchar(1000),
ServiceDescription varchar(255),
BillReferenceNumber varchar(100),
TreatmentDate datetime,
DateBilled datetime,
DateBillReceived datetime,
DateBillApproved datetime,
HasCopay bit DEFAULT ((0)) NOT NULL,
CopayAmt money,
Rebill bit DEFAULT ((0)) NOT NULL,
RebillAmt money,
IncludeInDemand bit DEFAULT ((1)) NOT NULL,
CreateDate datetime DEFAULT (getdate()) NOT NULL,
CreatedByID int,
ChangeDate datetime,
ChangeUserID int,
PRIMARY KEY (BillID)
);
CREATE TABLE dbo.BillItems (
BillItemID int IDENTITY(1,1) NOT NULL,
BillID int NOT NULL,
ItemDescription varchar(1000),
Amount money,
WillNotBePaid bit DEFAULT ((0)) NOT NULL,
CreateDate datetime DEFAULT (getdate()),
CreatedByID int,
ChangeDate datetime,
ChangeUserID varchar(25),
PRIMARY KEY (BillItemID)
);
I fully realize how complex the function is; but I couldn't find another way to account for all the scenarios.
I'm hoping that a far better SQL programmer or DBA will see a more performant solution.
Any help will be greatly appreciated.
Thanks,
Tom
UPDATE:
Thanks to everyone for their replies. I tried to add a little clarification in comments, but I'll do so here, too.
First, a definition: a Procedure is medical service from a Provider on a single Date of Service. We only concern ourselves with the total amount billed for a procedure; multiple persons do not receive bills.
A "Case" can have many Procedures.
Generally, a single Procedure will have a single Bill - but not always. A Bill may have one or more BillItems. The Copay (if one exists) is added to the sum of the BillItems. A Rebill Amount trumps everything.
The performance issue comes into play at a higher level, when calculating the totals for an entire Case (many Procedures) and when needing to display grid data that shows hundreds of Cases at once.
My query was at the Procedure level, because it was simpler to describe the problem.
As to sample data, the data in #Serpiton's SQL Fiddle is an excellent, concise example. Thank you very much for it.
In reviewing the answers, it seems to me that both the CTE approach of #Serpiton and #GarethD's view approach both are strong improvements on my original. For the moment, I'm going to work with the CTE approach, simply to avoid the necessity of dealing with the multiple results from the SELECT.
I have modified #Serpiton's CTE to work at the Case level. If he or others would please take a look at it, I'd appreciate it. It's working well in my testing, but I'd appreciate other eyes on it.
It goes like this:
WITH Normal As (
SELECT b.BillID
, b.CaseID
, sum(coalesce(n.Amount * (1 - b.Rebill), 0)) Amount
FROM Procedures p
INNER JOIN Bills b ON p.ProcedureID = b.ProcedureID
LEFT JOIN BillItems n ON b.BillID = n.BillID
WHERE b.CaseID = 3444
AND p.StatusID = 3
GROUP BY b.CaseID,b.BillID, b.HasCopay
)
SELECT Amount = Sum(b.Amount)
+ Sum(Coalesce(c.CopayAmt, 0))
+ Sum(Coalesce(r.RebillAmt, 0))
FROM Normal b
LEFT JOIN Bills c ON b.BillID = c.BillID And c.HasCopay = 1
LEFT JOIN Bills r ON b.BillID = r.BillID And r.Rebill = 1
GROUP BY b.caseid
A very quick win is to use a (TABLE VALUED) (INLINE) FUNCTION instead of a (SCALAR) (MULTI-STATEMENT) FUNCTION.
CREATE FUNCTION [dbo].[fnProcTotalBilled] (#PROCEDUREID INT)
AS
RETURN (
SELECT
(sub-query1)
+
(sub-query2)
+
(sub-query3) AS amount
);
This can then be used as follows:
SELECT
something.*,
totalBilled.*
FROM
something
CROSS APPLY -- Or OUTER APPLY
[dbo].[fnProcTotalBilled](something.procedureID) AS totalBilled
Over larger data-sets this is significantly faster than using scalar functions.
- It must be INLINE (Not Multi-Statement)
- It must be TABLE-VALUED (Not Scalar)
If you work out better business logic for the calculation, you'll get even more performance benefits again.
EDIT :
This may be functionally the same as you have described, but it's hard to tell. Please add comments to my question to investigate further.
SELECT
SUM(
CASE WHEN b.HasCopay = 0 AND b.Rebill = 0 THEN COALESCE(bi.TotalAmount, 0)
WHEN b.HasCopay = 1 THEN b.CopayAmt + COALESCE(bi.TotalAmount, 0)
WHEN b.Rebill = 1 THEN b.RebillAmt
ELSE 0
END
) AS Amount
FROM
Procedures p
INNER JOIN
Bills b
ON b.ProcedureID = p.ProcedureID
LEFT JOIN
(
SELECT BillID, SUM(Amount) AS TotalAmount
FROM BillItems
GROUP BY BillID
)
AS bi
ON bi.BillID = b.BillID
WHERE
p.ProcedureID=#PROCEDUREID
AND p.StatusID=3
The 'trick' that makes this simpler is the sub-query to aggregate all the BillItems together in to one record per BillID. The optimiser won't actually do that for the whole table, but only for the relevant records based on your JOINs and WHERE clause.
This then means that Bill:BillItem is 1:0..1, and everything simplifies. I believe ;)
Answer to the update
To increase the performance you can create a view with the same definition of the CTE, so that the query plan will be stored and reused.
If you have to calculate more than one total amount don't try to get them individually, a better plan would be to get all of them with a single query, writing a condition like
WHERE b.CaseID IN (list of cases)
or some other condition that fit your needs, and adding some more information in the main query, at least the CaseID.
Update
#DRapp pointed out a problem with my previous solution (that I write without testing, sorry pals), to remove the trouble I had removed BillItems from the main query, that now works only with the Bills.
WITH Normal As (
SELECT b.BillID
, b.ProcedureID
, sum(coalesce(n.Amount * (1 - b.Rebill), 0)) Amount
FROM Procedures p
INNER JOIN Bills b ON p.ProcedureID = b.ProcedureID
LEFT JOIN BillItems n ON b.BillID = n.BillID
WHERE p.ProcedureID = #PROCEDUREID
AND p.StatusID = 3
GROUP BY b.ProcedureID, b.BillID, b.HasCopay
)
SELECT #Billed = Sum(b.Amount)
+ Sum(Coalesce(c.CopayAmt, 0))
+ Sum(Coalesce(r.RebillAmt, 0))
FROM Normal b
LEFT JOIN Bills c ON b.BillID = c.BillID And c.HasCopay = 1
LEFT JOIN Bills r ON b.BillID = r.BillID And r.Rebill = 1
GROUP BY b.ProcedureID
How it works
The Normal CTE get all the bills related to the ProcedureID, and calculate the Bill Total, the Amount * (1 - Rebill) set the Amount to 0 if the Bill is to rebill.
In the main query the Normal CTE is joined to the special type of bill, as Normal contains all the Bills for the selected ProcedureID, the table Procedures is not there.
Demo with random data.
Old Query
Without data to test our query this is a blind fly
SELECT #billed = Sum(Coalesce(n.Amount, 0))
+ Sum(Coalesce(c.CopayAmt, 0))
+ Sum(Coalesce(r.RebillAmt, 0))
FROM Procedures p on
INNER JOIN Bills b ON p.ProcedureID = b.ProcedureID And b.Rebill = 0
INNER JOIN BillItems n ON b.BillID = n.BillID
INNER JOIN Bills c ON p.ProcedureID = b.ProcedureID And c.HasCopay = 1
INNER JOIN Bills r ON p.ProcedureID = b.ProcedureID And r.Rebill = 1
Where p.ProcedureID = #PROCEDUREID
AND p.StatusID = 3
Where b is the alias for the "normal" bill (with n for the bill items), c for the copayed bill and r for the rebilled.
The JOIN condition of b check only for b.Rebill = 0 to get the bill items for both the "normal" bills and the copaid ones.
I assume that no bill can have both HasCopay and Rebill to 1
The first thing I have noticed is that your query could fail if there is more than one billID for a procedureID (I don't know if this is possible in your design though). If it is and it happens then this part will fail:
(SELECT COALESCE((SELECT sum(bi.Amount) + COALESCE(b.CopayAmt,0)
FROM BillItems bi INNER JOIN Bills b ON b.BillID=bi.BillID
INNER JOIN Procedures p on p.ProcedureID=b.ProcedureID
WHERE b.ProcedureID=#PROCEDUREID
AND p.StatusID=3
AND b.HasCopay=1
GROUP BY b.billid,b.CopayAmt),0))
Due to the grouping, you will get more than one result returned in the subquery which is not allowed. I don't think this would affect my overall decision on how to alter your schema though.
I would consider turning this into a view, when you operate this as a scalar UDF it is executed once per row, when you use a view the definition is expanded out into the outer query and can be optimised accordingly.
You can also turn this into a single select, the first step would be to get the components common to all three subqueries:
SELECT p.ProcedureID,
bi.Amount,
b.HasCopay,
b.CopayAmt,
b.Rebill,
b.RebillAmt,
FROM ( SELECT BillID, Amount = SUM(Amount)
FROM Billitems
GROUP BY BillID
) bi
INNER JOIN Bills b
ON b.BillID = bi.BillID
INNER JOIN Procedures p
ON p.ProcedureID = b.ProcedureID
WHERE p.StatusID = 3;
You can now combine the logic of the 3 subqueries to get the same total:
SELECT p.ProcedureID,
Amount = CASE WHEN b.Rebill = 0 THEN bi.Amount ELSE 0 END,
CopayAmt = CASE WHEN b.HasCopay = 1 THEN b.CopayAmt ELSE 0 END,
RebillAmt = CASE WHEN b.Rebill = 1 THEN b.RebillAmt ELSE 0 END,
FROM ( SELECT BillID, Amount = SUM(Amount)
FROM Billitems
GROUP BY BillID
) bi
INNER JOIN Bills b
ON b.BillID = bi.BillID
INNER JOIN Procedures p
ON p.ProcedureID = b.ProcedureID
WHERE p.StatusID = 3;
You can now combine aggregate this and move to a view for reusability (I have moved the case statements above to an APPLY simply to avoid repeating the case statement in the Total column):
CREATE VIEW dbo.ProcTotalBilled
AS
SELECT p.ProcedureID,
Amount = SUM(calc.Amount),
CopayAmt = SUM(calc.CopayAmt),
Rebill = SUM(cal.RebillAmt),
Total = SUM(calc.Amount + calc.CopayAmt + cal.RebillAmt)
FROM ( SELECT BillID, Amount = SUM(Amount)
FROM Billitems
GROUP BY BillID
) bi
INNER JOIN Bills b
ON b.BillID = bi.BillID
INNER JOIN Procedures p
ON p.ProcedureID = b.ProcedureID
CROSS APPLY
( SELECT Amount = CASE WHEN b.Rebill = 0 THEN bi.Amount ELSE 0 END,
CopayAmt = CASE WHEN b.HasCopay = 1 THEN b.CopayAmt ELSE 0 END,
RebillAmt = CASE WHEN b.Rebill = 1 THEN b.RebillAmt ELSE 0 END
) calc
WHERE p.StatusID = 3
GROUP BY p.ProcedureID;
Then instead of using something like:
SELECT Total = dbo.fnProcTotalBilled(p.ProcedureID)
FROM dbo.Procedures p;
You would use
SELECT Total = ISNULL(ptb.Total, 0)
FROM dbo.Procedures p
LEFT JOIN dbo.ProcTotalBilled ptb
ON ptb.ProcedureID = p.ProcedureID;
Slightly more verbose, but I would be surprised if it didn't outperform your scalar UDF considerably
Can you show some sample data that covers the variety of samples? Also, the procedure I would expect is more a lookup table, and many people could be billed for the same procedure, thus the BillID would be critical to the function. What has been billed to a given person for a given procedure. The function would then have TWO parameters, one for the procedure you were interested in, and second for the patient's actual Bill.
Then, the inner queries would be restricted down to the one person's bill... Unless the procedure is unique per person having the procedure done, but that is unclear since DDL for procedure is not provided.
I have other thoughts on the querying, but would need clarification from above context so I do not throw crud here just to show a query.
After all, you need some values from bills along with the sum of bill items. You could simplify the query thus:
select sum
(
coalesce( case when b.rebill = 1 then b.rebillamt end , 0 ) +
coalesce( case when b.rebill = 0 then (select sum(bi.amount) from billitems bi where bi.billid = b.billid) end , 0 ) +
coalesce( case when b.rebill = 0 and b.hascopay = 1 then b.copayamt end , 0 )
) as value
from procedures p
inner join bills b on b.procedureid = p.procedureid
where p.ProcedureID = #PROCEDUREID
and p.StatusID = 3;
but T-SQL is buggy in this regard and complains with "Cannot perform an aggregate function on an expression containing an aggregate or a subquery". So you will have to use an inner and outer select instead.
select sum(value) as total
from
(
select
coalesce( case when b.rebill = 1 then b.rebillamt end , 0 ) +
coalesce( case when b.rebill = 0 then (select sum(bi.amount) from billitems bi where bi.billid = b.billid) end , 0 ) +
coalesce( case when b.rebill = 0 and b.hascopay = 1 then b.copayamt end , 0 ) as value
from procedures p
inner join bills b on b.procedureid = p.procedureid
where p.ProcedureID = #PROCEDUREID
and p.StatusID = 3
) allvalues;
You wouldn't even have to join table procedures with the bills table, but get the procedure id in an inner select. But I tried it with Serpiton's SQL fiddle (thanks to Serpiton for this) and T-SQL processes this slower than the join. You can try it anyhow. Maybe it is faster in your SQL Server version with your tables:
select sum(value) as total
from
(
select
coalesce( case when b.rebill = 1 then b.rebillamt end , 0 ) +
coalesce( case when b.rebill = 0 then (select sum(bi.amount) from billitems bi where bi.billid = b.billid) end , 0 ) +
coalesce( case when b.rebill = 0 and b.hascopay = 1 then b.copayamt end , 0 ) as value
from bills b
where b.procedureid =
(
select p.procedureid
from procedures p
where p.ProcedureID = #PROCEDUREID
and p.StatusID = 3
)
) allvalues;
EDIT: Here is one more option. Provided the given procedure id always exists and you only want to check if the status id is 3, then you can write the statement so that the bill select is only executed in the case of status id = 3. That doesn't have to be faster; it can even turn out to be slower. It's just one more option you can try.
select
case when p.StatusID = 3 then
(
select sum(value)
from
(
select
coalesce( case when b.rebill = 1 then b.rebillamt end , 0 ) +
coalesce( case when b.rebill = 0 then (select sum(bi.amount) from billitems bi where bi.billid = b.billid) end , 0 ) +
coalesce( case when b.rebill = 0 and b.hascopay = 1 then b.copayamt end , 0 ) as value
from bills b
where b.procedureid = p.procedureid
) allvalues
)
else
0
end as value
from procedures p
where p.ProcedureID = #PROCEDUREID;
Related
I have several cases where my complex CTE (Common Table Expressions) are ten times slower than the same queries using the temporary tables in SQL Server.
My question here is in regards to how SQL Server process the CTE queries, it looks like it tries to join all the separated queries instead of storing the results of each one and then trying to run the following ones. So that might be the reason why it is so faster when using temporary tables.
For example:
Query 1: using Common Table Expression:
;WITH Orders AS
(
SELECT
ma.MasterAccountId,
IIF(r.FinalisedDate IS NULL, 1, 0)) [Status]
FROM
MasterAccount ma
INNER JOIN
task.tblAccounts a ON a.AccountNumber = ma.TaskAccountId
AND a.IsActive = 1
LEFT OUTER JOIN
task.tblRequisitions r ON r.AccountNumber = a.AccountNumber
WHERE
ma.IsActive = 1
AND CAST(r.BatchDateTime AS DATE) BETWEEN #fromDate AND #toDate
AND r.BatchNumber > 0
),
StockAvailability AS
(
SELECT sa.AccountNumber,
sa.RequisitionNumber,
sa.RequisitionDate,
sa.Lines,
sa.HasStock,
sa.NoStock,
CASE WHEN sa.Lines = 0 THEN 'Empty'
WHEN sa.HasStock = 0 THEN 'None'
WHEN (sa.Lines > 0 AND sa.Lines > sa.HasStock) THEN 'Partial'
WHEN (sa.Lines > 0 AND sa.Lines <= sa.HasStock) THEN 'Full'
END AS [Status]
FROM
(
SELECT
r.AccountNumber,
r.RequisitionNumber,
r.RequisitionDate,
COUNT(rl.ProductNumber) Lines,
SUM(IIF(ISNULL(psoh.AvailableStock, 0) >= ISNULL(rl.Quantity, 0), 1, 0)) AS HasStock,
SUM(IIF(ISNULL(psoh.AvailableStock, 0) < ISNULL(rl.Quantity, 0), 1, 0)) AS NoStock
FROM task.tblrequisitions r
INNER JOIN task.tblRequisitionLines rl ON rl.RequisitionNumber = r.RequisitionNumber
LEFT JOIN ProductStockOnHandSummary psoh ON psoh.ProductNumber = rl.ProductNumber
WHERE dbo.fn_RemoveUnitPrefix(r.BatchNumber) = 0
AND r.UnitId = 1
AND r.FinalisedDate IS NULL
AND r.RequisitionStatus = 1
AND r.TransactionTypeNumber = 301
GROUP BY r.AccountNumber, r.RequisitionNumber, r.RequisitionDate
) AS sa
),
Available AS
(
SELECT ma.MasterAccountId,
SUM(IIF(ma.IsPartialStock = 1, CASE WHEN sa.[Status] IN ('Full', 'Partial') THEN 1 ELSE 0 END,
CASE WHEN sa.[Status] = 'Full' THEN 1 ELSE 0 END)) AS AvailableStock,
SUM(IIF(sa.[Status] IN ('Full', 'Partial', 'None'), 1, 0)) AS OrdersAnyStock,
SUM(IIF(sa.RequisitionDate < dbo.TicksToTime(ma.DailyOrderCutOffTime, #toDate),
IIF(ma.IsPartialStock = 1, CASE WHEN sa.[Status] IN ('Full', 'Partial') THEN 1 ELSE 0 END,
CASE WHEN sa.[Status] = 'Full' THEN 1 ELSE 0 END), 0)) AS AvailableBeforeCutOff
FROM MasterAccount ma
INNER JOIN StockAvailability sa ON sa.AccountNumber = ma.TaskAccountId
GROUP BY ma.MasterAccountId, ma.IsPartialStock
),
Totals AS
(
SELECT
o.MasterAccountId,
COUNT(o.MasterAccountId) AS BatchedOrders
FROM Orders o
GROUP BY o.MasterAccountId
)
SELECT a.MasterAccountId,
ISNULL(t.BatchedOrders, 0) BatchedOrders,
ISNULL(t.PendingOrders, 0) PendingOrders,
ISNULL(av.AvailableStock, 0) AvailableOrders,
ISNULL(av.AvailableBeforeCutOff, 0) AvailableCutOff,
ISNULL(av.OrdersAnyStock, 0) AllOrders
FROM MasterAccount a
LEFT OUTER JOIN Available av ON av.MasterAccountId = a.MasterAccountId
LEFT OUTER JOIN Totals t ON t.MasterAccountId = a.MasterAccountId
WHERE a.IsActive = 1
Query 2: using temporary tables:
DROP TABLE IF EXISTS #Orders
CREATE TABLE #Orders (MasterAccountId int, [Status] int);
INSERT INTO #Orders
SELECT
ma.MasterAccountId,
dbo.fn_GetBatchPickingStatus(ma.BatchPickingOnHold,
iif(r.GroupNumber > 0, 1, 0),
iif(r.FinalisedDate is null, 1, 0)) [Status]
FROM MasterAccount ma (nolock)
INNER JOIN wh3.dbo.tblAccounts a (nolock) on a.AccountNumber = dbo.fn_RemoveUnitPrefix(ma.TaskAccountId) and a.IsActive = 1
LEFT OUTER JOIN wh3.dbo.tblRequisitions r (nolock) on r.AccountNumber = a.AccountNumber
WHERE cast(r.BatchDateTime as date) between #fromDate and #toDate
AND r.BatchNumber > 0
AND ma.IsActive = 1
DROP TABLE IF EXISTS #StockAvailability
Create Table #StockAvailability (AccountNumber int, RequisitionNumber int, RequisitionDate datetime, Lines int, HasStock int, NoStock int);
Insert Into #StockAvailability
SELECT
r.AccountNumber,
r.RequisitionNumber,
r.RequisitionDate,
COUNT(rl.ProductNumber) Lines,
SUM(IIF(ISNULL(psoh.AvailableStock, 0) >= ISNULL(rl.Quantity, 0), 1, 0)) AS HasStock,
SUM(IIF(ISNULL(psoh.AvailableStock, 0) < ISNULL(rl.Quantity, 0), 1, 0)) AS NoStock
FROM WH3.dbo.tblrequisitions r (nolock)
INNER JOIN WH3.dbo.tblRequisitionLines rl (nolock) ON rl.RequisitionNumber = r.RequisitionNumber
LEFT JOIN ProductStockOnHandSummary psoh (nolock) ON psoh.ProductNumber = rl.ProductNumber -- Joined with View
WHERE r.BatchNumber = 0
AND r.FinalisedDate is null
AND r.RequisitionStatus = 1
AND r.TransactionTypeNumber = 301
GROUP BY r.AccountNumber, r.RequisitionNumber, r.RequisitionDate
DROP TABLE IF EXISTS #StockAvailability2
Create Table #StockAvailability2 (AccountNumber int, RequisitionNumber int, RequisitionDate datetime, Lines int, HasStock int, NoStock int, [Status] nvarchar(7));
Insert Into #StockAvailability2
SELECT sa.AccountNumber,
sa.RequisitionNumber,
sa.RequisitionDate,
sa.Lines,
sa.HasStock,
sa.NoStock,
CASE WHEN sa.Lines = 0 THEN 'Empty'
WHEN sa.HasStock = 0 THEN 'None'
WHEN (sa.Lines > 0 AND sa.Lines > sa.HasStock) THEN 'Partial'
WHEN (sa.Lines > 0 AND sa.Lines <= sa.HasStock) THEN 'Full'
END AS [Status]
FROM #StockAvailability sa
DROP TABLE IF EXISTS #Available
Create Table #Available (MasterAccountId int, AvailableStock int, OrdersAnyStock int, AvailableBeforeCutOff int);
INSERT INTO #Available
SELECT ma.MasterAccountId,
SUM(IIF(ma.IsPartialStock = 1, CASE WHEN sa.[Status] IN ('Full', 'Partial') THEN 1 ELSE 0 END,
CASE WHEN sa.[Status] = 'Full' THEN 1 ELSE 0 END)) AS AvailableStock,
SUM(IIF(sa.[Status] IN ('Full', 'Partial', 'None'), 1, 0)) AS OrdersAnyStock,
SUM(IIF(sa.RequisitionDate < dbo.TicksToTime(ma.DailyOrderCutOffTime, #toDate),
IIF(ma.IsPartialStock = 1, CASE WHEN sa.[Status] IN ('Full', 'Partial') THEN 1 ELSE 0 END,
CASE WHEN sa.[Status] = 'Full' THEN 1 ELSE 0 END), 0)) AS AvailableBeforeCutOff
FROM MasterAccount ma (NOLOCK)
INNER JOIN #StockAvailability2 sa ON sa.AccountNumber = dbo.fn_RemoveUnitPrefix(ma.TaskAccountId)
GROUP BY ma.MasterAccountId, ma.IsPartialStock
;WITH Totals AS
(
SELECT
o.MasterAccountId,
COUNT(o.MasterAccountId) AS BatchedOrders,
SUM(IIF(o.[Status] IN (0,1,2), 1, 0)) PendingOrders
FROM #Orders o (NOLOCK)
GROUP BY o.MasterAccountId
)
SELECT a.MasterAccountId,
ISNULL(t.BatchedOrders, 0) BatchedOrders,
ISNULL(t.PendingOrders, 0) PendingOrders,
ISNULL(av.AvailableStock, 0) AvailableOrders,
ISNULL(av.AvailableBeforeCutOff, 0) AvailableCutOff,
ISNULL(av.OrdersAnyStock, 0) AllOrders
FROM MasterAccount a (NOLOCK)
LEFT OUTER JOIN #Available av (NOLOCK) ON av.MasterAccountId = a.MasterAccountId
LEFT OUTER JOIN Totals t (NOLOCK) ON t.MasterAccountId = a.MasterAccountId
WHERE a.IsActive = 1
The answer is simple.
SQL Server doesn't materialise CTEs. It inlines them, as you can see from the execution plans.
Other DBMS may implement it differently, a well-known example is Postgres, which does materialise CTEs (it essentially creates temporary tables for CTEs behind the hood).
Whether explicit materialisation of intermediary results in explicit temporary tables is faster, depends on the query.
In complex queries the overhead of writing and reading intermediary data into temporary tables can be offset by more efficient simpler execution plans that optimiser is able to generate.
On the other hand, in Postgres CTE is an "optimisation fence" and engine can't push predicates across CTE boundary.
Sometimes one way is better, sometimes another. Once the query complexity grows beyond certain threshold an optimiser can't analyse all possible ways to process the data and it has to settle on something. For example, the order in which to join the tables. The number of permutations grows exponentially with the number of tables to choose from. Optimiser has limited time to generate a plan, so it may make a poor choice when all CTEs are inlined. When you manually break complex query into smaller simpler ones you need to understand what you are doing, but optimiser has a better chance to generate a good plan for each simple query.
There are different use cases for the two, and different advantages/disadvantages.
Common Table Expressions
Common Table Expressions should be viewed as expressions, not tables. As expressions, the CTE does not need to be instantiated, so the query optimizer can fold it into the rest of the query, and optimize the combination of the CTE and the rest of the query.
Temporary Tables
With temporary tables, the results of the query are stored in a real live table, in the temp database. The query results can then be reused in multiple queries, unlike CTEs, where the CTE, if used in multiple separate queries, would have to be a part of the work plan in each of those separate queries.
Also, a temporary table can have an index, keys, etc. Adding these to a temp table can be a great assistance in optimizing some queries, and is unavailable in the CTE, though the CTE can utilize the indexes and keys in the tables underlying the CTE.
If the underlying tables to a CTE don't support the type of optimizations you need, a temp table may be better.
There can be several reason for Temp table performing better than CTE and vice versa depending upon specific Query and requirement.
IMO in your case both the query are not optimize.
Since CTE is evaluated every time it is referenced.
so in your case
SELECT a.MasterAccountId,
ISNULL(t.BatchedOrders, 0) BatchedOrders,
ISNULL(t.PendingOrders, 0) PendingOrders,
ISNULL(av.AvailableStock, 0) AvailableOrders,
ISNULL(av.AvailableBeforeCutOff, 0) AvailableCutOff,
ISNULL(av.OrdersAnyStock, 0) AllOrders
FROM MasterAccount a
LEFT OUTER JOIN Available av ON av.MasterAccountId = a.MasterAccountId
LEFT OUTER JOIN Totals t ON t.MasterAccountId = a.MasterAccountId
WHERE a.IsActive = 1
This query is showing High Cardinality estimate.MasterAccount table is evaluated multiple times.Due to this reason it is slow.
In case of Temp table,
SELECT a.MasterAccountId,
ISNULL(t.BatchedOrders, 0) BatchedOrders,
ISNULL(t.PendingOrders, 0) PendingOrders,
ISNULL(av.AvailableStock, 0) AvailableOrders,
ISNULL(av.AvailableBeforeCutOff, 0) AvailableCutOff,
ISNULL(av.OrdersAnyStock, 0) AllOrders
FROM MasterAccount a (NOLOCK)
LEFT OUTER JOIN #Available av (NOLOCK) ON av.MasterAccountId = a.MasterAccountId
LEFT OUTER JOIN Totals t (NOLOCK) ON t.MasterAccountId = a.MasterAccountId
WHERE a.IsActive = 1
Here #Available is already evaluated and result is store in temp table so MasterAccount table is join with Less resultset,thus Cardinality Estimate is less.
similarly with #Orders table.
Both CTE and Temp table query can be optimize in your case thus performance improved.
So #Orders should be your base temp table and you should not use MasterAccount again later.you should use #Orders instead.
INSERT INTO #Available
SELECT ma.MasterAccountId,
SUM(IIF(ma.IsPartialStock = 1, CASE WHEN sa.[Status] IN ('Full', 'Partial') THEN 1 ELSE 0 END,
CASE WHEN sa.[Status] = 'Full' THEN 1 ELSE 0 END)) AS AvailableStock,
SUM(IIF(sa.[Status] IN ('Full', 'Partial', 'None'), 1, 0)) AS OrdersAnyStock,
SUM(IIF(sa.RequisitionDate < dbo.TicksToTime(ma.DailyOrderCutOffTime, #toDate),
IIF(ma.IsPartialStock = 1, CASE WHEN sa.[Status] IN ('Full', 'Partial') THEN 1 ELSE 0 END,
CASE WHEN sa.[Status] = 'Full' THEN 1 ELSE 0 END), 0)) AS AvailableBeforeCutOff
FROM #Orders ma (NOLOCK)
INNER JOIN #StockAvailability2 sa ON sa.AccountNumber = dbo.fn_RemoveUnitPrefix(ma.TaskAccountId)
GROUP BY ma.MasterAccountId, ma.IsPartialStock
Here require column from MasterAcount table like ma.IsPartialStock etc should incorporated in #order table itself if possible.Hope my idea is clear.
No need of MasterAccount table in in last query
SELECT a.MasterAccountId,
ISNULL(t.BatchedOrders, 0) BatchedOrders,
ISNULL(t.PendingOrders, 0) PendingOrders,
ISNULL(av.AvailableStock, 0) AvailableOrders,
ISNULL(av.AvailableBeforeCutOff, 0) AvailableCutOff,
ISNULL(av.OrdersAnyStock, 0) AllOrders
FROM #Available av
LEFT OUTER JOIN Totals t ON t.MasterAccountId = av.MasterAccountId
--WHERE a.IsActive = 1
I think no need of Nolock hint in temp table.
I have this QA logic that looks for errors into every AuditID within a RoomID to see if their AuditType were never marked Complete or if they have two complete statuses. Finally, it picks only the maximum AuditDate of the RoomIDs with errors to avoid showing multiple instances of the same RoomID, since there are many audits per room.
The issue is that the AUDIT table is very large and takes a long time to run. I was wondering if there is anyway to reach the same result faster.
Thank you in advance !
IF object_ID('tempdb..#AUDIT') is not null drop table #AUDIT
IF object_ID('tempdb..#ROOMS') is not null drop table #ROOMS
IF object_ID('tempdb..#COMPLETE') is not null drop table #COMPLETE
IF object_ID('tempdb..#FINALE') is not null drop table #FINALE
SELECT distinct
oc.HotelID, o.RoomID
INTO #ROOMS
FROM dbo.[rooms] o
LEFT OUTER JOIN dbo.[hotels] oc on o.HotelID = oc.HotelID
WHERE
o.[status] = '2'
AND o.orderType = '2'
SELECT
t.AuditID, t.RoomID, t.AuditDate, t.AuditType
INTO
#AUDIT
FROM
[dbo].[AUDIT] t
WHERE
t.RoomID IN (SELECT RoomID FROM #ROOMS)
SELECT
t1.RoomID, t3.AuditType, t3.AuditDate, t3.AuditID, t1.CompleteStatus
INTO
#COMPLETE
FROM
(SELECT
RoomID,
SUM(CASE WHEN AuditType = 'Complete' THEN 1 ELSE 0 END) AS CompleteStatus
FROM
#AUDIT
GROUP BY
RoomID) t1
INNER JOIN
#AUDIT t3 ON t1.RoomID = t3.RoomID
WHERE
t1.CompleteStatus = 0
OR t1.CompleteStatus > 1
SELECT
o.HotelID, o.RoomID,
a.AuditID, a.RoomID, a.AuditDate, a.AuditType, a.CompleteStatus,
c.ClientNum
INTO
#FINALE
FROM
#ROOMS O
LEFT OUTER JOIN
#COMPLETE a on o.RoomID = a.RoomID
LEFT OUTER JOIN
[dbo].[clients] c on o.clientNum = c.clientNum
SELECT
t.*,
Complete_Error_Status = CASE WHEN t.CompleteStatus = 0
THEN 'Not Complete'
WHEN t.CompleteStatus > 1
THEN 'Complete More Than Once'
END
FROM
#FINALE t
INNER JOIN
(SELECT
RoomID, MAX(AuditDate) AS MaxDate
FROM
#FINALE
GROUP BY
RoomID) tm ON t.RoomID = tm.RoomID AND t.AuditDate = tm.MaxDate
One section you could improve would be this one. See the inline comments.
SELECT
t1.RoomID, t3.AuditType, t3.AuditDate, t3.AuditID, t1.CompleteStatus
INTO
#COMPLETE
FROM
(SELECT
RoomID,
COUNT(1) AS CompleteStatus
-- Use the above along with the WHERE clause below
-- so that you are aggregating fewer records and
-- avoiding a CASE statement. Remove this next line.
--SUM(CASE WHEN AuditType = 'Complete' THEN 1 ELSE 0 END) AS CompleteStatus
FROM
#AUDIT
WHERE
AuditType = 'Complete'
GROUP BY
RoomID) t1
INNER JOIN
#AUDIT t3 ON t1.RoomID = t3.RoomID
WHERE
t1.CompleteStatus = 0
OR t1.CompleteStatus > 1
Just a thought. Streamline your code and your solution. you are not effectively filtering your datasets smaller so you continue to query the entire tables which is taking a lot of your resources and your temp tables are becoming full copies of those columns without the indexes (PK, FK, ++??) on the original table to take advantage of. This by no means is a perfect solution but it is an idea of how you can consolidate your logic and reduce your overall data set. Give it a try and see if it performs better for you.
Note this will return the last audit record for any room that has either not had an audit completed or completed more than once.
;WITH cte AS (
SELECT
o.RoomId
,o.clientNum
,a.AuditId
,a.AuditDate
,a.AuditType
,NumOfAuditsComplete = SUM(CASE WHEN a.AuditType = 'Complete' THEN 1 ELSE 0 END) OVER (PARTITION BY o.RoomId)
,RowNum = ROW_NUMBER() OVER (PARTITION BY o.RoomId ORDER BY a.AuditDate DESC)
FROm
dbo.Rooms o
LEFT JOIN dbo.Audit a
ON o.RoomId = a.RoomId
WHERE
o.[Status] = 2
AND o.OrderType = 2
)
SELECT
oc.HotelId
,cte.RoomId
,cte.AuditId
,cte.AuditDate
,cte.AuditType
,cte.NumOfAuditsComplete
,cte.clientNum
,Complete_Error_Status = CASE WHEN cte.NumOfAuditsComplete > 1 THEN 'Complete More Than Once' ELSE 'Not Complete' END
FROM
cte
LEFT JOIN dbo.Hotels oc
ON cte.HotelId = oc.HotelId
LEFT JOIN dbo.clients c
ON cte.clientNum = c.clientNum
WHERE
cte.RowNum = 1
AND cte.NumOfAuditsComplete != 1
Also note I changed your
WHERE
o.[status] = '2'
AND o.orderType = '2'
TO
WHERE
o.[status] = 2
AND o.orderType = 2
to be numeric without the single quotes. If the data type is truely varchar add them back but when you query a numeric column as a varchar it will do data conversion and may not take advantage of indexes that you have built on the table.
I have a query that runs in 4 seconds without an is null in the WHERE, but takes almost a minute with an is null. I've read up on the performance impact of the null check, but in this case, I can't modify the query being run.
select
view_scores.*
from
view_scores
inner join licenses AS l on view_scores.studentId = l.account_id
where view_scores.archived_date is null
and l.school_id = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee'
and l.is_current = 1
and l.expiration_date >= SYSDATETIME()
view_scores is a view that aggregates other views of data in other tables, one of which ultimately holds the archived_date field. A null value in that field means it hasn't been archived. Again, the data structure is outside of my control. All I can currently change is the internals of the views involved and indexes on the tables. Do I have any hope of dramatically improving the null check on archived_date without changing the query or schema?
view_scores is created with this SQL
SELECT
ueh.user_id AS studentId,
vu.first_name + ' ' + vu.last_name AS studentName,
ueh.archived_date as archived_date,
MIN([ueh].[date_taken]) AS [started_date],
MAX(ueh.date_taken) AS last_date,
SUM(CAST([ueh].[actual_time] AS FLOAT) / 600000000) AS [total_time_minutes],
SUM([exercise_scores].[earned_score]) AS [earned_score],
SUM([exercise_scores].[possible_score]) AS [possible_score],
AVG([exercise_scores].[percent_score]) AS [percent_score],
COUNT(ueh.exercise_id) AS total_exercises
FROM [user_exercise_history] AS [ueh]
LEFT JOIN
(
SELECT
coding_exercise_score.exercise_id AS exercise_id,
coding_exercise_score.assessment_id AS assessment_id,
coding_exercise_score.user_id AS user_id,
coding_exercise_score.archived_date AS archived_date,
score.earned AS earned_score,
score.possible AS possible_score,
CASE score.possible
WHEN 0 THEN 0
WHEN score.earned THEN 100
ELSE 9.5 * POWER(CAST(score.earned AS DECIMAL) / score.possible * 100, 0.511)
END AS percent_score
FROM coding_exercise_score
INNER JOIN
coding_exercise_score_detail AS score_detail
ON coding_exercise_score.id = score_detail.exercise_score_id
INNER JOIN
score
ON score.id = score_detail.score_id
WHERE score_detail.is_best_score = 'True'
UNION
SELECT
mc_score.exercise_id AS exercise_id,
mc_score.assessment_id AS assessment_id,
mc_score.user_id AS user_id,
mc_score.archived_date AS archived_date,
score.earned AS earned_score,
score.possible AS possible_score,
CASE score.possible
WHEN 0 THEN 0
WHEN score.earned THEN 100
ELSE 9.5 * POWER(CAST(score.earned AS DECIMAL) / score.possible * 100, 0.511)
END AS percent_score
FROM
multiple_choice_exercise_score AS mc_score
INNER JOIN score
ON score.id = mc_score.score_id
) AS [exercise_scores]
ON
(
(ueh.exercise_id = [exercise_scores].exercise_id
AND ueh.user_id = [exercise_scores].user_id
AND (
(ueh.assessment_id IS NULL AND [exercise_scores].assessment_id IS NULL)
OR ueh.assessment_id = [exercise_scores].assessment_id
)
AND (ueh.archived_date IS NULL)
)
)
INNER JOIN entity_account AS vu ON ((ueh.user_id = vu.account_id))
INNER JOIN (
select
g.group_id,
g.entity_name,
g.entity_description,
g.created_on_date,
g.modified_date,
g.created_by,
g.modified_by,
agj.account_id
from entity_group as g
inner join
account_group_join as agj
on agj.group_id = g.group_id
where g.entity_name <> 'Administrators'
and g.entity_name <> 'Group 1'
and g.entity_name <> 'Group 2'
and g.entity_name <> 'Group 3'
and g.entity_name <> 'Group 4'
and g.entity_name <> 'Group 5'
) AS g ON ueh.user_id = g.account_id
WHERE ueh.status = 'Completed'
GROUP BY ueh.user_id, vu.first_name, vu.last_name, ueh.archived_date
user_exercise_history.archived_date AS archived_date being the field that the null check is ultimately being executed against. I can modify the view in any way I want and index in any way I want, but that's about it.
The execution plan with the null check in it includes a pretty crazy set of sorting and Hash Matches that pertain to the score and coding_exercise_score_detail.
You can put an index on a view.
Create Indexed Views
Try an index on view_scores.archived_date
Generally all the columns involved in JOIN ON condition and in WHERE or ORDER BY should be indexed for better performance. Since you said view_scores is a view then check whether the column archived_date in actual table is indexed or not. If not then you should consider creating an index on that column.
You may also consider adding that condition to the view creation logic itself.
view_scores.archived_date is null
ON ueh.exercise_id = [exercise_scores].exercise_id
AND ueh.user_id = [exercise_scores].user_id
AND ueh.archived_date IS NULL
AND ( ( ueh.assessment_id IS NULL
AND [exercise_scores].assessment_id IS NULL
)
OR ueh.assessment_id = [exercise_scores].assessment_id
)
I would look at this
OR in Join is typically slow
Pick and ID that will not be used
ON ueh.exercise_id = [exercise_scores].exercise_id
AND ueh.user_id = [exercise_scores].user_id
AND ueh.archived_date IS NULL
AND isnull(ueh.assessment_id, -1) = isnull([exercise_scores].assessment_id, -1)
First, I will explain the what is being captured. User's have a member level associated with their accounts (Bronze, Gold, Diamond, etc). A nightly job needs to run to calculate the orders from today a year back. If the order total for a given user goes over or under a certain amount their level is upgraded or downgraded. The table where the level information is stored will not change much, but the minimum and maximum amount thresholds may over time. This is what the table looks like:
CREATE TABLE [dbo].[MemberAdvantageLevels] (
[Id] int NOT NULL IDENTITY(1,1) ,
[Name] varchar(255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[MinAmount] int NOT NULL ,
[MaxAmount] int NOT NULL ,
CONSTRAINT [PK__MemberAd__3214EC070D9DF1C7] PRIMARY KEY ([Id])
)
ON [PRIMARY]
GO
I wrote a query that will group the orders by user for the year to date. The query includes their current member level.
SELECT
Sum(dbo.tbh_Orders.SubTotal) AS OrderTotals,
Count(dbo.UserProfile.UserId) AS UserOrders,
dbo.UserProfile.UserId,
dbo.UserProfile.UserName,
dbo.UserProfile.Email,
dbo.MemberAdvantageLevels.Name,
dbo.MemberAdvantageLevels.MinAmount,
dbo.MemberAdvantageLevels.MaxAmount,
dbo.UserMemberAdvantageLevels.LevelAchievmentDate,
dbo.UserMemberAdvantageLevels.LevelAchiementAmount,
dbo.UserMemberAdvantageLevels.IsCurrent as IsCurrentLevel,
dbo.MemberAdvantageLevels.Id as MemberLevelId,
FROM
dbo.tbh_Orders
INNER JOIN dbo.tbh_OrderStatuses ON dbo.tbh_Orders.StatusID = dbo.tbh_OrderStatuses.OrderStatusID
INNER JOIN dbo.UserProfile ON dbo.tbh_Orders.CustomerID = dbo.UserProfile.UserId
INNER JOIN dbo.UserMemberAdvantageLevels ON dbo.UserProfile.UserId = dbo.UserMemberAdvantageLevels.UserId
INNER JOIN dbo.MemberAdvantageLevels ON dbo.UserMemberAdvantageLevels.MemberAdvantageLevelId = dbo.MemberAdvantageLevels.Id
WHERE
dbo.tbh_OrderStatuses.OrderStatusID = 4 AND
(dbo.tbh_Orders.AddedDate BETWEEN dateadd(year,-1,getdate()) AND GETDATE()) and IsCurrent = 1
GROUP BY
dbo.UserProfile.UserId,
dbo.UserProfile.UserName,
dbo.UserProfile.Email,
dbo.MemberAdvantageLevels.Name,
dbo.MemberAdvantageLevels.MinAmount,
dbo.MemberAdvantageLevels.MaxAmount,
dbo.UserMemberAdvantageLevels.LevelAchievmentDate,
dbo.UserMemberAdvantageLevels.LevelAchiementAmount,
dbo.UserMemberAdvantageLevels.IsCurrent,
dbo.MemberAdvantageLevels.Id
So, I need to check the OrdersTotal and if it exceeds the current level threshold, I then need to find the Level that fits their current order total and create a new record with their new level.
So for example, lets say jon#doe.com currently is at bronze. The MinAmount for bronze is 0 and the MaxAmount is 999. Currently his Orders for the year are at $2500. I need to find the level that $2500 fits within and upgrade his account. I also need to check their LevelAchievmentDate and if it is outside of the current year we may need to demote the user if there has been no activity.
I was thinking I could create a temp table that holds the results of all levels and then somehow create a CASE statement in the query above to determine the new level. I don't know if that is possible. Or, is it better to iterate over my order results and perform additional queries? If I use the iteration pattern I know i can use the When statement to iterate over the rows.
Update
I updated my Query A bit and so far came up with this, but I may need more information than just the ID from the SubQuery
Select * into #memLevels from MemberAdvantageLevels
SELECT
Sum(dbo.tbh_Orders.SubTotal) AS OrderTotals,
Count(dbo.AZProfile.UserId) AS UserOrders,
dbo.AZProfile.UserId,
dbo.AZProfile.UserName,
dbo.AZProfile.Email,
dbo.MemberAdvantageLevels.Name,
dbo.MemberAdvantageLevels.MinAmount,
dbo.MemberAdvantageLevels.MaxAmount,
dbo.UserMemberAdvantageLevels.LevelAchievmentDate,
dbo.UserMemberAdvantageLevels.LevelAchiementAmount,
dbo.UserMemberAdvantageLevels.IsCurrent as IsCurrentLevel,
dbo.MemberAdvantageLevels.Id as MemberLevelId,
(Select Id from #memLevels where Sum(dbo.tbh_Orders.SubTotal) >= #memLevels.MinAmount and Sum(dbo.tbh_Orders.SubTotal) <= #memLevels.MaxAmount) as NewLevelId
FROM
dbo.tbh_Orders
INNER JOIN dbo.tbh_OrderStatuses ON dbo.tbh_Orders.StatusID = dbo.tbh_OrderStatuses.OrderStatusID
INNER JOIN dbo.AZProfile ON dbo.tbh_Orders.CustomerID = dbo.AZProfile.UserId
INNER JOIN dbo.UserMemberAdvantageLevels ON dbo.AZProfile.UserId = dbo.UserMemberAdvantageLevels.UserId
INNER JOIN dbo.MemberAdvantageLevels ON dbo.UserMemberAdvantageLevels.MemberAdvantageLevelId = dbo.MemberAdvantageLevels.Id
WHERE
dbo.tbh_OrderStatuses.OrderStatusID = 4 AND
(dbo.tbh_Orders.AddedDate BETWEEN dateadd(year,-1,getdate()) AND GETDATE()) and IsCurrent = 1
GROUP BY
dbo.AZProfile.UserId,
dbo.AZProfile.UserName,
dbo.AzProfile.Email,
dbo.MemberAdvantageLevels.Name,
dbo.MemberAdvantageLevels.MinAmount,
dbo.MemberAdvantageLevels.MaxAmount,
dbo.UserMemberAdvantageLevels.LevelAchievmentDate,
dbo.UserMemberAdvantageLevels.LevelAchiementAmount,
dbo.UserMemberAdvantageLevels.IsCurrent,
dbo.MemberAdvantageLevels.Id
This hasn't been syntax checked or tested but should handle the inserts and updates you describe. The insert can be done as single statement using a derived/virtual table which contains the orders group by caluclation. Note that both the insert and update statement be done within the same transaction to ensure no two records for the same user can end up with IsCurrent = 1
INSERT UserMemberAdvantageLevels (UserId, MemberAdvantageLevelId, IsCurrent,
LevelAchiementAmount, LevelAchievmentDate)
SELECT t.UserId, mal.Id, 1, t.OrderTotals, GETDATE()
FROM
(SELECT ulp.UserId, SUM(ord.SubTotal) OrderTotals, COUNT(ulp.UserId) UserOrders
FROM UserLevelProfile ulp
INNER JOIN tbh_Orders ord ON (ord.CustomerId = ulp.UserId)
WHERE ord.StatusID = 4
AND ord.AddedDate BETWEEN DATEADD(year,-1,GETDATE()) AND GETDATE()
GROUP BY ulp.UserId) AS t
INNER JOIN MemberAdvantageLevels mal
ON (t.OrderTotals BETWEEN mal.MinAmount AND mal.MaxAmount)
-- Left join needed on next line in case user doesn't currently have a level
LEFT JOIN UserMemberAdvantageLevels umal ON (umal.UserId = t.UserId)
WHERE umal.MemberAdvantageLevelId IS NULL -- First time user has been awarded a level
OR (mal.Id <> umal.MemberAdvantageLevelId -- Level has changed
AND (t.OrderTotals > umal.LevelAchiementAmount -- Acheivement has increased (promotion)
OR t.UserOrders = 0)) -- No. of orders placed is zero (de-motion)
/* Reset IsCurrent flag where new record has been added */
UPDATE UserMemberAdvantageLevels
SET umal1.IsCurrent=0
FROM UserMemberAdvantageLevels umal1
INNER JOIN UserMemberAdvantageLevels umal2 On (umal2.UserId = umal1.UserId)
WHERE umal1.IsCurrent = 1
AND umal2.IsCurrent = 2
AND umal1.LevelAchievmentDate < umal2.LevelAchievmentDate)
One approach:
with cte as
(SELECT Sum(o.SubTotal) AS OrderTotals,
Count(p.UserId) AS UserOrders,
p.UserId,
p.UserName,
p.Email,
l.Name,
l.MinAmount,
l.MaxAmount,
ul.LevelAchievmentDate,
ul.LevelAchiementAmount,
ul.IsCurrent as IsCurrentLevel,
l.Id as MemberLevelId
FROM dbo.tbh_Orders o
INNER JOIN dbo.UserProfile p ON o.CustomerID = p.UserId
INNER JOIN dbo.UserMemberAdvantageLevels ul ON p.UserId = ul.UserId
INNER JOIN dbo.MemberAdvantageLevels l ON ul.MemberAdvantageLevelId = l.Id
WHERE o.StatusID = 4 AND
o.AddedDate BETWEEN dateadd(year,-1,getdate()) AND GETDATE() and
IsCurrent = 1
GROUP BY
p.UserId, p.UserName, p.Email, l.Name, l.MinAmount, l.MaxAmount,
ul.LevelAchievmentDate, ul.LevelAchiementAmount, ul.IsCurrent, l.Id)
select cte.*, ml.*
from cte
join #memLevels ml
on cte.OrderTotals >= ml.MinAmount and cte.OrderTotals <= ml.MaxAmount
I have a fairly complex sql that returns 2158 rows' id from a table with ~14M rows. I'm using CTEs for simplification.
The WHERE consists of two conditions. If i comment out one of them, the other runs in ~2 second. If i leave them both (separated by OR) the query runs ~100 seconds. The first condition alone needs 1-2 seconds and returns 19 rows, the second condition alone needs 0 seconds and returns 2139 rows.
What can be the reason?
This is the complete SQL:
WITH fpcRepairs AS
(
SELECT FPC_Row = ROW_NUMBER()OVER(PARTITION BY t.SSN_Number ORDER BY t.Received_Date, t.Claim_Creation_Date, t.Repair_Completion_Date, t.Claim_Submitted_Date)
, idData, Repair_Completion_Date, Received_Date, Work_Order, SSN_number, fiMaxActionCode, idModel,ModelName
, SP=(SELECT TOP 1 Reused_Indicator FROM tabDataDetail td INNER JOIN tabSparePart sp ON td.fiSparePart=sp.idSparePart
WHERE td.fiData=t.idData
AND (td.Material_Quantity <> 0)
AND (sp.SparePartName = '1254-3751'))
FROM tabData AS t INNER JOIN
modModel AS m ON t.fiModel = m.idModel
WHERE (m.ModelName = 'LT26i')
AND EXISTS(
SELECT NULL
FROM tabDataDetail AS td
INNER JOIN tabSparePart AS sp ON td.fiSparePart = sp.idSparePart
WHERE (td.fiData = t.idData)
AND (td.Material_Quantity <> 0)
AND (sp.SparePartName = '1254-3751')
)
), needToChange AS
(
SELECT idData FROM tabData AS t INNER JOIN
modModel AS m ON t.fiModel = m.idModel
WHERE (m.ModelName = 'LT26i')
AND EXISTS(
SELECT NULL
FROM tabDataDetail AS td
INNER JOIN tabSparePart AS sp ON td.fiSparePart = sp.idSparePart
WHERE (td.fiData = t.idData)
AND (td.Material_Quantity <> 0)
AND (sp.SparePartName IN ('1257-2741','1257-2742','1248-2338','1254-7035','1248-2345','1254-7042'))
)
)
SELECT t.idData
FROM tabData AS t INNER JOIN modModel AS m ON t.fiModel = m.idModel
INNER JOIN needToChange ON t.idData = needToChange.idData -- needs to change FpcAssy
LEFT OUTER JOIN fpcRepairs rep ON t.idData = rep.idData
WHERE
rep.idData IS NOT NULL -- FpcAssy replaced, check if reused was claimed correctly
AND rep.FPC_Row > 1 -- other FpcAssy repair before
AND (
SELECT SP FROM fpcRepairs lastRep
WHERE lastRep.SSN_Number = rep.SSN_Number
AND lastRep.FPC_Row = rep.FPC_Row - 1
) = rep.SP -- same SP, must be rejected(reused+reused or new+new)
OR
rep.idData IS NOT NULL -- FpcAssy replaced, check if reused was claimed correctly
AND rep.FPC_Row = 1 -- no other FpcAssy repair before
AND rep.SP = 0 -- not reused, must be rejected
order by t.idData
Here's the execution plan:
Download: http://www.filedropper.com/exeplanfpc
Try to use UNION ALL of 2 queries separately instead of OR condition.
I've tried it many times and it really helped. I've read about this issue in Art Of SQL .
Read it, you can find many useful information about performance issues.
UPDATE:
Check related questions
UNION ALL vs OR condition in sql server query
http://www.sql-server-performance.com/2011/union-or-sql-server-queries/
Can UNION ALL be faster than JOINs or do my JOINs just suck?
Check Wes's answer
The usage of the OR is probably causing the query optimizer to no longer use an index in the second query.