How to speed up query with aggregates? - sql

I have the following query that seems to run pretty slow if more than one aggregate is used in the select part. Is there some way to optimize this?
The query returns a 168 rows and takes 1 second to complete, but this bogs down when a couple of users load the page at once and the original query had more aggregates which also add seconds to the query.
***** Update here's a simplier query**
Select
gocm.CustomerID,
sum(DISTINCT o.OrderTotal) as TotalOfOrders
from GroupOrder_Customer_Mapping gocm
Left Join [Order] o on o.CreatedForCustomerID = gocm.customerid and o.grouporderid = 8254
where gocm.grouporderid = 8254
group by gocm.CustomerID, invitePath
order by invitepath
Execution Plan
returns following data (sample results)

Possible this be helpful for you -
SELECT
gocm.CustomerID
, o.TotalOfOrders
FROM (
SELECT DISTINCT gocm.CustomerID, invitePath
FROM dbo.GroupOrder_Customer_Mapping gocm
WHERE gocm.grouporderid = 8254
) gocm
LEFT JOIN (
SELECT
o.CreatedForCustomerID
, TotalOfOrders = SUM(DISTINCT o.OrderTotal)
FROM dbo.[Order] o
WHERE o.grouporderid = 8254
GROUP BY o.CreatedForCustomerID
) o ON o.CreatedForCustomerID = gocm.customerid
ORDER BY invitepath

If data is not updated frequently you might consider an indexed view.

Related

How to convert inline SQL queries to JOINS in SQL SERVER to reduce load time

I need help in optimizing this SQL query.
In the main SELECT statement there are three columns which is dependent on the outer query result. This is why my query is taking a long time to return data. I have tried making left joins but this is not working properly.
Can anyone help me to resolve this issue?
SELECT
DISTINCT ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
(
SELECT
STRING_AGG(
(ug.UG_Name),
','
)
FROM
Groups ug
INNER JOIN ApplicantUserGroup augm ON augm.AUGM_UserGroupID = ug.UG_ID
WHERE
augm.AUGM_OrganizationUserID = ou.OrganizationUserID
AND ug.UG_IsDeleted = 0
AND augm.AUGM_IsDeleted = 0
) AS UserGroups,
order1.OrderNumber AS OrderId -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttribute),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributes -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttributeID),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributeID
FROM
ApplicantData acd WITH (NOLOCK)
INNER JOIN ClientPackage ps WITH (NOLOCK) ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
INNER JOIN [ClientOrder] order1 WITH (NOLOCK) ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
INNER JOIN OUser ou WITH (NOLOCK) ON ou.OrganizationUserID = ps.OrganizationUserID
It looks like this query can be simplified, and the dependent subqueries in your SELECT clause removed, Consider your second and third dependent subqueries. You can refactor them into one nondependent subquery with a LEFT JOIN. Using nondependent subqueries is more efficient because the query planner can run them just once, rather than once for each row.
You want two STRING_AGG() results from the same table. This subquery gives those two outputs for every possible combination of HierarchyNodeID and OrganizationUserID values. STRING_AGG() is an aggregate function like SUM() and so works nicely with GROUP BY.
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
You can run this subquery itself to convince yourself it works.
Now, we can LEFT JOIN that into your query. Like this. (For readability I took out the NOLOCKs and used JOIN: it means the same thing as INNER JOIN.)
SELECT DISTINCT
ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
'tempvalue' AS UserGroups, -- shortened for testing
order1.OrderNumber AS OrderId, -- UAT-2455
uat2455.CustomAttributes, -- UAT-2455
uat2455.CustomAttributeIDs -- UAT-2455
FROM ApplicantData acd
JOIN ClientPackage ps
ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
JOIN ClientOrder order1
ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
JOIN OUser ou
ON ou.OrganizationUserID = ps.OrganizationUserID
LEFT JOIN (
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
) uat2455
ON uat2455.HierarchyNodeID = dpm.DPM_ID
AND uat2455.OrganizationUserId = ps.OrganizationUserID
See how we collapsed your second and third dependent subqueries to just one, then used it as a virtual table with LEFT JOIN? We transformed the WHERE clauses from the dependent subqueries into an ON clause.
You can test this: run it with TOP(50) and eyeball the results.
When you're happy, the next step is to transform your first dependent subquery the same way.
Pro tip Don't use WITH (NOLOCK), ever, unless a database administration expert tells you to after looking at your specific query. If your query's purpose is a historical report and you don't care whether the most recent transactions in your database are represented exactly right, you can precede your query with this statement. It also allows the query to run while avoiding locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Pro tip Be obsessive about formatting your queries for readability. You, your colleagues, and yourself a year from now must be able to read and reason about queries like this.

Query performance: CTE using ROW_NUMBER() to select first row

We have three environments and when I run my SQL query in two of them just takes 30 or 38 seconds to run but in the other environment running is not completed and I should cancel it. Query is based on two parts, a CTE and a very simple select from a table, in both CTE and select I'm using the same table.
Could you please tell me why it takes so long time? how can I improve the query?
ALTER VIEW [fact].[vPurchase]
AS
WITH VKPL AS
(
SELECT *
FROM
(SELECT
iv.[Delivery_FK],
1 AS column2,
ROW_NUMBER() OVER(PARTITION BY [Delivery_FK] ORDER BY iv.UpdateDate) AS rk
FROM
[fact].[KRMFact] iv
LEFT JOIN
[dimension].[Product] pr ON iv.Product_FK =pr.Product_SK
LEFT JOIN
[dimension].[Delivery] le ON le.Delivery_FK = iv.Delivery_FK
WHERE
pr.Product_Key = '740') X
WHERE
rk = 1
)
SELECT
-- .... here are some columns
Delivery_FK,
Product_FK,
CAST(column2 AS VARCHAR) AS column2,
f.[UpdateDate] AS [Update date]
FROM
[fact].[KRMFact] f
LEFT JOIN
VKPL v ON f.Delivery_FK = v.Delivery_FK
This is guesswork.
I guess the environment where this query is slow is the one with lots of production data in it.
I guess some index on your KRMFact table will, maybe, help you. Here's how to figure out what you need: SQL Server Management Studio (SSMS) has a feature to show you a query's execution plan. Put your query (not simplified, please, the actual query) into SSMS, right click and choose "Include Actual Execution Plan." Then run the query. The execution plan display may recommend an index for you to create to get this query to run faster.
I guess you're trying to find rows with the earliest values of UpdateDate.
Your subquery
SELECT *
FROM
(SELECT
iv.[Delivery_FK],
1 AS column2,
ROW_NUMBER() OVER(PARTITION BY [Delivery_FK] ORDER BY iv.UpdateDate) AS rk
FROM
[fact].[KRMFact] iv
LEFT JOIN
[dimension].[Product] pr ON iv.Product_FK =pr.Product_SK
LEFT JOIN
[dimension].[Delivery] le ON le.Delivery_FK = iv.Delivery_FK
WHERE
pr.Product_Key = '740') X
WHERE
rk = 1
looks like it picks out the row with the earliest KRMFact.UpdateDate for each value of KRMFact.Delivery_FK. That's what the ROW_NUMBER() OVER... WHERE rk=1 language does.
If my guess about that is correct you can do that a different way, which may be more efficient.
SELECT *
FROM
(SELECT
iv.[Delivery_FK],
1 AS column2,
1 AS rk
FROM
[fact].[KRMFact] iv
JOIN ( SELECT Delivery_FK, MIN(UpdateDate) first_update
FROM [fact].[KRMFact]
GROUP BY Delivery_FK
) first_update ON iv.UpdateDate = first_update.first_update
LEFT JOIN
[dimension].[Product] pr ON iv.Product_FK =pr.Product_SK
LEFT JOIN
[dimension].[Delivery] le ON le.Delivery_FK = iv.Delivery_FK
WHERE
pr.Product_Key = '740') X
WHERE
rk = 1
You should probably try out the old and new versions of the subquery to determine whether they will yield the same results.
If you use this subquery query I suggest, this index will help make it run faster by optimizing the new sub-sub-query's MIN() ... GROUP BY operation.
CREATE INDEX x_KRMFact_Product_Update
ON [fact].[KRMFact]
([Product_FK],[UpdateDate])
By the way, WHERE pr.Product_Key = '740' turns your LEFT JOIN [dimension].[Product] operation into an ordinary inner JOIN.

Need help in optimizing sql query

I am new to sql and have created the below sql to fetch the required results.However the query seems to take ages in running and is quite slow. It will be great if any help in optimization is provided.
Below is the sql query i am using:
SELECT
Date_trunc('week',a.pair_date) as pair_week,
a.used_code,
a.used_name,
b.line,
b.channel,
count(
case when b.sku = c.sku then used_code else null end
)
from
a
left join b on a.ma_number = b.ma_number
and (a.imei = b.set_id or a.imei = b.repair_imei
)
left join c on a.used_code = c.code
group by 1,2,3,4,5
I would rewrite the query as:
select Date_trunc('week',a.pair_date) as pair_week,
a.used_code, a.used_name, b.line, b.channel,
count(*) filter (where b.sku = c.sku)
from a left join
b
on a.ma_number = b.ma_number and
a.imei in ( b.set_id, b.repair_imei ) left join
c
on a.used_code = c.code
group by 1,2,3,4,5;
For this query, you want indexes on b(ma_number, set_id, repair_imei) and c(code, sku). However, this doesn't leave much scope for optimization.
There might be some other possibilities, depending on the tables. For instance, or/in in the on clause is usually a bad sign -- but it is unclear what your intention really is.

Why do multiple EXISTS break a query

I am attempting to include a new table with values that need to be checked and included in a stored procedure. Statement 1 is the existing table that needs to be checked against, while statement 2 is the new table to check against.
I currently have 2 EXISTS conditions that function independently and produce the results I am expecting. By this I mean if I comment out Statement 1, statement 2 works and vice versa. When I put them together the query doesn't complete, there is no error but it times out which is unexpected because each statement only takes a few seconds.
I understand there is likely a better way to do this but before I do, I would like to know why I cannot seem to do multiple exists statements like this? Are there not meant to be multiple EXISTS conditions in the WHERE clause?
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
OR
(
--Statement 2
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
)
EDIT: I have included the query details. Table 1-5 represent different tables, there are no repeated tables.
Too long to comment.
Your query as written seems correct. The timeout will only be able to be troubleshot from the execution plan, but here are a few things that could be happening or that you could benefit from.
Parameter sniffing on #Date. Try hard-coding this value and see if you still get the same slowness
No covering index on P.OTHER_ID or P.DATE or P.ID or SA.ID which would cause a table scan for these predicates
Indexes for the above columns which aren't optimal (including too many columns, etc)
Your query being serial when it may benefit from parallelism.
Using the LOWER function on a database which doesn't have a case sensitive collation (most don't, though this function doesn't slow things down that much)
You have a bad query plan in cache. Try adding OPTION (RECOMPILE) at the bottom so you get a new query plan. This is also done when comparing the speed of two queries to ensure they aren't using cached plans, or one isn't when another is which would skew the results.
Since your query is timing out, try including the estimated execution plan and post it for us at past the plan
I found putting 2 EXISTS in the WHERE condition made the whole process take significantly longer. What I found fixed it was using UNION and keeping the EXISTS in separate queries. The final result looked like the following:
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
UNION
--Statement 2
SELECT *
FROM table1 S
WHERE
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)

SQL - Subtraction within a Common Table Expression,

------ SOLVED --------
Instead of trying to perform the subtraction within the scope of the CTE, I just had to place it into the sub query which was using this particular CTE, in the main select list.
The Problem was, `InnerOQLI.FreePlaceCount is not being recognized, since it hasn't been defined within the scope of the CTE and is only being used in the Exists statement.
------ PROBLEM ----------
This is the first time I've used CTE's, I've joined multiple CTE's together so that I can retrieve an overall total in a single column.
I need to perform a subtraction within one of the CTE's
I first wrote it like this
MyCount2
AS
(
SELECT DISTINCT
O.ID AS OrderID,
(
(
(SELECT SUM(InnerOC.[Count])
FROM Order InnerO
INNER JOIN SubOrder InnerSO ON InnerO.ID = InnerSO.OrderID
INNER JOIN OrderComponent InnerOC ON SO.ID = OC.SubOrderID
WHERE OC.OrderComponentTypeID IN (1,2,4,5)
AND EXISTS (SELECT * FROM OrderQuoteLineItem InnerOQLI
WHERE InnerOQLI.OrderQuoteLineItemTypeID = 9 AND Order.ID = InnerO.ID)
AND Inner0.ID = ).ID)
)
- --< Minus Here
OQLI.FreePlaceCount
) AS [SHPCommExpression2]
FROM Order O
INNER JOIN SubOrder SO ON O.ID = SO.OrderID
INNER JOIN OrderComponent OC ON SO.ID = OC.SubOrderID
INNER JOIN OrderQuoteLineItem OQLI ON SO.ID = 0QLI.SubOrderID
),
Without going into to much detail, this brings back incorrect data because of repeated rows in the main query. (I believe it cos of the same joins within the main query)
So I then wrote this
MyCount2
AS
(SELECT InnerO.ID AS OrderID
SUM(InnerOC.[Count]
- InnerOQLI.FreePlaceCount) --- Tried to place subtraction here ----
AS [SHPCommExpression12])
FROM Order InnerO
INNER JOIN SubOrder InnerSO ON InnerO.ID = InnerSO.OrderID
INNER JOIN OrderComponent InnerOC ON SO.ID = OC.SubOrderID
WHERE OC.OrderComponentTypeID IN (1,2,4,5)
AND EXISTS (SELECT * FROM OrderQuoteLineItem InnerOQLI
WHERE InnerOQLI.OrderQuoteLineItemTypeID = 9 AND Order.ID = InnerO.ID)
GROUP BY InnerO.ID)
),
You can see where I've attempted to perform the subtraction, but It doesn't recognize InnerOQLI, where I've tried to add it to perform the subtraction. I can't work out how to correct this, I realize that it cant fully recognize the InnerOQLI since it's in the Exists statement, Is there away around this? If anyone could help I'd appreciate it
Thanks
Instead of trying to perform the subtraction within the scope of the CTE, I just had to place it into the sub query which was using this particular CTE, in the main select list.
The Problem was, `InnerOQLI.FreePlaceCount is not being recognized, since it hasn't been defined within the scope of the CTE and is only being used in the Exists statement.