SQL not efficient enough, tuning assistance required

SQL not efficient enough, tuning assistance required - sql

We have some SQL that is ok on smaller data volumes but poor once we scale up to selecting from larger volumes. Is there a faster alternative style to achieve the same output as below? The idea is to pull back a single unique row to get latest version of the data... The SQL does reference another view but this view runs very fast - so we expect the issue is here below and want to try a different approach
SELECT *
FROM
(SELECT (select CustomerId from PremiseProviderVersionsToday
where PremiseProviderId = b.PremiseProviderId) as CustomerId,
c.D3001_MeterId, b.CoreSPID, a.EnteredBy,
ROW_NUMBER() OVER (PARTITION BY b.PremiseProviderId
ORDER BY a.effectiveDate DESC) AS rowNumber
FROM PremiseMeterProviderVersions a, PremiseProviders b,
PremiseMeterProviders c
WHERE (a.TransactionDateTimeEnd IS NULL
AND a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND b.PremiseProviderId = c.PremiseProviderId)
) data
WHERE data.rowNumber = 1

As Bilal Ayub stated above, the correlated subquery can result in performance issues. See here for more detail. Below are my suggestions:
Change all to explicit joins (ANSI standard)
Use aliases that are more descriptive than single characters (this is mostly to help readers understand what each table does)
Convert data subquery to a temp table or cte (temp tables and ctes usually perform better than subqueries)
Note: normally, you should explicitly create and insert into your temp table but I chose not to do that here as I do not know the data types of your columns.
SELECT d.CustomerId
, c.D3001_MeterId
, b.CoreSPID
, a.EnteredBy
, rowNumber = ROW_NUMBER() OVER(PARTITION BY b.PremiseProviderId ORDER BY a.effectiveDate DESC)
INTO #tmp_RowNum
FROM PremiseMeterProviderVersions a
JOIN PremiseMeterProviders c ON c.PremiseMeterProviderId = a.PremiseMeterProviderId
JOIN PremiseProviders b ON b.PremiseProviderId = c.PremiseProviderId
JOIN PremiseProviderVersionsToday d ON d.PremiseProviderId = b.PremiseProviderId
WHERE a.TransactionDateTimeEnd IS NULL
SELECT *
FROM #tmp_RowNum
WHERE rowNumber = 1

You are running a correlated query that will run in loop, if size of table is small it will be faster, i would suggest to change it and try to join the table in order to get customerid.
(select CustomerId from PremiseProviderVersionsToday where PremiseProviderId = b.PremiseProviderId) as CustomerId

Consider derived tables including an aggregate query that calculates maximum EffectoveDate by PremiseProviderId and unit level query, each using explicit joins (current ANSI SQL standard) and not implicit as you currently use:
SELECT data.*
FROM
(SELECT t.CustomerId, c.D3001_MeterId, b.CoreSPID, a.EnteredBy,
b.PremiseProviderId, a.EffectiveDate
FROM PremiseMeterProviders c
INNER JOIN PremiseMeterProviderVersions a
ON a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND a.TransactionDateTimeEnd IS NULL
INNER JOIN PremiseProviders b
ON b.PremiseProviderId = c.PremiseProviderId
INNER JOIN PremiseProviderVersionsToday t
ON t.PremiseProviderId = b.PremiseProviderId
) data
INNER JOIN
(SELECT b.PremiseProviderId, MAX(a.EffectiveDate) As MaxEffDate
FROM PremiseMeterProviders c
INNER JOIN PremiseMeterProviderVersions a
ON a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND a.TransactionDateTimeEnd IS NULL
INNER JOIN PremiseProviders b
ON b.PremiseProviderId = c.PremiseProviderId
GROUP BY b.PremiseProviderId
) agg
ON data.PremiseProviderId = agg.PremiseProviderId
AND data.EffectiveDate = agg.MaxEffDate

Related

Query performance: CTE using ROW_NUMBER() to select first row

We have three environments and when I run my SQL query in two of them just takes 30 or 38 seconds to run but in the other environment running is not completed and I should cancel it. Query is based on two parts, a CTE and a very simple select from a table, in both CTE and select I'm using the same table.
Could you please tell me why it takes so long time? how can I improve the query?
ALTER VIEW [fact].[vPurchase]
AS
WITH VKPL AS
(
SELECT *
FROM
(SELECT
iv.[Delivery_FK],
1 AS column2,
ROW_NUMBER() OVER(PARTITION BY [Delivery_FK] ORDER BY iv.UpdateDate) AS rk
FROM
[fact].[KRMFact] iv
LEFT JOIN
[dimension].[Product] pr ON iv.Product_FK =pr.Product_SK
LEFT JOIN
[dimension].[Delivery] le ON le.Delivery_FK = iv.Delivery_FK
WHERE
pr.Product_Key = '740') X
WHERE
rk = 1
)
SELECT
-- .... here are some columns
Delivery_FK,
Product_FK,
CAST(column2 AS VARCHAR) AS column2,
f.[UpdateDate] AS [Update date]
FROM
[fact].[KRMFact] f
LEFT JOIN
VKPL v ON f.Delivery_FK = v.Delivery_FK

This is guesswork.
I guess the environment where this query is slow is the one with lots of production data in it.
I guess some index on your KRMFact table will, maybe, help you. Here's how to figure out what you need: SQL Server Management Studio (SSMS) has a feature to show you a query's execution plan. Put your query (not simplified, please, the actual query) into SSMS, right click and choose "Include Actual Execution Plan." Then run the query. The execution plan display may recommend an index for you to create to get this query to run faster.
I guess you're trying to find rows with the earliest values of UpdateDate.
Your subquery
SELECT *
FROM
(SELECT
iv.[Delivery_FK],
1 AS column2,
ROW_NUMBER() OVER(PARTITION BY [Delivery_FK] ORDER BY iv.UpdateDate) AS rk
FROM
[fact].[KRMFact] iv
LEFT JOIN
[dimension].[Product] pr ON iv.Product_FK =pr.Product_SK
LEFT JOIN
[dimension].[Delivery] le ON le.Delivery_FK = iv.Delivery_FK
WHERE
pr.Product_Key = '740') X
WHERE
rk = 1
looks like it picks out the row with the earliest KRMFact.UpdateDate for each value of KRMFact.Delivery_FK. That's what the ROW_NUMBER() OVER... WHERE rk=1 language does.
If my guess about that is correct you can do that a different way, which may be more efficient.
SELECT *
FROM
(SELECT
iv.[Delivery_FK],
1 AS column2,
1 AS rk
FROM
[fact].[KRMFact] iv
JOIN ( SELECT Delivery_FK, MIN(UpdateDate) first_update
FROM [fact].[KRMFact]
GROUP BY Delivery_FK
) first_update ON iv.UpdateDate = first_update.first_update
LEFT JOIN
[dimension].[Product] pr ON iv.Product_FK =pr.Product_SK
LEFT JOIN
[dimension].[Delivery] le ON le.Delivery_FK = iv.Delivery_FK
WHERE
pr.Product_Key = '740') X
WHERE
rk = 1
You should probably try out the old and new versions of the subquery to determine whether they will yield the same results.
If you use this subquery query I suggest, this index will help make it run faster by optimizing the new sub-sub-query's MIN() ... GROUP BY operation.
CREATE INDEX x_KRMFact_Product_Update
ON [fact].[KRMFact]
([Product_FK],[UpdateDate])
By the way, WHERE pr.Product_Key = '740' turns your LEFT JOIN [dimension].[Product] operation into an ordinary inner JOIN.

SELECT statement where rows are omitted based on another table

Table with orders has another table with positions. I want the orders table to show but then only have the most up to-date position on it. Below is a picture of the 3 rows I want showing. Omit the rest.
SELECT DispatchTable.ordernumber, DispatchTable.truck,
DispatchTable.driver, DispatchTable.actualpickup,
DispatchTable.actualdropoff, orders.pickupdateandtime,
orders.dropoffdateandtime, Truck002.lastposition,
Truck002.lastdateandtime
FROM DispatchTable
INNER JOIN orders ON DispatchTable.ordernumber = orders.id
INNER JOIN Truck002 ON DispatchTable.truck = Truck002.name
WHERE (orders.status = 'onRoute')

Assuming that you want the row having the latest lastdateandtime for the truck name, this should work:
SELECT DispatchTable.ordernumber,
DispatchTable.truck,
DispatchTable.driver,
DispatchTable.actualpickup,
DispatchTable.actualdropoff,
orders.pickupdateandtime,
orders.dropoffdateandtime,
TruckLatest.lastposition,
TruckLatest.lastdateandtime
FROM DispatchTable
INNER JOIN orders ON DispatchTable.ordernumber = orders.id
INNER JOIN (SELECT name,
lastposition,
lastdateandtime
FROM Truck002 Truck1
WHERE lastdateandtime =
(SELECT MAX(lastdateandtime)
FROM Truck002 Truck2
WHERE Truck2.name = Truck1.name)) TruckLatest
ON DispatchTable.truck = TruckLatest.name
WHERE (orders.status = 'onRoute')

If I understand correctly, you can get the most recent record for a truck using ROW_NUMBER():
SELECT dt.ordernumber, dt.truck,
dt.driver, dt.actualpickup,
dt.actualdropoff, o.pickupdateandtime,
o.dropoffdateandtime, t.lastposition,
t.lastdateandtime
FROM DispatchTable dt INNER JOIN
orders o
ON dt.ordernumber = o.id INNER JOIN
(SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.name ORDER BY t.lastdateandtime DESC) as seqnum
FROM Truck002 t
) t
ON dt.truck = t.name
WHERE o.status = 'onRoute' AND seqnum = 1;

Firstly, why are you using Truck002's name field rather than its id field as the link to DispacthTable? This is considered a less efficient way of doing it than using id (which is either a numerical field or a shorter string than name).
Secondly, you should mention in your Question that each Order can have many DispatchTable's and that each DispacthTable can have many Truck002's, otherwise many people will start by assuming that it is the other way round between DispatchTable and Truck002.
Thirdly, please try...
SELECT DispatchTable.ordernumber,
DispatchTable.truck,
DispatchTable.driver,
DispatchTable.actualpickup,
DispatchTable.actualdropoff,
orders.pickupdateandtime,
orders.dropoffdateandtime,
Truck002.lastposition,
Truck002.lastdateandtime
FROM DispatchTable
INNER JOIN orders ON DispatchTable.ordernumber = orders.id
INNER JOIN Truck002 ON DispatchTable.truck = Truck002.name
WHERE (orders.status = 'onRoute')
GROUP BY ordernumber
HAVING lastdateandtime = MAX( lastdateandtime )
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://msdn.microsoft.com/en-us/library/bb177906(v=office.12).aspx (on HAVING)
https://www.w3schools.com/sql/sql_having.asp (on HAVING)
https://msdn.microsoft.com/en-us/library/bb177905(v=office.12).aspx (on GROUP BY)
https://www.w3schools.com/sql/sql_groupby.asp (on GROUP BY)

SQL - select only newest record with WHERE clause

I have been trying to get some data off our database but got stuck when I needed to only get the newest file upload for each file type. I have done this before using the WHERE clause but this time there is an extra table involved that is needed to determine the file type.
My query looks like this so far and i am getting six records for this user (2x filetypeNo4 and 4x filetypeNo2).
SELECT db_file.fileID
,db_profile.NAME
,db_applicationFileType.fileTypeID
,> db_file.dateCreated
FROM db_file
LEFT JOIN db_applicationFiles
ON db_file.fileID = db_applicationFiles.fileID
LEFT JOIN db_profile
ON db_applicationFiles.profileID = db_profile.profileID
LEFT JOIN db_applicationFileType
ON db_applicationFiles.fileTypeID = > > db_applicationFileType.fileTypeID
WHERE db_profile.profileID IN ('19456')
AND db_applicationFileType.fileTypeID IN ('2','4')
I have the WHERE clause looking like this which is not working:
(db_file.dateCreated IS NULL
OR db_file.dateCreated = (
SELECT MAX(db_file.dateCreated)
FROM db_file left join
db_applicationFiles on db_file.fileID = db_applicationFiles.fileID
WHERE db_applicationFileType.fileTypeID = db_applicationFiles.FiletypeID
))
Sorry I am a noob so this may be really simple, but I just learn this stuff as I go on my own..

SELECT
ff.fileID,
pf.NAME,
ff.fileTypeID,
ff.dateCreated
FROM db_profile pf
OUTER APPLY
(
SELECT TOP 1 af.fileTypeID, df.dateCreated, df.fileID
FROM db_file df
INNER JOIN db_applicationFiles af
ON df.fileID = af.fileID
WHERE af.profileID = pf.profileID
AND af.fileTypeID IN ('2','4')
ORDER BY create_date DESC
) ff
WHERE pf.profileID IN ('19456')
And it looks like all of your joins are actually INNER. Unless there may be profile without files (that's why OUTER apply instead of CROSS).

What about an obvious:
SELECT * FROM
(SELECT * FROM db_file ORDER BY dateCreated DESC) AS files1
GROUP BY fileTypeID ;

Ms access get row number from a subquery not table

I have the following Ms Access query that retrieves data successfully:
SELECT stockInventory.purchaseId, stockInventory.itemId, item.itemName, stockInventory.unitId, unit.unitDesc, stockInventory.quantity, stockInventory.costPrice
FROM unit INNER JOIN (item INNER JOIN stockInventory ON item.itemId = stockInventory.itemId) ON unit.unitId = stockInventory.unitId
WHERE (((stockInventory.purchaseId)=1))
Now I want to retrieve these data with row number!
I tried the following:
SELECT A.*, ( SELECT COUNT(*) FROM A WHERE A.itemId>=itemId ) as rowNo
FROM
(
SELECT stockInventory.purchaseId, stockInventory.itemId, item.itemName, stockInventory.unitId, unit.unitDesc, stockInventory.quantity, stockInventory.costPrice
FROM unit INNER JOIN (item INNER JOIN stockInventory ON item.itemId = stockInventory.itemId) ON unit.unitId = stockInventory.unitId
WHERE (((stockInventory.purchaseId)=1))
) AS A;
But it says: The Microsoft access database engine cannot find the input table or query 'A' as the following picture:
How can I solve this problem?

The additional SELECT part
( SELECT COUNT(*) FROM A WHERE A.itemId>=itemId ) as rowNo
is a separate query that doesn't know about A.
I think you must save your original query (= the subquery) as new named query, then you can reference it in both SELECT parts.
SELECT A.*,
( SELECT COUNT(*) FROM mySubquery AS B WHERE B.itemId>=A.itemId ) as rowNo
FROM mySubquery AS A
Now it also gets clearer that you need two instances of the subquery (A and B).
I hope you don't have too many records, because performance will probably be bad. But that wasn't the focus here...

Consider directly adding rowNo subquery in original query:
SELECT (SELECT Count(*) FROM stockInventory AS sub
WHERE sub.itemId <= stockInventory.itemId) AS rowNo,
stockInventory.purchaseId, stockInventory.itemId, item.itemName,
stockInventory.unitId, unit.unitDesc, stockInventory.quantity,
stockInventory.costPrice
FROM unit
INNER JOIN (item
INNER JOIN stockInventory
ON item.itemId = stockInventory.itemId)
ON unit.unitId = stockInventory.unitId
WHERE (((stockInventory.purchaseId)=1))

Max date in view on left outer join

Thanks to a previous question, I found out how to pull the most recent data based on a linked table. BUT, now I have a related question.
The solution that I found used row_number() and PARTITION to pull the most recent set of data. But what if there's a possibility for zero or more rows in a linked table in the view? For example, the table FollowUpDate might have 0 rows, or 1, or more. I just want the most recent FollowUpDate:
SELECT
EFD.FormId
,EFD.StatusName
,MAX(EFD.ActionDate)
,EFT.Name AS FormType
,ECOA.Account AS ChargeOffAccount
,ERC.Name AS ReasonCode
,EAC.Description AS ApprovalCode
,MAX(EFU.FollowUpDate) AS FollowUpDate
FROM (
SELECT EF.FormId, EFD.ActionDate, EFS.Name AS StatusName, EF.FormTypeId, EF.ChargeOffId, EF.ReasonCodeId, EF.ApprovalCodeId,
row_number() OVER ( PARTITION BY EF.FormId ORDER BY EFD.ActionDate DESC ) DateSortKey
FROM Extension.FormDate EFD INNER JOIN Extension.Form EF ON EFD.FormId = EF.FormId INNER JOIN Extension.FormStatus EFS ON EFD.StatusId = EFS.StatusId
) EFD
INNER JOIN Extension.FormType EFT ON EFD.FormTypeId = EFT.FormTypeId
LEFT OUTER JOIN Extension.ChargeOffAccount ECOA ON EFD.ChargeOffId = ECOA.ChargeOffId
LEFT OUTER JOIN Extension.ReasonCode ERC ON EFD.ReasonCodeId = ERC.ReasonCodeId
LEFT OUTER JOIN Extension.ApprovalCode EAC ON EFD.ApprovalCodeId = EAC.ApprovalCodeId
LEFT OUTER JOIN (Select EFU.FormId, EFU.FollowUpDate, row_number() OVER (PARTITION BY EFU.FormId ORDER BY EFU.FollowUpDate DESC) FUDateSortKey FROM Extension.FormFollowUp EFU INNER JOIN Extension.Form EF ON EFU.FormId = EF.FormId) EFU ON EFD.FormId = EFU.FormId
WHERE EFD.DateSortKey = 1
GROUP BY
EFD.FormId, EFD.ActionDate, EFD.StatusName, EFT.Name, ECOA.Account, ERC.Name, EAC.Description, EFU.FollowUpDate
ORDER BY
EFD.FormId
If I do a similar pull using row_number() and PARTITION, I get the data only if there is at least one row in FollowUpDate. Kinda defeats the purpose of a LEFT OUTER JOIN. Can anyone help me get this working?

I rewrote your query - you had unnecessary subselects, and used row_number() for the FUDateSortKey but didn't use the column:
SELECT t.formid,
t.statusname,
MAX(t.actiondate) 'actiondate',
t.formtype,
t.chargeoffaccount,
t.reasoncode,
t.approvalcode,
MAX(t.followupdate) 'followupdate'
FROM (
SELECT t.formid,
fs.name 'StatusName',
t.actiondate,
ft.name 'formtype',
coa.account 'ChargeOffAccount',
rc.name 'ReasonCode',
ac.description 'ApprovalCode',
ffu.followupdate,
row_number() OVER (PARTITION BY ef.formid ORDER BY t.actiondate DESC) 'DateSortKey'
FROM EXTENSION.FORMDATE t
JOIN EXTENSION.FORM ef ON ef.formid = t.formid
JOIN EXTENSION.FORMSTATUS fs ON fs.statusid = t.statusid
JOIN EXTENSION.FORMTYPE ft ON ft.formtypeid = ef.formtypeid
LEFT JOIN EXTENSION.CHARGEOFFACCOUNT coa ON coa.chargeoffid = ef.chargeoffid
LEFT JOIN EXTENSION.REASONCODE rc ON rc.reasoncodeid = ef.reasoncodeid
LEFT JOIN EXTENSION.APPROVALCODE ac ON ac.approvalcodeid = ef.approvalcodeid
LEFT JOIN EXTENSION.FORMFOLLOWUP ffu ON ffu.formid = t.formid) t
WHERE t.datesortkey = 1
GROUP BY t.formid, t.statusname, t.formtype, t.chargeoffaccount, t.reasoncode, t.approvalcode
ORDER BY t.formid
The change I made to allow for FollowUpDate was to use a LEFT JOIN onto the FORMFOLLOWUP table - you were doing an INNER JOIN, so you'd only get rows with FORMFOLLOWUP records associated.

It's pretty hard to guess what's going on without table definitions and sample data.
Also, this is confusing: "the table FollowUpDate might have 0 rows" and you "want the most recent FollowUpDate." (especially when there is no table named FollowUpDate) There is no "most recent FollowUpDate" if there are zero FollowUpDates.
Maybe you want
WHERE <follow up date row number> in (1,NULL)

I figured it out. And as usual, I need a nap. I just needed to change my subselect to something I would swear I'd tried with no success:
SELECT field1, field2
FROM Table1 t1
LEFT JOIN (
SELECT field3, max(dateColumn)
FROM Table2
GROUP BY
field3
) t2
ON (t1.field1 = t2.field3)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL not efficient enough, tuning assistance required - sql

You are running a correlated query that will run in loop, if size of table is small it will be faster, i would suggest to change it and try to join the table in order to get customerid. (select CustomerId from PremiseProviderVersionsToday where PremiseProviderId = b.PremiseProviderId) as CustomerId

Related

Query performance: CTE using ROW_NUMBER() to select first row

SELECT statement where rows are omitted based on another table

SQL - select only newest record with WHERE clause

Ms access get row number from a subquery not table

Max date in view on left outer join

Categories

Resources