Improve Performance of SQL query joining 14 tables - sql

I am trying to join 14 tables in which few tables I need to join using left join.
With the existing data which is around 7000 records,its taking around 10 seconds to execute the below query.I am afraid what if the records are more than million.Please help me improve the performance of the below query.
CREATE proc [dbo].[GetTodaysActualInvoiceItemSoldHistory]
#fromdate datetime,
#todate datetime
as
Begin
select SDID.InvoiceDate as [Sold Date],Cust.custCompanyName as [Sold To] ,
case SQBD.TransferNo when '0' then IVM.VendorName else SQBD.TransferNo end as [Purchase From],
SQBD.BatchSellQty as SoldQty,SQID.SellPrice,
SDID.InvoiceNo as [Sales Invoice No],INV.PRInvoiceNo as [PO Invoice No],INV.PRInvoiceDate as [PO Invoice Date],
SQID.ItemDesc as [Item Description],SQID.NetPrice,SDHM.DeliveryHeaderMasterName as DeliveryHeaderName,
SQID.ItemCode as [Item Code],
SQBD.BatchNo,SQBD.ExpiryDate,SQID.Amount,
SQID.Dept_ID as Dept_ID,
Dept_Name as [Department],SQID.Catg_ID as Catg_ID,
Category_Name as [Category],SQID.Brand_ID as Brand_ID,
BrandName as BrandName, SQID.Manf_Id as Manf_Id,
Manf.ManfName as [Manufacturer],
STM.TaxName, SQID.Tax_ID as Tax_ID,
INV.VendorID as VendorID,
SQBD.ItemID,SQM.Isdeleted,
SDHM.DeliveryHeaderMasterID,Cust.CustomerMasterID
from SD_QuotationMaster SQM
inner join SD_InvoiceDetails SDID on SQM.QuoteID = SDID.QuoteID
inner join SD_QuoteItemDetails SQID on SDID.QuoteID = SQID.QuoteID
inner join SD_QuoteBatchDetails SQBD on SDID.QuoteID = SQBD.QuoteID and SQID.ItemID=SQBD.ItemID
inner join INV_ProductInvoice INV on SQBD.InvoiceID=INV.ProductInvoiceID
inner jOIN INV_VendorMaster IVM ON INV.VendorID = IVM.VendorID
inner jOIN Sys_TaxMaster STM ON SQID.Tax_ID = STM.Tax_ID
inner join Cust_CustomerMaster Cust on SQM.CustomerMasterID = Cust.CustomerMasterID
left jOIN INV_DeptartmentMaster Dept ON SQID.Dept_ID = Dept.Dept_ID
left jOIN INV_BrandMaster BRD ON SQID.Brand_ID = BRD.Brand_ID
left jOIN INV_ManufacturerMaster Manf ON SQID.Manf_Id = Manf.Manf_Id
left join INV_CategoryMaster CAT ON SQID.Catg_ID = CAT.Catg_ID
left join SLRB_DeliveryCustomerMaster SDCM on SQM.CustomerMasterID=SDCM.CustomerMasterID and SQM.DeliveryHeaderMasterID=SDCM.DeliveryHeaderMasterID
left join SLRB_DeliveryHeaderMaster SDHM on SDCM.DeliveryHeaderMasterID=SDHM.DeliveryHeaderMasterID
where (SQM.IsDeleted=0) and SQBD.BatchSellQty > 0
and SDID.InvoiceDate between #fromdate and #todate
order by ItemDesc
End
Only the below tables contain more data while other tables have records less than 20
InvoiceDetails, QuoteMaster, QuoteItemDetails, QuoteBatchDetails ProductInvoice
Below is the link for execution plan
http://jmp.sh/CSZc2x2
Thanks.

Let's start with an obvious error:
(isnull(SQBD.BatchSellQty,0) > 0)
That one is not indexable, so it should not happen. Seriously, BatchSellQty should not be unknown (nullable) in most cases, or you better handle null properly. That field should be indexed and I am not sure I would like that with an isNull - there are likely tons of batches. Also note that a filtered index (condition >0) may work here.
Second, check that you have all proper indices and the execution plan makes sense.
Thids, you have to test with a ton of data. Index statistics may make a difference. Check where the time is spent - it may be tempdb in which case you really need a good tempdb IO speed.... and it is not realted to the input side.

You can try to use query hints to help SQL Server optimizer build a optimal query execution plan. For example, you can force the order of tables will be joined, using FORCE ORDER statement. If you order your tables in order that joins with minimum result size at each step, query will execute faster (may be, needs to try). Example:
We need to A join B join C
If A join B = 2000 records x 1000 records = ~400 records (we suspect this result)
And A join C = 2000 records x 10 records = ~3 records (and this)
And B join C = 1000 records x 10 records = 10 000 records (and this)
In this case optimal order will be
A join C join B = ~3 records x 1000 records = ~3000 records

Related

SQL - faster to filter by large table or small table

I have the below query which takes a while to run, since ir_sales_summary is ~ 2 billion rows:
select c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend,
sum(sales_units_cy) as TY_unitSales, sum(sales_cost_cy) as TY_costDollars, sum(sales_units_ret_cy) as TY_retailDollars,
sum(sales_units_ly) as LY_unitSales, sum(sales_cost_ly) as LY_costDollars, sum(sales_units_ret_ly) as LY_retailDollars
from ir_sales_summary i
left join Chains c
on c.ChainID = i.ChainID
inner join Suppliers s
on s.SupplierID = i.SupplierID
inner join tmpWeekend we
on we.SaleDate = i.saledate
where year(i.saledate) = '2017'
group by c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend
(Worth noting, it takes roughly 3 hours to run since it is using a view that brings in data from a legacy service)
I'm thinking there's a way to speed up the filtering, since I just need the data from 2017. Should I be filtering from the big table (i) or be filtering from the much smaller weekending table (which gives us just the week ending dates)?
Try this. This might help, joining a static table as first table in query onto a fact/dynamic table will impact query performance i believe.
SELECT c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend
,sum(sales_units_cy) AS TY_unitSales
,sum(sales_cost_cy) AS TY_costDollars
,sum(sales_units_ret_cy) AS TY_retailDollars
,sum(sales_units_ly) AS LY_unitSales
,sum(sales_cost_ly) AS LY_costDollars
,sum(sales_units_ret_ly) AS LY_retailDollars
FROM Suppliers s
INNER JOIN (
SELECT we
,weeekend
,supplierid
,chainid
,sales_units_cy
,sales_cost_cy
,sales_units_ret_cy
,sales_units_ly
,sales_cost_ly
,sales_units_ret_ly
FROM ir_sales_summary i
INNER JOIN tmpWeekend we
ON we.SaleDate = i.saledate
WHERE year(i.saledate) = '2017'
) i
ON s.SupplierID = i.SupplierID
INNER JOIN Chains c
ON c.ChainID = i.ChainID
GROUP BY c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend

How to prevent timeout in query?

SELECT C.CompanyName,
B.BranchName,
E.EmployerName,
FE.EmployeeUniqueID,
pcr.EmployerUniqueID,
Case when FE.Status_id= 1 then 1 else 0 end IsUnPaid,
Case when re.EmployeeUniqueID IS NULL OR re.EmployeeUniqueID= '' then 0 else 1 end AS 'EmployeeRegistration',
FE.IncomeFixedComponent,
FE.IncomeVariableComponent,
Convert(varchar(11), Fe.PayStartDate, 106) as PayStartDate,
Convert(varchar(11), Fe.PayEndDate, 106) as PayEndDate,
S.StatusDescription,
FE.IsRejected,
FE.ID 'EdrID',
Convert(varchar(20), tr.TransactionDateTime, 113) as TransactionDateTime,
tr.BatchNo,
tr.IsDIFCreated,
Convert(varchar(20),tr.DIFFileCreationDateTime,113) as DiffDateTime
From File_EdrEntries FE
JOIN PAFFiles pe ON pe.ID = FE.PAFFile_ID
inner Join RegisteredEmployees RE
ON RE.EmployeeUniqueID= FE.EmployeeUniqueID
inner join File_PCREntries pcr on pe.ID=pcr.PAFFile_ID
JOIN Employers E ON E.EmployerID = pcr.EmployerUniqueID
JOIN Branches B ON B.BranchID = E.Branch_ID
JOIN companies C ON C.COMPANYID = B.COMPANY_ID
JOIN Statuses S ON S.StatusID = FE.Status_ID
JOIN Transactions tr on tr.EDRRecord_ID= fe.ID
where E.Branch_id=3
AND FE.IsRejected=0 AND FE.Status_id= 3 and tr.BatchNo is not null
AND Re.Employer_ID= re.Employer_ID;
THis query is supposed to return 10 million or more records and it usually causes timeout because of large no of records. So how can I improve its performance becauses I have done in where condition what I could.
First of all, you need to
optimize query more
Add required Indexes to tables involved in query
Then,
You can use this, to increase Query Timeout:
SET LOCK_TIMEOUT 1800;
SELECT ##LOCK_TIMEOUT AS [Lock Timeout];
Also, refer This Post
Find out which combination of tables filters the most data. for example if the following query filters out the majority of data you could consider creating a temp table with the data needed, index it and then use that in your bigger query.
SELECT fe.*,re.*
From File_EdrEntries FE
inner Join RegisteredEmployees RE
ON RE.EmployeeUniqueID= FE.EmployeeUniqueID
Breaking out the query into smaller chunks is likely the best way to go. Also make sure you have proper indexes in place

Why does separate table perform significantly better than subquery?

I was trying to improve performance of a SQL query and tried few combinations.
Original Query
SELECT ALIAS_A.id1,
ALIAS_A.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
FROM db_A.table_A ALIAS_A
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON ALIAS_A.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON ALIAS_A.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(ALIAS_A.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
AND Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE
The above query consumes nearly 400k impactCPU
Optimized Query 1
SELECT New_sub_table.id1,
New_sub_table.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
--changed part start--
FROM ( sel * from db_A.table_A ALIAS_A WHERE Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE ) New_sub_table -- created a subquery
--changed part end--
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON New_sub_table.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON New_sub_table.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(New_sub_table.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
I thought to filter the data first and then do the joins. After I checked the performance stats. It was consuming nearly 390k CPU. Not much of a difference.
Optimized Query 2
SELECT ALIAS_A.id1,
ALIAS_A.id2,
ALIAS_B.columnA,
ALIAS_C.columnB,
ALIAS_B.columnC
--changed part start--
FROM INTERMEDIATE_DB.INTERMEDIATE_TABLE ALIAS_A --CREATED AN INTERMEDIATE TABLE
--changed part end--
LEFT OUTER JOIN db_A.table_B ALIAS_B
ON ALIAS_A.id2 = ALIAS_B.id2
LEFT OUTER JOIN db_B.table_C ALIAS_C
ON ALIAS_B.columnA = ALIAS_C.item_num
LEFT OUTER JOIN db_A.table_D ALIAS_D
ON ALIAS_A.id2 = ALIAS_D.id2
INNER JOIN db_C.table_E ALIAS_E
ON Cast(ALIAS_A.column_date AS DATE) BETWEEN
ALIAS_E.column_startdate AND ALIAS_E.column_enddate
WHERE ALIAS_E.fiscalyear >= 2016
MACRO for loading data into intermediate table
INSERT INTO INTERMEDIATE_DB.INTERMEDIATE_TABLE
sel * from db_A.table_A ALIAS_A WHERE Cast(ALIAS_A.columnD AS DATE) BETWEEN
CURRENT_DATE - 5 AND CURRENT_DATE
So what I did here was. I used an intermediate table instead of subquery. The intermediate table gets loaded via the macro first and then the select query runs. It now consumes only 50k impactCPU (for both Macro and Select query combined).
My question -
I am unable to reason why this is happening even though the logic behind both queries is same (or so I think it is). What would be the best practice if this is incorrect way ?
Your main problem is the Cast(ALIAS_A.columnD AS DATE). When you check Explains you will notice the optimizer has no confidence for this step, probably greatly overestimating the number of rows returned.
But when you materialize the Select the number of rows is better known and the order of joins changes.
You would probably get the same plan when you Collect Statistics on the Cast(ALIAS_A.columnD AS DATE), run DIAGNOSTIC HELPSTATS ON FOR SESSION; and Explain should show you this as recommended stats.

SQL query that uses a GROUP BY and IN is too slow

I am struggling to speed this SQL query up. I have tried removing all the fields besides the two SUM() functions and the Id field but it is still incredibly slow. It is currently taking 15 seconds to run. Does anyone have any suggestions to speed this up as it is currently causing a timeout on a page in my web app. I need the fields shown so I can't really remove them but there surely has to be a way to improve this?
SELECT [Customer].[iCustomerID],
[Customer].[sCustomerSageCode],
[Customer].[sCustomerName],
[Customer].[sCustomerTelNo1],
SUM([InvoiceItem].[fQtyOrdered]) AS [Quantity],
SUM([InvoiceItem].[fNetAmount]) AS [Value]
FROM [dbo].[Customer]
LEFT JOIN [dbo].[CustomerAccountStatus] ON ([Customer].[iAccountStatusID] = [CustomerAccountStatus].[iAccountStatusID])
LEFT JOIN [dbo].[SalesOrder] ON ([SalesOrder].[iCustomerID] = [dbo].[Customer].[iCustomerID])
LEFT JOIN [Invoice] ON ([Invoice].[iCustomerID] = [Customer].[iCustomerID])
LEFT JOIN [dbo].[InvoiceItem] ON ([Invoice].[iInvoiceNumber] = [InvoiceItem].[iInvoiceNumber])
WHERE ([InvoiceItem].[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
AND( ([dbo].[SalesOrder].[dOrderDateTime] >= '2013-01-01')
OR ([dbo].[Customer].[dDateCreated] >= '2014-01-01'))
GROUP BY [Customer].[iCustomerID],[Customer].[sCustomerSageCode],[Customer].[sCustomerName], [Customer].[sCustomerTelNo1];
I don't think this query is doing what you want anyway. As written, there are no relationships between the Invoice table and the SalesOrder table. This leads me to believe that it is producing a cartesian product between invoices and orders, so customers with lots of orders would be generating lots of unnecessary intermediate rows.
You can test this by removing the SalesOrder table from the query:
SELECT c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1],
SUM(it.[fQtyOrdered]) AS [Quantity], SUM(it.[fNetAmount]) AS [Value]
FROM [dbo].[Customer] c LEFT JOIN
[dbo].[CustomerAccountStatus] cas
ON c.[iAccountStatusID] = cas.[iAccountStatusID] LEFT JOIN
[Invoice] i
ON (i.[iCustomerID] = c.[iCustomerID]) LEFT JOIN
[dbo].[InvoiceItem] it
ON (i.[iInvoiceNumber] = it.[iInvoiceNumber])
WHERE it.[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003') AND
c.[dDateCreated] >= '2014-01-01'
GROUP BY c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1];
If this works and you need the SalesOrder, then you will need to either pre-aggregate by SalesOrder or find better join keys.
The above query could benefit from an index on Customer(dDateCreated, CustomerId).
You have a lot of LEFT JOIN
I don't see CustomerAccountStatus usage. Ou can exclude it
The [InvoiceItem].[sNominalCode] could be null in case of LEFT JOIN so add [InvoiceItem].[sNominalCode] is not null or <THE IN CONDITION>
Also add the is not null checks to other conditions
It seems you are looking for customers that are either created this year or for which sales orders exist from last year or this year. So select from customers, but use EXISTS on SalesOrder. Then you want to count invoices. So outer join them and make sure to have the criteria in the ON clause. (sNominalCode will be NULL for any outer joined records. Hence asking for certain sNominalCode in the WHERE clause will turn your outer join into an inner join.)
SELECT
c.iCustomerID,
c.sCustomerSageCode,
c.sCustomerName,
c.sCustomerTelNo1,
SUM(ii.fQtyOrdered) AS Quantity,
SUM(ii.fNetAmount) AS Value
FROM dbo.Customer c
LEFT JOIN dbo.Invoice i ON (i.iCustomerID = c.iCustomerID)
LEFT JOIN dbo.InvoiceItem ii ON (ii.iInvoiceNumber = i.iInvoiceNumber AND ii.sNominalCode IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
WHERE c.dDateCreated >= '2014-01-01'
OR EXISTS
(
SELECT *
FROM dbo.SalesOrder
WHERE iCustomerID = c.iCustomerID
AND dOrderDateTime >= '2013-01-01'
)
GROUP BY c.iCustomerID, c.sCustomerSageCode, c.sCustomerName, c.sCustomerTelNo1;

Correct form long time query executing myodbc syntax

I'm trying to build one SQL query for Access that links tables with myodbc connection to retrive the data from internet, but the time to finish the query is too long about five minutes, so I think the problem is with the query:
SELECT COUNT([o].[orders_id]) AS howmany_orders,
(SELECT SUM([op1].[products_quantity]) FROM orders_total AS ot1, orders AS o1, orders_products AS op1
WHERE [o1].[date_purchased] >=date()-30 and [o1].[orders_id] = [op1].[orders_id] and [ot1].[orders_id] = [op1].[orders_id] and [ot1].[class]="ot_total" and [o1].[orders_status] = 1 and [op1].[products_id]=[op].[products_id]
GROUP BY [op1].[products_id]
) AS pendiente,
[op].[products_model],
Round((((7+1)*(howmany_orders/90))+1)-(p.stock_real- IIF(pendiente>0,pendiente,0)), 0) AS pedir,
p.ref_id
FROM orders_total AS ot, orders AS o, orders_products AS op INNER JOIN Productos AS p ON Mid([op].[products_model],4) LIKE p.ref_id
WHERE [o].[date_purchased] >=date()-90 and [o].[orders_id] = [op].[orders_id] and [ot].[orders_id] = [op].[orders_id] and [ot].[class]="ot_total" and [o].[orders_status] IN (7, 1) and ((p.fuera_de_stock)=False) and ((p.suspendido)=False) and ((p.quitar_de_la_web)=False)
GROUP BY [op].[products_model], p.ref_id, p.stock_real, [op].[products_id];
At a glance I see that the "LIKE" operator could be one of the problems here:
INNER JOIN Productos AS p ON Mid([op].[products_model],4) LIKE p.ref_id
but I have not way to substitute for an = operator
Thanks for your help!
EDITING:
I have reduced the query to that but is the same time:
SELECT COUNT(o.orders_id) AS howmany_orders, (
SELECT SUM(opz.products_quantity) FROM orders AS oz, orders_products AS opz WHERE oz.date_purchased >=date()-30 and oz.orders_id = opz.orders_id and oz.orders_status = 1 and opz.products_id=op.products_id GROUP BY opz.products_id
) AS pendiente, op.products_model, Round((((7+1)*(howmany_orders/90))+1)-(p.stock_real-IIf(pendiente>0,pendiente,0)),0) AS pedir, p.ref_id
FROM orders AS o, orders_products AS op INNER JOIN Productos AS p ON op.products_model=p.cod
WHERE o.date_purchased>=date()-90 And o.orders_id=op.orders_id And o.orders_status In (7,1) And ((p.suspendido)=False) And ((p.quitar_de_la_web)=False)
GROUP BY op.products_model, p.ref_id, p.stock_real, op.products_id;
Yep there’s your problem. The database can not do the join using any indexes so it has to do a table scan. Is there anyway you could persist this data so you don’t have to do the MID statement and just join on that? i.e. in your products_model table have an extra column with the MID data stored in there with the join on that column