SQL - faster to filter by large table or small table - sql

I have the below query which takes a while to run, since ir_sales_summary is ~ 2 billion rows:
select c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend,
sum(sales_units_cy) as TY_unitSales, sum(sales_cost_cy) as TY_costDollars, sum(sales_units_ret_cy) as TY_retailDollars,
sum(sales_units_ly) as LY_unitSales, sum(sales_cost_ly) as LY_costDollars, sum(sales_units_ret_ly) as LY_retailDollars
from ir_sales_summary i
left join Chains c
on c.ChainID = i.ChainID
inner join Suppliers s
on s.SupplierID = i.SupplierID
inner join tmpWeekend we
on we.SaleDate = i.saledate
where year(i.saledate) = '2017'
group by c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend
(Worth noting, it takes roughly 3 hours to run since it is using a view that brings in data from a legacy service)
I'm thinking there's a way to speed up the filtering, since I just need the data from 2017. Should I be filtering from the big table (i) or be filtering from the much smaller weekending table (which gives us just the week ending dates)?

Try this. This might help, joining a static table as first table in query onto a fact/dynamic table will impact query performance i believe.
SELECT c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend
,sum(sales_units_cy) AS TY_unitSales
,sum(sales_cost_cy) AS TY_costDollars
,sum(sales_units_ret_cy) AS TY_retailDollars
,sum(sales_units_ly) AS LY_unitSales
,sum(sales_cost_ly) AS LY_costDollars
,sum(sales_units_ret_ly) AS LY_retailDollars
FROM Suppliers s
INNER JOIN (
SELECT we
,weeekend
,supplierid
,chainid
,sales_units_cy
,sales_cost_cy
,sales_units_ret_cy
,sales_units_ly
,sales_cost_ly
,sales_units_ret_ly
FROM ir_sales_summary i
INNER JOIN tmpWeekend we
ON we.SaleDate = i.saledate
WHERE year(i.saledate) = '2017'
) i
ON s.SupplierID = i.SupplierID
INNER JOIN Chains c
ON c.ChainID = i.ChainID
GROUP BY c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend

Related

Converting a SQL subquery to a join for performance gains

I have a subquery with an inner join, This join is meant to cut down the data size to a more manageable size, before extracting data via an unpivot which has a further join to only pull out relevant matches..
When i've looked at the execution plan, it seems like the outer select is being executed first and thus taking an inordinate amount of time to complete as it is processing the data for all gamers instead of the cut down cohort..
this is the query
SELECT
t2.Gamer_ID,
C.Feature_Code,
C.Feature_Name,
t2.CODE_DATE
FROM
(
SELECT
CAST(A.Gamer_ID
,[Identification_Code_Code_1]
,[Identification_Code_Code_2]
,[Identification_Code_Code_3]
,[Identification_Code_Code_4]
,[Identification_Code_Code_5]
,[Identification_Code_Code_6]
,[Identification_Code_Code_7]
,[Identification_Code_Code_8]
,[Identification_Code_Code_9]
,[Identification_Code_Code_10]
,CAST(Joining_date AS DATE) AS CODE_DATE
FROM Gamer_Characteristics A
INNER JOIN Gamer_Population P ON P.Gamer_ID = A.Gamer_ID --cuts down the number of gamers to the selected cohort
) s
unpivot (CODE for col in (
[Identification_Code_Code_1]
,[Identification_Code_Code_2]
,[Identification_Code_Code_3]
,[Identification_Code_Code_4]
,[Identification_Code_Code_5]
,[Identification_Code_Code_6]
,[Identification_Code_Code_7]
,[Identification_Code_Code_8]
,[Identification_Code_Code_9]
,[Identification_Code_Code_10])) as t2
INNER JOIN Gamer_feature_Code C ON C.CODE = LEFT(t2.CODE,C.CODE_LENGTH) --join to a dimension table to pull through characteristcs based on code and code length
WHERE
T2.CODE_DATE <= '2020-03-31'
GROUP BY t2.Gamer_ID,
C.Feature_Code,
C.Feature_Name,
t2.CODE_DATE
I have two questions.
1: can this be converted to use a join instead of subquery
2: Can i force the inner join in the subquery to take precedence over the inner join in the outer select?

combining queries runs longer

I have 2 queries. Individually, both run well. How when I combine them, they run terribly slow and time out on me. The second query is returning a list of invoices that I want to use in the IN clause in the first query. Not sure if I'm doing something wrong. Thanks for your help....
select
ba.bill_acct_nbr,
ba.acct_name,
i.inv_id,
i.prnt_tmsp,
i.due_dt,
d.dvr_frst_name,
d.dvr_srnm,
r.ecr_ticket_no,
r.co_tmsp,
i.dr_note_amt ,
ht.amt HT,
vat.amt VAT
from rfs.rnt_agr_inv_notes i
inner join rfs.rnt_Agrs r on r.rnt_agr_nbr = i.rea_rnt_agr_nbr
inner join rfs_rv.dvr_rras d on r.rnt_agr_nbr = d.rdy_rnt_agr_nbr and d.main_dvr_flg = 'MR'
inner join rfs_rv.bus_acnts ba on ba.acct_id = i.bac_acc_id
inner join (select sum(chg_amt) amt, rain.inv_id
from rfs.rra_chgs rrc
inner join rfs.rnt_agr_inv_note_lns rainl on rrc.rrc_id=rainl.rrc_rrc_id
inner join rfs.rnt_agr_inv_notes rain on rainl.rai_inv_id=rain.inv_id
inner join rfs.rnt_agrs ra on rain.rea_rnt_agr_nbr=ra.rnt_agr_nbr
inner join rfs.rra_chg_typs rct on rrc.rct_chg_typ=rct.chg_typ
where rrc.rct_chg_typ not in ('TAX', 'SUR', 'VAT') and not ( ra.stn_system='ECR' and (rrc.rct_chg_typ ='02000' or rrc.rct_chg_typ ='02201'))
group by rain.inv_id) ht on ht.inv_id=i.inv_id
inner join (select sum(chg_amt) amt, rain.inv_id
from rfs.rra_chgs rrc
inner join rfs.rnt_agr_inv_note_lns rainl on rrc.rrc_id=rainl.rrc_rrc_id
inner join rfs.rnt_agr_inv_notes rain on rainl.rai_inv_id=rain.inv_id
inner join rfs.rnt_agrs ra on rain.rea_rnt_agr_nbr=ra.rnt_agr_nbr
inner join rfs.rra_chg_typs rct on rrc.rct_chg_typ=rct.chg_typ and not ( ra.stn_system='ECR' and rrc.rct_chg_typ ='02200')
where rrc.rct_chg_typ in ('TAX', 'SUR', 'VAT')
group by rain.inv_id) vat on vat.inv_id=i.inv_id
where
i.inv_id in (425001975206,550008226812,425002005105, 425002046396, 42500190929)
I commented out the last line above and replaced it with this code; which runs well by itself.
i.inv_id in (select
q.inv_id
from
rfs.rnt_agr_inv_notes q,
rfs_rv.bus_acnts ba
where
ba.acct_id = q.bac_acc_id
and ba.bill_acct_nbr IN ('16785616')
AND extract(MONTH from q.prnt_tmsp) = 5
AND extract(YEAR from q.prnt_tmsp) = 2015)
I can give you a solution for combining the two which has worked well for our product. It looks horrible but had a vastly huge performance improvement by generating the list of Ids up front in a temp table.
(As an aside, you really need to take a look at the huge amount of INNER JOIN in your query as this is not helping your performance).
You'd want something like this:
CREATE TABLE #invIds (id INT)
INSERT INTO #invIds (id) SELECT
q.inv_id
from
rfs.rnt_agr_inv_notes q,
rfs_rv.bus_acnts ba
where
ba.acct_id = q.bac_acc_id
and ba.bill_acct_nbr IN ('16785616')
AND extract(MONTH from q.prnt_tmsp) = 5
AND extract(YEAR from q.prnt_tmsp) = 2015
Then the query would do this
d.dvr_frst_name,
d.dvr_srnm,
r.ecr_ticket_no,
r.co_tmsp,
i.dr_note_amt ,
ht.amt HT,
vat.amt VAT
from rfs.rnt_agr_inv_notes I .....
.....where
i.inv_id in #invIds

SQL query that uses a GROUP BY and IN is too slow

I am struggling to speed this SQL query up. I have tried removing all the fields besides the two SUM() functions and the Id field but it is still incredibly slow. It is currently taking 15 seconds to run. Does anyone have any suggestions to speed this up as it is currently causing a timeout on a page in my web app. I need the fields shown so I can't really remove them but there surely has to be a way to improve this?
SELECT [Customer].[iCustomerID],
[Customer].[sCustomerSageCode],
[Customer].[sCustomerName],
[Customer].[sCustomerTelNo1],
SUM([InvoiceItem].[fQtyOrdered]) AS [Quantity],
SUM([InvoiceItem].[fNetAmount]) AS [Value]
FROM [dbo].[Customer]
LEFT JOIN [dbo].[CustomerAccountStatus] ON ([Customer].[iAccountStatusID] = [CustomerAccountStatus].[iAccountStatusID])
LEFT JOIN [dbo].[SalesOrder] ON ([SalesOrder].[iCustomerID] = [dbo].[Customer].[iCustomerID])
LEFT JOIN [Invoice] ON ([Invoice].[iCustomerID] = [Customer].[iCustomerID])
LEFT JOIN [dbo].[InvoiceItem] ON ([Invoice].[iInvoiceNumber] = [InvoiceItem].[iInvoiceNumber])
WHERE ([InvoiceItem].[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
AND( ([dbo].[SalesOrder].[dOrderDateTime] >= '2013-01-01')
OR ([dbo].[Customer].[dDateCreated] >= '2014-01-01'))
GROUP BY [Customer].[iCustomerID],[Customer].[sCustomerSageCode],[Customer].[sCustomerName], [Customer].[sCustomerTelNo1];
I don't think this query is doing what you want anyway. As written, there are no relationships between the Invoice table and the SalesOrder table. This leads me to believe that it is producing a cartesian product between invoices and orders, so customers with lots of orders would be generating lots of unnecessary intermediate rows.
You can test this by removing the SalesOrder table from the query:
SELECT c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1],
SUM(it.[fQtyOrdered]) AS [Quantity], SUM(it.[fNetAmount]) AS [Value]
FROM [dbo].[Customer] c LEFT JOIN
[dbo].[CustomerAccountStatus] cas
ON c.[iAccountStatusID] = cas.[iAccountStatusID] LEFT JOIN
[Invoice] i
ON (i.[iCustomerID] = c.[iCustomerID]) LEFT JOIN
[dbo].[InvoiceItem] it
ON (i.[iInvoiceNumber] = it.[iInvoiceNumber])
WHERE it.[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003') AND
c.[dDateCreated] >= '2014-01-01'
GROUP BY c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1];
If this works and you need the SalesOrder, then you will need to either pre-aggregate by SalesOrder or find better join keys.
The above query could benefit from an index on Customer(dDateCreated, CustomerId).
You have a lot of LEFT JOIN
I don't see CustomerAccountStatus usage. Ou can exclude it
The [InvoiceItem].[sNominalCode] could be null in case of LEFT JOIN so add [InvoiceItem].[sNominalCode] is not null or <THE IN CONDITION>
Also add the is not null checks to other conditions
It seems you are looking for customers that are either created this year or for which sales orders exist from last year or this year. So select from customers, but use EXISTS on SalesOrder. Then you want to count invoices. So outer join them and make sure to have the criteria in the ON clause. (sNominalCode will be NULL for any outer joined records. Hence asking for certain sNominalCode in the WHERE clause will turn your outer join into an inner join.)
SELECT
c.iCustomerID,
c.sCustomerSageCode,
c.sCustomerName,
c.sCustomerTelNo1,
SUM(ii.fQtyOrdered) AS Quantity,
SUM(ii.fNetAmount) AS Value
FROM dbo.Customer c
LEFT JOIN dbo.Invoice i ON (i.iCustomerID = c.iCustomerID)
LEFT JOIN dbo.InvoiceItem ii ON (ii.iInvoiceNumber = i.iInvoiceNumber AND ii.sNominalCode IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
WHERE c.dDateCreated >= '2014-01-01'
OR EXISTS
(
SELECT *
FROM dbo.SalesOrder
WHERE iCustomerID = c.iCustomerID
AND dOrderDateTime >= '2013-01-01'
)
GROUP BY c.iCustomerID, c.sCustomerSageCode, c.sCustomerName, c.sCustomerTelNo1;

Improve Performance of SQL query joining 14 tables

I am trying to join 14 tables in which few tables I need to join using left join.
With the existing data which is around 7000 records,its taking around 10 seconds to execute the below query.I am afraid what if the records are more than million.Please help me improve the performance of the below query.
CREATE proc [dbo].[GetTodaysActualInvoiceItemSoldHistory]
#fromdate datetime,
#todate datetime
as
Begin
select SDID.InvoiceDate as [Sold Date],Cust.custCompanyName as [Sold To] ,
case SQBD.TransferNo when '0' then IVM.VendorName else SQBD.TransferNo end as [Purchase From],
SQBD.BatchSellQty as SoldQty,SQID.SellPrice,
SDID.InvoiceNo as [Sales Invoice No],INV.PRInvoiceNo as [PO Invoice No],INV.PRInvoiceDate as [PO Invoice Date],
SQID.ItemDesc as [Item Description],SQID.NetPrice,SDHM.DeliveryHeaderMasterName as DeliveryHeaderName,
SQID.ItemCode as [Item Code],
SQBD.BatchNo,SQBD.ExpiryDate,SQID.Amount,
SQID.Dept_ID as Dept_ID,
Dept_Name as [Department],SQID.Catg_ID as Catg_ID,
Category_Name as [Category],SQID.Brand_ID as Brand_ID,
BrandName as BrandName, SQID.Manf_Id as Manf_Id,
Manf.ManfName as [Manufacturer],
STM.TaxName, SQID.Tax_ID as Tax_ID,
INV.VendorID as VendorID,
SQBD.ItemID,SQM.Isdeleted,
SDHM.DeliveryHeaderMasterID,Cust.CustomerMasterID
from SD_QuotationMaster SQM
inner join SD_InvoiceDetails SDID on SQM.QuoteID = SDID.QuoteID
inner join SD_QuoteItemDetails SQID on SDID.QuoteID = SQID.QuoteID
inner join SD_QuoteBatchDetails SQBD on SDID.QuoteID = SQBD.QuoteID and SQID.ItemID=SQBD.ItemID
inner join INV_ProductInvoice INV on SQBD.InvoiceID=INV.ProductInvoiceID
inner jOIN INV_VendorMaster IVM ON INV.VendorID = IVM.VendorID
inner jOIN Sys_TaxMaster STM ON SQID.Tax_ID = STM.Tax_ID
inner join Cust_CustomerMaster Cust on SQM.CustomerMasterID = Cust.CustomerMasterID
left jOIN INV_DeptartmentMaster Dept ON SQID.Dept_ID = Dept.Dept_ID
left jOIN INV_BrandMaster BRD ON SQID.Brand_ID = BRD.Brand_ID
left jOIN INV_ManufacturerMaster Manf ON SQID.Manf_Id = Manf.Manf_Id
left join INV_CategoryMaster CAT ON SQID.Catg_ID = CAT.Catg_ID
left join SLRB_DeliveryCustomerMaster SDCM on SQM.CustomerMasterID=SDCM.CustomerMasterID and SQM.DeliveryHeaderMasterID=SDCM.DeliveryHeaderMasterID
left join SLRB_DeliveryHeaderMaster SDHM on SDCM.DeliveryHeaderMasterID=SDHM.DeliveryHeaderMasterID
where (SQM.IsDeleted=0) and SQBD.BatchSellQty > 0
and SDID.InvoiceDate between #fromdate and #todate
order by ItemDesc
End
Only the below tables contain more data while other tables have records less than 20
InvoiceDetails, QuoteMaster, QuoteItemDetails, QuoteBatchDetails ProductInvoice
Below is the link for execution plan
http://jmp.sh/CSZc2x2
Thanks.
Let's start with an obvious error:
(isnull(SQBD.BatchSellQty,0) > 0)
That one is not indexable, so it should not happen. Seriously, BatchSellQty should not be unknown (nullable) in most cases, or you better handle null properly. That field should be indexed and I am not sure I would like that with an isNull - there are likely tons of batches. Also note that a filtered index (condition >0) may work here.
Second, check that you have all proper indices and the execution plan makes sense.
Thids, you have to test with a ton of data. Index statistics may make a difference. Check where the time is spent - it may be tempdb in which case you really need a good tempdb IO speed.... and it is not realted to the input side.
You can try to use query hints to help SQL Server optimizer build a optimal query execution plan. For example, you can force the order of tables will be joined, using FORCE ORDER statement. If you order your tables in order that joins with minimum result size at each step, query will execute faster (may be, needs to try). Example:
We need to A join B join C
If A join B = 2000 records x 1000 records = ~400 records (we suspect this result)
And A join C = 2000 records x 10 records = ~3 records (and this)
And B join C = 1000 records x 10 records = 10 000 records (and this)
In this case optimal order will be
A join C join B = ~3 records x 1000 records = ~3000 records

Joining SQL tables to compare revenue vs expense

Let me say first that I'm new to SQL, and learning much every day. With that said, here is my problem. I have a view that is already created (It shows revenue generated on equipment), but I need one more table added to it (Expenses against the equipment). When I try to add an inner join table, it create a bunch of duplicate views. Here is my original view (For the revenue portion of it):
SELECT
<removed, there are about 25 of them>
FROM
dbo.LRCON WITH (nolock)
INNER JOIN dbo.LRCONVIN WITH (nolock) ON dbo.LRCONVIN.ConId = dbo.LRCON.ConId
INNER JOIN dbo.LRBILCON WITH (nolock) ON dbo.LRBILCON.ConId = dbo.LRCONVIN.ConId AND dbo.LRBILCON.UntId = dbo.LRCONVIN.UntId
INNER JOIN dbo.LRBILITM WITH (nolock) ON dbo.LRBILITM.ParentItmId = dbo.LRBILCON.ItmId
INNER JOIN dbo.LRBIL WITH (nolock) ON dbo.LRBIL.BilId = dbo.LRBILCON.BilId
INNER JOIN dbo.LRCONTYP WITH (nolock) ON dbo.LRCONTYP.ConTypId = dbo.LRCON.ConTypId
INNER JOIN dbo.COLOOKUP AS C1 WITH (nolock) ON C1.Id = dbo.LRBILITM.ItmTyp
INNER JOIN dbo.COLOOKUP AS C2 WITH (nolock) ON C2.Id = dbo.LRCONTYP.ConTyp
INNER JOIN dbo.VHVIN WITH (nolock) ON dbo.VHVIN.UntId = dbo.LRCONVIN.UntId
WHERE
(dbo.LRBIL.Status = 647) AND (dbo.LRBILITM.ItmTyp <> 274)
I then try to add another join:
INNER JOIN dbo.SVSLS WITH (nolock) on dbo.SVSLS.UntId = dbo.LRCONVIN.UntId
with the select statement:
ROUND(dbo.SVSLS.AmtSubtotal + dbo.SVSLS.AmtSupplies + dbo.SVSLS.AmtDiagnostic + dbo.SVSLS.AmtTax1 + dbo.SVSLS.AmtTax2, 2) AS SvcAmtSale
... but it produces many, many rows of duplicates because it adds the detail of each expense to each row of my original table.
Original table:
https://dl.dropboxusercontent.com/u/81145403/orginal_table.jpg
After I add my new join/select:
https://dl.dropboxusercontent.com/u/81145403/failed_table.jpg
How do I fix this? At the end of the day, I just want to compare my revenue vs expenses on equipment over a date range. I really don't care to have the individual detail of the expenses, just a grand total is fine with me.
Is the SvcSaleAmt the revenue that you are interested in? And are the multiple detail rows separate entries on the same item? If you do not necessarily care about the individual details, you can GROUP the items together. In order to do this, you will need to get rid of the SlsId from your SELECT list, and add
GROUP BY CusId, CusName, BillId, ConId, Prd, ConTypId, ....., AmtCos, AmtGpm
Using all of the columns you have in your SELECT statement. Replace the ROUND() AS SvcSaleAmt with:
ROUND(SUM(dbo.SVSLS.AmtSubtotal + dbo.SVSLS.AmtSupplies + dbo.SVSLS.AmtDiagnostic + dbo.SVSLS.AmtTax1 + dbo.SVSLS.AmtTax2), 2) AS SvcSaleAmt