I have 2 queries. Individually, both run well. How when I combine them, they run terribly slow and time out on me. The second query is returning a list of invoices that I want to use in the IN clause in the first query. Not sure if I'm doing something wrong. Thanks for your help....
select
ba.bill_acct_nbr,
ba.acct_name,
i.inv_id,
i.prnt_tmsp,
i.due_dt,
d.dvr_frst_name,
d.dvr_srnm,
r.ecr_ticket_no,
r.co_tmsp,
i.dr_note_amt ,
ht.amt HT,
vat.amt VAT
from rfs.rnt_agr_inv_notes i
inner join rfs.rnt_Agrs r on r.rnt_agr_nbr = i.rea_rnt_agr_nbr
inner join rfs_rv.dvr_rras d on r.rnt_agr_nbr = d.rdy_rnt_agr_nbr and d.main_dvr_flg = 'MR'
inner join rfs_rv.bus_acnts ba on ba.acct_id = i.bac_acc_id
inner join (select sum(chg_amt) amt, rain.inv_id
from rfs.rra_chgs rrc
inner join rfs.rnt_agr_inv_note_lns rainl on rrc.rrc_id=rainl.rrc_rrc_id
inner join rfs.rnt_agr_inv_notes rain on rainl.rai_inv_id=rain.inv_id
inner join rfs.rnt_agrs ra on rain.rea_rnt_agr_nbr=ra.rnt_agr_nbr
inner join rfs.rra_chg_typs rct on rrc.rct_chg_typ=rct.chg_typ
where rrc.rct_chg_typ not in ('TAX', 'SUR', 'VAT') and not ( ra.stn_system='ECR' and (rrc.rct_chg_typ ='02000' or rrc.rct_chg_typ ='02201'))
group by rain.inv_id) ht on ht.inv_id=i.inv_id
inner join (select sum(chg_amt) amt, rain.inv_id
from rfs.rra_chgs rrc
inner join rfs.rnt_agr_inv_note_lns rainl on rrc.rrc_id=rainl.rrc_rrc_id
inner join rfs.rnt_agr_inv_notes rain on rainl.rai_inv_id=rain.inv_id
inner join rfs.rnt_agrs ra on rain.rea_rnt_agr_nbr=ra.rnt_agr_nbr
inner join rfs.rra_chg_typs rct on rrc.rct_chg_typ=rct.chg_typ and not ( ra.stn_system='ECR' and rrc.rct_chg_typ ='02200')
where rrc.rct_chg_typ in ('TAX', 'SUR', 'VAT')
group by rain.inv_id) vat on vat.inv_id=i.inv_id
where
i.inv_id in (425001975206,550008226812,425002005105, 425002046396, 42500190929)
I commented out the last line above and replaced it with this code; which runs well by itself.
i.inv_id in (select
q.inv_id
from
rfs.rnt_agr_inv_notes q,
rfs_rv.bus_acnts ba
where
ba.acct_id = q.bac_acc_id
and ba.bill_acct_nbr IN ('16785616')
AND extract(MONTH from q.prnt_tmsp) = 5
AND extract(YEAR from q.prnt_tmsp) = 2015)
I can give you a solution for combining the two which has worked well for our product. It looks horrible but had a vastly huge performance improvement by generating the list of Ids up front in a temp table.
(As an aside, you really need to take a look at the huge amount of INNER JOIN in your query as this is not helping your performance).
You'd want something like this:
CREATE TABLE #invIds (id INT)
INSERT INTO #invIds (id) SELECT
q.inv_id
from
rfs.rnt_agr_inv_notes q,
rfs_rv.bus_acnts ba
where
ba.acct_id = q.bac_acc_id
and ba.bill_acct_nbr IN ('16785616')
AND extract(MONTH from q.prnt_tmsp) = 5
AND extract(YEAR from q.prnt_tmsp) = 2015
Then the query would do this
d.dvr_frst_name,
d.dvr_srnm,
r.ecr_ticket_no,
r.co_tmsp,
i.dr_note_amt ,
ht.amt HT,
vat.amt VAT
from rfs.rnt_agr_inv_notes I .....
.....where
i.inv_id in #invIds
Related
I have the following SQL select statement that I use to get a subset of products, or wines:
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
The length of this table generated is 3,905. I want to get all the transactional data for these products.
At the moment I'm using this select statement
SELECT c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
LEFT JOIN Dim.Calendar AS c
ON f.SkDateId = c.SkDateId
LEFT JOIN (
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) AS s
ON s.id = f.SkProductVariantId
WHERE c.CalDate LIKE '%2019%'
The calendar dates are correct, but the number of unique products returned is 5,648, rather than the expected 3,905 from the select query.
Why does my LEFT JOIN on the first select query not work as I expect it to, please?
Thanks for any help!
If you want all the rows form your query, it needs to be the first reference in the LEFT JOIN. Then, I am guessing that you want transaction in 2019:
select . . .
from (SELECT pv.SkProdVariantId AS id, pa.Colour AS colour,
FROM Dim.ProductVariant pv JOIN
ProductAttributes_new pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) s LEFT JOIN
(fact.FTransactions f JOIN
Dim.Calendar c
ON f.SkDateId = c.SkDateId AND
c.CalDate >= '2019-01-01' AND
c.CalDate < '2020-01-01'
)
ON s.id = f.SkProductVariantId;
Note that this assumes that CalDate is really a date and not a string. LIKE should only be used on strings.
You misunderstand somehow how outer joins work. See Gordon's answer and my request comment on that.
As to the task: It seems you want to select transactions of 2019, but you want to restrict your results to wine products. We typically restrict query results in the WHERE clause. You can use IN or EXISTS for that.
SELECT
c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
INNER JOIN Dim.Calendar AS c ON f.SkDateId = c.SkDateId
WHERE DATEPART(YEAR, c.CalDate) = 2019
AND f.SkProductVariantId IN
(
SELECT pv.SkProdVariantId
FROM Dim.ProductVariant AS pv
WHERE pv.ProdTypeName = 'Wines'
);
(I've removed the join to ProductAttributes_new, because it doesn't seem to play any part in this query.)
I have the below query which takes a while to run, since ir_sales_summary is ~ 2 billion rows:
select c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend,
sum(sales_units_cy) as TY_unitSales, sum(sales_cost_cy) as TY_costDollars, sum(sales_units_ret_cy) as TY_retailDollars,
sum(sales_units_ly) as LY_unitSales, sum(sales_cost_ly) as LY_costDollars, sum(sales_units_ret_ly) as LY_retailDollars
from ir_sales_summary i
left join Chains c
on c.ChainID = i.ChainID
inner join Suppliers s
on s.SupplierID = i.SupplierID
inner join tmpWeekend we
on we.SaleDate = i.saledate
where year(i.saledate) = '2017'
group by c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend
(Worth noting, it takes roughly 3 hours to run since it is using a view that brings in data from a legacy service)
I'm thinking there's a way to speed up the filtering, since I just need the data from 2017. Should I be filtering from the big table (i) or be filtering from the much smaller weekending table (which gives us just the week ending dates)?
Try this. This might help, joining a static table as first table in query onto a fact/dynamic table will impact query performance i believe.
SELECT c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend
,sum(sales_units_cy) AS TY_unitSales
,sum(sales_cost_cy) AS TY_costDollars
,sum(sales_units_ret_cy) AS TY_retailDollars
,sum(sales_units_ly) AS LY_unitSales
,sum(sales_cost_ly) AS LY_costDollars
,sum(sales_units_ret_ly) AS LY_retailDollars
FROM Suppliers s
INNER JOIN (
SELECT we
,weeekend
,supplierid
,chainid
,sales_units_cy
,sales_cost_cy
,sales_units_ret_cy
,sales_units_ly
,sales_cost_ly
,sales_units_ret_ly
FROM ir_sales_summary i
INNER JOIN tmpWeekend we
ON we.SaleDate = i.saledate
WHERE year(i.saledate) = '2017'
) i
ON s.SupplierID = i.SupplierID
INNER JOIN Chains c
ON c.ChainID = i.ChainID
GROUP BY c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend
-- WITH POD was causing the issue, removing this code reduced 2 year pull to 3 mins.
Will post new question to figure out best way to include POD data.
--Edit for clarity, I am a read only user to these tables.
I wrote the below query, but it takes a very long time to execute (20min).
It is currently limited to 1 month, but user wants at least 1 year preferably 2. I assume this would scale time to hours.
Can anyone take a look at let me know if there is a BKM I am not using to improve performance?
Or if there is a better method for a report of this size? At 2 years, it would return ~100K rows from 17 tables.
WITH
POD AS
(
SELECT SHIPMENTS.Delivery
,SHIPMENTS.Shipment_Number
,PROOF_OF_DELIVERY.Shipping_Carrier
,PROOF_OF_DELIVERY.Tracking_Number
,PROOF_OF_DELIVERY.Ship_Method
,PROOF_OF_DELIVERY.POD_Signature
,PROOF_OF_DELIVERY.POD_Date
,PROOF_OF_DELIVERY.POD_Time
FROM
SHIPMENTS
LEFT JOIN PROOF_OF_DELIVERY
ON SHIPMENTS.Shipment_Number = PROOF_OF_DELIVERY.Delivery_Or_Shipment
WHERE Load_Date IN
(
SELECT MAX(Load_Date)
FROM PROOF_OF_DELIVERY
GROUP BY Delivery_Or_Shipment
)
)
SELECT DISTINCT GI.GOODS_ISSUE_DOCUMENT_ID
,GI.SALES_ORDER_ID
,GI.SALES_ORDER_LINE_ID
,GI.SALES_ORDER_TYPE_CODE
,GI.DELIVERY_HEADER_ID
,GI.DELIVERY_ITEM_ID
,FD.FISCAL_MONTH_CODE
,GI.MATERIAL_NUMBER
,GI.SHIPPED_QTY
,SO.ORDERER_NAME
,SO.CREATED_BY
,SO.CONTACT_PERSON
,GI.SOLD_TO_CUSTOMER_ID
,GI.SHIP_TO_CUSTOMER_ID
,GI.ORIGINAL_COMMIT_DATE
,GI.SHIP_FROM_PLANT_ID
,GI.ACTUAL_PGI_DATE
,GI.CUSTOMER_PO_NUMBER
,GI.SHIPPED_PRICE
,(GI.SHIPPED_PRICE * GI.SHIPPED_QTY) AS EXT_SHIPPED_PRICE
,GI.SALES_ORGANIZATION_CODE
,GI.DELIVERY_NOTE_PRIORITY_CODE
,FD.FISCAL_WEEK_CODE
,DV.DIVISION_CODE
,DN.Delivery_Item_Creation_Date
,SOLD.CUSTOMER_SHORT_NAME AS SOLD_TO_CUSTOMER_SHORT_NAME
,SHIP.CUSTOMER_SHORT_NAME AS SHIP_TO_CUSTOMER_SHORT_NAME
,SHIP.Customer_Site_Name
,SHIP.REGION_NAME
,MATD.MATERIAL_DESCRIPTION
,MATD.STANDARD_COST
,(MATD.STANDARD_COST * GI.SHIPPED_QTY) AS EXT_STANDARD_COST
,MATD.GLOBAL_EVENT
,PLT.LEAD_TIME_FOR_ORIGINAL_COMMIT
,OPRM.BASE_PART_CODE
,MATD.PRODUCT_INSP_MEMO
,MATD.MATERIAL_PRICING_GROUP_CODE
,MATD.MATERIAL_STATUS AS MMPP
,PIM.PIM_PBG_GROUPING
,SOL.SHIPPING_CONDITION
,SVO.SERVICE_ORDER_NUM
,SO.CREATION_TIME AS SO_CREATION_TIME
,SOL.CREATED_TIME AS SO_LINE_CREATED_TIME
,SOL.SHIPPING_POINT
,SDT.SALES_DOCUMENT_TYPE_CODE AS SVO_DOCUMENT_TYPE_CODE
,EQU.EQUIPMENT_NUM
,EQU.SERIAL_NUMBER
,EQU.CUSTOMERTOOLID
,POD.Shipment_Number
,POD.Shipping_Carrier
,POD.Tracking_Number
,POD.Ship_Method
,POD.POD_Signature
,POD.POD_Date
,POD.POD_Time
,DATEDIFF(dd,SO.CREATION_TIME,GI.ACTUAL_PGI_DATE) AS Cycle_Time_to_PGI_Days
,DATEDIFF(hh,SO.CREATION_TIME,GI.ACTUAL_PGI_DATE) AS Cycle_Time_to_PGI_Hours
FROM GOODS_ISSUE AS GI
INNER JOIN dbo.Delivery_Notes AS DN
ON GI.DELIVERY_HEADER_ID = DN.DELIVERY_HEADER_CODE AND GI.DELIVERY_ITEM_ID = DN.DELIVERY_ITEM_CODE
INNER JOIN dbo.Customer_View AS SOLD
ON GI.SOLD_TO_CUSTOMER_ID = SOLD.CUSTOMER_CODE
INNER JOIN dbo.Customer_View AS SHIP
ON GI.SOLD_TO_CUSTOMER_ID = SHIP.CUSTOMER_CODE
INNER JOIN dbo.MATERIAL_DETAILS AS MATD
ON GI.MATERIAL_NUMBER = MATD.MATERIAL_NUMBER
INNER JOIN dbo.OPR_MATERIAL_DIM AS OPRM
ON OPRM.MATERIAL_NUMBER = GI.MATERIAL_NUMBER
LEFT JOIN dbo.SM_DATE_DIM AS FD
ON CAST(FD.CALENDAR_DAY AS DATE) = CAST(GI.ACTUAL_PGI_DATE AS DATE)
LEFT JOIN dbo.DIM_PUBLISHED_LEAD_TIME_COMMIT AS PLT
ON PLT.MATERIAL_NUMBER = OPRM.BASE_PART_CODE
LEFT JOIN dbo.PRODUCT_INSP_MEMO_DIM AS PIM
ON PIM.PRODUCT_INSP_MEMO = MATD.PRODUCT_INSP_MEMO
INNER JOIN dbo.SM_SALES_ORDER_LINE_FACT AS SOL
ON SOL.SALES_ORDER_CODE = GI.SALES_ORDER_ID AND SOL.SALES_ORDER_LINE_CODE = GI.SALES_ORDER_LINE_ID
INNER JOIN dbo.SM_SALES_ORDER_FACT AS SO
ON SO.SALES_ORDER_CODE = GI.SALES_ORDER_ID
INNER JOIN dbo.SM_DIVISION_DIM AS DV
ON SO.DIVISION_SID = DV.DIVISION_SID
LEFT JOIN dbo.SERVICE_ORDER_FACT AS SVO
ON SVO.SERVICE_ORDER_NUM = SO.SERVICE_ORDER_NUMBER
LEFT JOIN dbo.SM_SALES_DOCUMENT_TYPE_DIM AS SDT
ON SDT.SALES_DOCUMENT_TYPE_SID = SVO.SALES_DOCUMENT_TYPE_SID
LEFT JOIN dbo.SM_EQUIPMENT_DIM AS EQU
ON EQU.EQUIPMENT_SID = SVO.EQUIPMENT_SID
LEFT JOIN POD
ON POD.Delivery = GI.DELIVERY_HEADER_ID
WHERE GI.ACTUAL_PGI_DATE > GETDATE()-32
AND SOLD_TO_CUSTOMER_ID IN (0010000252,0010000898,0010001121,0010001409,0010001842,0010001852,0010001879,0010001977,0010001978,0010002021,0010002202,0010002227,0010002982,0010003118,0010003176,0010003294,0010005492,0010006904,0010007048,0010007080,0010010381,0010010572,0010010905,0010011999,0010012014,0010012048,0010012571,0010013124,0010013711,0010013713,0010013824,0010014180,0010014188,0010014333,0010015059,0010015313,0010015414,0010015541,0010015544,0010015550)
A CTE is just syntax
I suspect that CTE is evaluated many times
Materialze the CTE to #temp with indexe(s) so it is run once
This cast will hurt it
Make those columns true dates and index them
ON CAST(FD.CALENDAR_DAY AS DATE) = CAST(GI.ACTUAL_PGI_DATE AS DATE)
That where negates the left so you can just do a join
Also that MAX(Load_Date) could match on another shipment
SELECT SHIPMENTS.Delivery
,SHIPMENTS.Shipment_Number
,PROOF_OF_DELIVERY.Shipping_Carrier
,PROOF_OF_DELIVERY.Tracking_Number
,PROOF_OF_DELIVERY.Ship_Method
,PROOF_OF_DELIVERY.POD_Signature
,PROOF_OF_DELIVERY.POD_Date
,PROOF_OF_DELIVERY.POD_Time
FROM SHIPMENTS
JOIN PROOF_OF_DELIVERY
ON SHIPMENTS.Shipment_Number = PROOF_OF_DELIVERY.Delivery_Or_Shipment
WHERE PROOF_OF_DELIVERY.Load_Date IN
(
SELECT MAX(Load_Date)
FROM PROOF_OF_DELIVERY
GROUP BY Delivery_Or_Shipment
)
Pull this up into the join
INNER JOIN dbo.Customer_View AS SOLD
ON GI.SOLD_TO_CUSTOMER_ID = SOLD.CUSTOMER_CODE
AND GI.SOLD_TO_CUSTOMER_ID IN (0010000252,0010000898,0010001121,0010001409,0010001842,0010001852,0010001879,0010001977,0010001978,0010002021,0010002202,0010002227,0010002982,0010003118,0010003176,0010003294,0010005492,0010006904,0010007048,0010007080,0010010381,0010010572,0010010905,0010011999,0010012014,0010012048,0010012571,0010013124,0010013711,0010013713,0010013824,0010014180,0010014188,0010014333,0010015059,0010015313,0010015414,0010015541,0010015544,0010015550)
Let me say first that I'm new to SQL, and learning much every day. With that said, here is my problem. I have a view that is already created (It shows revenue generated on equipment), but I need one more table added to it (Expenses against the equipment). When I try to add an inner join table, it create a bunch of duplicate views. Here is my original view (For the revenue portion of it):
SELECT
<removed, there are about 25 of them>
FROM
dbo.LRCON WITH (nolock)
INNER JOIN dbo.LRCONVIN WITH (nolock) ON dbo.LRCONVIN.ConId = dbo.LRCON.ConId
INNER JOIN dbo.LRBILCON WITH (nolock) ON dbo.LRBILCON.ConId = dbo.LRCONVIN.ConId AND dbo.LRBILCON.UntId = dbo.LRCONVIN.UntId
INNER JOIN dbo.LRBILITM WITH (nolock) ON dbo.LRBILITM.ParentItmId = dbo.LRBILCON.ItmId
INNER JOIN dbo.LRBIL WITH (nolock) ON dbo.LRBIL.BilId = dbo.LRBILCON.BilId
INNER JOIN dbo.LRCONTYP WITH (nolock) ON dbo.LRCONTYP.ConTypId = dbo.LRCON.ConTypId
INNER JOIN dbo.COLOOKUP AS C1 WITH (nolock) ON C1.Id = dbo.LRBILITM.ItmTyp
INNER JOIN dbo.COLOOKUP AS C2 WITH (nolock) ON C2.Id = dbo.LRCONTYP.ConTyp
INNER JOIN dbo.VHVIN WITH (nolock) ON dbo.VHVIN.UntId = dbo.LRCONVIN.UntId
WHERE
(dbo.LRBIL.Status = 647) AND (dbo.LRBILITM.ItmTyp <> 274)
I then try to add another join:
INNER JOIN dbo.SVSLS WITH (nolock) on dbo.SVSLS.UntId = dbo.LRCONVIN.UntId
with the select statement:
ROUND(dbo.SVSLS.AmtSubtotal + dbo.SVSLS.AmtSupplies + dbo.SVSLS.AmtDiagnostic + dbo.SVSLS.AmtTax1 + dbo.SVSLS.AmtTax2, 2) AS SvcAmtSale
... but it produces many, many rows of duplicates because it adds the detail of each expense to each row of my original table.
Original table:
https://dl.dropboxusercontent.com/u/81145403/orginal_table.jpg
After I add my new join/select:
https://dl.dropboxusercontent.com/u/81145403/failed_table.jpg
How do I fix this? At the end of the day, I just want to compare my revenue vs expenses on equipment over a date range. I really don't care to have the individual detail of the expenses, just a grand total is fine with me.
Is the SvcSaleAmt the revenue that you are interested in? And are the multiple detail rows separate entries on the same item? If you do not necessarily care about the individual details, you can GROUP the items together. In order to do this, you will need to get rid of the SlsId from your SELECT list, and add
GROUP BY CusId, CusName, BillId, ConId, Prd, ConTypId, ....., AmtCos, AmtGpm
Using all of the columns you have in your SELECT statement. Replace the ROUND() AS SvcSaleAmt with:
ROUND(SUM(dbo.SVSLS.AmtSubtotal + dbo.SVSLS.AmtSupplies + dbo.SVSLS.AmtDiagnostic + dbo.SVSLS.AmtTax1 + dbo.SVSLS.AmtTax2), 2) AS SvcSaleAmt
I posted a query yesterday (see here) that was horrible (took over a minute to run, resulting in 18,215 records):
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
ORDER BY EMAIL
With a lot of coaching and research, I've tuned it up to the following:
SELECT contacts.ContactID,
contacts.InstitutionID,
contacts.First,
contacts.Last,
institutionswithzipcodesadditional.CountyID,
institutionswithzipcodesadditional.StateID,
institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN contacts_link_emails ON
contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN institutionswithzipcodesadditional ON
contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
WHERE
(contacts.ContactID IN
(SELECT contacts_2.ContactID
FROM contacts AS contacts_2
INNER JOIN contacts_link_emails AS contacts_link_emails_2 ON
contacts_2.ContactID = contacts_link_emails_2.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions ON
contacts_2.JobTitle = contacts_def_jobfunctions.JobID
RIGHT OUTER JOIN newsletterremovelist ON
contacts_link_emails_2.Email = newsletterremovelist.EmailAddress
WHERE (contacts_def_jobfunctions.ParentJobID <> 1841)
GROUP BY contacts_2.ContactID
UNION
SELECT contacts_1.ContactID
FROM contacts_link_jobfunctions
INNER JOIN contacts_def_jobfunctions AS contacts_def_jobfunctions_1 ON
contacts_link_jobfunctions.JobID = contacts_def_jobfunctions_1.JobID
AND contacts_def_jobfunctions_1.ParentJobID <> 1841
INNER JOIN contacts AS contacts_1 ON
contacts_link_jobfunctions.ContactID = contacts_1.ContactID
INNER JOIN contacts_link_emails AS contacts_link_emails_1 ON
contacts_link_emails_1.ContactID = contacts_1.ContactID
LEFT OUTER JOIN newsletterremovelist AS newsletterremovelist_1 ON
contacts_link_emails_1.Email = newsletterremovelist_1.EmailAddress
GROUP BY contacts_1.ContactID))
While this query is now super fast (about 3 seconds), I've blown part of the logic somewhere - it only returns 14,863 rows (instead of the 18,215 rows that I believe is accurate).
The results seem near correct. I'm working to discover what data might be missing in the result set.
Can you please coach me through whatever I've done wrong here?
Thanks,
Russell Schutte
The main problem with your original query was that you had two extra joins just to introduce duplicates and then a DISTINCT to get rid of them.
Use this:
SELECT cle.Email,
c.ContactID,
c.First AS ContactFirstName,
c.Last AS ContactLastName,
c.InstitutionID,
izip.CountyID,
izip.StateID,
izip.DistrictID
FROM dbo.contacts c
INNER JOIN
dbo.institutionswithzipcodesadditional izip
ON izip.InstitutionID = c.InstitutionID
INNER JOIN
dbo.contacts_link_emails cle
ON cle.ContactID = c.ContactID
WHERE cle.Email NOT IN
(
SELECT EmailAddress
FROM dbo.newsletterremovelist
)
AND EXISTS
(
SELECT NULL
FROM dbo.contacts_def_jobfunctions cdj
WHERE cdj.JobId = c.JobTitle
AND cdj.ParentJobId <> '1841'
UNION ALL
SELECT NULL
FROM dbo.contacts_link_jobfunctions clj
JOIN dbo.contacts_def_jobfunctions cdj
ON cdj.JobID = clj.JobID
WHERE clj.ContactID = c.ContactID
AND cdj.ParentJobId <> '1841'
)
ORDER BY
email
Create the following indexes:
newsletterremovelist (EmailAddress)
contacts_link_jobfunctions (ContactID, JobID)
contacts_def_jobfunctions (JobID)
Do you get the same results when you do:
SELECT count(*)
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
SELECT COUNT(*)
FROM
contacts
INNER JOIN contacts_link_jobfunctions
ON contacts.ContactID = contacts_link_jobfunctions.ContactID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
If so keep adding each join conditon on until you don't get the same results and you will see where your mistake was. If all the joins are the same, then look at the where clauses. But I will be surprised if it isn't in the first join because the syntax you have orginally won't even work on SQL Server and it is pretty nonstandard SQL and may have been incorrect all along but no one knew.
Alternatively, pick a few of the records that are returned in the orginal but not the revised. Track them through the tables one at a time to see if you can find why the second query filters them out.
I'm not directly sure what is wrong, but when I run in to this situation, the first thing I do is start removing variables.
So, comment out the where clause. How many rows are returned?
If you get back the 11,604 rows then you've isolated the problems to the joins. Work though the joins, commenting each one out (remove the associated columns too) and figure out how many rows are eliminated.
As you do this, aim to find what is causing the desired rows to be eliminated. Once isolated, consider the join differences between the first query and the second query.
In looking at the first query, you could probably just modify that to eliminate any INs and instead do a EXISTS instead.
Consider your indexes as well. Any thing in the where or join clauses should probably be indexed.