SQL query that uses a GROUP BY and IN is too slow - sql

I am struggling to speed this SQL query up. I have tried removing all the fields besides the two SUM() functions and the Id field but it is still incredibly slow. It is currently taking 15 seconds to run. Does anyone have any suggestions to speed this up as it is currently causing a timeout on a page in my web app. I need the fields shown so I can't really remove them but there surely has to be a way to improve this?
SELECT [Customer].[iCustomerID],
[Customer].[sCustomerSageCode],
[Customer].[sCustomerName],
[Customer].[sCustomerTelNo1],
SUM([InvoiceItem].[fQtyOrdered]) AS [Quantity],
SUM([InvoiceItem].[fNetAmount]) AS [Value]
FROM [dbo].[Customer]
LEFT JOIN [dbo].[CustomerAccountStatus] ON ([Customer].[iAccountStatusID] = [CustomerAccountStatus].[iAccountStatusID])
LEFT JOIN [dbo].[SalesOrder] ON ([SalesOrder].[iCustomerID] = [dbo].[Customer].[iCustomerID])
LEFT JOIN [Invoice] ON ([Invoice].[iCustomerID] = [Customer].[iCustomerID])
LEFT JOIN [dbo].[InvoiceItem] ON ([Invoice].[iInvoiceNumber] = [InvoiceItem].[iInvoiceNumber])
WHERE ([InvoiceItem].[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
AND( ([dbo].[SalesOrder].[dOrderDateTime] >= '2013-01-01')
OR ([dbo].[Customer].[dDateCreated] >= '2014-01-01'))
GROUP BY [Customer].[iCustomerID],[Customer].[sCustomerSageCode],[Customer].[sCustomerName], [Customer].[sCustomerTelNo1];

I don't think this query is doing what you want anyway. As written, there are no relationships between the Invoice table and the SalesOrder table. This leads me to believe that it is producing a cartesian product between invoices and orders, so customers with lots of orders would be generating lots of unnecessary intermediate rows.
You can test this by removing the SalesOrder table from the query:
SELECT c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1],
SUM(it.[fQtyOrdered]) AS [Quantity], SUM(it.[fNetAmount]) AS [Value]
FROM [dbo].[Customer] c LEFT JOIN
[dbo].[CustomerAccountStatus] cas
ON c.[iAccountStatusID] = cas.[iAccountStatusID] LEFT JOIN
[Invoice] i
ON (i.[iCustomerID] = c.[iCustomerID]) LEFT JOIN
[dbo].[InvoiceItem] it
ON (i.[iInvoiceNumber] = it.[iInvoiceNumber])
WHERE it.[sNominalCode] IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003') AND
c.[dDateCreated] >= '2014-01-01'
GROUP BY c.[iCustomerID], c.[sCustomerSageCode], c.[sCustomerName], c.[sCustomerTelNo1];
If this works and you need the SalesOrder, then you will need to either pre-aggregate by SalesOrder or find better join keys.
The above query could benefit from an index on Customer(dDateCreated, CustomerId).

You have a lot of LEFT JOIN
I don't see CustomerAccountStatus usage. Ou can exclude it
The [InvoiceItem].[sNominalCode] could be null in case of LEFT JOIN so add [InvoiceItem].[sNominalCode] is not null or <THE IN CONDITION>
Also add the is not null checks to other conditions

It seems you are looking for customers that are either created this year or for which sales orders exist from last year or this year. So select from customers, but use EXISTS on SalesOrder. Then you want to count invoices. So outer join them and make sure to have the criteria in the ON clause. (sNominalCode will be NULL for any outer joined records. Hence asking for certain sNominalCode in the WHERE clause will turn your outer join into an inner join.)
SELECT
c.iCustomerID,
c.sCustomerSageCode,
c.sCustomerName,
c.sCustomerTelNo1,
SUM(ii.fQtyOrdered) AS Quantity,
SUM(ii.fNetAmount) AS Value
FROM dbo.Customer c
LEFT JOIN dbo.Invoice i ON (i.iCustomerID = c.iCustomerID)
LEFT JOIN dbo.InvoiceItem ii ON (ii.iInvoiceNumber = i.iInvoiceNumber AND ii.sNominalCode IN ('4000', '4001', '4002', '4004', '4005', '4006', '4007', '4010', '4015', '4016', '700000', '701001', '701002', '701003'))
WHERE c.dDateCreated >= '2014-01-01'
OR EXISTS
(
SELECT *
FROM dbo.SalesOrder
WHERE iCustomerID = c.iCustomerID
AND dOrderDateTime >= '2013-01-01'
)
GROUP BY c.iCustomerID, c.sCustomerSageCode, c.sCustomerName, c.sCustomerTelNo1;

Related

SQL - faster to filter by large table or small table

I have the below query which takes a while to run, since ir_sales_summary is ~ 2 billion rows:
select c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend,
sum(sales_units_cy) as TY_unitSales, sum(sales_cost_cy) as TY_costDollars, sum(sales_units_ret_cy) as TY_retailDollars,
sum(sales_units_ly) as LY_unitSales, sum(sales_cost_ly) as LY_costDollars, sum(sales_units_ret_ly) as LY_retailDollars
from ir_sales_summary i
left join Chains c
on c.ChainID = i.ChainID
inner join Suppliers s
on s.SupplierID = i.SupplierID
inner join tmpWeekend we
on we.SaleDate = i.saledate
where year(i.saledate) = '2017'
group by c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend
(Worth noting, it takes roughly 3 hours to run since it is using a view that brings in data from a legacy service)
I'm thinking there's a way to speed up the filtering, since I just need the data from 2017. Should I be filtering from the big table (i) or be filtering from the much smaller weekending table (which gives us just the week ending dates)?
Try this. This might help, joining a static table as first table in query onto a fact/dynamic table will impact query performance i believe.
SELECT c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend
,sum(sales_units_cy) AS TY_unitSales
,sum(sales_cost_cy) AS TY_costDollars
,sum(sales_units_ret_cy) AS TY_retailDollars
,sum(sales_units_ly) AS LY_unitSales
,sum(sales_cost_ly) AS LY_costDollars
,sum(sales_units_ret_ly) AS LY_retailDollars
FROM Suppliers s
INNER JOIN (
SELECT we
,weeekend
,supplierid
,chainid
,sales_units_cy
,sales_cost_cy
,sales_units_ret_cy
,sales_units_ly
,sales_cost_ly
,sales_units_ret_ly
FROM ir_sales_summary i
INNER JOIN tmpWeekend we
ON we.SaleDate = i.saledate
WHERE year(i.saledate) = '2017'
) i
ON s.SupplierID = i.SupplierID
INNER JOIN Chains c
ON c.ChainID = i.ChainID
GROUP BY c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend

Issue with SUM with two LEFT JOINs

In the query below I have trouble when I have two left joins. What happens is the sums are incorrect (elevated) with both left joins. When I remove the second left join, the query runs correctly. How can I run this query with the second left join?
SELECT o.IdNoLocation,
COUNT(DISTINCT o.EncodedId) AS "Frequency",
ISNULL(SUM(o.Subtotal), 0) AS "Spend",
ISNULL(SUM(os.Amount), 0) AS "Surcharges",
ISNULL(SUM(od.Amount), 0) AS "Discounts"
FROM ((tblOrder o
LEFT JOIN tblOrderSurcharge os ON o.OrderId=os.OrderId)
LEFT JOIN tblOrderDiscount od ON o.OrderId=od.OrderId)
WHERE
o.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND o.IdNoLocation <> 'X'
GROUP BY o.IdNoLocation
This is the expected result when there is more than one related row in either tblOrderSurcharge or tblOrderDiscount... you've got an implicit semi-cross join between the rows from those two tables.
There's a couple of approaches to "fixing" this, if you need the totals returned in a single statement.
One option is to use inline views to perform the totals, and then combine the rows. Essentially, run separate queries to get the totals from each table, and then combine the results on a unique key. For example:
SELECT o.IdNoLocation
, o.Frequency
, ISNULL(o.Spend, 0) AS Spend
, ISNULL(s.Surcharges), 0) AS Surcharges
, ISNULL(s.Discounts), 0) AS Discounts
FROM ( SELECT ooo.IdNoLocation
, COUNT(DISTINCT ooo.EncodedId) AS Frequency
, SUM(ooo.Subtotal) AS Spend
FROM tblOrder ooo
WHERE ooo.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND ooo.IdNoLocation <> 'X'
GROUP BY ooo.IdNoLocation
) o
LEFT
JOIN ( SELECT oso.IdNoLocation
, SUM(oss.Amount) AS Surcharges
FROM tblOrderSurcharge oss
JOIN tblOrder oso
ON oso.OrderId = oss.OrderId
WHERE oso.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND oso.IdNoLocation <> 'X'
GROUP BY oso.IdNoLocation
) s
ON s.IdNoLocation = o.IdNoLocation
LEFT
JOIN ( SELECT odo.IdNoLocation
, SUM(odd.Amount) AS Discounts
FROM tblOrderDiscount odd
JOIN tblOrder odo
ON odo.OrderId = odd.OrderId
WHERE odo.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND odo.IdNoLocation <> 'X'
GROUP BY odo.IdNoLocation
) d
ON d.IdNoLocation = o.IdNoLocation
(This isn't tested, just desk checked. Run each of the inline view queries (o, s, d) and verify the results from each of those queries is as you expect. Then, run the entire query, to combine the rows... in the outer query, we'll do the "outer" joins and deal with the "missing" rows and replacing NULLs with zeros)
Another option is to use correlated subqueries in the SELECT list.
To see why this doesn't work, run this query:
SELECT o.IdNoLocation,
o.EncodedId AS "Frequency",
o.Subtotal AS "Spend",
os.Amount AS "Surcharges",
od.Amount AS "Discounts"
FROM ((tblOrder o
LEFT JOIN tblOrderSurcharge os ON o.OrderId=os.OrderId)
LEFT JOIN tblOrderDiscount od ON o.OrderId=od.OrderId)
WHERE
o.BusinessDate BETWEEN '2014-01-01' AND '2014-12-31'
AND o.IdNoLocation <> 'X'
The 'correct' answer is to use one table for both surcharges and discounts. I would assume that's no longer an option, but you can fake it with a view that performs a UNION ALL between the two tables.

Using summed field in query twice with IIF statement - have I missed some syntax somewhere?

Having a bit of a problem with my code and can't figure out where I'm going wrong.
Essentially this query will return all employees for a given employer for a given year, along with the amount of their allowances, tax withheld, and gross payments they've received, and their Reportable Employer Superannuation Contributions (RESC).
RESC is any amounts (tblSuperPayments.PaymentAmount) paid over and above the superannuation guarantee, which is gross payments (sum of tblPayment.GrossPayment) * super rate (tblSuperRate.SuperRate). Otherwise, RESC is 0.
The data that I currently have in my tables is as follows
SUM(tblPayment.GrossPayment) = 1730
SUM(tblEmployee.TaxPayable) = 80
SUM(tblSuperPayments.PaymentAmount) = 500
tblSuperRate.SuperRate = 9.5%
Therefore my query should be returning an amount of RESC of 500-(1730*9.5%)= 335.65.
However, my query is currently returning $835.65 - meaning that (1730*9.5%) is returning -335.65.
I can't figure out where my logic is going wrong - it's probably something simple but I can't see it. I suspect that it might be summing tblPayment.GrossPayment twice (edited on request)
SELECT
tblEmployee.EmployeeID AS Id
SUM(tblPayment.Allowances) AS TotAllow,
SUM(tblPayment.TaxPayable) AS TotTax,
SUM(tblPayment.GrossPayment) AS TotGross,
(IIF
((SUM(tblSuperPayments.PaymentAmount)) <= (SUM(tblPayment.GrossPayment)*tblSuperRate.SuperRate),
0,
(SUM(tblSuperPayments.PaymentAmount) - (SUM(tblPayment.GrossPayment)*tblSuperRate.SuperRate))
)) As TotRESC
FROM
((tblEmployee
LEFT JOIN tblPayment // any reason for using left join over inner join
ON tblEmployee.EmployeeID = tblPayment.fk_EmployeeID)
LEFT JOIN tblSuperPayments // any reason for using left join over inner join
ON tblEmployee.EmployeeID = tblSuperPayments.fk_EmployeeID)
LEFT JOIN tblSuperRate // any reason for using left join over inner join
ON (tblPayment.PaymentDate <= tblSuperRate.TaxYearEnd) // these two conditions might be returning
AND (tblPayment.PaymentDate >= tblSuperRate.TaxYearStart) //two SuperRate rows because of using equals in both
WHERE
tblEmployee.fk_EmployerID = 1
GROUP BY
tblEmployee.EmployeeID,
tblSuperRate.SuperRate;
Looking at your query I recommend you to just group by primary key (EmployeeID) of tblEmployee and the use the result as a sub query and do a join later tham using many columns of tblEmployeein group by which might cause duplicate rows. I rewrote the query as I have mentioned above and added comments at places which might cause the error.
SELECT
tblEmployee.TFN,
tblEmployee.FirstName,
tblEmployee.MiddleName,
tblEmployee.LastName,
tblEmployee.DOB,
tblEmployee.MailingAddress,
tblEmployee.AddressLine2,
tblEmployee.City,
tblEmployee.fk_StateProvinceID,
tblEmployee.PostalCode,
temp.TotAllow,
temp.TotTax,
temp.TotGross,
temp.TotRESC
FROM
(SELECT
tblEmployee.EmployeeID AS Id
SUM(tblPayment.Allowances) AS TotAllow,
SUM(tblPayment.TaxPayable) AS TotTax,
SUM(tblPayment.GrossPayment) AS TotGross,
(IIF
((SUM(tblSuperPayments.PaymentAmount)) <= (SUM(tblPayment.GrossPayment)*tblSuperRate.SuperRate),
0,
(SUM(tblSuperPayments.PaymentAmount) - (SUM(tblPayment.GrossPayment)*tblSuperRate.SuperRate))
)) As TotRESC
FROM
((tblEmployee
LEFT JOIN tblPayment // any reason for using left join over inner join
ON tblEmployee.EmployeeID = tblPayment.fk_EmployeeID)
LEFT JOIN tblSuperPayments // any reason for using left join over inner join
ON tblEmployee.EmployeeID = tblSuperPayments.fk_EmployeeID)
LEFT JOIN tblSuperRate // any reason for using left join over inner join
ON (tblPayment.PaymentDate <= tblSuperRate.TaxYearEnd) // these two conditions might be returning
AND (tblPayment.PaymentDate >= tblSuperRate.TaxYearStart) //two SuperRate rows because of using equals in both
WHERE
tblEmployee.fk_EmployerID = 1
GROUP BY
tblEmployee.EmployeeID,
tblSuperRate.SuperRate) temp // Does a single employee have more than one superrate why grouping by it?
JOIN tblEmployee ON tblEmployee.EmployeeID=temp.Id;

Improve Performance of SQL query joining 14 tables

I am trying to join 14 tables in which few tables I need to join using left join.
With the existing data which is around 7000 records,its taking around 10 seconds to execute the below query.I am afraid what if the records are more than million.Please help me improve the performance of the below query.
CREATE proc [dbo].[GetTodaysActualInvoiceItemSoldHistory]
#fromdate datetime,
#todate datetime
as
Begin
select SDID.InvoiceDate as [Sold Date],Cust.custCompanyName as [Sold To] ,
case SQBD.TransferNo when '0' then IVM.VendorName else SQBD.TransferNo end as [Purchase From],
SQBD.BatchSellQty as SoldQty,SQID.SellPrice,
SDID.InvoiceNo as [Sales Invoice No],INV.PRInvoiceNo as [PO Invoice No],INV.PRInvoiceDate as [PO Invoice Date],
SQID.ItemDesc as [Item Description],SQID.NetPrice,SDHM.DeliveryHeaderMasterName as DeliveryHeaderName,
SQID.ItemCode as [Item Code],
SQBD.BatchNo,SQBD.ExpiryDate,SQID.Amount,
SQID.Dept_ID as Dept_ID,
Dept_Name as [Department],SQID.Catg_ID as Catg_ID,
Category_Name as [Category],SQID.Brand_ID as Brand_ID,
BrandName as BrandName, SQID.Manf_Id as Manf_Id,
Manf.ManfName as [Manufacturer],
STM.TaxName, SQID.Tax_ID as Tax_ID,
INV.VendorID as VendorID,
SQBD.ItemID,SQM.Isdeleted,
SDHM.DeliveryHeaderMasterID,Cust.CustomerMasterID
from SD_QuotationMaster SQM
inner join SD_InvoiceDetails SDID on SQM.QuoteID = SDID.QuoteID
inner join SD_QuoteItemDetails SQID on SDID.QuoteID = SQID.QuoteID
inner join SD_QuoteBatchDetails SQBD on SDID.QuoteID = SQBD.QuoteID and SQID.ItemID=SQBD.ItemID
inner join INV_ProductInvoice INV on SQBD.InvoiceID=INV.ProductInvoiceID
inner jOIN INV_VendorMaster IVM ON INV.VendorID = IVM.VendorID
inner jOIN Sys_TaxMaster STM ON SQID.Tax_ID = STM.Tax_ID
inner join Cust_CustomerMaster Cust on SQM.CustomerMasterID = Cust.CustomerMasterID
left jOIN INV_DeptartmentMaster Dept ON SQID.Dept_ID = Dept.Dept_ID
left jOIN INV_BrandMaster BRD ON SQID.Brand_ID = BRD.Brand_ID
left jOIN INV_ManufacturerMaster Manf ON SQID.Manf_Id = Manf.Manf_Id
left join INV_CategoryMaster CAT ON SQID.Catg_ID = CAT.Catg_ID
left join SLRB_DeliveryCustomerMaster SDCM on SQM.CustomerMasterID=SDCM.CustomerMasterID and SQM.DeliveryHeaderMasterID=SDCM.DeliveryHeaderMasterID
left join SLRB_DeliveryHeaderMaster SDHM on SDCM.DeliveryHeaderMasterID=SDHM.DeliveryHeaderMasterID
where (SQM.IsDeleted=0) and SQBD.BatchSellQty > 0
and SDID.InvoiceDate between #fromdate and #todate
order by ItemDesc
End
Only the below tables contain more data while other tables have records less than 20
InvoiceDetails, QuoteMaster, QuoteItemDetails, QuoteBatchDetails ProductInvoice
Below is the link for execution plan
http://jmp.sh/CSZc2x2
Thanks.
Let's start with an obvious error:
(isnull(SQBD.BatchSellQty,0) > 0)
That one is not indexable, so it should not happen. Seriously, BatchSellQty should not be unknown (nullable) in most cases, or you better handle null properly. That field should be indexed and I am not sure I would like that with an isNull - there are likely tons of batches. Also note that a filtered index (condition >0) may work here.
Second, check that you have all proper indices and the execution plan makes sense.
Thids, you have to test with a ton of data. Index statistics may make a difference. Check where the time is spent - it may be tempdb in which case you really need a good tempdb IO speed.... and it is not realted to the input side.
You can try to use query hints to help SQL Server optimizer build a optimal query execution plan. For example, you can force the order of tables will be joined, using FORCE ORDER statement. If you order your tables in order that joins with minimum result size at each step, query will execute faster (may be, needs to try). Example:
We need to A join B join C
If A join B = 2000 records x 1000 records = ~400 records (we suspect this result)
And A join C = 2000 records x 10 records = ~3 records (and this)
And B join C = 1000 records x 10 records = 10 000 records (and this)
In this case optimal order will be
A join C join B = ~3 records x 1000 records = ~3000 records

Super Slow Query - sped up, but not perfect... Please help

I posted a query yesterday (see here) that was horrible (took over a minute to run, resulting in 18,215 records):
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
ORDER BY EMAIL
With a lot of coaching and research, I've tuned it up to the following:
SELECT contacts.ContactID,
contacts.InstitutionID,
contacts.First,
contacts.Last,
institutionswithzipcodesadditional.CountyID,
institutionswithzipcodesadditional.StateID,
institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN contacts_link_emails ON
contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN institutionswithzipcodesadditional ON
contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
WHERE
(contacts.ContactID IN
(SELECT contacts_2.ContactID
FROM contacts AS contacts_2
INNER JOIN contacts_link_emails AS contacts_link_emails_2 ON
contacts_2.ContactID = contacts_link_emails_2.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions ON
contacts_2.JobTitle = contacts_def_jobfunctions.JobID
RIGHT OUTER JOIN newsletterremovelist ON
contacts_link_emails_2.Email = newsletterremovelist.EmailAddress
WHERE (contacts_def_jobfunctions.ParentJobID <> 1841)
GROUP BY contacts_2.ContactID
UNION
SELECT contacts_1.ContactID
FROM contacts_link_jobfunctions
INNER JOIN contacts_def_jobfunctions AS contacts_def_jobfunctions_1 ON
contacts_link_jobfunctions.JobID = contacts_def_jobfunctions_1.JobID
AND contacts_def_jobfunctions_1.ParentJobID <> 1841
INNER JOIN contacts AS contacts_1 ON
contacts_link_jobfunctions.ContactID = contacts_1.ContactID
INNER JOIN contacts_link_emails AS contacts_link_emails_1 ON
contacts_link_emails_1.ContactID = contacts_1.ContactID
LEFT OUTER JOIN newsletterremovelist AS newsletterremovelist_1 ON
contacts_link_emails_1.Email = newsletterremovelist_1.EmailAddress
GROUP BY contacts_1.ContactID))
While this query is now super fast (about 3 seconds), I've blown part of the logic somewhere - it only returns 14,863 rows (instead of the 18,215 rows that I believe is accurate).
The results seem near correct. I'm working to discover what data might be missing in the result set.
Can you please coach me through whatever I've done wrong here?
Thanks,
Russell Schutte
The main problem with your original query was that you had two extra joins just to introduce duplicates and then a DISTINCT to get rid of them.
Use this:
SELECT cle.Email,
c.ContactID,
c.First AS ContactFirstName,
c.Last AS ContactLastName,
c.InstitutionID,
izip.CountyID,
izip.StateID,
izip.DistrictID
FROM dbo.contacts c
INNER JOIN
dbo.institutionswithzipcodesadditional izip
ON izip.InstitutionID = c.InstitutionID
INNER JOIN
dbo.contacts_link_emails cle
ON cle.ContactID = c.ContactID
WHERE cle.Email NOT IN
(
SELECT EmailAddress
FROM dbo.newsletterremovelist
)
AND EXISTS
(
SELECT NULL
FROM dbo.contacts_def_jobfunctions cdj
WHERE cdj.JobId = c.JobTitle
AND cdj.ParentJobId <> '1841'
UNION ALL
SELECT NULL
FROM dbo.contacts_link_jobfunctions clj
JOIN dbo.contacts_def_jobfunctions cdj
ON cdj.JobID = clj.JobID
WHERE clj.ContactID = c.ContactID
AND cdj.ParentJobId <> '1841'
)
ORDER BY
email
Create the following indexes:
newsletterremovelist (EmailAddress)
contacts_link_jobfunctions (ContactID, JobID)
contacts_def_jobfunctions (JobID)
Do you get the same results when you do:
SELECT count(*)
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
SELECT COUNT(*)
FROM
contacts
INNER JOIN contacts_link_jobfunctions
ON contacts.ContactID = contacts_link_jobfunctions.ContactID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
If so keep adding each join conditon on until you don't get the same results and you will see where your mistake was. If all the joins are the same, then look at the where clauses. But I will be surprised if it isn't in the first join because the syntax you have orginally won't even work on SQL Server and it is pretty nonstandard SQL and may have been incorrect all along but no one knew.
Alternatively, pick a few of the records that are returned in the orginal but not the revised. Track them through the tables one at a time to see if you can find why the second query filters them out.
I'm not directly sure what is wrong, but when I run in to this situation, the first thing I do is start removing variables.
So, comment out the where clause. How many rows are returned?
If you get back the 11,604 rows then you've isolated the problems to the joins. Work though the joins, commenting each one out (remove the associated columns too) and figure out how many rows are eliminated.
As you do this, aim to find what is causing the desired rows to be eliminated. Once isolated, consider the join differences between the first query and the second query.
In looking at the first query, you could probably just modify that to eliminate any INs and instead do a EXISTS instead.
Consider your indexes as well. Any thing in the where or join clauses should probably be indexed.