SQL - Multiple join left with sum doesn't give expected result - sql

Here is my request
SELECT j.* ,
c.name as client_name ,
s.name as supplier_name,
s.ID as supplier_id ,
mt.* ,
SUM(pb.require_followup) as nb_followup,
SUM(ws.worked_time) as hours_on_job,
SUM(iv.total) as total_price,
SUM(iv.hour_expected) as hours_planned,
j.ID as ID
FROM $wpdb->posts j
LEFT JOIN ".Job::$META_TABLE." mt ON mt.post_id = j.ID
LEFT JOIN ".Job::$LINK_TABLE_JOB_CONTACT." l1 ON l1.job_id = j.ID
LEFT JOIN ".Contact::$TABLE_NAME." c ON c.ID = l1.contact_id
LEFT JOIN ".Supplier::$TABLE_NAME." s ON s.ID = c.supplier_id
LEFT JOIN ".Problem::$TABLE_NAME." pb ON pb.job_id = j.ID
LEFT JOIN ".Worksheet::$TABLE_NAME." ws ON ws.job_id = j.ID
LEFT JOIN ".Invoice::$TABLE_NAME." iv ON iv.job_id = j.ID
WHERE j.post_status = 'publish'
AND j.post_type = 'job'
".implode(' ',$where_condition)."
GROUP BY j.ID
ORDER BY j.post_date DESC
the Problem is that result for SUM is wrong when I LEFT JOIN other table.
The row 53 for example give 105 for nb_followup instead of 1
Where this request return the right result simply by removing the last 2 LEFT JOIN : LEFT JOIN ".Worksheet::$TABLE_NAME." ws ON ws.job_id = j.ID and
LEFT JOIN ".Invoice::$TABLE_NAME." iv ON iv.job_id = j.ID
SELECT j.* ,
c.name as client_name ,
s.name as supplier_name,
s.ID as supplier_id ,
mt.* ,
SUM(pb.require_followup) as nb_followup,
j.ID as ID
FROM $wpdb->posts j
LEFT JOIN ".Job::$META_TABLE." mt ON mt.post_id = j.ID
LEFT JOIN ".Job::$LINK_TABLE_JOB_CONTACT." l1 ON l1.job_id = j.ID
LEFT JOIN ".Contact::$TABLE_NAME." c ON c.ID = l1.contact_id
LEFT JOIN ".Supplier::$TABLE_NAME." s ON s.ID = c.supplier_id
LEFT JOIN ".Problem::$TABLE_NAME." pb ON pb.job_id = j.ID
WHERE j.post_status = 'publish'
AND j.post_type = 'job'
".implode(' ',$where_condition)."
GROUP BY j.ID
ORDER BY j.post_date DESC
Also removing only LEFT JOIN ".Invoice::$TABLE_NAME." iv ON iv.job_id = j.ID will give 15 as result for the row 53
To resume
Full request give 105 -> wrong should be 1
removing the last join give 15 -> wrong should be 1
removing the last 2 join give 1 -> Correct

You need to calculate the SUM()s BEFORE you join, otherwise the rows multiply because of the joins and this in turn leads to errors in summation. e.g.
SELECT
j.ID as ID
, pb.nb_followup
FROM $wpdb->posts j
LEFT JOIN (select pb.job_id, SUM(pb.require_followup) as nb_followup from ".Problem::$TABLE_NAME." pb GROUP BY pb.job_id) pb ON pb.job_id = j.ID
The other problem you are facing is that MySQL permits "lazy syntax" for group by. Don't use this lazy syntax or you will get unexpected error/bugs. It is very simple to avoid, REPEAT every column of the select clause in the group by clause UNLESS the column is using an aggregate function such as SUM(), COUNT(), MIN(), MAX() and so on.e.g.
select a.col1, b.col2, c.col3 , sum(d.col4)
from a
inner join b on a.id = b.aid
inner join c on b.id = c.bid
inner join d on c.id = d.cid
group by a.col1, b.col2, c.col3

Related

Problem with where clause and count transactions?

I want to count how many transactions I have by Currency. When I count without a where clause I get 0 transactions where I have NULL values but when I use a where clause with an IN operator I get a filtered result and no zero results. How to show 0 in count transactions?
SELECT
c.ShortName,
count(ad.AccountId) as No_of_transactions
FROM Currency c
LEFT JOIN Account a ON c.id = a.CurrencyId
LEFT JOIN AccountDetails ad ON a.id = ad.AccountId
LEFT JOIN [Location] l ON ad.LocationId = l.Id
LEFT JOIN LocationType lt ON l.LocationTypeId = lt.Id
WHERE lt.Name IN('Region Branch', 'City Branch')
GROUP BY c.ShortName
This is the result that I want to get:
EUR 31,
USD 0,
GBR 0
You have a LEFT JOIN. The filtering needs to go in the ON clause:
SELECT c.ShortName,
COUNT(lt.id) as No_of_transactions
FROM Currency c LEFT JOIN
Account a
ON c.id = a.CurrencyId LEFT JOIN
AccountDetails ad
ON a.id = ad.AccountId LEFT JOIN
[Location] l
ON ad.LocationId = l.Id LEFT JOIN
LocationType lt
ON l.LocationTypeId = lt.Id AND
lt.Name IN ('Region Branch', 'City Branch')
GROUP BY c.ShortName;
In your version, the non-matches turn into NULLs, which the WHERE conditions filter out.
You want to move the condition on the left joined table from the where clause to the on clause of the relevant join to avoid records being filtered out when they do not exist in that table.
Also, I think that you need COUNT(DISTINCT ad.id) (as per the comments, a transaction is uniquely represented by this column):
SELECT
c.ShortName,
COUNT(DISTINCT ad.id) as No_of_transactions
FROM Currency c
LEFT JOIN Account a ON c.id = a.CurrencyId
LEFT JOIN AccountDetails ad ON a.id = ad.AccountId
LEFT JOIN [Location] l ON ad.LocationId = l.Id
LEFT JOIN LocationType lt ON l.LocationTypeId = lt.Id AND lt.Name IN('Region Branch', 'City Branch')
GROUP BY c.ShortName
SELECT
c.ShortName,
COUNT(t.Id) as No_of_transations
FROM Currency c
LEFT JOIN Account a ON c.id = a.CurrencyId
LEFT JOIN AccountDetails ad ON a.id = ad.AccountId
LEFT JOIN [Location] l ON ad.LocationId = l.Id
LEFT JOIN (SELECT * FROM LocationType lt WHERE lt.Name IN('Region Branch', 'City
Branch')) AS t ON l.LocationTypeId = t.Id
GROUP BY c.ShortName

CTE Missing Records

The first query listed below returns some logistical data associated with hires that have been made within a particular period of time. The query returns 478 records.
SELECT c.candidate_id AS candidate_id
,o.name
,j.name AS job_title
,c.applied_from
,job_id AS job_id
,cjs.score AS smart_rank_score
,cjs.is_completed AS smartrank_completion_status
,c.hired_at
FROM candidate_jobs c
LEFT JOIN organizations o ON o.id = c.organization_id
LEFT JOIN candidate_job_surveys cjs ON cjs.candidate_job_id = c.id
LEFT JOIN jobs j ON j.id = c.job_id
WHERE o.name LIKE ANY ('{"%Tutor Doctor%"}')
AND c.hired_at :: date BETWEEN '2015-01-01' AND '2016-02-22'
ORDER BY 8 DESC
However, when I attempted to add a CTE (see below) that displays each hire's final "post hire check in score", the query only returns 236 records. Ideally, I'd like the query to either return a score or null value for each of the initial 478 hire records.
WITH final_post_hire_score (candidate_id, final_score) AS
(SELECT c.candidate_id
,p.score
FROM post_hire_followup_reviews p
LEFT JOIN candidate_jobs c ON c.id = p.candidate_job_id
WHERE p.check_in_number = 3)
SELECT c.candidate_id AS candidate_id
,o.name
,j.name AS job_title
,c.applied_from
,job_id AS job_id
,cjs.score AS smart_rank_score
,cjs.is_completed AS smartrank_completion_status
,c.hired_at
,final_score
FROM final_post_hire_score f
LEFT JOIN candidate_jobs c ON c.candidate_id = f.candidate_id
LEFT JOIN organizations o ON o.id = c.organization_id
LEFT JOIN candidate_job_surveys cjs ON cjs.candidate_job_id = c.id
LEFT JOIN jobs j ON j.id = c.job_id
WHERE o.name LIKE ANY ('{"%Tutor Doctor%"}')
AND c.hired_at :: date BETWEEN '2015-01-01' AND '2016-02-22'
ORDER BY 8 DESC
Missing records are due to the filter's, Move the filter's to ON condition else your LEFT OUTER JOIN will be implicitly converted to INNER JOIN
When you are using LEFT OUTER JOIN right table filter's should be present in ON condition else the NULL values for non matching records will get filtered
WITH final_post_hire_score (candidate_id, final_score)
AS (SELECT c.candidate_id,
p.score
FROM post_hire_followup_reviews p
LEFT JOIN candidate_jobs c
ON c.id = p.candidate_job_id
WHERE p.check_in_number = 3)
SELECT c.candidate_id AS candidate_id,
o.NAME,
j.NAME AS job_title,
c.applied_from,
job_id AS job_id,
cjs.score AS smart_rank_score,
cjs.is_completed AS smartrank_completion_status,
c.hired_at,
final_score
FROM final_post_hire_score f
LEFT JOIN candidate_jobs c
ON c.candidate_id = f.candidate_id
AND c.hired_at :: date BETWEEN '2015-01-01' AND '2016-02-22'
LEFT JOIN organizations o
ON o.id = c.organization_id
AND o.NAME LIKE ANY ( '{"%Tutor Doctor%"}' )
LEFT JOIN candidate_job_surveys cjs
ON cjs.candidate_job_id = c.id
LEFT JOIN jobs j
ON j.id = c.job_id
ORDER BY 8 DESC
I think there's an extra
WHERE p.check_in_number = 3
that isn't anywhere else.

Optimizing a T-SQL query with COUNT in the SELECT and HAVING statements

I am not sure if this is to do with the fact that I am dealing with very large tables (some have 900+ million rows) but I am having trouble optimizing my query. I have also checked and used indexed fields wherever possible , with most of the fields being used on my query actually being indexed.Using a select top 100 statement takes roughly 10 minutes and i would like to get all of the results back, much more faster. How would I go about for optimizing this query and future queries like it? For security purposes I had to use alternative aliases below:
SELECT TOP 100
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock, COUNT (*) as Numbers
FROM
j WITH(NOLOCK)
INNER JOIN
jp WITH(NOLOCK) ON j.ID = jp.ID
INNER JOIN
jd WITH(NOLOCK) ON (jd.ID = jp.ID And jd.path = 3)
INNER JOIN
fa WITH(NOLOCK) ON fa.ID = j.ID
INNER JOIN
l WITH(NOLOCK) ON j.ID = l.ID AND l.CoID = 3
INNER JOIN
c WITH(NOLOCK) ON c.CID = fa.CID
INNER JOIN
x WITH(NOLOCK) ON c.CID = x.CID
WHERE
j.ExpiryDate > GETDATE()
GROUP BY
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock
HAVING
COUNT(*) <= 10
Try this
SELECT TOP 100
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock, COUNT (*) as Numbers
FROM
j WITH(NOLOCK)
INNER JOIN
jp WITH(NOLOCK) ON j.ID = jp.ID
and j.ExpiryDate > GETDATE()
INNER JOIN
jd WITH(NOLOCK) ON (jd.ID = jp.ID And jd.path = 3)
INNER JOIN
fa WITH(NOLOCK) ON fa.ID = j.ID
INNER JOIN
l WITH(NOLOCK) ON j.ID = l.ID AND l.CoID = 3
INNER JOIN
c WITH(NOLOCK) ON c.CID = fa.CID
INNER JOIN
x WITH(NOLOCK) ON c.CID = x.CID
GROUP BY
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock
HAVING
COUNT(*) <= 10
Sometimes it helps to reduce the data set ina derived table and then apply the function only on the data that meets the where condition. Without seeing axecution plans for both, I don;t know itf this will work, but iti si worth a tr.
SELECT a.XID, a.JID, a.FirstDate, a.ExpiryDate, a.Lock, COUNT (*) as Numbers
FROM (
SELECT
x.ID as XID, j.ID as JID, j.FirstDate, j.ExpiryDate, x.Lock
FROM
j
INNER JOIN
jp ON j.ID = jp.ID
INNER JOIN
jd ON (jd.ID = jp.ID And jd.path = 3)
INNER JOIN
fa ON fa.ID = j.ID
INNER JOIN
l ON j.ID = l.ID AND l.CoID = 3
INNER JOIN
c ON c.CID = fa.CID
INNER JOIN
x ON c.CID = x.CID
WHERE
j.ExpiryDate > GETDATE()) a
GROUP BY
a.XID, a.JID, a.FirstDate, a.ExpiryDate, a.Lock
HAVING
COUNT(*) <= 10

Where clause in left join using correlated query?

Here's the query:
SELECT c.Name As CompanyName, j.ID as JobID, j.Title as JobTitle,
ja.ApplicationDate, DATEDIFF(MONTH,ja.ApplicationDate, GETDATE()) AS MonthsAgo,
jsc.Name As Recruiter, js.Name As RecruitingAgency, jsh.Name As LastStatus
FROM Companies c
JOIN Jobs j
ON c.ID = j.CompanyID
JOIN JobApplications ja
ON j.ID = ja.JobID
LEFT JOIN JobContact jsc
ON jsc.ID = j.JobSourceContactID
LEFT JOIN JobContactCompany js
ON jsc.JobSourceCompanyID = js.ID
LEFT JOIN (
SELECT TOP 1 jh.JobID, jh.StatusDate, jt.Name
FROM JobStatusHistory jh
JOIN JobStatusTypes jt
ON jh.JobStatusTypeID = jt.ID
--WHERE jh.JobID = j.ID
ORDER BY jh.StatusDate DESC
) jsh
ON j.ID = jsh.JobID
ORDER BY ja.ApplicationDate
I'm trying to get the most recent job status for a particular job. I can't figure out how to do the where clause (the commented WHERE) in the LEFT JOIN. I've done this in the past, but can't remember how I did this in the past.
I will be grateful for any pointers.
You need to use OUTER APPLY. A CROSS Apply is like an INNER JOIN where the applied table must return results, whereas an OUTER Apply is like a [LEFT] OUTER JOIN where the applied subquery may return no results.
SELECT c.Name As CompanyName, j.ID as JobID, j.Title as JobTitle,
ja.ApplicationDate, DATEDIFF(MONTH,ja.ApplicationDate, GETDATE()) AS MonthsAgo,
jsc.Name As Recruiter, js.Name As RecruitingAgency, jsh.Name As LastStatus
FROM Companies c
JOIN Jobs j ON c.ID = j.CompanyID
JOIN JobApplications ja ON j.ID = ja.JobID
LEFT JOIN JobContact jsc ON jsc.ID = j.JobSourceContactID
LEFT JOIN JobContactCompany js ON jsc.JobSourceCompanyID = js.ID
OUTER APPLY (
SELECT TOP 1 jh.JobID, jh.StatusDate, jt.Name
FROM JobStatusHistory jh
JOIN JobStatusTypes jt
ON jh.JobStatusTypeID = jt.ID
WHERE jh.JobID = j.ID
ORDER BY jh.StatusDate DESC
) jsh
ORDER BY ja.ApplicationDate

SQL use nested select in middle of inner join

Is it possible to use a select in the middle of joining...
I am trying to do the following:
FROM
tblorders o
INNER JOIN tblunits u on o.id = u.orderid
INNER JOIN ((SELECT
,Min(n.date) as [MinDate]
from tblNotes n
Where n.test = 'test') te
INNER JOIN tblnotes n on te.id = n.id
and te.[MinDate] = n.AuditinsertTimestamp)
INNER Join tblClient c ON o.ClientId = c.Id
Basically in the select in the middle of the query it is selecting only the notes with min date. The problem is I need to do this here because I need from tblOrders to be the first table.......
Suggestions?
The INNER JOIN failed because you have a leading comma here:
,Min(n.date) as [MinDate]
I think you are looking for something like this:
SELECT ...
FROM tblorders o
INNER JOIN tblunits u on o.id = u.orderid
INNER JOIN (
SELECT id, Min(date) as [MinDate]
from tblNotes
Where test = 'test'
group by id
) te <-- not sure what JOIN clause to use here, please post schema
INNER JOIN tblnotes n on te.id = n.id
and te.[MinDate] = n.AuditinsertTimestamp
INNER Join tblClient c ON o.ClientId = c.Id
You are missing an alias and join condition:
FROM
tblorders o
INNER JOIN tblunits u on o.id = u.orderid
INNER JOIN ((SELECT Min(n.date) as [MinDate]
from tblNotes n
Where n.test = 'test') te
INNER JOIN tblnotes n on te.id = n.id
and te.[MinDate] = n.AuditinsertTimestamp)
-- missing
AS z
ON <join conditions haere>
INNER Join tblClient c ON o.ClientId = c.Id
Yes, you can have a Select in a Join.