Where clause in left join using correlated query? - sql

Here's the query:
SELECT c.Name As CompanyName, j.ID as JobID, j.Title as JobTitle,
ja.ApplicationDate, DATEDIFF(MONTH,ja.ApplicationDate, GETDATE()) AS MonthsAgo,
jsc.Name As Recruiter, js.Name As RecruitingAgency, jsh.Name As LastStatus
FROM Companies c
JOIN Jobs j
ON c.ID = j.CompanyID
JOIN JobApplications ja
ON j.ID = ja.JobID
LEFT JOIN JobContact jsc
ON jsc.ID = j.JobSourceContactID
LEFT JOIN JobContactCompany js
ON jsc.JobSourceCompanyID = js.ID
LEFT JOIN (
SELECT TOP 1 jh.JobID, jh.StatusDate, jt.Name
FROM JobStatusHistory jh
JOIN JobStatusTypes jt
ON jh.JobStatusTypeID = jt.ID
--WHERE jh.JobID = j.ID
ORDER BY jh.StatusDate DESC
) jsh
ON j.ID = jsh.JobID
ORDER BY ja.ApplicationDate
I'm trying to get the most recent job status for a particular job. I can't figure out how to do the where clause (the commented WHERE) in the LEFT JOIN. I've done this in the past, but can't remember how I did this in the past.
I will be grateful for any pointers.

You need to use OUTER APPLY. A CROSS Apply is like an INNER JOIN where the applied table must return results, whereas an OUTER Apply is like a [LEFT] OUTER JOIN where the applied subquery may return no results.
SELECT c.Name As CompanyName, j.ID as JobID, j.Title as JobTitle,
ja.ApplicationDate, DATEDIFF(MONTH,ja.ApplicationDate, GETDATE()) AS MonthsAgo,
jsc.Name As Recruiter, js.Name As RecruitingAgency, jsh.Name As LastStatus
FROM Companies c
JOIN Jobs j ON c.ID = j.CompanyID
JOIN JobApplications ja ON j.ID = ja.JobID
LEFT JOIN JobContact jsc ON jsc.ID = j.JobSourceContactID
LEFT JOIN JobContactCompany js ON jsc.JobSourceCompanyID = js.ID
OUTER APPLY (
SELECT TOP 1 jh.JobID, jh.StatusDate, jt.Name
FROM JobStatusHistory jh
JOIN JobStatusTypes jt
ON jh.JobStatusTypeID = jt.ID
WHERE jh.JobID = j.ID
ORDER BY jh.StatusDate DESC
) jsh
ORDER BY ja.ApplicationDate

Related

SQL - Multiple join left with sum doesn't give expected result

Here is my request
SELECT j.* ,
c.name as client_name ,
s.name as supplier_name,
s.ID as supplier_id ,
mt.* ,
SUM(pb.require_followup) as nb_followup,
SUM(ws.worked_time) as hours_on_job,
SUM(iv.total) as total_price,
SUM(iv.hour_expected) as hours_planned,
j.ID as ID
FROM $wpdb->posts j
LEFT JOIN ".Job::$META_TABLE." mt ON mt.post_id = j.ID
LEFT JOIN ".Job::$LINK_TABLE_JOB_CONTACT." l1 ON l1.job_id = j.ID
LEFT JOIN ".Contact::$TABLE_NAME." c ON c.ID = l1.contact_id
LEFT JOIN ".Supplier::$TABLE_NAME." s ON s.ID = c.supplier_id
LEFT JOIN ".Problem::$TABLE_NAME." pb ON pb.job_id = j.ID
LEFT JOIN ".Worksheet::$TABLE_NAME." ws ON ws.job_id = j.ID
LEFT JOIN ".Invoice::$TABLE_NAME." iv ON iv.job_id = j.ID
WHERE j.post_status = 'publish'
AND j.post_type = 'job'
".implode(' ',$where_condition)."
GROUP BY j.ID
ORDER BY j.post_date DESC
the Problem is that result for SUM is wrong when I LEFT JOIN other table.
The row 53 for example give 105 for nb_followup instead of 1
Where this request return the right result simply by removing the last 2 LEFT JOIN : LEFT JOIN ".Worksheet::$TABLE_NAME." ws ON ws.job_id = j.ID and
LEFT JOIN ".Invoice::$TABLE_NAME." iv ON iv.job_id = j.ID
SELECT j.* ,
c.name as client_name ,
s.name as supplier_name,
s.ID as supplier_id ,
mt.* ,
SUM(pb.require_followup) as nb_followup,
j.ID as ID
FROM $wpdb->posts j
LEFT JOIN ".Job::$META_TABLE." mt ON mt.post_id = j.ID
LEFT JOIN ".Job::$LINK_TABLE_JOB_CONTACT." l1 ON l1.job_id = j.ID
LEFT JOIN ".Contact::$TABLE_NAME." c ON c.ID = l1.contact_id
LEFT JOIN ".Supplier::$TABLE_NAME." s ON s.ID = c.supplier_id
LEFT JOIN ".Problem::$TABLE_NAME." pb ON pb.job_id = j.ID
WHERE j.post_status = 'publish'
AND j.post_type = 'job'
".implode(' ',$where_condition)."
GROUP BY j.ID
ORDER BY j.post_date DESC
Also removing only LEFT JOIN ".Invoice::$TABLE_NAME." iv ON iv.job_id = j.ID will give 15 as result for the row 53
To resume
Full request give 105 -> wrong should be 1
removing the last join give 15 -> wrong should be 1
removing the last 2 join give 1 -> Correct
You need to calculate the SUM()s BEFORE you join, otherwise the rows multiply because of the joins and this in turn leads to errors in summation. e.g.
SELECT
j.ID as ID
, pb.nb_followup
FROM $wpdb->posts j
LEFT JOIN (select pb.job_id, SUM(pb.require_followup) as nb_followup from ".Problem::$TABLE_NAME." pb GROUP BY pb.job_id) pb ON pb.job_id = j.ID
The other problem you are facing is that MySQL permits "lazy syntax" for group by. Don't use this lazy syntax or you will get unexpected error/bugs. It is very simple to avoid, REPEAT every column of the select clause in the group by clause UNLESS the column is using an aggregate function such as SUM(), COUNT(), MIN(), MAX() and so on.e.g.
select a.col1, b.col2, c.col3 , sum(d.col4)
from a
inner join b on a.id = b.aid
inner join c on b.id = c.bid
inner join d on c.id = d.cid
group by a.col1, b.col2, c.col3

CTE Missing Records

The first query listed below returns some logistical data associated with hires that have been made within a particular period of time. The query returns 478 records.
SELECT c.candidate_id AS candidate_id
,o.name
,j.name AS job_title
,c.applied_from
,job_id AS job_id
,cjs.score AS smart_rank_score
,cjs.is_completed AS smartrank_completion_status
,c.hired_at
FROM candidate_jobs c
LEFT JOIN organizations o ON o.id = c.organization_id
LEFT JOIN candidate_job_surveys cjs ON cjs.candidate_job_id = c.id
LEFT JOIN jobs j ON j.id = c.job_id
WHERE o.name LIKE ANY ('{"%Tutor Doctor%"}')
AND c.hired_at :: date BETWEEN '2015-01-01' AND '2016-02-22'
ORDER BY 8 DESC
However, when I attempted to add a CTE (see below) that displays each hire's final "post hire check in score", the query only returns 236 records. Ideally, I'd like the query to either return a score or null value for each of the initial 478 hire records.
WITH final_post_hire_score (candidate_id, final_score) AS
(SELECT c.candidate_id
,p.score
FROM post_hire_followup_reviews p
LEFT JOIN candidate_jobs c ON c.id = p.candidate_job_id
WHERE p.check_in_number = 3)
SELECT c.candidate_id AS candidate_id
,o.name
,j.name AS job_title
,c.applied_from
,job_id AS job_id
,cjs.score AS smart_rank_score
,cjs.is_completed AS smartrank_completion_status
,c.hired_at
,final_score
FROM final_post_hire_score f
LEFT JOIN candidate_jobs c ON c.candidate_id = f.candidate_id
LEFT JOIN organizations o ON o.id = c.organization_id
LEFT JOIN candidate_job_surveys cjs ON cjs.candidate_job_id = c.id
LEFT JOIN jobs j ON j.id = c.job_id
WHERE o.name LIKE ANY ('{"%Tutor Doctor%"}')
AND c.hired_at :: date BETWEEN '2015-01-01' AND '2016-02-22'
ORDER BY 8 DESC
Missing records are due to the filter's, Move the filter's to ON condition else your LEFT OUTER JOIN will be implicitly converted to INNER JOIN
When you are using LEFT OUTER JOIN right table filter's should be present in ON condition else the NULL values for non matching records will get filtered
WITH final_post_hire_score (candidate_id, final_score)
AS (SELECT c.candidate_id,
p.score
FROM post_hire_followup_reviews p
LEFT JOIN candidate_jobs c
ON c.id = p.candidate_job_id
WHERE p.check_in_number = 3)
SELECT c.candidate_id AS candidate_id,
o.NAME,
j.NAME AS job_title,
c.applied_from,
job_id AS job_id,
cjs.score AS smart_rank_score,
cjs.is_completed AS smartrank_completion_status,
c.hired_at,
final_score
FROM final_post_hire_score f
LEFT JOIN candidate_jobs c
ON c.candidate_id = f.candidate_id
AND c.hired_at :: date BETWEEN '2015-01-01' AND '2016-02-22'
LEFT JOIN organizations o
ON o.id = c.organization_id
AND o.NAME LIKE ANY ( '{"%Tutor Doctor%"}' )
LEFT JOIN candidate_job_surveys cjs
ON cjs.candidate_job_id = c.id
LEFT JOIN jobs j
ON j.id = c.job_id
ORDER BY 8 DESC
I think there's an extra
WHERE p.check_in_number = 3
that isn't anywhere else.

Optimizing a T-SQL query with COUNT in the SELECT and HAVING statements

I am not sure if this is to do with the fact that I am dealing with very large tables (some have 900+ million rows) but I am having trouble optimizing my query. I have also checked and used indexed fields wherever possible , with most of the fields being used on my query actually being indexed.Using a select top 100 statement takes roughly 10 minutes and i would like to get all of the results back, much more faster. How would I go about for optimizing this query and future queries like it? For security purposes I had to use alternative aliases below:
SELECT TOP 100
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock, COUNT (*) as Numbers
FROM
j WITH(NOLOCK)
INNER JOIN
jp WITH(NOLOCK) ON j.ID = jp.ID
INNER JOIN
jd WITH(NOLOCK) ON (jd.ID = jp.ID And jd.path = 3)
INNER JOIN
fa WITH(NOLOCK) ON fa.ID = j.ID
INNER JOIN
l WITH(NOLOCK) ON j.ID = l.ID AND l.CoID = 3
INNER JOIN
c WITH(NOLOCK) ON c.CID = fa.CID
INNER JOIN
x WITH(NOLOCK) ON c.CID = x.CID
WHERE
j.ExpiryDate > GETDATE()
GROUP BY
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock
HAVING
COUNT(*) <= 10
Try this
SELECT TOP 100
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock, COUNT (*) as Numbers
FROM
j WITH(NOLOCK)
INNER JOIN
jp WITH(NOLOCK) ON j.ID = jp.ID
and j.ExpiryDate > GETDATE()
INNER JOIN
jd WITH(NOLOCK) ON (jd.ID = jp.ID And jd.path = 3)
INNER JOIN
fa WITH(NOLOCK) ON fa.ID = j.ID
INNER JOIN
l WITH(NOLOCK) ON j.ID = l.ID AND l.CoID = 3
INNER JOIN
c WITH(NOLOCK) ON c.CID = fa.CID
INNER JOIN
x WITH(NOLOCK) ON c.CID = x.CID
GROUP BY
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock
HAVING
COUNT(*) <= 10
Sometimes it helps to reduce the data set ina derived table and then apply the function only on the data that meets the where condition. Without seeing axecution plans for both, I don;t know itf this will work, but iti si worth a tr.
SELECT a.XID, a.JID, a.FirstDate, a.ExpiryDate, a.Lock, COUNT (*) as Numbers
FROM (
SELECT
x.ID as XID, j.ID as JID, j.FirstDate, j.ExpiryDate, x.Lock
FROM
j
INNER JOIN
jp ON j.ID = jp.ID
INNER JOIN
jd ON (jd.ID = jp.ID And jd.path = 3)
INNER JOIN
fa ON fa.ID = j.ID
INNER JOIN
l ON j.ID = l.ID AND l.CoID = 3
INNER JOIN
c ON c.CID = fa.CID
INNER JOIN
x ON c.CID = x.CID
WHERE
j.ExpiryDate > GETDATE()) a
GROUP BY
a.XID, a.JID, a.FirstDate, a.ExpiryDate, a.Lock
HAVING
COUNT(*) <= 10

How to retrieve count of records in SELECT statement

I am trying to retrieve the right count of records to mitigate an issue I am having. The below query returns 327 records from my database:
SELECT DISTINCT COUNT(at.someid) AS CountOfStudentsInTable FROM tblJobSkillAssessment AS at
INNER JOIN tblJobSkills j ON j.jobskillid = at.skillid
LEFT JOIN tblStudentPersonal sp ON sp.someid2 = at.someid
INNER JOIN tblStudentSchool ss ON ss.monsterid = at.someid
INNER JOIN tblSchools s ON s.schoolid = ss.schoolid
INNER JOIN tblSchoolDistricts sd ON sd.schoolid = s.schoolid
INNER JOIN tblDistricts d ON d.districtid = sd.districtid
INNER JOIN tblCountySchools cs ON cs.schoolid = s.schoolid
INNER JOIN tblCounties cty ON cty.countyid = cs.countyid
INNER JOIN tblRegionUserRegionGroups rurg ON rurg.districtid = d.districtid
INNER JOIN tblGroups g ON g.groupid = rurg.groupid
WHERE ss.graduationyear IN (SELECT Items FROM FN_Split(#gradyears, ',')) AND sp.optin = 'Yes' AND g.groupname = #groupname
Where I run into trouble is trying to reconcile that with the below query. One is for showing just a count of all the particular students the other is showing pertinent information for a set of students as needed but the total needs to be the same and it is not. The below query return 333 students - the reason is because the school the student goes to is in two separate counties and it counts that student twice. I can't figure out how to fix this.
SELECT DISTINCT #TableName AS TableName, d.district AS LocationName, cty.county AS County, COUNT(DISTINCT cc.monsterid) AS CountOfStudents, d.IRN AS IRN FROM tblJobSkillAssessment AS cc
INNER JOIN tblJobSkills AS c ON c.jobskillid = cc.skillid
INNER JOIN tblStudentPersonal sp ON sp.monsterid = cc.monsterid
INNER JOIN tblStudentSchool ss ON ss.monsterid = cc.monsterid
INNER JOIN tblSchools s ON s.schoolid = ss.schoolid
INNER JOIN tblSchoolDistricts sd ON sd.schoolid = s.schoolid
INNER JOIN tblDistricts d ON d.districtid = sd.districtid
INNER JOIN tblCountySchools cs ON cs.schoolid = s.schoolid
INNER JOIN tblCounties cty ON cty.countyid = cs.countyid
INNER JOIN tblRegionUserRegionGroups rurg ON rurg.districtid = d.districtid
INNER JOIN tblGroups g ON g.groupid = rurg.groupid
WHERE ss.graduationyear IN (SELECT Items FROM FN_Split(#gradyears, ',')) AND sp.optin = 'Yes' AND g.groupname = #groupname
GROUP BY cty.county, d.IRN, d.district
ORDER BY LocationName ASC
If you just want the count, then perhaps count(distinct) will solve the problem:
select count(distinct at.someid)
I don't see what at.someid refers to, so perhaps:
select count(distinct cc.monsterid)

Sql join 1 instance

I require some help with my very shaky sql skills.
Say I have the following select statement:
SELECT DISTINCT
p.ProjectId,
p.Title,
i.Name,
p.StartDate,
p.EndDate,
ped.ProjectEthicsDocumentId,
st.Description AS StatusText
FROM
dbo.Project p
inner join dbo.WorkflowHistory w ON p.ProjectId = w.ProjectId
left join dbo.ProjectInstitution pi ON pi.ProjectId = p.ProjectId
left join dbo.Institution i ON i.InstitutionId = pi.InstitutionId
left join dbo.ProjectEthicsDocument ped on p.ProjectId = ped.ProjectId
left join dbo.Status st ON p.StatusId = st.StatusId
This will return all the projects and other relevant details from the relevant tables. Now, say I have 2 institutions for 'Project A'. This statement will return 2 rows for 'Project A', one for each institution. How do I set it so that it only returns the first row of each project it finds? I want one instance of every project with say the first institution found.
The easiest way is probably with the row_number() function:
select *
from (SELECT DISTINCT p.ProjectId, p.Title, i.Name, p.StartDate,p.EndDate,
ped.ProjectEthicsDocumentId, st.Description AS StatusText,
row_number() over (partition by p.ProjectId order by i.InstitutionId) as seqnum
FROM dbo.Project p
inner join dbo.WorkflowHistory w ON p.ProjectId = w.ProjectId
left join dbo.ProjectInstitution pi ON pi.ProjectId = p.ProjectId
left join dbo.Institution i ON i.InstitutionId = pi.InstitutionId
left join dbo.ProjectEthicsDocument ped on p.ProjectId = ped.ProjectId
left join dbo.Status st ON p.StatusId = st.StatusId
) p
where seqnum = 1;
You can move selecting institution name to a subquery. This way you it doesn't affect how other tables are joined.
SELECT DISTINCT
p.ProjectId,
p.Title,
(SELECT TOP 1 i.Name FROM dbo.Institution i
INNER JOIN dbo.ProjectInstitution pi ON i.InstitutionId = pi.InstitutionId
WHERE pi.ProjectId = p.ProjectId) AS Name,
p.StartDate,
p.EndDate,
ped.ProjectEthicsDocumentId,
st.Description AS StatusText
FROM
dbo.Project p
inner join dbo.WorkflowHistory w ON p.ProjectId = w.ProjectId
left join dbo.ProjectEthicsDocument ped on p.ProjectId = ped.ProjectId
left join dbo.Status st ON p.StatusId = st.StatusId
you could use
;with cte as
(
<your select statement> `,`
Row_number() over(partition by <column that has 2 records> order by ProjectId) as rn
)
--then do this
select * from cte where rn=1