CTE Missing Records - sql

The first query listed below returns some logistical data associated with hires that have been made within a particular period of time. The query returns 478 records.
SELECT c.candidate_id AS candidate_id
,o.name
,j.name AS job_title
,c.applied_from
,job_id AS job_id
,cjs.score AS smart_rank_score
,cjs.is_completed AS smartrank_completion_status
,c.hired_at
FROM candidate_jobs c
LEFT JOIN organizations o ON o.id = c.organization_id
LEFT JOIN candidate_job_surveys cjs ON cjs.candidate_job_id = c.id
LEFT JOIN jobs j ON j.id = c.job_id
WHERE o.name LIKE ANY ('{"%Tutor Doctor%"}')
AND c.hired_at :: date BETWEEN '2015-01-01' AND '2016-02-22'
ORDER BY 8 DESC
However, when I attempted to add a CTE (see below) that displays each hire's final "post hire check in score", the query only returns 236 records. Ideally, I'd like the query to either return a score or null value for each of the initial 478 hire records.
WITH final_post_hire_score (candidate_id, final_score) AS
(SELECT c.candidate_id
,p.score
FROM post_hire_followup_reviews p
LEFT JOIN candidate_jobs c ON c.id = p.candidate_job_id
WHERE p.check_in_number = 3)
SELECT c.candidate_id AS candidate_id
,o.name
,j.name AS job_title
,c.applied_from
,job_id AS job_id
,cjs.score AS smart_rank_score
,cjs.is_completed AS smartrank_completion_status
,c.hired_at
,final_score
FROM final_post_hire_score f
LEFT JOIN candidate_jobs c ON c.candidate_id = f.candidate_id
LEFT JOIN organizations o ON o.id = c.organization_id
LEFT JOIN candidate_job_surveys cjs ON cjs.candidate_job_id = c.id
LEFT JOIN jobs j ON j.id = c.job_id
WHERE o.name LIKE ANY ('{"%Tutor Doctor%"}')
AND c.hired_at :: date BETWEEN '2015-01-01' AND '2016-02-22'
ORDER BY 8 DESC

Missing records are due to the filter's, Move the filter's to ON condition else your LEFT OUTER JOIN will be implicitly converted to INNER JOIN
When you are using LEFT OUTER JOIN right table filter's should be present in ON condition else the NULL values for non matching records will get filtered
WITH final_post_hire_score (candidate_id, final_score)
AS (SELECT c.candidate_id,
p.score
FROM post_hire_followup_reviews p
LEFT JOIN candidate_jobs c
ON c.id = p.candidate_job_id
WHERE p.check_in_number = 3)
SELECT c.candidate_id AS candidate_id,
o.NAME,
j.NAME AS job_title,
c.applied_from,
job_id AS job_id,
cjs.score AS smart_rank_score,
cjs.is_completed AS smartrank_completion_status,
c.hired_at,
final_score
FROM final_post_hire_score f
LEFT JOIN candidate_jobs c
ON c.candidate_id = f.candidate_id
AND c.hired_at :: date BETWEEN '2015-01-01' AND '2016-02-22'
LEFT JOIN organizations o
ON o.id = c.organization_id
AND o.NAME LIKE ANY ( '{"%Tutor Doctor%"}' )
LEFT JOIN candidate_job_surveys cjs
ON cjs.candidate_job_id = c.id
LEFT JOIN jobs j
ON j.id = c.job_id
ORDER BY 8 DESC

I think there's an extra
WHERE p.check_in_number = 3
that isn't anywhere else.

Related

Oracle SQL How to Count Column Value Occurences and Group BY during joins

I'm working on another SQL query, trying to group a collection of records while doing a count and joining tables. See below for goal, current query, and attached scripts for building and populating tables.
Show all customers who have checked more books than DVDs. Display
customer name, total book checkouts and total DVD checkouts. Sort
results by customer first name and last name.
SELECT C.CUSTOMER_FIRSTNAME, C.CUSTOMER_LASTNAME, COUNT(T.TRANSACTION_ID)
FROM customer C
INNER JOIN library_card LC ON C.CUSTOMER_ID = LC.CUSTOMER_ID
INNER JOIN transaction T ON LC.LIBRARY_CARD_ID = T.LIBRARY_CARD_ID
INNER JOIN physical_item P ON T.PHYSICAL_ITEM_ID = P.PHYSICAL_ITEM_ID
INNER JOIN catalog_item CT ON P.CATALOG_ITEM_ID = CT.CATALOG_ITEM_ID
GROUP BY C.CUSTOMER_FIRSTNAME, C.CUSTOMER_LASTNAME
ORDER BY C.CUSTOMER_FIRSTNAME, C.CUSTOMER_LASTNAME;
Run first: https://drive.google.com/open?id=1PYAZV4KIfZtxP4eQn35zsczySsxDM7ls
Run second: https://drive.google.com/open?id=1pAzWmJqvD3o3n6YJqVUM6TtxDafKGd3f
EDIT
With some help from Mr. Barbaros I've come up with the below query, which is closer. However, this query isn't returning any results for DVDs, which leads me to believe it's a join issue.
SELECT C.CUSTOMER_FIRSTNAME, C.CUSTOMER_LASTNAME, COUNT(CT1.TYPE) AS BOOK_COUNT, COUNT(CT2.TYPE) AS DVD_COUNT
FROM customer C
INNER JOIN library_card LC ON C.CUSTOMER_ID = LC.CUSTOMER_ID
INNER JOIN transaction T ON LC.LIBRARY_CARD_ID = T.LIBRARY_CARD_ID
INNER JOIN physical_item P ON T.PHYSICAL_ITEM_ID = P.PHYSICAL_ITEM_ID
INNER JOIN catalog_item CT1 ON P.CATALOG_ITEM_ID = CT1.CATALOG_ITEM_ID AND CT1.TYPE = 'BOOK'
LEFT OUTER JOIN catalog_item CT2 ON P.CATALOG_ITEM_ID = CT2.CATALOG_ITEM_ID AND CT2.TYPE = 'DVD'
GROUP BY C.CUSTOMER_FIRSTNAME, C.CUSTOMER_LASTNAME, CT1.TYPE, CT2.TYPE
ORDER BY C.CUSTOMER_FIRSTNAME, C.CUSTOMER_LASTNAME;
Use "conditional aggregates" (use a case expression inside the aggregate function)
SELECT
C.CUSTOMER_FIRSTNAME
, C.CUSTOMER_LASTNAME
, COUNT( CASE WHEN CT.TYPE = 'BOOK' THEN T.TRANSACTION_ID END ) books
, COUNT( CASE WHEN CT.TYPE = 'DVD' THEN T.TRANSACTION_ID END ) dvds
FROM customer C
INNER JOIN library_card LC ON C.CUSTOMER_ID = LC.CUSTOMER_ID
INNER JOIN transaction T ON LC.LIBRARY_CARD_ID = T.LIBRARY_CARD_ID
INNER JOIN physical_item P ON T.PHYSICAL_ITEM_ID = P.PHYSICAL_ITEM_ID
INNER JOIN catalog_item CT ON P.CATALOG_ITEM_ID = CT.CATALOG_ITEM_ID
GROUP BY
C.CUSTOMER_FIRSTNAME
, C.CUSTOMER_LASTNAME
HAVING
COUNT( CASE WHEN CT.TYPE = 'BOOK' THEN T.TRANSACTION_ID END )
> COUNT( CASE WHEN CT.TYPE = 'DVD' THEN T.TRANSACTION_ID END )
ORDER BY
C.CUSTOMER_FIRSTNAME
, C.CUSTOMER_LASTNAME
;
You can use catalog_item table twice( think of as seperate tables for books and dvds ), and compare by HAVING clause as :
SELECT C.CUSTOMER_FIRSTNAME, C.CUSTOMER_LASTNAME,
COUNT(CT1.CATALOG_ITEM_ID) as "Book Checkout",
COUNT(CT2.CATALOG_ITEM_ID) as "DVD Checkout"
FROM customer C
INNER JOIN library_card LC ON C.CUSTOMER_ID = LC.CUSTOMER_ID
INNER JOIN transaction T ON LC.LIBRARY_CARD_ID = T.LIBRARY_CARD_ID
INNER JOIN physical_item P ON T.PHYSICAL_ITEM_ID = P.PHYSICAL_ITEM_ID
LEFT JOIN catalog_item CT1 ON P.CATALOG_ITEM_ID = CT1.CATALOG_ITEM_ID AND CT1.TYPE = 'BOOK'
LEFT JOIN catalog_item CT2 ON P.CATALOG_ITEM_ID = CT2.CATALOG_ITEM_ID AND CT1.TYPE = 'DVD'
GROUP BY C.CUSTOMER_FIRSTNAME, C.CUSTOMER_LASTNAME
HAVING COUNT(CT1.CATALOG_ITEM_ID) > COUNT(CT2.CATALOG_ITEM_ID)
ORDER BY C.CUSTOMER_FIRSTNAME, C.CUSTOMER_LASTNAME;
CUSTOMER_FIRSTNAME CUSTOMER_LASTNAME Book Checkout DVD Checkout
------------------ ----------------- ------------- -------------
Deena Pilgrim 3 1
Emile Cross 5 2
Please try to remove ,CT1.TYPE, CT2.TYPE on your group by clause.

SQL - Multiple join left with sum doesn't give expected result

Here is my request
SELECT j.* ,
c.name as client_name ,
s.name as supplier_name,
s.ID as supplier_id ,
mt.* ,
SUM(pb.require_followup) as nb_followup,
SUM(ws.worked_time) as hours_on_job,
SUM(iv.total) as total_price,
SUM(iv.hour_expected) as hours_planned,
j.ID as ID
FROM $wpdb->posts j
LEFT JOIN ".Job::$META_TABLE." mt ON mt.post_id = j.ID
LEFT JOIN ".Job::$LINK_TABLE_JOB_CONTACT." l1 ON l1.job_id = j.ID
LEFT JOIN ".Contact::$TABLE_NAME." c ON c.ID = l1.contact_id
LEFT JOIN ".Supplier::$TABLE_NAME." s ON s.ID = c.supplier_id
LEFT JOIN ".Problem::$TABLE_NAME." pb ON pb.job_id = j.ID
LEFT JOIN ".Worksheet::$TABLE_NAME." ws ON ws.job_id = j.ID
LEFT JOIN ".Invoice::$TABLE_NAME." iv ON iv.job_id = j.ID
WHERE j.post_status = 'publish'
AND j.post_type = 'job'
".implode(' ',$where_condition)."
GROUP BY j.ID
ORDER BY j.post_date DESC
the Problem is that result for SUM is wrong when I LEFT JOIN other table.
The row 53 for example give 105 for nb_followup instead of 1
Where this request return the right result simply by removing the last 2 LEFT JOIN : LEFT JOIN ".Worksheet::$TABLE_NAME." ws ON ws.job_id = j.ID and
LEFT JOIN ".Invoice::$TABLE_NAME." iv ON iv.job_id = j.ID
SELECT j.* ,
c.name as client_name ,
s.name as supplier_name,
s.ID as supplier_id ,
mt.* ,
SUM(pb.require_followup) as nb_followup,
j.ID as ID
FROM $wpdb->posts j
LEFT JOIN ".Job::$META_TABLE." mt ON mt.post_id = j.ID
LEFT JOIN ".Job::$LINK_TABLE_JOB_CONTACT." l1 ON l1.job_id = j.ID
LEFT JOIN ".Contact::$TABLE_NAME." c ON c.ID = l1.contact_id
LEFT JOIN ".Supplier::$TABLE_NAME." s ON s.ID = c.supplier_id
LEFT JOIN ".Problem::$TABLE_NAME." pb ON pb.job_id = j.ID
WHERE j.post_status = 'publish'
AND j.post_type = 'job'
".implode(' ',$where_condition)."
GROUP BY j.ID
ORDER BY j.post_date DESC
Also removing only LEFT JOIN ".Invoice::$TABLE_NAME." iv ON iv.job_id = j.ID will give 15 as result for the row 53
To resume
Full request give 105 -> wrong should be 1
removing the last join give 15 -> wrong should be 1
removing the last 2 join give 1 -> Correct
You need to calculate the SUM()s BEFORE you join, otherwise the rows multiply because of the joins and this in turn leads to errors in summation. e.g.
SELECT
j.ID as ID
, pb.nb_followup
FROM $wpdb->posts j
LEFT JOIN (select pb.job_id, SUM(pb.require_followup) as nb_followup from ".Problem::$TABLE_NAME." pb GROUP BY pb.job_id) pb ON pb.job_id = j.ID
The other problem you are facing is that MySQL permits "lazy syntax" for group by. Don't use this lazy syntax or you will get unexpected error/bugs. It is very simple to avoid, REPEAT every column of the select clause in the group by clause UNLESS the column is using an aggregate function such as SUM(), COUNT(), MIN(), MAX() and so on.e.g.
select a.col1, b.col2, c.col3 , sum(d.col4)
from a
inner join b on a.id = b.aid
inner join c on b.id = c.bid
inner join d on c.id = d.cid
group by a.col1, b.col2, c.col3

Optimizing a T-SQL query with COUNT in the SELECT and HAVING statements

I am not sure if this is to do with the fact that I am dealing with very large tables (some have 900+ million rows) but I am having trouble optimizing my query. I have also checked and used indexed fields wherever possible , with most of the fields being used on my query actually being indexed.Using a select top 100 statement takes roughly 10 minutes and i would like to get all of the results back, much more faster. How would I go about for optimizing this query and future queries like it? For security purposes I had to use alternative aliases below:
SELECT TOP 100
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock, COUNT (*) as Numbers
FROM
j WITH(NOLOCK)
INNER JOIN
jp WITH(NOLOCK) ON j.ID = jp.ID
INNER JOIN
jd WITH(NOLOCK) ON (jd.ID = jp.ID And jd.path = 3)
INNER JOIN
fa WITH(NOLOCK) ON fa.ID = j.ID
INNER JOIN
l WITH(NOLOCK) ON j.ID = l.ID AND l.CoID = 3
INNER JOIN
c WITH(NOLOCK) ON c.CID = fa.CID
INNER JOIN
x WITH(NOLOCK) ON c.CID = x.CID
WHERE
j.ExpiryDate > GETDATE()
GROUP BY
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock
HAVING
COUNT(*) <= 10
Try this
SELECT TOP 100
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock, COUNT (*) as Numbers
FROM
j WITH(NOLOCK)
INNER JOIN
jp WITH(NOLOCK) ON j.ID = jp.ID
and j.ExpiryDate > GETDATE()
INNER JOIN
jd WITH(NOLOCK) ON (jd.ID = jp.ID And jd.path = 3)
INNER JOIN
fa WITH(NOLOCK) ON fa.ID = j.ID
INNER JOIN
l WITH(NOLOCK) ON j.ID = l.ID AND l.CoID = 3
INNER JOIN
c WITH(NOLOCK) ON c.CID = fa.CID
INNER JOIN
x WITH(NOLOCK) ON c.CID = x.CID
GROUP BY
x.ID, j.ID, j.FirstDate, j.ExpiryDate, x.Lock
HAVING
COUNT(*) <= 10
Sometimes it helps to reduce the data set ina derived table and then apply the function only on the data that meets the where condition. Without seeing axecution plans for both, I don;t know itf this will work, but iti si worth a tr.
SELECT a.XID, a.JID, a.FirstDate, a.ExpiryDate, a.Lock, COUNT (*) as Numbers
FROM (
SELECT
x.ID as XID, j.ID as JID, j.FirstDate, j.ExpiryDate, x.Lock
FROM
j
INNER JOIN
jp ON j.ID = jp.ID
INNER JOIN
jd ON (jd.ID = jp.ID And jd.path = 3)
INNER JOIN
fa ON fa.ID = j.ID
INNER JOIN
l ON j.ID = l.ID AND l.CoID = 3
INNER JOIN
c ON c.CID = fa.CID
INNER JOIN
x ON c.CID = x.CID
WHERE
j.ExpiryDate > GETDATE()) a
GROUP BY
a.XID, a.JID, a.FirstDate, a.ExpiryDate, a.Lock
HAVING
COUNT(*) <= 10

Where clause in left join using correlated query?

Here's the query:
SELECT c.Name As CompanyName, j.ID as JobID, j.Title as JobTitle,
ja.ApplicationDate, DATEDIFF(MONTH,ja.ApplicationDate, GETDATE()) AS MonthsAgo,
jsc.Name As Recruiter, js.Name As RecruitingAgency, jsh.Name As LastStatus
FROM Companies c
JOIN Jobs j
ON c.ID = j.CompanyID
JOIN JobApplications ja
ON j.ID = ja.JobID
LEFT JOIN JobContact jsc
ON jsc.ID = j.JobSourceContactID
LEFT JOIN JobContactCompany js
ON jsc.JobSourceCompanyID = js.ID
LEFT JOIN (
SELECT TOP 1 jh.JobID, jh.StatusDate, jt.Name
FROM JobStatusHistory jh
JOIN JobStatusTypes jt
ON jh.JobStatusTypeID = jt.ID
--WHERE jh.JobID = j.ID
ORDER BY jh.StatusDate DESC
) jsh
ON j.ID = jsh.JobID
ORDER BY ja.ApplicationDate
I'm trying to get the most recent job status for a particular job. I can't figure out how to do the where clause (the commented WHERE) in the LEFT JOIN. I've done this in the past, but can't remember how I did this in the past.
I will be grateful for any pointers.
You need to use OUTER APPLY. A CROSS Apply is like an INNER JOIN where the applied table must return results, whereas an OUTER Apply is like a [LEFT] OUTER JOIN where the applied subquery may return no results.
SELECT c.Name As CompanyName, j.ID as JobID, j.Title as JobTitle,
ja.ApplicationDate, DATEDIFF(MONTH,ja.ApplicationDate, GETDATE()) AS MonthsAgo,
jsc.Name As Recruiter, js.Name As RecruitingAgency, jsh.Name As LastStatus
FROM Companies c
JOIN Jobs j ON c.ID = j.CompanyID
JOIN JobApplications ja ON j.ID = ja.JobID
LEFT JOIN JobContact jsc ON jsc.ID = j.JobSourceContactID
LEFT JOIN JobContactCompany js ON jsc.JobSourceCompanyID = js.ID
OUTER APPLY (
SELECT TOP 1 jh.JobID, jh.StatusDate, jt.Name
FROM JobStatusHistory jh
JOIN JobStatusTypes jt
ON jh.JobStatusTypeID = jt.ID
WHERE jh.JobID = j.ID
ORDER BY jh.StatusDate DESC
) jsh
ORDER BY ja.ApplicationDate

SQL use nested select in middle of inner join

Is it possible to use a select in the middle of joining...
I am trying to do the following:
FROM
tblorders o
INNER JOIN tblunits u on o.id = u.orderid
INNER JOIN ((SELECT
,Min(n.date) as [MinDate]
from tblNotes n
Where n.test = 'test') te
INNER JOIN tblnotes n on te.id = n.id
and te.[MinDate] = n.AuditinsertTimestamp)
INNER Join tblClient c ON o.ClientId = c.Id
Basically in the select in the middle of the query it is selecting only the notes with min date. The problem is I need to do this here because I need from tblOrders to be the first table.......
Suggestions?
The INNER JOIN failed because you have a leading comma here:
,Min(n.date) as [MinDate]
I think you are looking for something like this:
SELECT ...
FROM tblorders o
INNER JOIN tblunits u on o.id = u.orderid
INNER JOIN (
SELECT id, Min(date) as [MinDate]
from tblNotes
Where test = 'test'
group by id
) te <-- not sure what JOIN clause to use here, please post schema
INNER JOIN tblnotes n on te.id = n.id
and te.[MinDate] = n.AuditinsertTimestamp
INNER Join tblClient c ON o.ClientId = c.Id
You are missing an alias and join condition:
FROM
tblorders o
INNER JOIN tblunits u on o.id = u.orderid
INNER JOIN ((SELECT Min(n.date) as [MinDate]
from tblNotes n
Where n.test = 'test') te
INNER JOIN tblnotes n on te.id = n.id
and te.[MinDate] = n.AuditinsertTimestamp)
-- missing
AS z
ON <join conditions haere>
INNER Join tblClient c ON o.ClientId = c.Id
Yes, you can have a Select in a Join.