SQL join subquery where condition - sql

How can I effectively subquery a LEFT OUTER JOIN so that only rows that meet a specific condition in the join are included?
I'd like to only count PPPD's where converted_at IS NULL. However when I add PPPD.converted_at IS NULL, then the result is more limited than I'd like it to be because it only includes patient_profiles that do have a row with null in converted_at. Instead I'd like a count of all PPPD records that have converted_at = null
SELECT P.id, P.gender, P.dob,
count(distinct recommendations.id) AS recommendation_count,
count(distinct PPPD.id) AS community_submissions,
FROM patient_profiles AS P
LEFT OUTER JOIN recommendations ON recommendations.patient_profile_id = P.id
LEFT OUTER JOIN patient_profile_potential_doctors AS PPPD ON PPPD.patient_profile_id = P.id
WHERE P.is_test = FALSE
GROUP BY P.id

You need to add the condition in the ON clause:
SELECT P.id, P.gender, P.dob,
count(distinct r.id) AS recommendation_count,
count(distinct PPPD.id) AS community_submissions,
FROM patient_profiles P LEFT OUTER JOIN
recommendations r
ON r.patient_profile_id = P.id LEFT OUTER JOIN
patient_profile_potential_doctors PPPD
ON PPPD.patient_profile_id = P.id AND PPPD.converted_at IS NULL
WHERE P.is_test = FALSE;
GROUP BY P.id

Related

How to replace exist in Hive with two correlated subqueries

I have a query that looks like this
SELECT u.id, COUNT(*)
FROM users u, posts p
WHERE u.id = p.owneruserid
AND EXISTS (SELECT COUNT(*) as num
FROM postlinks pl
WHERE pl.postid = p.id
GROUP BY pl.id
HAVING num > 1) --correlated subquery 1
AND EXISTS (SELECT *
FROM comments c
WHERE c.postid = p.id); --correlated subquery 2
GROUP BY u.id
I researched and read that in Hive IN or EXIST are not supported statements. I read that a workaround for this would be to use a LEFT JOIN. I have tried this but I am having trouble with the GROUP BY u.id. I read that this needs to be paired always with an aggregation function like COUNT() however I'm not sure how I can rewrite this query so as to get it to work. All the other examples I have seen online do not seem to be as complicated as this one.
Like you said, you can convert them to left join or may be left join since they uses exists in both subquery. Simply convert your subqueries to inline view and join them with original tables.
SELECT u.id, COUNT(*)
FROM users u
inner join posts p on u.id = p.owneruserid
left outer join (SELECT COUNT(*) as num, pl.postid postid
FROM postlinks pl
GROUP BY pl.postid
HAVING num > 1) pl ON pl.postid = p.id --correlated subquery 1 with left join
left outer join (SELECT postid FROM comments c GROUP BY postid)c ON c.postid = p.id --correlated subquery 2 with left join
WHERE ( c.postid is not null AND pl.postid is not null) -- this ensure data exists in both subquery
GROUP BY u.id
With left join, there may be chance of duplicates, you can use group by in subqry2 to avoid it.

How to add condition on the left table to include “zero” / “0” results in COUNT aggregate?

I have an SQL-select:
SELECT
p.id,
COUNT(a.id)
FROM Person p
LEFT JOIN Account a
ON a.person_id = p.id
WHERE p.id = 1
GROUP BY p.id;
and it works fine. But if I add a condition on left table this query will return no rows instead of zero count:
SELECT
p.id,
COUNT(a.id)
FROM Person p
LEFT JOIN Account a
ON a.person_id = p.id
WHERE p.id = 1 AND a.state = '0'
GROUP BY p.id;
How can add the condition on the left table that returns 0 count in case there are no results?
In a LEFT JOIN, conditions on the second table need to be in the ON clause:
SELECT p.id, COUNT(a.id)
FROM Person p LEFT JOIN
Account a
ON a.person_id = p.id AND a.state = '0'
WHERE p.id = 1
GROUP BY p.id;
The rule is pretty simple to follow. A LEFT JOIN keeps all rows in the first table, even when there is no match in the second table. The values in the second table become NULL. The NULL value will fail the condition a.state = '0'.

SQL multiple table join subquery

How can I use a sub query for just the invite table? I'd like all records from patient_profiles and for the invites join to use only records created after a specific date?
SELECT p.first_name,
COUNT(invites.invited_by_id)as invite_count
FROM patient_profiles AS p
LEFT OUTER JOIN patient_profiles AS invites ON invites.invited_by_id = p.id
WHERE p.is_test = false AND AND invites.created_at >= '2017-10-16'::date
GROUP BY p.first_name
You don't need a subquery. Just move the date condition to the ON clause:
SELECT p.first_name,
COUNT(invites.invited_by_id)as invite_count
FROM patient_profiles p LEFT OUTER JOIN
patient_profiles invite
ON invites.invited_by_id = p.id AND invites.created_at >= '2017-10-16'::date
WHERE p.is_test = false
GROUP BY p.first_name;
You can use HAVING.
SELECT p.first_name, COUNT(invites.invited_by_id)as invite_count
FROM patient_profiles p
GROUP BY p.first_name
HAVING p.is_test = false AND p.created_at >= '2017-10-16'::date

Selecting single column multiple times based on different conditions

I have written a SQL query to retrieve required data and it looks like given below:
SELECT distinct p.person_id,p.birth_date,p.gender_code,
wm_concat(distinct r.race_code) as race_code,p.hispanic_latino_code,
c.clinically_diagnosed_code,
wm_concat(distinct c.characteristic_code) as chara_codes,
p.prev_adopted_code,p.age_adopted,
FIRST_VALUE(pe.removed_date) OVER (ORDER BY pe.removed_date),
count(pe.removed_date) as removal_count,
LAST_VALUE(pe.discharge_date) OVER (ORDER BY pe.discharge_date),
LAST_VALUE(pe.removed_date) OVER (ORDER BY pe.removed_date) as latest_removal_date,pe.created_date,
pe.removal_circumstance_code,wm_concat(distinct rr.removal_reason_code) as removal_reasons,
ps.placement_type_code,ps.icpc_placement_flag,pe.caretaker_structure_code
FROM PERSON p left outer join RACE r on p.person_id = r.person_id
left outer join CHARACTERISTIC c on c.person_id = p.person_id
left outer join PLACEMENT_EPISODE pe on p.person_id = pe.child_id
left outer join PLACEMENT_SETTING ps on p.person_id = ps.child_id
left outer join REMOVAL_REASON rr on pe.placement_episode_id = rr.placement_episode_id
GROUP BY p.person_id,p.birth_date,p.gender_code,p.hispanic_latino_code,
c.clinically_diagnosed_code,p.prev_adopted_code,p.age_adopted,pe.removed_date,
pe.discharge_date,pe.removed_date,pe.created_date,pe.removal_circumstance_code,
ps.placement_type_code,ps.icpc_placement_flag,pe.caretaker_structure_code
ORDER BY p.person_id
In the above mentioned query, I have already selected birth date for a person. Now again in select clause I want to select birth_date for persons with following condition:
condition 1: p.person_id = pe.primary_caretaker_id
condition 2: p.person_id = pe.secondary_caretaker_id
Can someone tell me the way to select these fields(birth_date based on two different conditions) in the existing query?
Birth_date has been already selected once for individual person. Now I want to retrieve birth_date for primary_caretaker and secondary_caretaker.
You will need to join to the PERSON table twice more:
SELECT distinct p.person_id,p.birth_date,p.gender_code,
wm_concat(distinct r.race_code) as race_code,p.hispanic_latino_code,
c.clinically_diagnosed_code,
wm_concat(distinct c.characteristic_code) as chara_codes,
p.prev_adopted_code,p.age_adopted,
FIRST_VALUE(pe.removed_date) OVER (ORDER BY pe.removed_date),
count(pe.removed_date) as removal_count,
LAST_VALUE(pe.discharge_date) OVER (ORDER BY pe.discharge_date),
LAST_VALUE(pe.removed_date) OVER (ORDER BY pe.removed_date) as latest_removal_date,
pe.created_date,
pe.removal_circumstance_code,wm_concat(distinct rr.removal_reason_code) as removal_reasons,
ps.placement_type_code,ps.icpc_placement_flag,pe.caretaker_structure_code,
primCare.birth_date as primary_carer_birth_date,
secCare.birth_date as secondary_carer_birth_date,
FROM PERSON p left outer join RACE r on p.person_id = r.person_id
left outer join PERSON primCare on primCare.person_id = pe.primary_caretaker_id
left outer join PERSON secCare on secCare.person_id = pe.secondary_caretaker_id
left outer join CHARACTERISTIC c on c.person_id = p.person_id
left outer join PLACEMENT_EPISODE pe on p.person_id = pe.child_id
left outer join PLACEMENT_SETTING ps on p.person_id = ps.child_id
left outer join REMOVAL_REASON rr on pe.placement_episode_id = rr.placement_episode_id
GROUP BY p.person_id,p.birth_date,p.gender_code,p.hispanic_latino_code,
c.clinically_diagnosed_code,p.prev_adopted_code,p.age_adopted,pe.removed_date,
pe.discharge_date,pe.removed_date,pe.created_date,pe.removal_circumstance_code,
ps.placement_type_code,ps.icpc_placement_flag,pe.caretaker_structure_code, primCare.birth_date, secCare.birth_date
ORDER BY p.person_id

AVG of AVG, aggregate functions of subquery

This subquery produces the correct table. But now I want to get the average of the averages, and I'm getting an error "Missing FROM-clause entry for table "c"".
SELECT
c.name,
AVG(avgvalue)
FROM
(
SELECT
c.name,
p.name,
AVG(a."value") AS avgvalue
FROM answers a INNER JOIN survey_responses sr ON sr.id = a.survey_response_id AND a.question_id = 13
INNER JOIN answers category_answer ON category_answer.survey_response_id = sr.id AND category_answer.question_id = 264
INNER JOIN answers_categories ac ON category_answer.id = ac.answer_id
INNER JOIN categories c ON c.id = ac.category_id
INNER JOIN products p ON p.id = a.product_id
WHERE c.name IN ('Accounting')
GROUP BY c.name, p."name"
HAVING count(p.name)>10
) as ProductAverages
GROUP BY c.name;
You are naming the ProductAverages, so your table aliases should reference it, not c - which can be used only in the inner query:
SELECT
name, -- Here
AVG(avgvalue)
FROM
(
SELECT
c.name,
p.name,
AVG(a."value") AS avgvalue
FROM answers a INNER JOIN survey_responses sr ON sr.id = a.survey_response_id AND a.question_id = 13
INNER JOIN answers category_answer ON category_answer.survey_response_id = sr.id AND category_answer.question_id = 264
INNER JOIN answers_categories ac ON category_answer.id = ac.answer_id
INNER JOIN categories c ON c.id = ac.category_id
INNER JOIN products p ON p.id = a.product_id
WHERE c.name IN ('Accounting')
GROUP BY c.name, p."name"
HAVING count(p.name)>10
) as ProductAverages
GROUP BY name; -- and here