SQL LEFT JOIN combined with regular joins

SQL LEFT JOIN combined with regular joins - sql

I have the following query that joins a bunch of tables.
I'd like to get every record from the INDUSTRY table that has consolidated_industry_id = 1 regardless of whether or not it matches the other tables. I believe this needs to be done with a LEFT JOIN?
SELECT attr.industry_id AS option_id,
attr.industry AS option_name,
uj.ft_job_industry_id,
Avg(CASE
WHEN s.salary > 0 THEN s.salary
END) AS average,
Count(CASE
WHEN s.salary > 0 THEN attr.industry
END) AS count_non_zero,
Count(attr.industry_id) AS count_total
FROM industry attr,
user_job_ft_job uj,
salary_ft_job s,
user_job_ft_job ut,
[user] u,
user_education_mba_school mba
WHERE u.user_id = uj.user_id
AND u.user_id = ut.user_id
AND u.user_id = mba.user_id
AND uj.ft_job_industry_id = attr.industry_id
AND uj.user_job_ft_job_id = s.user_job_id
AND u.include_in_student_site_results = 1
AND u.site_instance_id IN ( 1 )
AND uj.job_type_id = 1
AND attr.consolidated_industry_id = 1
AND mba.mba_graduation_year_id NOT IN ( 8, 9 )
AND uj.admin_approved = 1
GROUP BY attr.industry_id,
attr.industry,
uj.ft_job_industry_id
This returns only one row, but there are 8 matches in the industry table where consolidated_industry_id = 1.
--- EDIT: The real question here is, how do I combine the LEFT JOIN with the regular joins?

Use left join for tables that may miss a corresponding record. Put the conditions for each table in the on clause of the join, not in the where, as that would in effect make them inner joins anyway. Something like:
select
attr.industry_id AS option_id, attr.industry AS option_name,
uj.ft_job_industry_id, AVG(CASE WHEN s.salary > 0 THEN s.salary END) AS average,
COUNT(CASE WHEN s.salary > 0 THEN attr.industry END) as count_non_zero,
COUNT(attr.industry_id) as count_total
from
industry attr
left join user_job_ft_job uj on uj.ft_job_industry_id = attr.industry_id and uj.job_type_id = 1 and uj.admin_approved = 1
left join salary_ft_job s on uj.user_job_ft_job_id = s.user_job_id
left join [user] u on u.user_id = uj.user_id and u.include_in_student_site_results = 1 and u.site_instance_id IN (1)
left join user_job_ft_job ut on u.user_id = ut.user_id
left join user_education_mba_school mba on u.user_id = mba.user_id and mba.mba_graduation_year_id not in (8, 9)
where
attr.consolidated_industry_id = 1
group by
attr.industry_id, attr.industry, uj.ft_job_industry_id
If you have any tables that you know always have a corresponding record, just use innser join for that.

Related

SQL code in Access says error in FROM clause

SELECT
p.Name,
p.Age,
MAX(COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID)))
FROM Players AS p
INNER JOIN Teams AS t
ON t.ID = p.Team_ID
INNER JOIN Matches AS m
ON m.Team_ID = t.ID
GROUP BY
p.Name,
p.Age;

I can suggest the following query:
SELECT TOP 1
p.Name,
p.Age,
COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID))
FROM (Players AS p
INNER JOIN Teams AS t
ON t.ID = p.Team_ID)
INNER JOIN Matches AS m
ON m.Team_ID = t.ID
GROUP BY
p.Name,
p.Age
ORDER BY
COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID)) DESC;
This fixes the syntax problem with your joins. It also interprets the MAX as meaning that you want the record with the maximum ratio of counts. In this case, we can use TOP 1 along with ORDER BY to identify this max record.

MS Access requires strange parentheses when you have more than one join. In addition, MAX(COUNT(m.winTeam_ID)) doesn't make sense. I don't know what you are trying to calculate in the SELECT. Perhaps this does what you want:
SELECT p.Name, p.Age,
COUNT(m.winTeam_ID) / (COUNT(m.winTeam_ID) + COUNT(m.lossTeam_ID)))
FROM (Players AS p INNER JOIN
Teams AS t
ON t.ID = p.Team_ID
) INNER JOIN
Matches AS m
ON m.Team_ID = t.ID
GROUP BY p.Name, p.Age;

I think your Matches table shouldn't have a Team_ID And instead you have winTeam_ID and lossTeam_ID!
And also you want to query players of a team with - something like - the best win-rate.
If so, Use a query like this - tested on SQL Server only -:
select
p.Age, p.Name, ts.rate
from
Players p
join
(select top(1) -- sub-query will return just first record
t.ID
, sum(case when (t.ID = winTeam_ID) then 1 else 0 end) as wins
, sum(case when (t.ID = lossTeam_ID) then 1 else 0 end) as losses
, sum(case when (t.ID = winTeam_ID) then 1.0 else 0.0 end) /
(sum(case when (t.ID = winTeam_ID) then 1.0 else 0.0 end) + sum(case when (t.ID = lossTeam_ID) then 1.0 else 0.0 end)) as rate
from Teams as t
left join Matches as m
on t.ID = m.winTeam_ID
or t.ID = lossTeam_ID
group by t.ID
order by rate desc -- This will make max rate as first
) as ts -- Team stats calculated in this sub-query
on p.Team_ID = ts.ID;

SQL correct query or not

given these relationships, how could you query the following:
The tourists (name and email) that booked at least a pension whose rating is greater than 9, but didn't book any 3 star hotel with a rating less than 9.
Is the following correct?
SELECT Tourists.name, Tourists.email
FROM Tourists
WHERE EXISTS (
SELECT id FROM Bookings
INNER JOIN Tourists ON Bookings.touristId=Tourists.id
INNER JOIN AccomodationEstablishments ON Bookings.accEstId=AccomodationEstablishments.id
INNER JOIN AccomodationTypes ON AccomodationEstablishments.accType=AccomodationTypes.id
WHERE AccomodationTypes.name = 'Pension' AND
AccomodationEstablishments.rating > 9
) AND NOT EXISTS (
SELECT id FROM Bookings
INNER JOIN Tourists ON Bookings.touristId=Tourists.id
INNER JOIN AccomodationEstablishments ON Bookings.accEstId=AccomodationEstablishments.id
INNER JOIN AccomodationTypes ON AccomodationEstablishments.accType=AccomodationTypes.id
WHERE AccomodationTypes.name = 'Hotel' AND
AccomodationEstablishments.noOfStars = 3 AND
AccomodationEstablishments.rating < 9
)

I would do this using aggregation and having:
SELECT t.name, t.email
FROM Bookings b INNER JOIN
Tourists t
ON b.touristId = t.id INNER JOIN
AccomodationEstablishments ae
ON b.accEstId = ae.id INNER JOIN
AccomodationTypes a
ON ae.accType = a.id
GROUP BY t.name, t.email
HAVING SUM(CASE WHEN a.name = 'Pension' AND ae.rating > 9 THEN 1 ELSE 0 END) > 0 AND
SUM(a.name = 'Hotel' AND ae.noOfStars = 3 AND ae.rating < 9 THEN 1 ELSE 0 END)= 0;
Your method also works, but you probably need t.id in the subqueries.

COUNT Multiple tables with relationships and users

I've got a group of tables in SQL Server that I need to count on with specific counting criteria for each table, the problem I'm having that I need to group this by users, which are contained in one table and are mapped to the tables I need to count on via a relationship table.
This is the relationship table
Relationship User Work Item
Analyst 1 IR1
Analyst 2 IR2
Analyst 2 IR3
Analyst 1 IR4
User 3 IR1
Analyst 1 SR2
Analyst 1 SR3
Analyst 2 SR4
This is the IR table (the SR table is identical)
ID Status
IR1 Active
IR2 Active
IR3 Closed
IR4 Active
This is the user table
User Name
1 Dave
2 Jim
3 Karl
What I need is a table like below counting only the active items
Name IR Count SR Count
Dave 2 2
Jim 1 1
All I seem to be able to do currently is count all of the users regardless of status, I think this may be due to the left joins. I basically had:
Select u.name,
count (ir),
count (sr) from user u
Inner join relationship r on r.user=u.user and r.relationship = 'Analyst'
Left Join IR on r.workitem=ir.id and ir.Status = 'Active'
Left Join SR on r.workitem=sr.id and sr.Status = 'Active'
Group by u.name
I have simplified the above as much as possible. This is the actual query:
SELECT
u.DisplayName as Analyst,
u.BaseManagedEntityId as AUsername,
COUNT(distinct i.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) AS 'Active Incidents',
COUNT(distinct sr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Service Requests',
COUNT(distinct cr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Change Requests',
COUNT(distinct ma.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Manual Activities'
FROM MTV_System$Domain$User u
INNER JOIN RelationshipView r ON r.TargetEntityId = u.BaseManagedEntityId AND r.RelationshipTypeId = '15E577A3-6BF9-6713-4EAC-BA5A5B7C4722' AND r.IsDeleted ='0'
LEFT JOIN MTV_System$WorkItem$Incident i ON r.SourceEntityId = i.BaseManagedEntityId AND (i.Status_785407A9_729D_3A74_A383_575DB0CD50ED != '2B8830B6-59F0-F574-9C2A-F4B4682F1681' AND i.Status_785407A9_729D_3A74_A383_575DB0CD50ED != 'BD0AE7C4-3315-2EB3-7933-82DFC482DBAF')
LEFT JOIN MTV_System$WorkItem$ServiceRequest sr ON r.SourceEntityId = SR.BaseManagedEntityId AND (sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '72B55E17-1C7D-B34C-53AE-F61F8732E425' OR sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '59393F48-D85F-FA6D-2EBE-DCFF395D7ED1' OR sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '05306BF5-A6B9-B5AD-326B-BA4E9724BF37')
LEFT JOIN MTV_System$WorkItem$ChangeRequest cr on r.SourceEntityId = cr.BaseManagedEntityId AND (cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 = '6D6C64DD-07AC-AAF5-F812-6A7CCEB5154D' or cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 = 'DD6B0870-BCEA-1520-993D-9F1337E39D4D')
LEFT JOIN MTV_System$WorkItem$Activity$ManualActivity MA on r.SourceEntityId = ma.BaseManagedEntityId AND (ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 = '11FC3CEF-15E5-BCA4-DEE0-9C1155EC8D83' OR ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 = 'D544258F-24DA-1CF3-C230-B057AAA66BED')
GROUP BY u.DisplayName,u.BaseManagedEntityId
Order by u.DisplayName

your over simplification seems to have lost your what appears to be your issue from some of your comments. Using left joins would include any of the users even if they don't have a count of one of your other tables. However if you want your result set to only include usres that have at least 1 Incident and/or 1 Request and/or 1 Change Request. Etc. you can either filter the aggretation you are doing after the fact to remove when Incidents + Requests + ... = 0. Or you can filter them out by adding a WHERE statement that says WHEN NOT all of those other tables are null which is the same as OR IS NOT NULL...
SELECT
u.DisplayName as Analyst,
u.BaseManagedEntityId as AUsername,
COUNT(distinct i.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) AS 'Active Incidents',
COUNT(distinct sr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Service Requests',
COUNT(distinct cr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Change Requests',
COUNT(distinct ma.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Manual Activities'
FROM
MTV_System$Domain$User u
INNER JOIN RelationshipView r
ON r.TargetEntityId = u.BaseManagedEntityId
AND r.RelationshipTypeId = '15E577A3-6BF9-6713-4EAC-BA5A5B7C4722'
AND r.IsDeleted ='0'
LEFT JOIN MTV_System$WorkItem$Incident i
ON r.SourceEntityId = i.BaseManagedEntityId
AND i.Status_785407A9_729D_3A74_A383_575DB0CD50ED NOT IN ('2B8830B6-59F0-F574-9C2A-F4B4682F1681','BD0AE7C4-3315-2EB3-7933-82DFC482DBAF')
LEFT JOIN MTV_System$WorkItem$ServiceRequest sr
ON r.SourceEntityId = SR.BaseManagedEntityId
AND sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F IN ('72B55E17-1C7D-B34C-53AE-F61F8732E425','59393F48-D85F-FA6D-2EBE-DCFF395D7ED1','05306BF5-A6B9-B5AD-326B-BA4E9724BF37')
LEFT JOIN MTV_System$WorkItem$ChangeRequest cr
ON r.SourceEntityId = cr.BaseManagedEntityId
AND cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 IN ('6D6C64DD-07AC-AAF5-F812-6A7CCEB5154D','DD6B0870-BCEA-1520-993D-9F1337E39D4D')
LEFT JOIN MTV_System$WorkItem$Activity$ManualActivity MA
ON r.SourceEntityId = ma.BaseManagedEntityId
AND ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 IN ('11FC3CEF-15E5-BCA4-DEE0-9C1155EC8D83','D544258F-24DA-1CF3-C230-B057AAA66BED')
WHERE
i.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C IS NOT NULL
OR sr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C IS NOT NULL
OR cr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C IS NOT NULL
OR ma.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C IS NOT NULL
GROUP BY u.DisplayName,u.BaseManagedEntityId
Order by u.DisplayName
Also note the user of IN and NOT IN in the join conditions instead of OR all of the time.

The issue is count(). Instead, you want count(distinct):
Select u.name,
count(distinct ir),
count(distinct sr)
from user u
. . .
This generates a Cartesian product between the ir and sr values. If you have largish numbers in each group, then aggregating before the join is a better approach.

;WITH T AS
(
SELECT
U.UserID,
U.Name,
CASE WHEN R.WorkItem LIKE 'IR%' THEN R.WorkItem ELSE '' END AS IRCode,
CASE WHEN R.WorkItem LIKE 'SR%' THEN R.WorkItem ELSE '' END AS SRCode
FROM #tblUser U
INNER JOIN #tblRelationship R ON U.UserId=R.UserId
)
SELECT
Name,
SUM(CASE IRCode WHEN '' THEN 0 ELSE 1 END) AS 'IR Count',
SUM(CASE SRCode WHEN '' THEN 0 ELSE 1 END) AS 'SR Count'
FROM T
WHERE
(
(T.IRCode=''
OR
T.SRCode='')
AND
(
T.IRCode IN (SELECT ID FROM #tblIR WHERE Status='Active')
OR
T.SRCode IN (SELECT ID FROM #tblSR WHERE Status='Active')
)
)
GROUP BY T.Name

Try with the below script .
SELECT u.name
,SUM(t1.[IR Count]) [IR Count]
,SUM(t2.[SR Count]) [SR Count]
FROM #user u
INNER JOIN #relationship r on r.[user]=u.[user]and r.relationship = 'Analyst'
CROSS APPLY (SELECT COUNT(DISTINCT ID) [IR Count]
FROM #IR ir WHERE r.workitem=ir.id and ir.Status = 'Active') t1
CROSS APPLY (SELECT COUNT(DISTINCT ID) [SR Count]
FROM #SR sr WHERE r.workitem=sr.id and sr.Status = 'Active')t2
GROUP BY u.name
OUTPUT :
SR Count in your sample output is wrong ,if the SR table is same as that of IR table (Status of the SR3 is closed for the user 'dave' so that will be ignored.) .

This is what I've come up with that seems to work based off the actual table
SELECT
u.DisplayName as Analyst,
u.BaseManagedEntityId as AUsername,
COUNT(distinct i.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) AS 'Active Incidents',
COUNT(distinct sr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Service Requests',
COUNT(distinct cr.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Change Requests',
COUNT(distinct ma.Id_9A505725_E2F2_447F_271B_9B9F4F0D190C) as 'Active Manual Activities'
FROM MTV_System$Domain$User u
LEFT JOIN RelationshipView r ON r.TargetEntityId = u.BaseManagedEntityId AND r.RelationshipTypeId = '15E577A3-6BF9-6713-4EAC-BA5A5B7C4722' AND r.IsDeleted ='0'
LEFT JOIN MTV_System$WorkItem$Incident i ON r.SourceEntityId = i.BaseManagedEntityId
LEFT JOIN MTV_System$WorkItem$ServiceRequest sr ON r.SourceEntityId = SR.BaseManagedEntityId
LEFT JOIN MTV_System$WorkItem$ChangeRequest cr on r.SourceEntityId = cr.BaseManagedEntityId
LEFT JOIN MTV_System$WorkItem$Activity$ManualActivity MA on r.SourceEntityId = ma.BaseManagedEntityId
Where (i.Status_785407A9_729D_3A74_A383_575DB0CD50ED != '2B8830B6-59F0-F574-9C2A-F4B4682F1681' AND i.Status_785407A9_729D_3A74_A383_575DB0CD50ED != 'BD0AE7C4-3315-2EB3-7933-82DFC482DBAF')
OR (sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '72B55E17-1C7D-B34C-53AE-F61F8732E425' OR sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '59393F48-D85F-FA6D-2EBE-DCFF395D7ED1' OR sr.Status_6DBB4A46_48F2_4D89_CBF6_215182E99E0F = '05306BF5-A6B9-B5AD-326B-BA4E9724BF37')
OR (cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 = '6D6C64DD-07AC-AAF5-F812-6A7CCEB5154D' or cr.Status_72C1BC70_443C_C96F_A624_A94F1C857138 = 'DD6B0870-BCEA-1520-993D-9F1337E39D4D')
OR (ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 = '11FC3CEF-15E5-BCA4-DEE0-9C1155EC8D83' OR ma.Status_8895EC8D_2CBF_0D9D_E8EC_524DEFA00014 = 'D544258F-24DA-1CF3-C230-B057AAA66BED')
GROUP BY u.DisplayName,u.BaseManagedEntityId
Order by u.DisplayName

How to optimize SQL Server query

I am copying data from one table to another table. While copying I am doing some calculation to modify one column.
SQL Server query:
INSERT INTO rat_proj_duration_map_2
SELECT
r.*,
r.hour_val / (CASE
WHEN week_val = 1 AND
(SELECT TOP 1
hrswk
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE calwk = 2
AND r.uid = u.uid
AND yr = 2016)
> 0 THEN (SELECT TOP 1
hrswk
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE calwk = 2
AND r.uid = u.uid
AND yr = 2016)
WHEN (SELECT
hrswk
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.week_val = us.calwk
AND r.uid = u.uid
AND yr = 2016)
< 1 AND
(SELECT
MAX(hrswk)
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.uid = u.uid
AND yr = 2016)
> 0 THEN (SELECT
MAX(hrswk)
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.uid = u.uid
AND yr = 2016)
WHEN (SELECT
COUNT(*)
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.uid = u.uid
AND yr = 2016)
<= 0 THEN 1
ELSE (SELECT
hrswk
FROM UserProfileRATinterface_view us
INNER JOIN users u
ON u.username = us.username
WHERE r.week_val = us.calwk
AND r.uid = u.uid
AND yr = 2016)
END) * 100 AS percentage_val
FROM rat_proj_duration_map r
When I run this query I getting time out issue.
TCP Provider: Timeout error [258]
SQL Server is not in my hand to increase time out value.
Is it possible to optimize my SQL query?

Are you sure this query is logically correct? You have several TOP 1s without specific ORDER BY, scalar comparison of subselect without TOP (which, I assume, may return more than one row if you are using top in other subselects with same source).
And yes - this query can be optimized. You can obtain all the values you need with a single subselect statement and avoid multiple execution of same subselects for each row of rat_proj_duration_map which you are having now:
INSERT INTO rat_proj_duration_map_2
SELECT
r.*,
r.hour_val / (CASE
WHEN week_val = 1 AND us.min_hrswk_2 > 0
THEN us.min_hrswk_2
WHEN us.min_hrswk_week_val <1
AND max_hrswk > 0
THEN max_hrswk
WHEN us.cnt <= 0
THEN 1
ELSE min_hrswk_week_val
END) * 100 as percentage_val
FROM
rat_proj_duration_map r
OUTER APPLY
(
SELECT
count(*) as cnt,
MIN(CASE WHEN calcw = 2 THEN hrswk END) as min_hrswk_2,
MIN(CASE WHEN calcw = r.week_val THEN hrswk END) as min_hrswk_week_val,
MAX(hrswk) as max_hrswk
FROM UserProfileRATinterface_view us
inner join users u on u.username=us.username
WHERE r.uid=u.uid and yr=2016
) us
But I can't be sure if original logic is correct. And the idea of that case to me looks like this:
...
r.hour_val / COALESCE(NULLIF(us.min_hrswk_2, 0),
NULLIF(us.min_hrswk_week_val, 0), NULLIF(max_hrswk, 0), 1)
...

The subqueries in your case clause seem to be essentially the same. You could simplify the whole command by defining a grouped version (... where yr=2016 group by u.uid) of this subquery (preferrably as a common table expression) and then work with that. This could potentially save a lot of redundant operations.
The following might work (have not tested it):
;WITH usrall as (
SELECT u.uid ui, hrswk hw, r.week wk, us.calwk cw
FROM UserProfileRATinterface_view us
INNER JOIN users u on u.username=us.username
WHERE r.uid=u.uid and yr=2016
), usrgrp as (
SELECT ui gui, MAX(hrswk) ghw, count(*) gcnt FROM usrall group by ui
), denom as (
SELECT gui dui, COALESCE( MAX(w2.hw), MAX(wkwc.hw), MAX(gwh) ) dnm
FROM usrgrp
LEFT JOIN usrall w2 ON w2.ui=gui AND w2.cw=2 AND w2.hw>0
LEFT JOIN usrall wkcw ON wkcw.ui=gui AND wkcw.wk=wkcw.cw AND wkwc.hw<1
GROUP BY gui
)
SELECT r.*, r.hour_val / d.dnm
FROM rat_proj_duration_map r
INNER JOIN denom d ON d.dui=u.uid
Essentially I have tried (I hope it works :-/) to replace the case construct by a COALESCE() function that checks the three possible calculated values one after the other. The first non-null value is accepted.
As I said: I have not tested it. Good luck

Query for logistic regression, multiple where exists

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc

I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.

Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL LEFT JOIN combined with regular joins - sql

Related

SQL code in Access says error in FROM clause

SQL correct query or not

COUNT Multiple tables with relationships and users

How to optimize SQL Server query

Query for logistic regression, multiple where exists

Categories

Resources