Query for logistic regression, multiple where exists

Query for logistic regression, multiple where exists - sql

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc

I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.

Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA

Related

Pull a separate column that matches the (min) of an aggregate function

It works well so far but I am stumped from here as I am brand new to this. This query finds the closest distance match, pairing up every item in the "FAILED" folder against everything that isn't in the "FAILED" folder.
There is a column "RouteID" in the "table p" that I want to match up with the min() aggregate.
I cannot process how to make the SELECT query simply show the associated "RouteID" column from tbl p but ultimately, I want to turn this into an update query that will SET a.Route = p.Route that is associated with the min()
Any help would be appreciated.
SELECT a.name, a.Reference1,
MIN(round(ACOS(COS(RADIANS(90-a.lat))
*COS(RADIANS(90-p.latpoint))
+SIN(RADIANS(90-a.lat))
*SIN(RADIANS(90-p.latpoint))
*COS(RADIANS(a.lon-p.longpoint)))
*3958.756,2)) AS 'DISTANCE'
FROM tblOrder AS a WITH (NOLOCK)
LEFT JOIN
(
SELECT b.lat AS latpoint, b.lon AS longpoint,
b.Sequence, b.routeid
from tblOrder b WITH (NOLOCK)
WHERE b.CUSTID = 180016
AND b.routeID <> 'FAILED'
AND b.StopType = 1
) AS p ON 1=1
WHERE a.CustID = 180016
AND a.RouteID = 'FAILED'
AND a.StopType = 1
AND P.RouteID <> 'FAILED'
GROUP BY
a.name, a.Reference1

You can select them separately and then join them
SELECT c.name, c.Reference1, q.RouteID
FROM
(
SELECT a.name, a.Reference1,
MIN(round(ACOS(COS(RADIANS(90-a.lat))
*COS(RADIANS(90-p.latpoint))
+SIN(RADIANS(90-a.lat))
*SIN(RADIANS(90-p.latpoint))
*COS(RADIANS(a.lon-p.longpoint)))
*3958.756,2)) AS 'DISTANCE'
FROM tblOrder AS a WITH (NOLOCK)
LEFT JOIN
(
SELECT b.lat AS latpoint, b.lon AS longpoint,
b.Sequence, b.routeid
from tblOrder b WITH (NOLOCK)
WHERE b.CUSTID = 180016
AND b.routeID <> 'FAILED'
AND b.StopType = 1
) AS p ON 1=1
WHERE a.CustID = 180016
AND a.RouteID = 'FAILED'
AND a.StopType = 1
AND P.RouteID <> 'FAILED'
GROUP BY
a.name, a.Reference1
) c
LEFT JOIN
(
SELECT b.lat AS latpoint, b.lon AS longpoint,
b.Sequence, b.routeid
from tblOrderRouteStops b WITH (NOLOCK)
WHERE b.CUSTID = 180016
AND b.routeID <> 'FAILED'
AND b.StopType = 1
) AS q
ON q.routeID = c.DISTANCE

SQL: Attribute matches two different conditions at the same time

I want to make a query where an attribute (same attribute) matches two different conditions at the same time. I have to check if a driver was found in both cities.
I tried to use intersect but I don't get any matches. But in my table I have one driver that matches this conditions.
SELECT s.NumeSofer
FROM Soferi s
INNER JOIN contraventii c ON s.idSofer=c.idSofer
INNER JOIN localitati l ON c.idLocContr=l.idLoc
WHERE l.DenLoc IN ('Iasi', 'Rosiori') AND l.Jud IN ('IS', 'NT');
INTERSECT
SELECT s.NumeSofer
FROM Soferi s
INNER JOIN contraventii c ON s.idSofer=c.idSofer
INNER JOIN localitati l ON c.idLocContr=l.idLoc
WHERE l.DenLoc='Rosiori' AND l.Jud='NT';

You can use aggregation and a HAVING clause, like:
SELECT s.NumeSofer
FROM Soferi s
INNER JOIN contraventii c ON s.idSofer=c.idSofer
INNER JOIN localitati l
ON c.idLocContr = l.idLoc
AND (l.DenLoc, l.Jud) IN ( ('Iasi', 'IS'), ('Rosiori', 'NT') )
GROUP BY s.NumeSofer
HAVING
MAX(CASE WHEN l.DenLoc = 'Iasi' AND l.Jud = 'IS' THEN 1 END) = 1
AND MAX(CASE WHEN l.DenLoc = 'Rosiori' AND l.Jud = 'NT' THEN 1 END) = 1
This will bring you all NumeSofer for which at least one record exists in localitati with DenLoc='Iasi' AND Jud='IS' and at least one record exists with DenLoc='Rosiori' AND Jud='NT'.
Note: the IN operator can be used with tuple values; this reduce the lenght of the query, while avoiding using OR, which is usually not good for general performance.

Do a GROUP BY instead. Use case expressions to do conditional aggregation:
SELECT s.NumeSofer, count(distinct l.DenLoc) as totcount,
count(case when l.DenLoc='Rosiori' then 1 end) as Rosioricount,
count(case when l.DenLoc='Iasi' then 1 end) as Iasicount
FROM Soferi s
INNER JOIN contraventii c ON s.idSofer=c.idSofer
INNER JOIN localitati l ON c.idLocContr=l.idLoc
WHERE (l.DenLoc='Rosiori' AND l.Jud='NT')
OR (l.DenLoc='Iasi' AND l.Jud='IS')
GROUP BY s.NumeSofer
ORDER BY totcount desc
Any rows with totcount = 2?
To get only drivers with both DenLoc's add a HAVING clause:
SELECT s.NumeSofer, count(distinct l.DenLoc) as totcount,
count(case when l.DenLoc='Rosiori' then 1 end) as Rosioricount,
count(case when l.DenLoc='Iasi' then 1 end) as Iasicount
FROM Soferi s
INNER JOIN contraventii c ON s.idSofer=c.idSofer
INNER JOIN localitati l ON c.idLocContr=l.idLoc
WHERE (l.DenLoc='Rosiori' AND l.Jud='NT')
OR (l.DenLoc='Iasi' AND l.Jud='IS')
GROUP BY s.NumeSofer
HAVING count(distinct l.DenLoc) > 1

I need to pull unique patients that meet certain criteria

I pulled person_nbrs that have never had an EventType1 before or after an EventType2. I need to pull person_nbrs that have not had an EventType1 prior to having an EventType2. If they had an EventType1 after an EventType2, than it is to be ignored. Here is my query that pulls person_nbrs that have never had an EventType1 before or after EventType2.
SELECT
person_nbr, enc_nbr, enc_timestamp
FROM
person p
JOIN
patient_encounter pe ON p.person_id = pe.person_id
JOIN
patient_procedure pp ON pe.enc_id = pp.enc_id
WHERE
enc_timestamp >= '20170101'
--EventType2
AND code_id LIKE '2'
-- EventType1
AND person_nbr NOT IN (SELECT person_nbr
FROM person p
JOIN patient_encounter pe ON p.person_id = pe.person_id
JOIN patient_procedure pp ON pe.enc_id = pp.enc_id
WHERE code_id LIKE '1')
GROUP BY
person_nbr, enc_nbr, enc_timestamp
ORDER BY
person_nbr ;

You can do this with aggregation and a HAVING clause:
SELECT p.person_nbr
FROM person p JOIN
patient_encounter pe
ON p.person_id = pe.person_id JOIN
patient_procedure pp
ON pe.enc_id = pp.enc_id
GROUP BY p.person_nbr
HAVING SUM(CASE WHEN pp.code_id = 2 THEN 1 ELSE 0 END) > 0 AND -- has code 2
(MAX(CASE WHEN pp.code_id = 1 THEN pe.timestamp END) IS NULL OR
MAX(CASE WHEN pp.code_id = 1 THEN pe.timestamp END) < MIN(CASE WHEN pp.code_id = 2 THEN pe.timestamp END)
) ;
The HAVING clause has two parts:
The first specifies that the person has a code = 2.
The second specifies one of two conditions. The first is that there is no code = 1. The second alternative is that the latest c = 1 timestamp is less than the earliest code = 2 timestamp.

SQL correct query or not

given these relationships, how could you query the following:
The tourists (name and email) that booked at least a pension whose rating is greater than 9, but didn't book any 3 star hotel with a rating less than 9.
Is the following correct?
SELECT Tourists.name, Tourists.email
FROM Tourists
WHERE EXISTS (
SELECT id FROM Bookings
INNER JOIN Tourists ON Bookings.touristId=Tourists.id
INNER JOIN AccomodationEstablishments ON Bookings.accEstId=AccomodationEstablishments.id
INNER JOIN AccomodationTypes ON AccomodationEstablishments.accType=AccomodationTypes.id
WHERE AccomodationTypes.name = 'Pension' AND
AccomodationEstablishments.rating > 9
) AND NOT EXISTS (
SELECT id FROM Bookings
INNER JOIN Tourists ON Bookings.touristId=Tourists.id
INNER JOIN AccomodationEstablishments ON Bookings.accEstId=AccomodationEstablishments.id
INNER JOIN AccomodationTypes ON AccomodationEstablishments.accType=AccomodationTypes.id
WHERE AccomodationTypes.name = 'Hotel' AND
AccomodationEstablishments.noOfStars = 3 AND
AccomodationEstablishments.rating < 9
)

I would do this using aggregation and having:
SELECT t.name, t.email
FROM Bookings b INNER JOIN
Tourists t
ON b.touristId = t.id INNER JOIN
AccomodationEstablishments ae
ON b.accEstId = ae.id INNER JOIN
AccomodationTypes a
ON ae.accType = a.id
GROUP BY t.name, t.email
HAVING SUM(CASE WHEN a.name = 'Pension' AND ae.rating > 9 THEN 1 ELSE 0 END) > 0 AND
SUM(a.name = 'Hotel' AND ae.noOfStars = 3 AND ae.rating < 9 THEN 1 ELSE 0 END)= 0;
Your method also works, but you probably need t.id in the subqueries.

Sybase SQL - Remove "semi-duplicates" from query results

I have a query that uses two SELECT statements that are combined using a UNION ALL. Both statements pull data from similar tables to populate the query results. I am attempting to remove the "semi-duplicate" rows from the query, but am having issues doing so.
My query is the following:
SELECT DISTINCT *
FROM
(
SELECT
TeamNum = CASE
WHEN T.TeamName = 'Alpha Team'
THEN '1'
WHEN T.TeamName IN ('Bravo Team', 'Charlie Team')
THEN '2'
WHEN T.TeamName = 'Delta Team'
THEN '3'
ELSE '<Undefined>'
END,
P.PatientLastName AS LastName,
P.PatientFirstName AS FirstName,
R.PrimaryCity AS City,
ReimbursorName = CASE
WHEN RE.ReimbursorDescription = 'Medicare'
Then 'R1'
WHEN RE.ReimbursorDescription = 'Medicaid'
Then 'R2'
ELSE 'R3'
END,
P.PatientID AS PatientID
FROM
PatReferrals PR LEFT JOIN Patient P ON PR.PatientID = P.PatientID,
Patient P LEFT OUTER JOIN Rolodex R ON P.RolodexID = R.RolodexID,
PatReferrals PR LEFT OUTER JOIN PatReimbursors PRE ON PR.PatientID = PRE.PatientID,
PatReimbursors PRE LEFT OUTER JOIN Reimbursors RE ON PRE.ReimbursorID = RE.ReimbursorID,
PatReferrals PR FULL OUTER JOIN Teams T ON PR.TeamID = T.TeamID,
WHERE
PR.ReferralDate BETWEEN GETDATE()-4 AND GETDATE()-1
AND PR.Status <> 'R'
AND PRE.CoveragePriority = '1'
AND PRE.ExpirationDate IS NULL
UNION ALL
SELECT
TeamNum = CASE
WHEN T.TeamName = 'Alpha Team'
THEN '1'
WHEN T.TeamName IN ('Bravo Team', 'Charlie Team')
THEN '2'
WHEN T.TeamName = 'Delta Team'
THEN '3'
ELSE '<Undefined>'
END,
P.PatientLastName AS LastName,
P.PatientFirstName AS FirstName,
R.PrimaryCity AS City,
ReimbursorName = CASE
WHEN RE.ReimbursorDescription = 'Medicare'
Then 'E1'
WHEN RE.ReimbursorDescription = 'Medicaid'
Then 'E2'
ELSE 'E3'
END,
P.PatientID AS PatientID
FROM
PatReferrals PR LEFT JOIN Patient P ON PR.PatientID = P.PatientID,
Patient P LEFT OUTER JOIN Rolodex R ON P.RolodexID = R.RolodexID,
PatReferrals PR LEFT OUTER JOIN PatEligibilities PE ON PR.PatientID = PE.PatientID,
PatEligibilities PE LEFT OUTER JOIN Reimbursors RE ON PE.ReimbursorID = RE.ReimbursorID,
PatReferrals PR FULL OUTER JOIN Teams T ON PR.TeamID = T.TeamID,
WHERE
PR.ReferralDate BETWEEN GETDATE()-4 AND GETDATE()-1
AND PR.Status <> 'R'
AND PE.Status <> 'V'
AND PE.ApplicationDate BETWEEN DATE(PR.ReferralDate)-5 AND DATE('2100/01/01')
)
AS DUMMYTBL
ORDER BY
DUMMYTBL.LastName ASC,
DUMMYTBL.FirstName ASC
The results that I receive when I run the query is the following:
3 Doe Jane Town R1 19874
1 Roe John City R3 50016
1 Roe John City E1 50016
2 Smith Jane Town E3 33975
The data that I am needing to remove is duplicate rows based on a certain criteria once the results are brought in from the original query. Each person can only be listed once and they must have a single pay source (R1, R2, R3, E1, E2, E3). If there is a R#, than there cannot be a E# listed for that person. If there are no R#'s than an E# must be listed. As shown in my example results, line 2 and 3 have the same person listed, but two pay sources (R3 and E1).
How can I go about making each person have only one row shown using the criteria that I have listed?
EDIT: Modifed the SQL query to show the original variables from the WHERE clauses in order to show further detail on the query. The PatReimbursors and the PatEligibilities tables have similar data, but the criteria is different in order to pull the correct data.

Your query does not make sense. I would start by eliminating the implicit cartesian product, generated by the , in the from clause.
My guess is that the from clause should be:
FROM
PatReferrals PR LEFT JOIN
Patient P
ON PR.PatientID = P.PatientID left outer join
Rolodex R
ON P.RolodexID = R.RolodexID left outer join
PatEligibilities PE
ON PR.PatientID = PE.PatientID left outer join
Reimbursors RE
ON PE.ReimbursorID = RE.ReimbursorID left outer join
Teams T ON PR.TeamID = T.TeamID
Once you do this, you may not need the union all or the select distinct. You may be able to put both the reimbursors and the eligibilities in the same query.

Use a subquery or subqueries.
The overall query should be written using the following pattern:
Select Distinct [Person Data]
From PersonTable
left Join to otherTable1 -- add outer join for each table you need data from
On [Conditions that ensure join can generate only one row per person,
... and specify which of possibly many rows to get...]
Make sure the conditions eliminate any possibility for the join to generate more than one row from the other [outer] table per person row in in the person table,. This may (and often does) require that the join condition be based on a subquery, as, for example...
Select Distinct [Person Data]
From PersonTable p
left Join to employments e -- add outer join for each table you need data from
On e.PersonId = p.PersonId
and e.HireDate = (Select Max(hiredate) from employments
where personId = p.PersonId)

After working with this for quite some time today, I found a solution to the problem that I was having. Here is the solution that works and pulls the correct information that I was needing:
SELECT DISTINCT
TeamNum,
LastName,
FirstName,
City,
ReimbursorName = CASE
WHEN max(ReimbursorName) IN ('R1', 'E1')
THEN '1'
WHEN max(ReimbursorName) IN ('R2', 'E2')
THEN '2'
ELSE '3'
END,
PatientID
FROM
(
SELECT
TeamNum = CASE
WHEN T.TeamName = 'Alpha Team'
THEN '1'
WHEN T.TeamName IN ('Bravo Team', 'Charlie Team')
THEN '2'
WHEN T.TeamName = 'Delta Team'
THEN '3'
ELSE '<Undefined>'
END,
P.PatientLastName AS LastName,
P.PatientFirstName AS FirstName,
R.PrimaryCity AS City,
ReimbursorName = CASE
WHEN RE.ReimbursorDescription = 'Medicare'
Then 'R1'
WHEN RE.ReimbursorDescription = 'Medicaid'
Then 'R2'
ELSE 'R3'
END,
P.PatientID AS PatientID
FROM
PatReferrals PR LEFT JOIN Patient P ON PR.PatientID = P.PatientID,
Patient P LEFT OUTER JOIN Rolodex R ON P.RolodexID = R.RolodexID,
PatReferrals PR LEFT OUTER JOIN PatReimbursors PRE ON PR.PatientID = PRE.PatientID,
PatReimbursors PRE LEFT OUTER JOIN Reimbursors RE ON PRE.ReimbursorID = RE.ReimbursorID,
PatReferrals PR FULL OUTER JOIN Teams T ON PR.TeamID = T.TeamID
WHERE
PR.ReferralDate BETWEEN GETDATE()-4 AND GETDATE()-1
AND PR.Status <> 'R'
AND PRE.CoveragePriority = '1'
AND PRE.ExpirationDate IS NULL
UNION ALL
SELECT
TeamNum = CASE
WHEN T.TeamName = 'Alpha Team'
THEN '1'
WHEN T.TeamName IN ('Bravo Team', 'Charlie Team')
THEN '2'
WHEN T.TeamName = 'Delta Team'
THEN '3'
ELSE '<Undefined>'
END,
P.PatientLastName AS LastName,
P.PatientFirstName AS FirstName,
R.PrimaryCity AS City,
ReimbursorName = CASE
WHEN RE.ReimbursorDescription = 'Medicare'
Then 'E1'
WHEN RE.ReimbursorDescription = 'Medicaid'
Then 'E2'
ELSE 'E3'
END,
P.PatientID AS PatientID
FROM
PatReferrals PR LEFT JOIN Patient P ON PR.PatientID = P.PatientID,
Patient P LEFT OUTER JOIN Rolodex R ON P.RolodexID = R.RolodexID,
PatReferrals PR LEFT OUTER JOIN PatEligibilities PE ON PR.PatientID = PE.PatientID,
PatEligibilities PE LEFT OUTER JOIN Reimbursors RE ON PE.ReimbursorID = RE.ReimbursorID,
PatReferrals PR FULL OUTER JOIN Teams T ON PR.TeamID = T.TeamID
WHERE
PR.ReferralDate BETWEEN GETDATE()-4 AND GETDATE()-1
AND PR.Status <> 'R'
AND PE.Status <> 'V'
AND PE.ApplicationDate BETWEEN DATE(PR.ReferralDate)-5 AND DATE('2100/01/01')
)
AS DUMMYTBL
GROUP BY
TeamNum,
LastName,
FirstName,
City,
PatientID
ORDER BY
DUMMYTBL.LastName ASC,
DUMMYTBL.FirstName ASC
Thanks for all the responses that were provided.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Query for logistic regression, multiple where exists - sql

Related

Pull a separate column that matches the (min) of an aggregate function

SQL: Attribute matches two different conditions at the same time

I need to pull unique patients that meet certain criteria

SQL correct query or not

Sybase SQL - Remove "semi-duplicates" from query results

Categories

Resources