Problem with SQL query involving XOR-like condition - sql

Let's say we have a table (EnsembleMembers) in an SQL database with the following data. It lists the musicians which are part of various ensembles, along with their instruments.
EnsembleID (FK) MusicianID (FK) Instrument
----------------------------------------------
'1' '1' 'Clarinet'
'1' '4' 'Clarinet'
'1' '100' 'Saxophone'
'2' '200' 'Saxophone'
'2' '300' 'Saxophone'
'2' '320' 'Flute'
'99' '300' 'Clarinet'
I want to select the ensemble IDs where the ensemble has one or more saxophone or one or more clarinet players, but not both. I have tried the following SQL statement, but it is returning 1,2,2,99, rather than the expected 2,99.
SELECT e1.EnsembleID
FROM ensemblemembers e1
WHERE e1.Instrument = 'Saxophone'
OR e1.Instrument = 'Clarinet'
AND NOT EXISTS (SELECT *
FROM ensemblemembers e2
WHERE ( e1.Instrument = 'Saxophone'
AND e2.Instrument = 'Clarinet'
AND e1.EnsembleID = e2.EnsembleID)
OR ( e1.Instrument = 'Clarinet'
AND e2.Instrument = 'Saxophone'
AND e1.EnsembleID = e2.EnsembleID));
What am I doing wrong?
PS - I don't want to use DISTINCT for performance reasons.

SELECT EnsembleID
FROM EnsembleMembers
WHERE Instrument IN ('Saxophone', 'Clarinet')
GROUP BY EnsembleID
HAVING COUNT(DISTINCT Instrument) = 1
You can also use a FULL OUTER JOIN for this, but this type of join is not supported by MySQL and a few other minor databases.
SELECT COALESCE(e1.EnsembleID, e2.EnsembleID) AS EnsembleID
FROM EnsembleMembers e1 FULL OUTER JOIN EnsembleMembers e2
ON e1.EnsembleID = e2.EnsembleID
AND e1.Instrument = 'Saxophone'
AND e2.Instrument = 'Clarinet'
WHERE e1.EnsembleID IS NULL OR e2.EnsembleID IS NULL
If you need this to work without FULL OUTER JOIN, try this:
SELECT e1.EnsembleID, e1.Instrument
FROM EnsembleMembers e1 LEFT OUTER JOIN EnsembleMembers e2
ON e1.EnsembleID = e2.EnsembleID
AND e2.Instrument = 'Clarinet'
WHERE e1.Instrument = 'Saxophone' AND e2.EnsembleID IS NULL
UNION
SELECT e1.EnsembleID, e1.Instrument, e2.EnsembleID, e2.Instrument
FROM EnsembleMembers e1 LEFT OUTER JOIN EnsembleMembers e2
ON e1.EnsembleID = e2.EnsembleID
AND e2.Instrument = 'Saxophone'
WHERE e1.Instrument = 'Clarinet' AND e2.EnsembleID IS NULL;
In the future, please tag your question with the brand of RDBMS you use.

I assume you have a table called ENSEMBLES:
select E.id from ensembles E where exists (
select 1 from ensemblemembers M where E.id = M.ensembleid
and M.instrument in ('clarinet', 'saxophone')
) and not (
exists (
select 1 from ensemblemembers M where E.id = M.ensembleid
and M.instrument = 'clarinet'
) and exists (
select 1 from ensemblemembers M where E.id = M.ensembleid
and M.instrument = 'saxophone'
)
)
You want to avoid using DISTINCT, so one way to do it is by using the main ENSEMBLES table. From there, pick ensemble rows that have 'clarinet' OR 'saxophone'. Then the third step is to remove all ensemble rows that have 'clarinet' AND 'saxophone'.

Here is the most stupid solution (but I like it somehow):
SELECT EnsembleID FROM EnsembleMembers
MINUS
(
SELECT EnsembleID
FROM EnsembleMembers
WHERE Instrument = 'Saxophone'
INTERSECT
SELECT EnsembleID
FROM EnsembleMembers
WHERE Instrument = 'Clarinet' );

Here's how you could do the query using XOR:
select a.EnsembleID from
(
select max(EnsembleID) as EnsembleID,
max(case when Instrument = 'Saxophone' then 1 else 0 end) as Saxophone,
max(case when Instrument = 'Clarinet' then 1 else 0 end) as Clarinet
from
EnsembleMembers
group by EnsembleID
) a
where
a.Saxophone ^ a.Clarinet = 1

Related

Select sql statement for Mysql

I have this bit of sql
SELECT pd.id, pd.title, pd.fname, pd.lname, pd.DOB, pd.phone_home, pd.phone_biz, pd.phone_contact, pd.phone_cell, pd.sex, pd.email, pd.pid, f.form_id, ldf1.field_value AS autoleaxis, ldf2.field_value AS autolecyl
FROM patient_data pd
LEFT JOIN forms f ON pd.pid = f.pid
LEFT JOIN `lbf_data` ldf1 ON ldf1.form_id = f.form_id
LEFT JOIN `lbf_data` ldf2 ON ldf1.form_id = ldf2.form_id
WHERE (
f.form_name = 'Opthalmology'
AND ldf1.field_id = 'autoleaxis'
AND ldf2.field_id = 'autolecyl'
)
OR (
f.form_id IS NULL
AND ldf1.field_id IS NULL
AND ldf2.field_id IS NULL
)
ORDER BY pd.pid
Which is for a MySQL Database. It works ok except it is returning more than one record per person in some cases as they have more than one form_id. How do I restrict it so that it only returns the records for highest form_id that the person has or form_id set to null if there isn't one. Thanks
You can use subquery in LEFT JOIN, as with:
SELECT *
FROM patient_data pd
LEFT JOIN
(
select max(form_id) as form_id, pid, form_name
from forms as f2
) as f
ON pd.pid = f.pid
LEFT JOIN `lbf_data` ldf1 ON ldf1.form_id = f.form_id
LEFT JOIN `lbf_data` ldf2 ON ldf1.form_id = ldf2.form_id
WHERE (
f.form_name = 'Opthalmology'
AND ldf1.field_id = 'autoleaxis'
AND ldf2.field_id = 'autolecyl'
)
OR (
f.form_id IS NULL
AND ldf1.field_id IS NULL
AND ldf2.field_id IS NULL
)
ORDER BY pd.pid
You can test in here

Oracle SQL Correlated subquery - Returning count(*) in some columns

I have my initial statement which is :
SELECT TEAM.ID PKEY_SRC_OBJECT,
TEAM.MODF_DAT UPDATE_DATE,
TEAM.MODF_USR UPDATED_BY,
PERSO.FIRST_NAM FISRT_NAME
FROM TEAM
LEFT OUTER JOIN PERSO ON (TEAM.ID=PERSO.TEAM_ID)
I want to calculate some "flags" and return them in my initial statement.
There are 3 flags which can be calculated like this :
1) Flag ISMASTER:
SELECT Count(*)
FROM TEAM_TEAM_REL A, TEAM B
WHERE B.PARTY_PTY_ID = A.RLTD_TEAM_ID
AND CODE = 'Double';
2) Flag ISAGENT:
SELECT Count(*)
FROM TEAM_ROL_REL A, TEAM B
WHERE B.PARTY_PTY_ID = A.TEAM_ID;
3) Flag NUMPACTS:
SELECT Count(*)
FROM TEAM_ROL_REL A,
TEAM_ROL_POL_REL B,
PERSO_POL_STA_REL C,
TEAM D
WHERE A.ROL_CD IN ('1','2')
AND A.T_ROL_REL_ID = B.P_ROL_REL_ID
AND B.P_POL_ID = C.P_POL_ID
AND C.STA_CD = 'A'
AND D.PARTY_PTY_ID = A.TEAM_ID;
To try to achieve this, I've updated my initial statement like this :
WITH ABC AS (
SELECT TEAM.ID PKEY_SRC_OBJECT,
TEAM.MODF_DAT UPDATE_DATE,
TEAM.MODF_USR UPDATED_BY,
PERSO.FIRST_NAM FISRT_NAME
FROM TEAM
LEFT OUTER JOIN PERSO ON (TEAM.ID=PERSO.TEAM_ID)
)
SELECT ABC.*, MAST.ISMASTER, AGENT.ISAGENT, PACTS.NUMPACTS FROM ABC
LEFT OUTER JOIN (
select
RLTD_TEAM_ID,
Count(RLTD_TEAM_ID) OVER (PARTITION BY RLTD_TEAM_ID) as ISMASTER
FROM TEAM_TEAM_REL
WHERE CODE = 'Double'
) MAST
ON ABC.PKEY_SRC_OBJECT = MAST.RLTD_TEAM_ID
LEFT OUTER JOIN (
select
TEAM_ID,
Count(TEAM_ID) OVER (PARTITION BY TEAM_ID) as ISAGENT
FROM TEAM_ROL_REL
) AGENT
ON ABC.PKEY_SRC_OBJECT = AGENT.TEAM_ID
LEFT OUTER JOIN (
select
TEAM_ID,
Count(TEAM_ID) OVER (PARTITION BY TEAM_ID) as NUMPACTS
FROM TEAM_ROL_REL, TEAM_ROL_POL_REL, PERSO_POL_STA_REL
WHERE TEAM_ROL_REL.ROL_CD IN ('1','2')
AND TEAM_ROL_REL.T_ROL_REL_ID = TEAM_ROL_POL_REL.P_ROL_REL_ID
AND TEAM_ROL_POL_REL.P_POL_ID = PERSO_POL_STA_REL.P_POL_ID
AND PERSO_POL_STA_REL.STA_CD = 'A'
) PACTS
ON ABC.PKEY_SRC_OBJECT = PACTS.TEAM_ID;
For the two first flags (ISMASTER and ISAGENT) I get the result in less than 1min, but for the last flag (NUMPACTS) it runs few minutes without provide any result.
I think my statement is too heavy, maybe I should do it in a totally different way.
I think you have perhaps over complicated things.
You could do this (assuming I have understood your requirements correctly) like so:
WITH ttr AS (SELECT rltd_team_id,
COUNT(*) is_master
FROM team_team_rel
AND CODE = 'Double'
GROUP BY rltd_team_id),
trr AS (SELECT team_id,
COUNT(*) is_agent
FROM team_rol_rel
GROUP BY team_id)
pacts AS (SELECT trr1.team_id,
COUNT(*) num_pacts
FROM team_rol_rel trr1
INNER JOIN team_rol_pol_rel trpr ON (trr1.t_rol_rel_id = trpr.p_rol_rel_id)
INNER JOIN perso_pol_sta_rel ppsr ON (trpr.p_pol_id = ppsr.p_pol_id
WHERE trr1.rol_cd IN ('1', '2')
AND ppsr.st_cd = 'A'
GROUP BY trr1.team_id)
SELECT t.id pkey_src_object,
t.modf_dat update_date,
t.modf_usr updated_by,
p.first_nam first_name,
ttr.is_master,
trr.is_agent,
pacts.num_pacts
FROM team t
LEFT OUTER JOIN perso p ON t.id = p.team_id
LEFT OUTER JOIN ttr ON t.party_pty_id = ttr.rltd_team_id
LEFT OUTER JOIN trr ON t.party_pty_id = trr.team_id
LEFT OUTER JOIN pacts ON t.pkey_src_object = pacts.team_id;
N.B. untested, since you didn't provide any test data.

Convert Exist condition to Join with T-SQL

I am trying to convert the following T-SQL Select query to exclude "Exists" Clause and Include "Join" Clause. but i am ending up not getting the right result. can some one from this expert team help me with some tips.
select *
FROM HRData
INNER JOIN (
SELECT eeceeid, MIN(eecdateoftermination) eTermDate
FROM dbo.empcomp
INNER JOIN
(
SELECT xeeid FROM HRData_EEList
INNER JOIN dbo.empcomp t ON xeeid = eeceeid AND xcoid = eeccoid
WHERE eecemplstatus = 'T' AND eectermreason <> 'TRO' AND eeccoid <> 'WAON6'
AND EXISTS ( SELECT 1 FROM dbo.empded
INNER JOIN dbo.dedcode on deddedcode = eeddedcode AND deddedtype = 'MED' AND (eedbenstopdate IS NULL OR eedbenstopdate > '12/31/2005')
WHERE eedeeid = xeeid AND eedcoid = xcoid )
GROUP BY xeeid
HAVING COUNT(*) > 1) Term ON xeeid = eeceeid
group by eeceeid
) Terms ON eeid = eeceeid AND Termdate = eTermDate
The algorithm to convert EXISTS to JOIN is very simple.
Instead of
FROM A
WHERE EXISTS (SELECT *
FROM B
WHERE A.Foo = B.Foo)
Use
FROM A
INNER JOIN (SELECT DISTINCT Foo
FROM B) AS B
ON A.Foo = B.Foo
But the first one probably will be optimised better
Interesting request.
select *
FROM HRData
INNER JOIN (
SELECT eeceeid, MIN(eecdateoftermination) eTermDate
FROM dbo.empcomp
INNER JOIN
(
SELECT xeeid FROM HRData_EEList
INNER JOIN dbo.empcomp t ON xeeid = eeceeid AND xcoid = eeccoid
INNER JOIN
( SELECT DISTINCT xeeid, xcoid FROM dbo.empded
INNER JOIN dbo.dedcode on deddedcode = eeddedcode AND deddedtype = 'MED' AND (eedbenstopdate IS NULL OR eedbenstopdate > '12/31/2005')
-- WHERE eedeeid = xeeid AND eedcoid = xcoid
) AS A ON xeeid = A.xeeid AND eedcoid = A.eedcoid
WHERE eecemplstatus = 'T' AND eectermreason <> 'TRO' AND eeccoid <> 'WAON6'
GROUP BY xeeid
HAVING COUNT(*) > 1) Term ON xeeid = eeceeid
group by eeceeid
) Terms ON eeid = eeceeid AND Termdate = eTermDate
Another method of converting an exists to a join is to use a ROW_NUMBER() in the subselect to assist in removing duplicates.
EXISTS:
FROM A
WHERE EXISTS (SELECT *
FROM B
WHERE B.Condition = 'true' AND A.Foo = B.Foo)
JOIN:
FROM A
JOIN (SELECT B.Foo, ROW_NUMBER() OVER (PARTITION BY B.Foo ORDER BY B.Foo) RN
FROM B
WHERE B.Condition = 'true') DT
ON A.Foo = DT.Foo AND DT.RN = 1
The ORDER BY is totally arbitrary since you don't care which record it selects, but it's required. You may be able to use (SELECT NULL) instead.

Sum of resulting set of rows in SQL

I've got the following query:
SELECT DISTINCT CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
/*MC.chemical_percentage,*/
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
--AND CU.permit_id = 2118
--GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month, MC.chemical_id, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound
--ORDER BY C.chemical_name ASC
Which returns:
But what I need is to return one row per month per material adding up the values of POC per month and NON_POC per month.
So, I should end up with something like:
Month material_id material_name POC NON_POC
1 52 Krylon... 0.107581 0.074108687
2 52 Krylon... 0.143437 0.0988125
I tried using SUM but it sums up the same result multiple times:
SELECT /*DISTINCT*/ CU.permit_id, CU.month, /*CU.year,*/ M.material_id, M.material_name, /*MC.chemical_id, C.chemical_name,
C.precursor_organic_compound, C.non_precursor_organic_compound,*/
--MC.chemical_percentage,
POC_emissions = SUM(
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END),
NON_POC_emissions = SUM(
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END)
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE M.material_id = 52
--AND CU.permit_id = 187
AND (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
GROUP BY CU.permit_id, M.material_id, M.material_name, CU.month/*, CU.year, MC.chemical_id, C.chemical_name, C.precursor_organic_compound, C.non_precursor_organic_compound*/
--ORDER BY C.chemical_name ASC
The first query has a DISTINCT clause. What is the output without the DISTINCT clause. I suspect you have more rows than shows in your screenshot.
Regardless, you could try something like this to get the desired result.
select permit_id, month, material_id, material_name,
sum(poc_emissions), sum(non_poc_emissions)
from (
SELECT DISTINCT CU.permit_id, CU.month, M.material_id, M.material_name,
POC_emissions =
CASE
WHEN (C.precursor_organic_compound = 'true')
THEN (CU.chemical_usage_lbs / CU.material_density) * M.VOC
ELSE 0
END,
NON_POC_emissions =
CASE
WHEN (C.non_precursor_organic_compound = 'true')
THEN CU.chemical_usage_lbs * (MC.chemical_percentage / 100)
ELSE 0
END
FROM material M
LEFT OUTER JOIN material_chemical MC ON MC.material_id = M.material_id
LEFT OUTER JOIN chemical_usage CU ON CU.material_id = MC.material_id
LEFT OUTER JOIN chemical C ON C.chemical_id = MC.chemical_id
WHERE (CU.month >=1 AND CU.month <= 2)
AND CU.year = 2013
AND M.material_id = 52
) main
group by permit_id, month, material_id, material_name
Explanation
Since the results you retrieved by doing a DISTINCT was consider source-of-truth, I created an in-memory table by making it a sub-query. However, this subquery must have a name of some kind...whatever name. I gave it a name main. Subqueries look like this:
select ... from (sub-query) <give-it-a-table-name>
Simple Example:
select * from (select userid, username from user) user_temp
Advanced Example:
select * from (select userid, username from user) user_temp
inner join (select userid, sum(debits) as totaldebits from debittable) debit
on debit.userid = user_temp.userid
Notice how user_temp alias for the subquery can be used as if the sub-query was a real table.
Use above query in subquery and group by (month) and select sum(POC_emissions) and sum(NON_POC_emissions )

Query for logistic regression, multiple where exists

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc
I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.
Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA