Convert Exist condition to Join with T-SQL - sql

I am trying to convert the following T-SQL Select query to exclude "Exists" Clause and Include "Join" Clause. but i am ending up not getting the right result. can some one from this expert team help me with some tips.
select *
FROM HRData
INNER JOIN (
SELECT eeceeid, MIN(eecdateoftermination) eTermDate
FROM dbo.empcomp
INNER JOIN
(
SELECT xeeid FROM HRData_EEList
INNER JOIN dbo.empcomp t ON xeeid = eeceeid AND xcoid = eeccoid
WHERE eecemplstatus = 'T' AND eectermreason <> 'TRO' AND eeccoid <> 'WAON6'
AND EXISTS ( SELECT 1 FROM dbo.empded
INNER JOIN dbo.dedcode on deddedcode = eeddedcode AND deddedtype = 'MED' AND (eedbenstopdate IS NULL OR eedbenstopdate > '12/31/2005')
WHERE eedeeid = xeeid AND eedcoid = xcoid )
GROUP BY xeeid
HAVING COUNT(*) > 1) Term ON xeeid = eeceeid
group by eeceeid
) Terms ON eeid = eeceeid AND Termdate = eTermDate

The algorithm to convert EXISTS to JOIN is very simple.
Instead of
FROM A
WHERE EXISTS (SELECT *
FROM B
WHERE A.Foo = B.Foo)
Use
FROM A
INNER JOIN (SELECT DISTINCT Foo
FROM B) AS B
ON A.Foo = B.Foo
But the first one probably will be optimised better

Interesting request.
select *
FROM HRData
INNER JOIN (
SELECT eeceeid, MIN(eecdateoftermination) eTermDate
FROM dbo.empcomp
INNER JOIN
(
SELECT xeeid FROM HRData_EEList
INNER JOIN dbo.empcomp t ON xeeid = eeceeid AND xcoid = eeccoid
INNER JOIN
( SELECT DISTINCT xeeid, xcoid FROM dbo.empded
INNER JOIN dbo.dedcode on deddedcode = eeddedcode AND deddedtype = 'MED' AND (eedbenstopdate IS NULL OR eedbenstopdate > '12/31/2005')
-- WHERE eedeeid = xeeid AND eedcoid = xcoid
) AS A ON xeeid = A.xeeid AND eedcoid = A.eedcoid
WHERE eecemplstatus = 'T' AND eectermreason <> 'TRO' AND eeccoid <> 'WAON6'
GROUP BY xeeid
HAVING COUNT(*) > 1) Term ON xeeid = eeceeid
group by eeceeid
) Terms ON eeid = eeceeid AND Termdate = eTermDate

Another method of converting an exists to a join is to use a ROW_NUMBER() in the subselect to assist in removing duplicates.
EXISTS:
FROM A
WHERE EXISTS (SELECT *
FROM B
WHERE B.Condition = 'true' AND A.Foo = B.Foo)
JOIN:
FROM A
JOIN (SELECT B.Foo, ROW_NUMBER() OVER (PARTITION BY B.Foo ORDER BY B.Foo) RN
FROM B
WHERE B.Condition = 'true') DT
ON A.Foo = DT.Foo AND DT.RN = 1
The ORDER BY is totally arbitrary since you don't care which record it selects, but it's required. You may be able to use (SELECT NULL) instead.

Related

Rewrite union query to a single query

I wondered if it was possible to write this as one query:
SELECT * FROM
(SELECT "markets".*
FROM "markets"
INNER JOIN "positions" ON "positions"."id" = "markets"."position_id"
INNER JOIN "players" ON "players"."id" = "positions"."player_id"
INNER JOIN "other_players" ON "other_players"."id" = "players"."other_player_id"
WHERE (markets.updated_at > '2021-01-10 11:50:14.136015')
AND "markets"."on_feed" = true
AND "markets"."deleted_at" IS NULL
UNION
SELECT "markets".*
FROM "markets"
WHERE (markets.updated_at > '2021-01-10 11:50:14.136015')
AND "markets"."on_feed" = true
AND "markets"."deleted_at" IS NOT NULL) results
ORDER BY results.updated_at DESC
There's a lot of overlap in the two queries. In the first one we want all markets that have those associations in place. For the 2nd query we don't really care if the associations are there or not.
SELECT * FROM "markets"
LEFT JOIN "positions" ON "positions"."id" = "markets"."position_id"
LEFT JOIN "players" ON "players"."id" = "positions"."player_id"
LEFT JOIN "other_players" ON "other_players"."id" = "players"."other_player_id"
WHERE markets.updated_at > '2021-01-10 11:50:14.136015'
AND "markets"."on_feed" = true
AND ("markets"."deleted_at" IS NOT NULL
OR "positions"."id" IS NOT NULL AND "players"."id" IS NOT NULL
AND "other_players"."id" IS NOT NULL)
ORDER BY markets.updated_at DESC
Use EXISTS instead:
SELECT m.*
FROM "markets" m
WHERE m.updated_at > '2021-01-10 11:50:14.136015' AND
markets."on_feed" = true AND
(m."deleted_at" IS NOT NULL OR
EXISTS (SELECT 1
FROM "positions" p JOIN
"players" pl
ON pl."id" = p."player_id" JOIN
"other_players" op
ON op."id" = p."other_player_id"
WHERE p."id" = m."position_id"
)
)
ORDER BY m.updated_at DESC

access compare three tables

Is there a way to triple check that the data is consistent between the three tables and show if there are any discrepancies in any of the three tables?
Each table is not completely the same but they all have an EmployeeID, amount, datepaid, and CID column.
When I did with two tables I thought I had the SQL as:
SELECT tbMaster.EmployeeID, tbMaster.Amount, tbMaster.DatePaid, tbMaster.CID
FROM tbMaster LEFT JOIN tbCID ON (tbMaster.CID = tbCID.CID) AND (tbMaster.Amount = tbCID.Amount) AND (tbMaster.DatePaid = tbCID.DatePaid)
WHERE (((tbMaster.[Advance/paid])="Paid Respondent") AND ((tbCID.CID) Is Null));
It worked but if the tbCID had any discrepancies it did not catch it...
So for the three tables I thought:
SELECT tbMaster.EmployeeID, tbMaster.Amount, tbMaster.DatePaid, tbMaster.[Advance/paid], tbMaster.CID
FROM tbMaster, tbCID, table3
WHERE tbMaster.CID <> tbCID.CID
OR table3.CID <> tbMaster.CID
OR table3.CID <> tbCID.CID
OR tbMaster.Amount <> tbCID.Amount
OR table3.AMOUNT <> tbMaster.AMOUNT
OR table3.AMOUNT <> tbCID.AMOUNT
OR tbMaster.Datepaid <> tbCID.Datepaid
OR table3.DATEPAID <> tbMaster.DATEPAID
OR table3.DATEPAID <> tbCID.DATEPAID
But there is only 30 entries yet I get 5x or 6x copies and get over 30000 rows/entries in the query...
You are doing a full cross join on your three tables, so for every combination of tbMaster row, tbCID row, and table3 row where the three don't match on the three columns, a row will be returned.
If you are trying to return a tbMaster row where the CID, Amount, and DatePaid do not appear in both tbSUID and table3, you need something similar to your first query:
SELECT tbMaster.EmployeeID, tbMaster.Amount, tbMaster.DatePaid, tbMaster.CID
FROM tbMaster
LEFT JOIN tbSUID ON (tbMaster.CID = tbSUID.CID) AND (tbMaster.Amount = tbSUID.Amount) AND (tbMaster.DatePaid = tbSUID.DatePaid)
LEFT JOIN table3 ON (tbMaster.CID = table3.CID) AND (tbMaster.Amount = table3.Amount) AND (tbMaster.DatePaid = table3.DatePaid)
WHERE ((tbMaster.[Advance/paid])="Paid Respondent")
AND ((tbCID.CID) Is Null) OR (table3.CID Is Null));
If the record has to appear in just one of tbSUID and table3, replace the last "OR" with "AND".
Consider subqueries in NOT EXISTS clauses:
SELECT m.EmployeeID, m.Amount, m.DatePaid, m.[Advance/paid], m.CID
FROM tbMaster m
WHERE NOT EXISTS
(SELECT 1 FROM tbCID c
WHERE c.CID = m.CID OR c.AMOUNT = m.AMOUNT OR c.DATEPAID = m.DATEPAID)
OR NOT EXISTS
(SELECT 1 FROM table3 t
WHERE t.CID = m.CID OR t.AMOUNT = m.AMOUNT OR t.DATEPAID = m.DATEPAID)
For inverse comparisons, consider adding union queries with a source_table indicator:
SELECT m.EmployeeID, m.Amount, m.DatePaid, m.[Advance/paid], m.CID,
'tbMaster' AS [source_table]
FROM tbMaster m
WHERE NOT EXISTS
(SELECT 1 FROM tbCID c
WHERE c.CID = m.CID OR c.AMOUNT = m.AMOUNT OR c.DATEPAID = m.DATEPAID)
OR NOT EXISTS
(SELECT 1 FROM table3 t
WHERE t.CID = m.CID OR t.AMOUNT = m.AMOUNT OR t.DATEPAID = m.DATEPAID)
UNION ALL
SELECT c.EmployeeID, c.Amount, c.DatePaid, c.[Advance/paid], c.CID,
'tbCID' AS [source_table]
FROM tbCID c
WHERE NOT EXISTS
(SELECT 1 FROM tbMaster m
WHERE m.CID = c.CID OR m.AMOUNT = c.AMOUNT OR m.DATEPAID = c.DATEPAID)
OR NOT EXISTS
(SELECT 1 FROM table3 t
WHERE t.CID = c.CID OR t.AMOUNT = c.AMOUNT OR t.DATEPAID = c.DATEPAID)
UNION ALL
SELECT t.EmployeeID, t.Amount, t.DatePaid, t.[Advance/paid], t.CID,
'table3' AS [source_table]
FROM table3 t
WHERE NOT EXISTS
(SELECT 1 FROM tbMaster m
WHERE m.CID = t.CID OR m.AMOUNT = t.AMOUNT OR m.DATEPAID = t.DATEPAID)
OR NOT EXISTS
(SELECT 1 FROM tbCID c
WHERE c.CID = t.CID OR c.AMOUNT = t.AMOUNT OR c.DATEPAID = t.DATEPAID)

SQL - left join generate duplicates

I have code to select some applications but LEFT JOIN is creating duplicates.
Question: How to get rid of duplicates generated by
LEFT JOIN
attachments as att
ON
(a.ssn = att.ssn AND att.doc_type = 'id_copy' AND att.source = 'cpt3')
My Work:
I am considering SELECT DISTINCT or GROUP BY but, can't figure out where to put them.
This is the query:
SELECT
a.*, c.code, l.who, l.locked_time, att.id AS attnew,
( SELECT
COUNT(*)
FROM
applications as a2
WHERE
a2.approved = 'Y'
AND
a2.paid = 'Y'
AND
a2.paid_back = 'Y'
AND
a2.ssn = a.ssn
) AS previous_apps
FROM
applications as a
LEFT JOIN
locked_by as l
USING(id)
LEFT JOIN
campaign_codes as c
ON
c.id = a.campaign
LEFT JOIN
attachments as att
ON
(a.ssn = att.ssn AND att.doc_type = 'id_copy' AND att.source = 'cpt3')
WHERE
a.closed='N'
AND
a.paid = 'N'
ORDER BY
a.arrived_date
DESC
Use GROUP BY
to avoid duplicates. In your case:
...
WHERE
a.closed='N'
AND
a.paid = 'N'
GROUP BY
a.id
ORDER BY
a.arrived_date
DESC
If you want to make an one to one relation to avoid duplicates and att.id is the same for each att.ssn or you need only one (MAX,MIN,..ANY?) att.id. Try this:
LEFT JOIN
(SELECT ssn,
MAX(id) as id,
FROM attachments
WHERE doc_type = 'id_copy' AND source = 'cpt3'
GROUP BY ssn
)as att
ON
(a.ssn = att.ssn)

Select only the rows where column values appear more than once

I have a select statement similar to the following:
select *
from A
inner join B on A.id_x = B.id_x
inner join C on B.id_y = C.id_y
inner join D on C.id_z = D.id_z
where
A.date > '2014-01-01'
and A.id_y = 154
and D.id_t = 2
What I want is to do something like this and count(A.id_x) > 1, which returns only the parts of the original select which repeat on A.id_x.
Is this possible?
EDIT:
I just tried to solve it using temp tables, with the code I got from T-SQL Insert into table without having to specify every column
Select * Into
#tmpBigTable
From [YourBigTable]
But I got an error message because my tables have the same column names, A.id_x and B.id_x, for example.
"Column names in each table must be unique."
Is there some way to force the issue, or declare arbitrary naming extensions?
select *
from A
inner join B on A.id_x = B.id_x
inner join C on B.id_y = C.id_y
inner join D on C.id_z = D.id_z
where
A.date > '2014-01-01'
and A.id_y = 154
and D.id_t = 2
AND A.id_x IN
(
SELECT A.id_x FROM A
GROUP BY A.id_x
HAVING count(A.id_x)>1);
You can do this with window functions:
select *
from (select *, count(*) over (partition by A.id_x) as cnt
from A inner join
B
on A.id_x = B.id_x inner join
C
on B.id_y = C.id_y inner join
D
on C.id_z = D.id_z
where A.date > '2014-01-01' and A.id_y = 154 and D.id_t = 2
) abcd
where cnt > 1;

Query for logistic regression, multiple where exists

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc
I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.
Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA