access compare three tables - sql

Is there a way to triple check that the data is consistent between the three tables and show if there are any discrepancies in any of the three tables?
Each table is not completely the same but they all have an EmployeeID, amount, datepaid, and CID column.
When I did with two tables I thought I had the SQL as:
SELECT tbMaster.EmployeeID, tbMaster.Amount, tbMaster.DatePaid, tbMaster.CID
FROM tbMaster LEFT JOIN tbCID ON (tbMaster.CID = tbCID.CID) AND (tbMaster.Amount = tbCID.Amount) AND (tbMaster.DatePaid = tbCID.DatePaid)
WHERE (((tbMaster.[Advance/paid])="Paid Respondent") AND ((tbCID.CID) Is Null));
It worked but if the tbCID had any discrepancies it did not catch it...
So for the three tables I thought:
SELECT tbMaster.EmployeeID, tbMaster.Amount, tbMaster.DatePaid, tbMaster.[Advance/paid], tbMaster.CID
FROM tbMaster, tbCID, table3
WHERE tbMaster.CID <> tbCID.CID
OR table3.CID <> tbMaster.CID
OR table3.CID <> tbCID.CID
OR tbMaster.Amount <> tbCID.Amount
OR table3.AMOUNT <> tbMaster.AMOUNT
OR table3.AMOUNT <> tbCID.AMOUNT
OR tbMaster.Datepaid <> tbCID.Datepaid
OR table3.DATEPAID <> tbMaster.DATEPAID
OR table3.DATEPAID <> tbCID.DATEPAID
But there is only 30 entries yet I get 5x or 6x copies and get over 30000 rows/entries in the query...

You are doing a full cross join on your three tables, so for every combination of tbMaster row, tbCID row, and table3 row where the three don't match on the three columns, a row will be returned.
If you are trying to return a tbMaster row where the CID, Amount, and DatePaid do not appear in both tbSUID and table3, you need something similar to your first query:
SELECT tbMaster.EmployeeID, tbMaster.Amount, tbMaster.DatePaid, tbMaster.CID
FROM tbMaster
LEFT JOIN tbSUID ON (tbMaster.CID = tbSUID.CID) AND (tbMaster.Amount = tbSUID.Amount) AND (tbMaster.DatePaid = tbSUID.DatePaid)
LEFT JOIN table3 ON (tbMaster.CID = table3.CID) AND (tbMaster.Amount = table3.Amount) AND (tbMaster.DatePaid = table3.DatePaid)
WHERE ((tbMaster.[Advance/paid])="Paid Respondent")
AND ((tbCID.CID) Is Null) OR (table3.CID Is Null));
If the record has to appear in just one of tbSUID and table3, replace the last "OR" with "AND".

Consider subqueries in NOT EXISTS clauses:
SELECT m.EmployeeID, m.Amount, m.DatePaid, m.[Advance/paid], m.CID
FROM tbMaster m
WHERE NOT EXISTS
(SELECT 1 FROM tbCID c
WHERE c.CID = m.CID OR c.AMOUNT = m.AMOUNT OR c.DATEPAID = m.DATEPAID)
OR NOT EXISTS
(SELECT 1 FROM table3 t
WHERE t.CID = m.CID OR t.AMOUNT = m.AMOUNT OR t.DATEPAID = m.DATEPAID)
For inverse comparisons, consider adding union queries with a source_table indicator:
SELECT m.EmployeeID, m.Amount, m.DatePaid, m.[Advance/paid], m.CID,
'tbMaster' AS [source_table]
FROM tbMaster m
WHERE NOT EXISTS
(SELECT 1 FROM tbCID c
WHERE c.CID = m.CID OR c.AMOUNT = m.AMOUNT OR c.DATEPAID = m.DATEPAID)
OR NOT EXISTS
(SELECT 1 FROM table3 t
WHERE t.CID = m.CID OR t.AMOUNT = m.AMOUNT OR t.DATEPAID = m.DATEPAID)
UNION ALL
SELECT c.EmployeeID, c.Amount, c.DatePaid, c.[Advance/paid], c.CID,
'tbCID' AS [source_table]
FROM tbCID c
WHERE NOT EXISTS
(SELECT 1 FROM tbMaster m
WHERE m.CID = c.CID OR m.AMOUNT = c.AMOUNT OR m.DATEPAID = c.DATEPAID)
OR NOT EXISTS
(SELECT 1 FROM table3 t
WHERE t.CID = c.CID OR t.AMOUNT = c.AMOUNT OR t.DATEPAID = c.DATEPAID)
UNION ALL
SELECT t.EmployeeID, t.Amount, t.DatePaid, t.[Advance/paid], t.CID,
'table3' AS [source_table]
FROM table3 t
WHERE NOT EXISTS
(SELECT 1 FROM tbMaster m
WHERE m.CID = t.CID OR m.AMOUNT = t.AMOUNT OR m.DATEPAID = t.DATEPAID)
OR NOT EXISTS
(SELECT 1 FROM tbCID c
WHERE c.CID = t.CID OR c.AMOUNT = t.AMOUNT OR c.DATEPAID = t.DATEPAID)

Related

Pull a separate column that matches the (min) of an aggregate function

It works well so far but I am stumped from here as I am brand new to this. This query finds the closest distance match, pairing up every item in the "FAILED" folder against everything that isn't in the "FAILED" folder.
There is a column "RouteID" in the "table p" that I want to match up with the min() aggregate.
I cannot process how to make the SELECT query simply show the associated "RouteID" column from tbl p but ultimately, I want to turn this into an update query that will SET a.Route = p.Route that is associated with the min()
Any help would be appreciated.
SELECT a.name, a.Reference1,
MIN(round(ACOS(COS(RADIANS(90-a.lat))
*COS(RADIANS(90-p.latpoint))
+SIN(RADIANS(90-a.lat))
*SIN(RADIANS(90-p.latpoint))
*COS(RADIANS(a.lon-p.longpoint)))
*3958.756,2)) AS 'DISTANCE'
FROM tblOrder AS a WITH (NOLOCK)
LEFT JOIN
(
SELECT b.lat AS latpoint, b.lon AS longpoint,
b.Sequence, b.routeid
from tblOrder b WITH (NOLOCK)
WHERE b.CUSTID = 180016
AND b.routeID <> 'FAILED'
AND b.StopType = 1
) AS p ON 1=1
WHERE a.CustID = 180016
AND a.RouteID = 'FAILED'
AND a.StopType = 1
AND P.RouteID <> 'FAILED'
GROUP BY
a.name, a.Reference1
You can select them separately and then join them
SELECT c.name, c.Reference1, q.RouteID
FROM
(
SELECT a.name, a.Reference1,
MIN(round(ACOS(COS(RADIANS(90-a.lat))
*COS(RADIANS(90-p.latpoint))
+SIN(RADIANS(90-a.lat))
*SIN(RADIANS(90-p.latpoint))
*COS(RADIANS(a.lon-p.longpoint)))
*3958.756,2)) AS 'DISTANCE'
FROM tblOrder AS a WITH (NOLOCK)
LEFT JOIN
(
SELECT b.lat AS latpoint, b.lon AS longpoint,
b.Sequence, b.routeid
from tblOrder b WITH (NOLOCK)
WHERE b.CUSTID = 180016
AND b.routeID <> 'FAILED'
AND b.StopType = 1
) AS p ON 1=1
WHERE a.CustID = 180016
AND a.RouteID = 'FAILED'
AND a.StopType = 1
AND P.RouteID <> 'FAILED'
GROUP BY
a.name, a.Reference1
) c
LEFT JOIN
(
SELECT b.lat AS latpoint, b.lon AS longpoint,
b.Sequence, b.routeid
from tblOrderRouteStops b WITH (NOLOCK)
WHERE b.CUSTID = 180016
AND b.routeID <> 'FAILED'
AND b.StopType = 1
) AS q
ON q.routeID = c.DISTANCE

Convert Exist condition to Join with T-SQL

I am trying to convert the following T-SQL Select query to exclude "Exists" Clause and Include "Join" Clause. but i am ending up not getting the right result. can some one from this expert team help me with some tips.
select *
FROM HRData
INNER JOIN (
SELECT eeceeid, MIN(eecdateoftermination) eTermDate
FROM dbo.empcomp
INNER JOIN
(
SELECT xeeid FROM HRData_EEList
INNER JOIN dbo.empcomp t ON xeeid = eeceeid AND xcoid = eeccoid
WHERE eecemplstatus = 'T' AND eectermreason <> 'TRO' AND eeccoid <> 'WAON6'
AND EXISTS ( SELECT 1 FROM dbo.empded
INNER JOIN dbo.dedcode on deddedcode = eeddedcode AND deddedtype = 'MED' AND (eedbenstopdate IS NULL OR eedbenstopdate > '12/31/2005')
WHERE eedeeid = xeeid AND eedcoid = xcoid )
GROUP BY xeeid
HAVING COUNT(*) > 1) Term ON xeeid = eeceeid
group by eeceeid
) Terms ON eeid = eeceeid AND Termdate = eTermDate
The algorithm to convert EXISTS to JOIN is very simple.
Instead of
FROM A
WHERE EXISTS (SELECT *
FROM B
WHERE A.Foo = B.Foo)
Use
FROM A
INNER JOIN (SELECT DISTINCT Foo
FROM B) AS B
ON A.Foo = B.Foo
But the first one probably will be optimised better
Interesting request.
select *
FROM HRData
INNER JOIN (
SELECT eeceeid, MIN(eecdateoftermination) eTermDate
FROM dbo.empcomp
INNER JOIN
(
SELECT xeeid FROM HRData_EEList
INNER JOIN dbo.empcomp t ON xeeid = eeceeid AND xcoid = eeccoid
INNER JOIN
( SELECT DISTINCT xeeid, xcoid FROM dbo.empded
INNER JOIN dbo.dedcode on deddedcode = eeddedcode AND deddedtype = 'MED' AND (eedbenstopdate IS NULL OR eedbenstopdate > '12/31/2005')
-- WHERE eedeeid = xeeid AND eedcoid = xcoid
) AS A ON xeeid = A.xeeid AND eedcoid = A.eedcoid
WHERE eecemplstatus = 'T' AND eectermreason <> 'TRO' AND eeccoid <> 'WAON6'
GROUP BY xeeid
HAVING COUNT(*) > 1) Term ON xeeid = eeceeid
group by eeceeid
) Terms ON eeid = eeceeid AND Termdate = eTermDate
Another method of converting an exists to a join is to use a ROW_NUMBER() in the subselect to assist in removing duplicates.
EXISTS:
FROM A
WHERE EXISTS (SELECT *
FROM B
WHERE B.Condition = 'true' AND A.Foo = B.Foo)
JOIN:
FROM A
JOIN (SELECT B.Foo, ROW_NUMBER() OVER (PARTITION BY B.Foo ORDER BY B.Foo) RN
FROM B
WHERE B.Condition = 'true') DT
ON A.Foo = DT.Foo AND DT.RN = 1
The ORDER BY is totally arbitrary since you don't care which record it selects, but it's required. You may be able to use (SELECT NULL) instead.

Using a group by to group a select statement

Using a group by to group a select stament
SELECT
k.Ivalue, k.JOBDESCRIPTION ,
count( k.Ivalue) as TOTAL
FROM
(SELECT
a."ID" as Ivalue, b."JOBDESCRIPTION", rq."CURRENTSTATUS"
FROM
tblG2o_Requests a
INNER JOIN
tblG2o_JOBS b ON a."JOBPOSTID" = b."ID"
INNER JOIN
(SELECT
r.REQUESTID, ir."CURRENTSTATUS"
FROM
TBLG2O_RESULTSPOOL r
INNER JOIN
tblG2o_Requests ir ON r.RequestID = ir."ID"
WHERE
r.ShortListed = '1') rq ON rq.REQUESTID = a."ID"
WHERE
"ACTIVE" = '1'
AND "DATECOMPLETED" IS NULL
ORDER BY
"REQUESTDATE" DESC) k
GROUP BY
k.JOBDESCRIPTION
What is the question? You seem to be missing the group by clause, and you do not need double quotes around field names unless you have spaces in them, and even then, if TSQL for example, you would use [] in preference.
I had to remove an ORDER BY in the subquery, that isn't allowed unless other conditions demand it (like TOP n in TSQL)
SELECT
k.Ivalue
, k.JOBDESCRIPTION
, COUNT(k.Ivalue) AS TOTAL
FROM (
SELECT
a.ID AS Ivalue
, b.JOBDESCRIPTION
, rq.CURRENTSTATUS
FROM tblG2o_Requests a
INNER JOIN tblG2o_JOBS b
ON a.JOBPOSTID = b.ID
INNER JOIN (
SELECT
r.REQUESTID
, ir.CURRENTSTATUS
FROM TBLG2O_RESULTSPOOL r
INNER JOIN tblG2o_Requests ir
ON r.RequestID = ir.ID
WHERE r.ShortListed = '1'
) rqenter
ON rq.REQUESTID = a.ID
WHERE ACTIVE = '1'
AND DATECOMPLETED IS NULL
) k
GROUP BY
k.Ivalue
, k.JOBDESCRIPTION
Finally worked
SELECT
k.Ivalue
, l.JOBDESCRIPTION
, k.TOTAL,
k.CURRENTSTATUS
FROM (
SELECT
a.ID AS Ivalue
,b.ID as JobPostID
, rq."CURRENTSTATUS"
,COUNT(a.ID) AS TOTAL
FROM tblG2o_Requests a
INNER JOIN tblG2o_JOBS b
ON a."JOBPOSTID" = b.ID
INNER JOIN (
SELECT
r."REQUESTID"
, ir."CURRENTSTATUS"
FROM TBLG2O_RESULTSPOOL r
INNER JOIN tblG2o_Requests ir
ON r."REQUESTID" = ir.ID
WHERE r."SHORTLISTED" = 1
) rq
ON rq."REQUESTID" = a.ID
WHERE ACTIVE = '1'
AND DATECOMPLETED IS NULL
GROUP BY
a.ID ,b.ID
, rq."CURRENTSTATUS" ) k
inner join tblG2o_JOBS l on k.JobPostID =l.ID
enter code here

SQL - left join generate duplicates

I have code to select some applications but LEFT JOIN is creating duplicates.
Question: How to get rid of duplicates generated by
LEFT JOIN
attachments as att
ON
(a.ssn = att.ssn AND att.doc_type = 'id_copy' AND att.source = 'cpt3')
My Work:
I am considering SELECT DISTINCT or GROUP BY but, can't figure out where to put them.
This is the query:
SELECT
a.*, c.code, l.who, l.locked_time, att.id AS attnew,
( SELECT
COUNT(*)
FROM
applications as a2
WHERE
a2.approved = 'Y'
AND
a2.paid = 'Y'
AND
a2.paid_back = 'Y'
AND
a2.ssn = a.ssn
) AS previous_apps
FROM
applications as a
LEFT JOIN
locked_by as l
USING(id)
LEFT JOIN
campaign_codes as c
ON
c.id = a.campaign
LEFT JOIN
attachments as att
ON
(a.ssn = att.ssn AND att.doc_type = 'id_copy' AND att.source = 'cpt3')
WHERE
a.closed='N'
AND
a.paid = 'N'
ORDER BY
a.arrived_date
DESC
Use GROUP BY
to avoid duplicates. In your case:
...
WHERE
a.closed='N'
AND
a.paid = 'N'
GROUP BY
a.id
ORDER BY
a.arrived_date
DESC
If you want to make an one to one relation to avoid duplicates and att.id is the same for each att.ssn or you need only one (MAX,MIN,..ANY?) att.id. Try this:
LEFT JOIN
(SELECT ssn,
MAX(id) as id,
FROM attachments
WHERE doc_type = 'id_copy' AND source = 'cpt3'
GROUP BY ssn
)as att
ON
(a.ssn = att.ssn)

Query for logistic regression, multiple where exists

A logistic regression is a composed of a uniquely identifying number, followed by multiple binary variables (always 1 or 0) based on whether or not a person meets certain criteria. Below I have a query that lists several of these binary conditions. With only four such criteria the query takes a little longer to run than what I would think. Is there a more efficient approach than below? Note. tblicd is a large table lookup table with text representations of 15k+ rows. The query makes no real sense, just a proof of concept. I have the proper indexes on my composite keys.
select patient.patientid
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1000
)
then '1' else '0'
end as moreThan1000
,case when exists
(
select c.patientid from tblclaims as c
inner join patient as p on p.patientid=c.patientid
and c.admissiondate = p.admissiondate
and c.dischargedate = p.dischargedate
where patient.patientid = p.patientid
group by c.patientid
having count(*) > 1500
)
then '1' else '0'
end as moreThan1500
,case when exists
(
select distinct picd.patientid from patienticd as picd
inner join patient as p on p.patientid= picd.patientid
and picd.admissiondate = p.admissiondate
and picd.dischargedate = p.dischargedate
inner join tblicd as t on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%' and patient.patientid = picd.patientid
)
then '1' else '0'
end as diabetes
,case when exists
(
select r.patientid, count(*) from patient as r
where r.patientid = patient.patientid
group by r.patientid
having count(*) >1
)
then '1' else '0'
end
from patient
order by moreThan1000 desc
I would start by using subqueries in the from clause:
select q.patientid, moreThan1000, moreThan1500,
(case when d.patientid is not null then 1 else 0 end),
(case when pc.patientid is not null then 1 else 0 end)
from patient p left outer join
(select c.patientid,
(case when count(*) > 1000 then 1 else 0 end) as moreThan1000,
(case when count(*) > 1500 then 1 else 0 end) as moreThan1500
from tblclaims as c inner join
patient as p
on p.patientid=c.patientid and
c.admissiondate = p.admissiondate and
c.dischargedate = p.dischargedate
group by c.patientid
) q
on p.patientid = q.patientid left outer join
(select distinct picd.patientid
from patienticd as picd inner join
patient as p
on p.patientid= picd.patientid and
picd.admissiondate = p.admissiondate and
picd.dischargedate = p.dischargedate inner join
tblicd as t
on t.icd_id = picd.icd_id
where t.descrip like '%diabetes%'
) d
on p.patientid = d.patientid left outer join
(select r.patientid, count(*) as cnt
from patient as r
group by r.patientid
having count(*) >1
) pc
on p.patientid = pc.patientid
order by 2 desc
You can then probably simplify these subqueries more by combining them (for instance "p" and "pc" on the outer query can be combined into one). However, without the correlated subqueries, SQL Server should find it easier to optimize the queries.
Example of left joins as requested...
SELECT
patientid,
ISNULL(CondA.ConditionA,0) as IsConditionA,
ISNULL(CondB.ConditionB,0) as IsConditionB,
....
FROM
patient
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionA from ... where ... ) CondA
ON patient.patientid = CondA.patientID
LEFT JOIN
(SELECT DISTINCT patientid, 1 as ConditionB from ... where ... ) CondB
ON patient.patientid = CondB.patientID
If your Condition queries only return a maximum one row, you can simplify them down to
(SELECT patientid, 1 as ConditionA from ... where ... ) CondA