Finding unique matches in 2 separate databases - sql

I have 2 databases that have the same structure, but different data. Both are SQL 2005.
I am trying to find which of the Persons in Database A, exist in Database B. My best opportunity for match is to match on FirstName and LastName.
I only want to bring back a list of:
DatabaseA.Person
DatabaseB.Person
Where:
1. I want all records from DatabaseA, even if there is not a match in Database B.
2. I only want records from DatabaseB where the FirstName/LastName match only one record in DatabaseB.
I have written a query, where I group by, but since I need to see more data than FirstName and LastName, I cannot bring it back without grouping it - which gives me many duplicates. What kind of query should I be using? Do I need to use a cursor?
Here is my query now, which sort of works - except I'm getting results for duplicates in DatabaseB and all I want to know about Database B is when FirstName/LastName matches to one distinct record and no others. My objective is to get a list of people that I know are the same person in 2 databases so that I can build a dictionary list of department code mappings between employees.
select
count(DatabaseAEmployee.id) as matchcount
, DatabaseAPerson.id as DatabaseAPersonid
, DatabaseAEmployee.DeptCode DatabaseADeptCode
, DatabaseAPerson.firstname as DatabaseAfirst
, DatabaseAPerson.lastname as DatabaseAlast
, DatabaseBPerson.id as DatabaseBPersonid
, DatabaseBEmployee.DeptCode as DatabaseBDeptCode
, DatabaseBPerson.firstname as DatabaseBfirst
, DatabaseBPerson.lastname as DatabaseBlast
, DatabaseAPerson.ssn as DatabaseAssn
, DatabaseBPerson.ssn as DatabaseBssn
, DatabaseAPerson.dateofbirth as DatabaseAdob
, DatabaseBPerson.dateofbirth as DatabaseBdob
FROM [DatabaseA].[dbo].Employee DatabaseAEmployee
LEFT OUTER JOIN [DatabaseA].[dbo].Person DatabaseAPerson
ON DatabaseAPerson.id = DatabaseAEmployee.id
LEFT OUTER JOIN [DatabaseB].[dbo].Person DatabaseBPerson
ON
DatabaseAPerson.firstname = DatabaseBPerson.firstname
AND
DatabaseAPerson.lastname = DatabaseBPerson.lastname
LEFT OUTER JOIN [DatabaseB].[dbo].Employee DatabaseBEmployee
on DatabaseBEmployee.id = DatabaseBPerson.id
group by
DatabaseAPerson.firstname
, DatabaseAPerson.lastname
, DatabaseAPerson.id
, DatabaseAEmployee.DeptCode
, DatabaseBPerson.id
, DatabaseBEmployee.DeptCode
, DatabaseBPerson.firstname
, DatabaseBPerson.lastname
, DatabaseBPerson.ssn
, DatabaseAPerson.ssn
, DatabaseBPerson.dateofbirth
, DatabaseAPerson.dateofbirth
Here's what I'm trying now, but I'm getting duplicates on the left side:
with UniqueMatchedPersons (Id, FirstName, LastName)
as (
select
p2.ID, p2.FirstName, p2.LastName
from
[DatabaseA].[dbo].[Employee] p1
INNER JOIN [DatabaseA].[dbo].[Person] p2 on p1.id = p2.id
inner join [DatabaseB].[dbo].[Person] p3
on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName
INNER JOIN [DatabaseB].[dbo].[Employee] p4
on p3.id = p4.id
group by p2.ID, p2.FirstName, p2.LastName
having count(p2.ID) = 1
)
select p1.*, p2.*
from DatabaseA.dbo.Person p1
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID
left outer join DatabaseB.dbo.Person p2
on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName

Try this:
SELECT id,FirstName,Lastname
FROM dba.Persons
UNION
SELECT b.id,b.FirstName,b.LastName
FROM dbb.Persons as b
INNER JOIN dba.Persons as a
ON b.FirstName = a.FirstName AND b.LastName = a.LastName
If you want to get all from A and only those from B that DON'T have a match (which would make more sense to me) i'd use this:
SELECT id,FirstName,Lastname
FROM dba.Persons
UNION
SELECT b.id,b.FirstName,b.LastName
FROM dbb.Persons as b
LEFT OUTER JOIN dba.Persons as a
ON b.FirstName = a.FirstName AND b.LastName = a.LastName
WHERE a.id is null

Try something like:
Select dta.LastName, dta.FirstName, dta.[otherColumns] dtb.LastName, dtb.FirstName
dtb.[otherColumns]
From [databaseA].[table] as dta
LEFT OUTER JOIN [databaseB].[table] as dtb
on dta.Lastname = dtb.LastName and dta.FirstName = dtb.FirstName
That should get you: 1) everyone in table A, and 2) everyone in table B who is has a Lastname/Firstname match in table A.

Works when SQL Server (at least it should)
SELECT
A.*
, B.*
FROM
DatabaseA.dbo.Person A
LEFT JOIN DatabaseB.dbo.Person B
ON A.FirstName = B.FirstName AND A.LastName = B.LastName
Edit: You mention you receive duplicates from DatabaseB where you only need the match on first and lastname. But you also request other data (then first/lastname) this is the problem. If you distinct data they you only request that data.

Using transact-sql, the following untested query should allow you to view unique matches only:
select
p1.ID, p1.FirstName, p1.LastName
from
[DatabaseA].[dbo].[Persons] p1
left outer join [DatabaseB].[dbo].[Persons] p2
on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName
group by p1.ID, p1.FirstName, p2.LastName
having count(p1.ID) = 1
If using Sql Server, this can then be encapsulated within a common table expression, to which you can perform a join.
with UniqueMatchedPersons (Id, FirstName, LastName)
as (
--query in previous code snippet
)
select persons.*
from Persons
inner join UniqueMatchedPersons on Persons.ID = UniqueMatchedPersons.ID
Update:
If you wish to select fields from both tables, you can simply respecify the original join condition that evaluated name matching before; this is because duplicated matches on the left hand side of the join have been filtered out by the having aggregate condition.
Modifying the select portion of the above snippet to read the following will allow you to select fields from either side of the join:
select p1.*, p2.*
from [DatabaseA].[dbo].[Persons] p1
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID
left outer join [DatabaseB].[dbo].[Persons] p2
on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName
Update 2:
To filter out duplicates on the left hand side (which will also cause duplicates on the right) you'll have to remove the grouping on [DatabaseA].[dbo].[Persons].[ID].
When I refer to duplicates, I mean names in adjacent rows that are identical in terms of characters and padding. If you have diacritic variations of first and last names, then the results of the name comparison will be subject to the database collation (unless you explicity declare a collation on a join expression). Likewise if you have variations in spacing, padding or punctuation between names, you may have to consider a different approach than a direct equality operator for name matching.
Try the following:
with UniqueMatchedPersons (FirstName, LastName)
as (
select
p1.FirstName, p1.LastName
from
[DatabaseA].[dbo].[Person] p1
left outer join [DatabaseB].[dbo].[Person] p2
on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName
group by p1.FirstName, p1.LastName
having count(p1.FirstName) = 1
)
select p1.*, p2.*, e1.*, e2.*
from [DatabaseA].[dbo].[Person] p1
inner join UniqueMatchedPersons ump
on p1.FirstName = ump.FirstName and p1.LastName = ump.LastName
left outer join [DatabaseB].[dbo].[Person] p2
on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName
inner join [DatabaseA].[dbo].[Employee] e1 on p1.ID = e1.ID
inner join [DatabaseB].[dbo].[Employee] e2 on e2.ID = p2.ID
order by p1.id asc

Related

Access Subquery On mulitple conditions

This SQL query needs to be done in ACCESS.
I am trying to do a subquery on the total sales, but I want to link the sale to the province AND to product. The below query will work with one or the other: (po.product_name = allp.all_products) AND (p.province = allp.all_province); -- but it will no take both.
I will be including every month into this query, once I can figure out the subquery on with two criteria.
Select
p.province as [Province],
po.product_name as [Product],
all_price
FROM
(purchase_order po
INNER JOIN person p
on p.person_id = po.person_id)
left join
(
select
po1.product_name AS [all_products],
sum(pp1.price) AS [all_price],
p1.province AS [all_province]
from (purchase_order po1
INNER JOIN product pp1
on po1.product_name = pp1.product_name)
INNER JOIN person p1
on po1.person_id = p1.person_id
group by po1.product_name, pp1.price, p1.province
)
as allp
on (po.product_name = allp.all_products) AND (p.province = allp.all_province);
Make the first select sql into a table by giving it an alias and join table 1 to table 2. I don't have your table structure or data to test it but I think this will lead you down the right path:
select table1.*, table2.*
from
(Select
p.province as [Province],
po.product_name as [Product]
--removed this ,all_price
FROM
(purchase_order po
INNER JOIN person p
on p.person_id = po.person_id) table1
left join
(
select
po1.product_name AS [all_products],
sum(pp1.price) AS [all_price],
p1.province AS [all_province]
from (purchase_order po1
INNER JOIN product pp1
on po1.product_name = pp1.product_name)
INNER JOIN person p1
on po1.person_id = p1.person_id
group by po1.product_name, pp1.price, p1.province --check your group by, I dont think you want pp1.price here if you want to aggregate
) as table2 --changed from allp
on (table1.product = table2.all_products) AND (table1.province = table2.all_province);

PostgreSQL select multiple columns of a table that is connected via a many-to-many pivot

I have this query:
SELECT
a.account_uuid,
a.account_no,
a.account_group_uuid,
a.account_scope_uuid,
a.created_at,
a.deleted_at,
s.service_uuid,
s.status,
st.service_type,
(
SELECT
c.company
FROM companies c
WHERE a.company_owner_uuid = c.company_uuid
)
FROM
accounts a
LEFT JOIN
services s
ON a.account_uuid = s.account_uuid
LEFT JOIN
service_types st
ON s.service_type_uuid = st.service_type_uuid
WHERE
a.deleted_at IS NULL
ORDER BY
a.account_no
And I need to join and select multiple columns from a people table by way of a pivot table accounts_contacts that would have the account_uuid and a person_uuid. There are also is_primary and is_active columns on the accounts_contacts table and there will only be one primary at a time, so the end result would be a single first and last name. This is the idea of the query:
SELECT
p.first_name, p.last_name
FROM
people p
INNER JOIN
accounts_contacts ac
ON ac.account_uuid = a.account_uuid
AND ac.person_uuid = p.person_uuid
WHERE
ac.is_primary = true
AND ac.is_active = true
But not sure how to fit it into the above query. A subquery would only allow for one of the columns.
account_contacts is an "association" or "junction" table. It is not a pivot table.
The basic idea should be joins:
SELECT . . . ,
p.first_name, p.last_name
FROM accounts a LEFT JOIN
services s
ON a.account_uuid = s.account_uuid LEFT JOIN
service_types st
ON s.service_type_uuid = st.service_type_uuid LEFT JOIN
accounts_contacts ac
ON ac.account_uuid = a.account_uuid LEFT JOIN
people p
ON ac.person_uuid = p.person_uuid AND
ac.is_primary = true AND
ac.is_active = true

How can I join 3 tables, limit the query based on a condition, and return empty columns if condition not met?

I have the following query currently:
select * from people
LEFT JOIN addresses
ON people.id = addresses.id
LEFT JOIN pers
ON people.id = pers.pers_id
WHERE people.id =:id
AND addresses.is_primary = 'Y'
Of course if there is no address where is_primary = 'Y' for a given person, the query doesn't return any results.
Without is_primary='Y', the query returns multiple addresses.
Is there any way, instead, to return null columns for all of the address fields in the event where there is no record for the id where is_primary = 'Y'?
You can do something like this -
select *
from people
LEFT JOIN addresses
ON people.pidm = addresses.pidm
and addresses.is_primary = 'Y'
RIGHT JOIN pers
ON people.id = pers.pers_id
WHERE people.id = :id
use case when
select *,case when addresses.is_primary not in('Y') then 'primary address different' else addresses.is_primary end as is_primary from people
LEFT JOIN addresses
ON people.pidm = addresses.pidm
RIGHT JOIN pers
ON people.id = pers.pers_id
WHERE people.id =:id
I strongly recommend that you not mix left join and right join. The query is just so hard to follow.
Instead, start with the table where you want to keep all the rows. Then only use left join. Sometimes, you may need to put conditions in the on clause.
In your case:
select *
from people p left join
addresses a
on p.pidm = a.pidm and a.is_primary = 'Y' left join
pers
on p.id = pers.pers_id
where p.id = :id;

Join query with not equal

I want to get all unpaid male customers those who are not in any plan
SELECT cr.id, cr.type FROM mydb.customer cr
JOIN mydb.plan1 p1 on cr.id != p1.id
JOIN mydb.plan2 p2 on cr.id != p2.id
JOIN mydb.plan3 p3 on cr.id != p3.id
WHERE cr.type = 'male'
is this query correct?
You could use a series of three left joins along with IS NULL:
SELECT cr.id, cr.type
FROM mydb.customer cr
LEFT JOIN mydb.plan1 p1
ON cr.id = p1.id
LEFT JOIN mydb.plan2 p2
ON cr.id = p2.id
LEFT JOIN mydb.plan3 p3
ON cr.id = p3.id
WHERE p1.id IS NULL AND p2.id IS NULL AND p3.id iS NULL AND
cr.type = 'male'
Since all you seem to need is the id, EXCEPT should be a good choice here:
SELECT id FROM mydb.customer WHERE type = 'male'
EXCEPT ALL SELECT id FROM mydb.plan1
EXCEPT ALL SELECT id FROM mydb.plan2
EXCEPT ALL SELECT id FROM mydb.plan3;
To be precise: EXCEPT ALL:
Using EXCEPT clause in PostgreSQL
Basic techniques:
Select rows which are not present in other table
Multiple joins may not perform as fast if each table can have multiple related rows due to multiplication of rows in the intermediary derived table. Just test performance with EXPLAIN ANALYZE.

Extra row to SELECT statement, crosstab? Access 2007

I'm working with some biostats people and of course they love SAS. I have a select statement below that works for testing the presence of certain problems a person can have. It's a binary thing so they either do or they don't. If a person has heart problem and a respiratory problem, then their patientID will be listed twice. How can I add an extra column of a 1 or 0 for every morbidity? So, if I have three problems and they are "HEART", "LUNG" and "UTI", an extra column would be generated that has a 1 or 0 based on the presence of that a person had that problem or not.
I suppose I can use Excel to make it a crosstab, but eventually it will need to be in that format. Below is my SELECT statement. Thanks, folks!
EDITED:
TRANSFORM First(Person.PersonID) AS Morbidity
SELECT Person.PersonID, Person.Age, Person.Sex
FROM tblKentuckyCounties INNER JOIN ((tblComorbidity INNER JOIN comorbidVisits ON tblComorbidity.ID = comorbidVisits.comorbidFK) INNER JOIN (Person INNER JOIN tblComorbidityPerson ON Person.PersonID = tblComorbidityPerson.personID) ON tblComorbidity.ID = tblComorbidityPerson.comorbidityFK) ON tblKentuckyCounties.ID = Person.County
WHERE (((tblComorbidity.comorbidityexplanation)="anxiety and depression" Or (tblComorbidity.comorbidityexplanation)="heart" Or (tblComorbidity.comorbidityexplanation)="hypertension" Or (tblComorbidity.comorbidityexplanation)="pressure sores" Or (tblComorbidity.comorbidityexplanation)="tobacco" Or (tblComorbidity.comorbidityexplanation)="uti"))
GROUP BY Person.PersonID, Person.Age, Person.Sex, tblComorbidity.comorbidityexplanation
PIVOT Person.Race;
This is not tested:
TRANSFORM IIf([c.comorbidityexplanation]=
[c.comorbidityexplanation],1,0) AS Morbidity
SELECT p.PersonID, p.Age, p.Sex, p.Race
FROM tblKentuckyCounties kc
INNER JOIN ((tblComorbidity c
INNER JOIN comorbidVisits cv
ON c.ID = cv.comorbidFK)
INNER JOIN (Person p
INNER JOIN tblComorbidityPerson cp
ON p.PersonID = cp.personID)
ON c.ID = cp.comorbidityFK)
ON kc.ID = p.County
GROUP BY p.PersonID, p.Age, p.Sex, p.Race
PIVOT c.comorbidityexplanation