Join multiple tables to return only one result for each record from main table - sql

Currently I have three tables I am joining. I have data that was migrated from one system(old) to another system(new). I need to compare this data to ensure matches but also mismatches. I have three tables. One has the list of accounts being moved. The two systems have differnt ID types so this first table is a list of all IDs for the two tables and each account that was moved. So this is my base population.
ID1 ID2
ABC 123
ABC 123
ABC 123
DEF 456
DEF 456
DEF 456
I then have table 2 which is all the data from the old system.
ID Fname Lname
ABC John Smith
ABC Tom Smith
ABC Kate Smith
DEF Jason Thomas
DEF Ruby Thomas
DEF Alex Johnson
Then table 3 is all the data found in the new system.
ID Fname Lname
123 John Smith
123 Tom Smith
123 Kate Smith
456 Jason Thomas
456 Ruby Thomas
Right now when I join these tables on the ID I get a lot more rows than I need.
When I do my join I receive this:
ID Fname_old Lname_old ID2 Fname_new Lname_new
ABC John Smith 123 John Smith
ABC John Smith 123 Tom Smith
ABC John Smith 123 Kate Smith
I am trying to join them where it only returns the row that matches, and if it can't find a match I should still get the ID from the ID file and the data from table 2(old data) as this is the data that was sent to the new system.
ID1 ID2 Fname_old Lname_old Fname_new Lname_new
ABC 123 John Smith John Smith
ABC 123 Tom Smith Tom Smith
ABC 123 Kate Smith Kate Smith
DEF 456 Jason Thomas Jason Thomas
DEF 456 Ruby Thomas Ruby Thomas
DEF 456 Alex Johnson
The code I am using is:
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from table1 a
left join table2 b
on a.ID1 = b.ID
left join table3 c
on a.ID2 = c.ID

If its just duplicate rows in your first table you could try distincting them in a derived table like below:
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from (SELECT DISTINCT ID1, ID2 FROM table1) a
left join table2 b
on a.ID1 = b.ID
left join table3 c
on a.ID2 = c.ID

You are joining them on ID columns.
ID columns are usually UNIQUE while you have multiple identical IDs and specify join on those IDs.
Since you need to compare data, i suggest you lookup MATCH and how it works as that seems to be closer to what you are looking for here.

You can get a match using row_number():
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from (select a.*,
row_number() over (partition by id order by id) as seqnum
from table1 a
) a left join
(select b.*,
row_number() over (partition by id order by id) as seqnum
from table2 b
) b
on a.ID1 = b.ID and a.seqnum = b.seqnum
(select c.*,
row_number() over (partition by id order by id) as seqnum
from table3 c
) c
on a.ID2 = c.ID and a.seqnum = c.seqnum;
Note: This does not preserve the "ordering" of the original values, so any rows can be matched with any other. Why? SQL tables represent unordered sets.
If there is an ordering in the tables, you can use that in the order by clauses to get a match consistent with the ordering.

If you have a compare chance for name and last name this code will work.
select DISTINCT a.ID1, a.ID2, b.fname as fname_old, b.lname as lname_old, c.fname as
fname_new, c.lname as lname_new from table2 b
left join table1 a on a.ID1=b.ID
left join table3 c on a.ID2=c.ID and b.Fname=c.Fname and b.Lname=c.Lname
My Result :
ID1 ID2 fname_old lname_old fname_new lname_new
ABC 123 John Smith John Smith
ABC 123 Kate Smith Kate Smith
ABC 123 Tom Smith Tom Smith
DEF 456 Alex Johnson NULL NULL
DEF 456 Jason Thomas Jason Thomas
DEF 456 Ruby Thomas Ruby Thomas

You say that this is data transferred to two systems. So you expect all data to match. You could hence reduce the query to only find data that doesn't match, if any.
Here is a SQL standard compliant query. You tagged your request with hive. I don't know about hive, so you may have to adjust the query.
select
t2.id as id1,
t3.id as id2,
t2.fname as fname_old,
t2.lname as lname_old,
t3.fname as fname_new,
t3.lname as lname_new
from table2 t2
full outer join t3
on t3.fname = t2.fname
and t3.lname = t2.lname
and exists (select null from table1 t1 where t1.id1 = t2.id and t1.id2 = t3.id)
where t2.id is null or t3.id is null;
This is a full anti join. It returns all rows that have no exact match in the other table. It doesn't, however guesstimate which deviating rows may be pairs. You will get a result like this:
ID1 | ID2 | Fname_old | Lname_old | Fname_new | Lname_new
----+-----+-----------+-----------+-----------+----------
DEF | | Alex | Johnson | |
GHI | | Jone | Miller | |
GHI | | Maxx | Miller | |
GHI | | Fritz | Miller | |
| 789 | | | Joan | Miller
| 789 | | | Max | Miller
| 799 | | | Fritz | Miller
As you see, you would have to examine this result manually. But ideally the query shouldn't return any row at all, which would just prove that everything went as expected and nobody (system or person) messed with the data :-)

Related

Pull up the most recent record including joining 2 tables and filters

I have seen a lot of posts on pulling up the most recent record. I haven't been able to find one that includes joining another table and filters.
What I need is information regarding the most recent document (record) created, but only if it meets certain criteria. PLUS I need to pull in some data from another table.
s504Plans Table
Student ID | Firstname | Startdate | Status
---------- --------- --------- ------
111111 Johnny 1/5/2015 F
222222 Sue 4/7/2016 I
333333 Barb 2/5/2016 F
111111 Johnny 2/1/2016 F
Cases Table
Student ID | School |
---------- ------
111111 Franklin
222222 Eisenhower
333333 Franklin
And the results I'd like to see are only the most recent document where the status of the document is F...
Student ID | Firstname | Startdate | Status | School
---------- --------- --------- ------ ------
111111 Johnny 2/1/2016 F Franklin
333333 Barb 2/5/2016 F Franklin
Thanks!
You can use inner join and where
select
a.Student_ID
, a.Firstname
, a.Startdate
, a.Status
, b.School
from s504Plans as a
inner join Cases as b on a.Student_ID = b.Student_ID
inner join ( select Student_ID, max(Startdate ) as max_startdate
from s504Plans
group by Student_ID) t
on ( a.Student_id = t.Student_id and a.Startdate = t.max_startdate)
where a.Status = 'F'

SQL Query, GROUP/COUNT issue with INNER JOIN

I've got a data set composed primarily of dates, IDs, and addresses, that looks a bit like this:
datadate id address
20150801 Bob 123
20150801 Bob 123
20150801 Dan 345
20150801 Dan 456
20150801 Dan 567
20150801 George 234
20150801 Jim 123
20150801 Jim 123
20150801 John 678
20150801 John 123
20150802 Tom 123
20150802 Tom 234
20150802 Tom 345
My goal is to write a query which identifies any IDs which are associated with multiple distinct addresses for a specific date (or date range). I want the query results to give me the name and distinct addresses. So, for this data set, the results I'd like to see would look like this, for date 8/1/2015:
datadate id address
20150801 Dan 345
20150801 Dan 456
20150801 Dan 567
20150801 John 678
20150801 John 123
The query I've worked up so far is this, but it's not really working for me:
SELECT a.[datadate], a.[id], a.[address], b.[count1]
FROM table1 AS a INNER JOIN (SELECT [id], COUNT([address]) as [count1] FROM table1 GROUP BY [id] having count1 > 1 ) AS b ON a.[id]=b.[id]
WHERE a.[datadate] = '20150801'
ORDER BY a.[id], a.[address];
Any suggestions?
Just modifying your existing query a little bit, you can change your having to count(distinct address) and then joining back to the table to get your address values like this:
SELECT t.datadate
,t.id
,t1.address
FROM (
SELECT datadate
,id
,count(DISTINCT address) address
FROM test
WHERE datadate = '20150801'
GROUP BY datadate,id
HAVING count(DISTINCT address) > 1
) t
INNER JOIN test t1 ON t.datadate = t1.datadate
AND t.id = t1.id;
I tested this on SQL Server, but should be similar in MS-Access as well.
SQL Fiddle Demo
Edit
I just read your question again and it appears you want all duplicates. In which case I would use exists to see if another row with the same id but a different address exists.
select * from mytable t1
where datadate = '20150801'
and exists (
select 1 from mytable t2
where t2.id = t1.id
and t2.address <> t1.address
and t2.datadate = t1.datadate
)

How to bring together multiple delta tables?

I have a table with IDs and primary information. I also have two delta tables keyed on ID and date of change. I need to build a view that merges these three tables together indicating all changes over time.
Main Table:
ID Name
-- ------------------
1 Bob Jones
2 Dave Smith
First Attribute Table:
ID Date Attr1
-- ---------- -----
1 01/01/2013 25
1 02/15/2013 33
1 02/17/2013 47
1 03/02/2013 58
2 02/01/2013 1
...
Second Attribute Table
ID Date Attr2
-- ---------- -----
1 01/01/2013 ABC
1 01/05/2013 DEF
1 01/15/2013 RST
1 02/10/2013 XYZ
1 02/15/2013 Foo
1 03/05/2013 Blah
2 02/01/2013 Two
...
Based on that data, for Bob Jones, I need the view to return the following:
ID Name Date Attr1 Attr2
-- ----------- ---------- ----- -----
1 Bob Jones 01/01/2013 25 ABC
1 Bob Jones 01/05/2013 25 DEF
1 Bob Jones 01/15/2013 25 RST
1 Bob Jones 02/10/2013 25 XYZ
1 Bob Jones 02/15/2013 33 Foo
1 Bob Jones 02/17/2013 47 Foo
1 Bob Jones 03/02/2013 58 Foo
1 Bob Jones 03/05/2013 58 Blah
I tried outer joining the attribute tables to get all change values ordered by date and then used an outer join on the entire query with itself to get "prior" records:
with qry as (
select
rownum = ROW_NUMBER() OVER (ORDER BY m.ID, a.DATE),
m.ID,
m.Name,
a.DATE,
a.Attr1,
a.Attr2
from Main m
inner join (
select
COALESCE(a1.ID, a2.ID) as ID,
COALESCE(a1.LOAD_DATE, a2.LOAD_DATE) as LOAD_DATE,
a1.Attr1,
a2.Attr2
from Attributes1 a1
full outer join Attributes2 a2
on (a1.ID = a2.ID and a1.DATE = a2.DATE)
) a on (a.ID = m.ID)
)
select
COALESCE(qry.ID, prev.ID) as ID,
COALESCE(qry.Name, prev.Name) as Name,
COALESCE(qry.DATE, prev.DATE) as DATE,
COALESCE(qry.Attr1, prev.Attr1) as Attr1,
COALESCE(qry.Attr2, prev.Attr2) as Attr2,
from qry
left join qry prev
on (prev.rownum = qry.rownum - 1)
order by ID, DATE
However, that doesn't work when one attribute table changes quicker than the other because the attributes that didn't change are null in the results of the attribute table join and if two nulls show up back-to-back, the coalesce will return a null when I need the last non-null value that was in that column.
Can this even be done in a view in SQL Server 2012?

Access 2007 SQL Merge tables without creating duplicates

I would like to add the unique values of tblA to tblB without creating duplicate values based on multiple fields. In the following example, FirstName and LastName determine a duplicate, Foo and Source are irrelevant.
tblA:
FirstName LastName Foo Source
John Doe 1 A
Jane Doe 2 A
Steve Smith 3 A
Bill Johnson 2 A
tblB:
FirstName LastName Foo Source
John Doe 1 B
Bob Smith 5 B
Steve Smith 4 B
This is the result I want:
tblA:
FirstName LastName Foo Source
John Doe 1 A
Jane Doe 2 A
Steve Smith 3 A
Bill Johnson 2 A
Bob Smith 5 B
Here's an equivalent of the code I've tried:
INSERT INTO tblA
SELECT B.*
FROM tblB AS B
LEFT JOIN tblA AS A ON A.FirstName = B.FirstName AND A.LastName = B.LastName
WHERE A.FirstName IS NULL
And this is the result I get:
tblA:
FirstName LastName Foo Source
John Doe 1 A
Jane Doe 2 A
Steve Smith 3 A
Bill Johnson 2 A
John Doe 1 B
Bob Smith 5 B
Steve Smith from tblB is ignored, which is good. John Doe from tblB is added, which is bad. I've spent way too much time on this and I've inspected the data every way I can think of to ensure John Doe in tblA and tblB are the same first and last name. Any ideas on what could be going wrong?
Update: FYI, on my real tblB, about 10,000 of 30,000 should be moved to tblA. This is actually moving over 21,000. The problem is this is one step of a common process.
When I try:
SELECT tbb.*
FROM tbb
LEFT JOIN tba
ON (tbb.FirstName = tba.FirstName)
AND (tbb.LastName = tba.LastName)
WHERE (((tba.LastName) Is Null));
The only line returned is:
Bob Smith 5 B
Is it possible that John Doe has a hidden character?
Edit : Sorry, it doesn't work on Access2007
You have many way to do that :
INSERT INTO tblA
SELECT B.* FROM tblB AS B
WHERE B.firstname, B.lastname NOT IN (select firstname, lastname from tblA)
Or
INSERT INTO tblA
SELECT * FROM tblB
MINUS
SELECT * FROM tblA
This one works in Access.
You can run it to infinity - it won't add more rows than needed:
INSERT INTO tblA
SELECT B.*
FROM tblB AS B
WHERE (((B.FirstName) Not In (select firstname from tblA))
AND ((B.LastName) Not In (select firstname from tblA)))

SQL Join Ignore multiple matches (fuzzy results ok)

I don't even know what the name of my problem is called, so I'm just gonna put some sample data. I don't mind fuzzy results on this (this is the best way I can think to express it. I don't mind if I overlook some data, this is for approximated evaluation, not for detailed accounting, if that makes sense). But I do need every record in TABLE 1, and I would like to avoid the nulls case indicated below.
IS THIS POSSIBLE?
TABLE 1
acctnum sub fname lname phone
12345 1 john doe xxx-xxx-xxxx
12346 0 jane doe xxx-xxx-xxxx
12347 0 rob roy xxx-xxx-xxxx
12348 0 paul smith xxx-xxx-xxxx
TABLE 2
acctnum sub division
12345 1 EAST
12345 2 WEST
12345 3 NORTH
12346 1 TOP
12346 2 BOTTOM
12347 2 BALLOON
12348 1 NORTH
So if we do a "regular outer" join, we'd get some results like this, since the sub 0's don't match the second table:
TABLE AFTER JOIN
acctnum sub fname lname phone division
12345 1 john doe xxx-xxx-xxxx EAST
12346 0 jane doe xxx-xxx-xxxx null
12347 0 rob roy xxx-xxx-xxxx null
12348 0 paul smith xxx-xxx-xxxx null
But I would rather get
TABLE AFTER JOIN
acctnum sub fname lname phone division
12345 1 john doe xxx-xxx-xxxx EAST
12346 0 jane doe xxx-xxx-xxxx TOP
12347 0 rob roy xxx-xxx-xxxx BALLOON
12348 0 paul smith xxx-xxx-xxxx NORTH
And I'm trying to avoid:
TABLE AFTER JOIN
acctnum sub fname lname phone division
12345 1 john doe xxx-xxx-xxxx EAST
12345 1 john doe xxx-xxx-xxxx WEST
12345 1 john doe xxx-xxx-xxxx NORTH
12346 0 jane doe xxx-xxx-xxxx TOP
12346 0 jane doe xxx-xxx-xxxx BOTTOM
12347 0 rob roy xxx-xxx-xxxx BALOON
12348 0 paul smith xxx-xxx-xxxx NORTH
So I decided to go with using a union and two if conditions. I'll accept a null for conditions where the sub account is defined in table 1 but not in table 2, and for everything else, I'll just match against the min.
If I'm understanding correctly, it looks like you're trying to join on the sub column if it matches. If there's no match on sub, then you want it to select the "first" row for that acctnum. Is this correct?
If so, you'll need to left join on the full match, then perform another left join on a select statement that determines the division that corresponds to the lowest sub value for that acctnum. The row_number() function can help you with this, like this:
select
t1.acctnum,
t1.sub,
t1.fname,
t1.lname,
t1.phone,
isnull(t2_match.division, t2_first.division) as division
from table1 t1
left join table2 t2_match on t2_match.acctnum = t1.acctnum and t2_match.sub = t1.sub
left join
(
select
acctnum,
sub,
division,
row_number() over (partition by acctnum order by sub) as rownum
from table2
) t2_first on t2_first.acctnum = t1.acctnum
EDIT
If you don't care at all about which record you get back from table 2 when a matching sub doesn't exist, you could combine two different queries (one that matches the sub and one that just takes the min or max division) with a union.
select
t1.acctnum,
t1.sub,
t1.fname,
t1.lname,
t1.phone,
t2.division
from table1 t1
join table2 t2 on t2.acctnum = t1.acctnum and t2.sub = t1.sub
union
select
t1.acctnum,
t1.sub,
t1.fname,
t1.lname,
t1.phone,
min(t2.division)
from table1 t1
join table2 t2 on t2.acctnum = t1.acctnum
left join table2 t2_match on t2_match.acctnum = t1.acctnum and t2_match.sub = t1.sub
where t2_match.acctnum is null
Personally, I don't find the union syntax any more compelling and you now have to maintain the query in two places. For this reason, I'd favor the row_number() approach.
try to use
SELECT MIN(Table_1.acctnum) as acctnum , MIN(Table_1.sub) as sub,MIN( Table_1.fname) as fname, MIN(Table_1.lname) as name, MIN(Table_1.phone) as phone, MIN(Table_2.division) as division
FROM Table_1 INNER JOIN Table_2 ON Table_1.acctnum = Table_2.acctnum AND Table_1.sub = Table_2.sub
where Table_1.sub>0
group by Table_1.acctnum
union
SELECT MIN(Table_1.acctnum) as acctnum , MIN(Table_1.sub) as sub,MIN( Table_1.fname) as fname, MIN(Table_1.lname) as name, MIN(Table_1.phone) as phone, MIN(Table_2.division) as division
FROM Table_1 INNER JOIN Table_2 ON Table_1.acctnum = Table_2.acctnum
where Table_1.sub=0
group by Table_1.acctnum
this is the result
12345 1 john doe xxxxxxxxxx EAST
12346 0 jane doe xxxxxxxxxx BOTTOM
12347 0 rob roy xxxxxxxxxx BALLOON
12348 0 paul smith xxxxxxxxxx NORTH
if you change min to max TOP will be insted of BOTTOM on the second row
It may also work for you:
SELECT t1.acctnum, t1.sub, t1.fname, t1.lname, t1.phone,
ISNULL(MAX(t2.division),MAX(t3.division)) as division
FROM table_1 t1
LEFT JOIN table_2 t2 ON (t2.acctnum = t1.acctnum AND t1.sub = t2.sub)
LEFT JOIN table_2 t3 ON (t3.acctnum = t1.acctnum)
GROUP BY t1.acctnum, t1.sub, t1.fname, t1.lname, t1.phone
This will give your desired result, exactly (for the shown data):
Updated to not assume there is always a sub==1 value:
SELECT
T1.acctnum,
T1.sub,
T1.fname,
T1.lname,
T1.phone,
T2.division
FROM
TABLE_1 T1
LEFT JOIN
TABLE_2 T2 ON T1.acctnum = T2.acctnum
AND
T2.sub = (SELECT MIN(T3.sub) FROM TABLE_2 T3 WHERE T1.acctnum = T3.acctnum)
ORDER BY
T1.lname,
T1.fname,
T1.acctnum