Access 2007 SQL Merge tables without creating duplicates - sql

I would like to add the unique values of tblA to tblB without creating duplicate values based on multiple fields. In the following example, FirstName and LastName determine a duplicate, Foo and Source are irrelevant.
tblA:
FirstName LastName Foo Source
John Doe 1 A
Jane Doe 2 A
Steve Smith 3 A
Bill Johnson 2 A
tblB:
FirstName LastName Foo Source
John Doe 1 B
Bob Smith 5 B
Steve Smith 4 B
This is the result I want:
tblA:
FirstName LastName Foo Source
John Doe 1 A
Jane Doe 2 A
Steve Smith 3 A
Bill Johnson 2 A
Bob Smith 5 B
Here's an equivalent of the code I've tried:
INSERT INTO tblA
SELECT B.*
FROM tblB AS B
LEFT JOIN tblA AS A ON A.FirstName = B.FirstName AND A.LastName = B.LastName
WHERE A.FirstName IS NULL
And this is the result I get:
tblA:
FirstName LastName Foo Source
John Doe 1 A
Jane Doe 2 A
Steve Smith 3 A
Bill Johnson 2 A
John Doe 1 B
Bob Smith 5 B
Steve Smith from tblB is ignored, which is good. John Doe from tblB is added, which is bad. I've spent way too much time on this and I've inspected the data every way I can think of to ensure John Doe in tblA and tblB are the same first and last name. Any ideas on what could be going wrong?
Update: FYI, on my real tblB, about 10,000 of 30,000 should be moved to tblA. This is actually moving over 21,000. The problem is this is one step of a common process.

When I try:
SELECT tbb.*
FROM tbb
LEFT JOIN tba
ON (tbb.FirstName = tba.FirstName)
AND (tbb.LastName = tba.LastName)
WHERE (((tba.LastName) Is Null));
The only line returned is:
Bob Smith 5 B
Is it possible that John Doe has a hidden character?

Edit : Sorry, it doesn't work on Access2007
You have many way to do that :
INSERT INTO tblA
SELECT B.* FROM tblB AS B
WHERE B.firstname, B.lastname NOT IN (select firstname, lastname from tblA)
Or
INSERT INTO tblA
SELECT * FROM tblB
MINUS
SELECT * FROM tblA

This one works in Access.
You can run it to infinity - it won't add more rows than needed:
INSERT INTO tblA
SELECT B.*
FROM tblB AS B
WHERE (((B.FirstName) Not In (select firstname from tblA))
AND ((B.LastName) Not In (select firstname from tblA)))

Related

SQL Join on Like Operator

I know this question has been asked a couple of times and i've tried to use the solution for my problem. Unfortunately it did not get me the output i wanted. I need to update ID column in one table by joining it to another table where the joining column does not have exact value.
TableA TableB
EmpNo EmpName ID EmpNo EmpName ID TermDate
101 John Doe Null 250termed_101 John Doe 250 11-15-2018
102 Jane Doe Null 251termed_102 Jane Doe 251 02-25-2019
101 Bryan Adams Null 252termed_101 Bryan Adams 252 03-12-2020
Here's what i tried but was unable to get the required output because the below query is giving me duplicates:
select *
from TableA as A left join
TableB as B
on B.EmpNo like '%' + A.EmpNo + '%' and A.EmpNo is not null
Output Required:
EmpNo EmpName ID
101 John Doe 250
102 Jane Doe 251
101 Bryan Adams 252
I need to populate ID column from TableB into TableA by joining these 2 tables on EmpNo. For the first record, John Doe is terminated on 11-15-2018 and his employee number is assigned to Bryan Adams with unique ID. I need to populate the ID column from TableB into Table A for the corresponding employee who had that number at the time.
Thanks in advance
If your problem is truly that you're getting duplicates that you don't want, you can throw a DISTINCT in. But your problem is in the data: Bryan Adams and John Doe both have employee numbers of 101, so they look duplicated when you join to TableB.
This SQL Fiddle might help you: http://sqlfiddle.com/#!18/f30476/10
You seem on the right path, the update statement should look like the following, also notice how I am making the like comparison more accurate:
update A SET
ID = B.ID
from
TableA as A
left join TableB as B on
B.EmpNo like '%_' + A.EmpNo and
A.EmpNo is not null;
that will break when you have 101 and 1101 or 2101, so it's not a good match, so let's revisit:
update A SET
ID = B.ID
from
TableA as A
inner join TableB as B on
RIGHT(B.EmpNo, len(B.EmpNo) - charindex('_', B.EmpNo)) = A.EmpNo
A.EmpNo is not null and --you don't need this,
charindex('_', B.EmpNo) > 0;--needed, otherwise you get string errors

Join multiple tables to return only one result for each record from main table

Currently I have three tables I am joining. I have data that was migrated from one system(old) to another system(new). I need to compare this data to ensure matches but also mismatches. I have three tables. One has the list of accounts being moved. The two systems have differnt ID types so this first table is a list of all IDs for the two tables and each account that was moved. So this is my base population.
ID1 ID2
ABC 123
ABC 123
ABC 123
DEF 456
DEF 456
DEF 456
I then have table 2 which is all the data from the old system.
ID Fname Lname
ABC John Smith
ABC Tom Smith
ABC Kate Smith
DEF Jason Thomas
DEF Ruby Thomas
DEF Alex Johnson
Then table 3 is all the data found in the new system.
ID Fname Lname
123 John Smith
123 Tom Smith
123 Kate Smith
456 Jason Thomas
456 Ruby Thomas
Right now when I join these tables on the ID I get a lot more rows than I need.
When I do my join I receive this:
ID Fname_old Lname_old ID2 Fname_new Lname_new
ABC John Smith 123 John Smith
ABC John Smith 123 Tom Smith
ABC John Smith 123 Kate Smith
I am trying to join them where it only returns the row that matches, and if it can't find a match I should still get the ID from the ID file and the data from table 2(old data) as this is the data that was sent to the new system.
ID1 ID2 Fname_old Lname_old Fname_new Lname_new
ABC 123 John Smith John Smith
ABC 123 Tom Smith Tom Smith
ABC 123 Kate Smith Kate Smith
DEF 456 Jason Thomas Jason Thomas
DEF 456 Ruby Thomas Ruby Thomas
DEF 456 Alex Johnson
The code I am using is:
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from table1 a
left join table2 b
on a.ID1 = b.ID
left join table3 c
on a.ID2 = c.ID
If its just duplicate rows in your first table you could try distincting them in a derived table like below:
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from (SELECT DISTINCT ID1, ID2 FROM table1) a
left join table2 b
on a.ID1 = b.ID
left join table3 c
on a.ID2 = c.ID
You are joining them on ID columns.
ID columns are usually UNIQUE while you have multiple identical IDs and specify join on those IDs.
Since you need to compare data, i suggest you lookup MATCH and how it works as that seems to be closer to what you are looking for here.
You can get a match using row_number():
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from (select a.*,
row_number() over (partition by id order by id) as seqnum
from table1 a
) a left join
(select b.*,
row_number() over (partition by id order by id) as seqnum
from table2 b
) b
on a.ID1 = b.ID and a.seqnum = b.seqnum
(select c.*,
row_number() over (partition by id order by id) as seqnum
from table3 c
) c
on a.ID2 = c.ID and a.seqnum = c.seqnum;
Note: This does not preserve the "ordering" of the original values, so any rows can be matched with any other. Why? SQL tables represent unordered sets.
If there is an ordering in the tables, you can use that in the order by clauses to get a match consistent with the ordering.
If you have a compare chance for name and last name this code will work.
select DISTINCT a.ID1, a.ID2, b.fname as fname_old, b.lname as lname_old, c.fname as
fname_new, c.lname as lname_new from table2 b
left join table1 a on a.ID1=b.ID
left join table3 c on a.ID2=c.ID and b.Fname=c.Fname and b.Lname=c.Lname
My Result :
ID1 ID2 fname_old lname_old fname_new lname_new
ABC 123 John Smith John Smith
ABC 123 Kate Smith Kate Smith
ABC 123 Tom Smith Tom Smith
DEF 456 Alex Johnson NULL NULL
DEF 456 Jason Thomas Jason Thomas
DEF 456 Ruby Thomas Ruby Thomas
You say that this is data transferred to two systems. So you expect all data to match. You could hence reduce the query to only find data that doesn't match, if any.
Here is a SQL standard compliant query. You tagged your request with hive. I don't know about hive, so you may have to adjust the query.
select
t2.id as id1,
t3.id as id2,
t2.fname as fname_old,
t2.lname as lname_old,
t3.fname as fname_new,
t3.lname as lname_new
from table2 t2
full outer join t3
on t3.fname = t2.fname
and t3.lname = t2.lname
and exists (select null from table1 t1 where t1.id1 = t2.id and t1.id2 = t3.id)
where t2.id is null or t3.id is null;
This is a full anti join. It returns all rows that have no exact match in the other table. It doesn't, however guesstimate which deviating rows may be pairs. You will get a result like this:
ID1 | ID2 | Fname_old | Lname_old | Fname_new | Lname_new
----+-----+-----------+-----------+-----------+----------
DEF | | Alex | Johnson | |
GHI | | Jone | Miller | |
GHI | | Maxx | Miller | |
GHI | | Fritz | Miller | |
| 789 | | | Joan | Miller
| 789 | | | Max | Miller
| 799 | | | Fritz | Miller
As you see, you would have to examine this result manually. But ideally the query shouldn't return any row at all, which would just prove that everything went as expected and nobody (system or person) messed with the data :-)

Alias scoping in SQL

I'm having an issue with a complex query on an SQLite3 database that I think has to do with a misunderstanding on my part of how to refer to columns in a results table returned by a select statement, especially when aliases are involved.
Here is an example table - a list of movie IDs with a row for each actor working on the movie:
CREATE TABLE movie_actor (imdb_id TEXT, actor TEXT);
INSERT INTO movie_actor VALUES('44r4', 'John Doe');
INSERT INTO movie_actor VALUES('44r4', 'Jane Doe');
INSERT INTO movie_actor VALUES('44r4', 'Jermaine Doe');
INSERT INTO movie_actor VALUES('44r4', 'Jacob Doe');
INSERT INTO movie_actor VALUES('55r5', 'John Doe');
INSERT INTO movie_actor VALUES('55r5', 'Jane Doe');
INSERT INTO movie_actor VALUES('55r5', 'Nathan Deer');
INSERT INTO movie_actor VALUES('66r6', 'Bob Duck');
INSERT INTO movie_actor VALUES('66r6', 'John Doe');
INSERT INTO movie_actor VALUES('66r6', 'Jermaine Doe');
INSERT INTO movie_actor VALUES('66r6', 'Jane Doe');
INSERT INTO movie_actor VALUES('77r7', 'John Doe');
I am trying to find out the how many times each pair of actors worked with each other across all movies. I decided to go about this with a self-join, but ran into issues where I would get record pairs such as "John Doe, Jane Doe, 3" and "Jane Doe, John Doe, 3" - this is really the same thing, and I wanted to only count the first version. This is the code that resulted:
SELECT DISTINCT
CASE WHEN d.actor_1 > d.actor_2 THEN d.actor_1 ELSE d.actor_2 END d.actor_1,
CASE WHEN d.actor_2 > d.actor_1 THEN d.actor_2 ELSE d.actor_1 END d.actor_2,
d.v
FROM (
SELECT c.actor_1 AS actor_1, c.actor_2 AS actor_2, COUNT(*) AS v
FROM (
SELECT a.actor AS actor_1, b.actor AS actor_2
FROM movie_actor a JOIN movie_actor b ON a.imdb_id=b.imdb_id
) AS c
WHERE c.actor_1 <> c.actor_2
GROUP BY c.actor_1, c.actor_2
HAVING COUNT(*) > 2
ORDER BY COUNT(*) DESC
LIMIT 20
)
AS d
This doesn't run, but I can't figure out why. My assumption is that I am not using aliases properly, but I really don't know. Any ideas?
(SQL Fiddle link here)
We get a simpler query, if we add the condition a.actor < b.actor. This excludes pairs with equal actors and at the same time it removed the need of swapping actors.
SELECT
a.actor AS actor_1, b.actor AS actor_2, COUNT(*) AS v
FROM
movie_actor a
INNER JOIN movie_actor b
ON a.imdb_id = b.imdb_id
WHERE
a.actor < b.actor
GROUP BY a.actor, b.actor
ORDER BY COUNT(*) DESC, a.actor, b.actor
LIMIT 20
Note: SQL always creates a cross product when joining, i.e. it creates all possible combinations of records that match the join condition. Therefore for imdb 55r5 (including 3 actors) it will first generate the following 3 x 3 = 9 pairs:
John Doe John Doe
John Doe Jane Doe
John Doe Nathan Deer
Jane Doe John Doe
Jane Doe Jane Doe
Jane Doe Nathan Deer
Nathan Deer John Doe
Nathan Deer Jane Doe
Nathan Deer Nathan Deer
Then the WHERE-clause excludes all a >= b pairs and we get
John Doe Nathan Deer
Jane Doe John Doe
Jane Doe Nathan Deer
Generate the distinct pairs first, then count them.
select actor_1, actor_2, count(*)
from (select distinct a.imdb_id, a.actor as actor_1, b.actor as actor_2
from movie_actor a
inner join movie_actor b on a.imdb_id = b.imdb_id
where a.actor < b.actor) x
group by actor_1, actor_2
order by actor_1, actor_2;
actor_1 actor_2 count(*)
---------- ---------- ----------
Bob Duck Jane Doe 1
Bob Duck Jermaine D 1
Bob Duck John Doe 1
Jacob Doe Jane Doe 1
Jacob Doe Jermaine D 1
Jacob Doe John Doe 1
Jane Doe Jermaine D 2
Jane Doe John Doe 3
Jane Doe Nathan Dee 1
Jermaine D John Doe 2
John Doe Nathan Dee 1

select rows from a database table with common field in sqlite

i have a table as follows:
create table table1(id integer,firstname text,lastname text);
firstname lastname======== =========
1 ben taylor
2 rob taylor
3 rob smith
4 rob zombie
5 peter smith
6 ben smith
7 peter taylor
I want to select rows with a lastname , where the lastname must be shared by ben and rob and firstnames must be ben and rob.
Hence in the above table the result of the select query must be:
1 ben taylor
2 rob taylor
3 rob smith
6 ben smith
what must be the sql query to get the above results?
I tried - select * from table1 as a,table1 as b where a.firstname='ben' and b.firstname='rob' and a.lastname=b.lastname this joined all the resultant rows which is not what i inteneded.
You can use two filtering joins to demand that the lastname is shared with both a Ben and a Rob:
select *
from Table1 t1
join Table1 rob
on rob.firstname = 'rob'
and t1.lastname = rob.lastname
join Table1 ben
on ben.firstname = 'ben'
and t1.lastname = ben.lastname
where t1.firstname in ('ben', 'rob')
Live example at SQL Fiddle.
select *
from table1 where first_name = 'ben' or first_name = 'rob'
and last_name
in (select last_name from table1 where first_name = 'rob' and last_name
in (select last_name from table1 where first_name = 'ben'))

Select query which returns exect no of rows as compare in values of sub query

I have got a table named student. I have written this query:
select * From student where sname in ('rajesh','rohit','rajesh')
In the above query it's returning me two records; one matching 'rajesh' and another matching: 'rohit'.
But i want there to be 3 records: 2 for 'rajesh' and 1 for 'rohit'.
Please provide me some solution or tell me where i am missing.
NOTE: the count of result of sub query is not fix there can be many words there some distinct and some multiple occurrence .
Thanks
Your requirements are not clear, and I'll try to explain why.
Let's define table students
ID FirstName LastName
1 John Smith
2 Mike Smith
3 Ben Bray
4 John Bray
5 John Smith
6 Bill Lynch
7 Bill Smith
Query with WHERE clause:
FirstName in ('Mike', 'Ben', 'Mike')
will return 2 rows only, because it could be rewritten as:
FirstName='Mike' or FirstName='Ben' or FirstName='Mike'
WHERE is filtering clause that just says if existing row satisfy given conditions or not (for each of rows created by FROM clause.
Let's say we have subquery that returns any number of non distinct FirstNames
In case if SQ contains 'Mike', 'Ben', 'Mike' using inner join you can get those 3 rows without problem
Select ST.* from Students ST
Inner Join (Select name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
Result will be:
ID FirstName LastName
2 Mike Smith
2 Mike Smith
3 Ben Bray
Note data are not ordered by order of names returning by SQ. If you want that, SQ should return some ordering number, eg.:
Ord Name
1. Mike
2. Ben
3. Mike
In that case query should be:
Select ST.* from Students ST
Inner Join (Select ord, name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
Order By SQ.ord
And result:
ID FirstName LastName
2 Mike Smith (1)
3 Ben Bray (2)
2 Mike Smith (3)
Now, let's se what will happen if subquery returns
Ord Name
1. Mike
2. Bill
3. Mike
You will end up with
ID FirstName LastName
2 Mike Smith (1)
6 Bill Lynch (2)
7 Bill Smith (2)
2 Mike Smith (3)
Even worse, if you have something like:
Ord Name
1. John
2. Bill
3. John
Result is:
ID FirstName LastName
1 John Smith (1)
4 John Bray (1)
5 John Smith (1)
6 Bill Lynch (2)
7 Bill Smith (2)
1 John Smith (3)
4 John Bray (3)
5 John Smith (3)
This is an complex situation, and you have to clarify precisely what requirement is.
If you need only one student with the same name, for each of rows in SQ, you can use something like SQL 2005+):
;With st1 as
(
Select Row_Number() over (Partition by SQ.ord Order By ID) as rowNum,
ST.ID,
ST.FirstName,
ST.LastName,
SQ.ord
from Students ST
Inner Join (Select ord, name from …. <your subquery>) SQ
On ST.FirstName=SQ.name
)
Select ID, FirstName, LastName
From st1
Where rowNum=1 -- that was missing row, added later
Order By ord
It will return (for SQ values John, Bill, John)
ID FirstName LastName
1 John Smith (1)
6 Bill Lynch (2)
1 John Smith (3)
Note, numbers (1),(2),(3) are shown to display value of ord although they are not returned by query.
If you can split the where clause in your calling code, you could perform a UNION ALL on each clause.
SELECT * FROM Student WHERE sname = 'rajesh'
UNION ALL SELECT * FROM Student WHERE sname = 'rohit'
UNION ALL SELECT * FROM Student WHERE sname = 'rajesh'
Try using a JOIN:
SELECT ...
FROM Student s
INNER JOIN (
SELECT 'rajesh' AS sname
UNION ALL
SELECT 'rohit'
UNION ALL
SELECT 'rajesh') t ON s.sname = t.sname
just because you've got a criteria in there two times doesn't mean that it will return 1 result per criteria. SQL engines usually just use the unique criteria - thus, from your example, there will be 2 criteria in IN clause: 'rajesh','rohit'
WHY do you need to return 2 results? are there two rajesh in your table? they should BOTH return then. You don't need to ask for rajesh twice for that to happen. What does your data look like? What do you want to see returned?
Hi i am query just as you give above and it give me all data that matches in the condition of in clause. just like your post
select * from person
where personid in (
'Carson','Kim','Carson'
)
order by FirstName
and its give me all records which fulfill this Criteria