Remove Duplicate Data Based on Three Fields and Two tables - sql

I have two tables that have more than three fields each. There is a group of records that are on both files, the below is a mock example:
Table 1:
ID Name Town State
1 Dave Chicago IL
2 Mark Tea MD
Table 2:
ID Name State Job Married
1 Dave IL Manager Yes
2 Mark MD Driver No
For my purpose duplicates exist if ID, Name, and State are the same. So the above data are duplicates. How do I delete them from one table (I have over 900 duplicates so deleting one by one is not possible)?

delete table1
where ID in(select distinct ID from table1 where ID in (Select ID from table2))
i dont understand which table has duplicates, if you want to delete duplicate data from one table1 then you can use this query

This query will produce a de-duplicated result set:
SELECT Table1.ID,
Table1.NAME,
Table1.Town,
Table1.STATE,
NULL AS Job,
NULL AS Married
FROM Table1
WHERE Table1.ID NOT IN (
SELECT Table1.ID
FROM Table1
INNER JOIN Table2 ON (Table2.STATE = Table1.STATE)
AND (Table2.NAME = Table1.NAME)
AND (Table1.ID = Table2.ID)
)
UNION
SELECT Table2.ID,
Table2.NAME,
NULL AS Town,
Table2.STATE,
Table2.Job,
Table2.Married
FROM Table2

This is the most straightforward way to do it, assuming that you want to delete from Table1. I'm a little rusty with Access SQL syntax, but I believe this works:
DELETE FROM [Table1]
WHERE EXISTS (
SELECT 1
FROM [Table2]
WHERE [Table2].[ID] = [Table1].[ID]
AND [Table2].[Name] = [Table1].[Name]
AND [Table2].[State] = [Table1].[State]
)

Related

SQL comparing two sets of complex data

Sorry, probably the title is not the best one but I hope you will understand what problem I have.
I need to compare and analyse two sets of data and I'm using MS-Access for that. My data is organized in two tables. Following is not the real data I'm working with but will serve ok as example:
TABLE 1
ID Name
1 Zoie
2 Rohan
2 Simon
3 Jerome
4 Jakob
4 Mathew
4 Cora
6 Keely
7 Aiyana
7 Jake
8 Reid
9 Emerson
TABLE 2
ID Name
1 Michael
2 Rohan
2 Simon
3 Jill
4 Jakob
4 Cora
5 Charlie
7 John
8 Reid
9 Yadiel
9 Emerson
9 Paris
So, I need to select only those IDs which fully corresponds (all names under specific IDs are the same) in both tables and those are: 2 and 8
I would also like to have separate select statement which will result with IDs 2 and 8 but also IDs with names from table 1 which also appears in table 2 (all from table 1 plus possible some extra in table 2 under the same ID). So that would be: 2, 8, 9
I would also like to have separate select statement which will result with IDs 2 and 8 but also IDs with names from table 2 which also appears in table 1 (all from table 2 plus possible some extra in table 1 under the same ID). So that would be: 2, 4, 8
I would also like to have separate select statement which would be a combination of last two.
So result would be: 2, 4, 8, 9
I would appreciate any suggestions.
Thanks in advance!
Best regards,
Mario
Q#1:
select id
from table1
group by id
having count(*) =
(
select count(*)
from table2
group by table2.id
having table2.id = table1.id
)
and count(*) =
(
select count(*)
from table1 table1_1
inner join table2 on table1_1.id = table2.id and table1_1.name = table2.name
group by table1_1.id
having table1_1.id = table1.id
)
Explanation of this query:
It is grouping table1 by ID
For each group (for each ID), it is counting the number of rows in table1 that have this ID.
For each group, it is counting the number of rows in table2 that have this ID.
For each group, it is counting the number of rows where the name appears in both tables for this ID (it does that by inner joining table1 and table2 on the ID and Name which means only rows where both ID and Name match in both tables will be counted, for each ID).
It then returns IDs (from table1) where each of the above counts are equal. This is what results in returning IDs where all names are in both tables (no more, no less).
Q#2 - In this case you don't care that table2 has the same number of names per ID. So remove the first sub-query (that counts matching rows in table2).
select id
from table1
group by id
having count(*) =
(
select count(*)
from table1 table1_1
inner join table2 on table1_1.id = table2.id and table1_1.name = table2.name
group by table1_1.id
having table1_1.id = table1.id
)
Although the above is easy enough to understand following the same logic as Q#1, it is probably more efficient to do the following, and more straightforward. It only matters if you find it running too slow for your data (which is subjective and context dependent).
select table1.id
from table1
left join table2 on table1.id = table2.id and table1.name = table2.name
group by table1.id
having count(table1.id) = count(table2.id)
Here, the two tables are LEFT (outer) joined which means all records from table1 are gathered and records in table2 that match by ID and Name are also included alongside. Then, we group them by ID and we compare the count of each group in table1 with those that had matching names in table2.
Q#3 - This case is the same as Q#2 except table1 and table2 are swapped.
Q#4 - In this case you only care about IDs that have at least one name that appears in both tables. So join the tables and return the distinct IDs:
select distinct id
from table1
inner join table2 on table1.id = table2.id and table1.name = table2.name
Here is a SQLFiddle to play with containing the four queries: http://www.sqlfiddle.com/#!18/3fc71/22

Join table on Count

I have two tables in Access, one containing IDs (not unique) and some Name and one containing IDs (not unique) and Location. I would like to return a third table that contains only the IDs of the elements that appear more than 1 time in either Names or Location.
Table 1
ID Name
1 Max
1 Bob
2 Jack
Table 2
ID Location
1 A
2 B
Basically in this setup it should return only ID 1 because 1 appears twice in Table 1 :
ID
1
I have tried to do a JOIN on the tables and then apply a COUNT but nothing came out.
Thanks in advance!
Here is one method that I think will work in MS Access:
(select id
from table1
group by id
having count(*) > 1
) union -- note: NOT union all
(select id
from table2
group by id
having count(*) > 1
);
MS Access does not allow union/union all in the from clause. Nor does it support full outer join. Note that the union will remove duplicates.
Simple Group By and Having clause should help you
select ID
From Table1
Group by ID
having count(1)>1
union
select ID
From Table2
Group by ID
having count(1)>1
Based on your description, you do not need to join tables to find duplicate records, if your table is what you gave above, simply use:
With A
as
(
select ID,count(*) as Times From table group by ID
)
select * From A where A.Times>1
Not sure I understand what query you already tried, but this should work:
select table1.ID
from table1 inner join table2 on table1.id = table2.id
group by table1.ID
having count(*) > 1
Or if you have ID's in one table but not the other
select table1.ID
from table1 full outer join table2 on table1.id = table2.id
group by table1.ID
having count(*) > 1

Compare two sql server tables

Can anyone suggest me how to compare two database tables in sql server and return the rows in the second table which are not in the first table. The primary key in both the tables is not the same. For instance, the tables are as follows.
Table1
ID Name DoB
1 John Doe 20/03/2012
2 Joe Bloggs 31/12/2011
Table2
ID Name DoB
11 John Doe 20/03/2012
21 Joe Bloggs 31/12/2011
31 James Anderson 14/04/2010
The sql query should compare only the Name and DoB in both tables and return
31 James Anderson 14/04/2010
Thanks.
Pretty simple, use a LEFT OUTER JOIN to return everything from Table2 even if there isn't a match in Table1, then limit that down to only rows that don't have a match:
SELECT Table2.ID, Table2.Name, Table2.DoB
FROM Table2
LEFT OUTER JOIN Table1 ON Table2.Name = Table1.Name AND Table2.DoB = Table1.DoB
WHERE Table1.ID IS NULL
Look into the use of SQL EXCEPT
SELECT Name, DOB
FROM Table1
EXCEPT
SELECT Name, DOB
FROM Table2
http://msdn.microsoft.com/en-us/library/ms188055.aspx
You want a LEFT OUTER JOIN. http://en.wikipedia.org/wiki/Join_(SQL)#Left_outer_join
This type of JOIN will return all records of the 'left' table (the table in the FROM clause in this example) even if there are no matching records in the joined table.
SELECT Table2.ID, Table2.Name, Table2.DoB
FROM Table2
LEFT OUTER JOIN Table1 ON Table1.Name = Table2.Name AND Table1.DoB = Table2.DoB
WHERE Table1.ID IS NULL
Note that you can substitue LEFT OUTER JOIN for LEFT JOIN. It's a short cut that most DBMSs use.
use CHECKSUM () function in sql server
select T1.* from Table1 T1 join Table2 T2
on CHECKSUM(T1.Name,T1.DOB)!= CHECKSUM(T2.Name,T2.DOB)
Details
This SQL statement compares two tables without having to specify column names.
SELECT 'Table1' AS Tbl, binary_checksum(*) AS chksum, * FROM Table1
WHERE binary_checksum(*) NOT IN (SELECT binary_checksum(*) FROM Table2)
UNION
SELECT 'Table2' AS Tbl, binary_checksum(*) AS chksum, * FROM Table2
WHERE binary_checksum(*) NOT IN (SELECT binary_checksum(*) FROM Table1)
ORDER BY <optional_column_names>, Tbl
The output will display any rows that are different and rows that are in Table1, but not Table2 and vice versa.

Fetch values from 3 tables without redundancy-SQLite

I want to retrieve column values of Name from 3 different tables
Table1 contains Name
Table2 contains Name, BAmt
Table3 contains Name, RAmt
Now I want to fetch names from all the 3 tables.
Let's say the table contains these values
Table1: Table2: Table3:
Name Name BAmt Name RAmt
1. Jack 1.Alice 1000 1.Mark 5000
2. Mark 2.Jack 500 2.Tarzon 1000
3. Ricky 3.Beth 5500
4.Jack 100
Now I want a table that contains all the names (without any repeats of any name). New table should ideally contain these values
Name BAmt RAmt
1.Jack 500 100
2.Mark 5000
3.Ricky
4.Alice 1000
5.Tarzon 1000
6.Beth 5500
Here it has ignored repeats but has taken the necessary values of those columns from the table.
What kind of a select statement should I use.
Here is something I think, but I guess its wrong approach:
Statement statement1 = db.createStatement("SELECT DISTINCT Name,BAmt,RAmt FROM Table1,Table2,Table3");
statement1.prepare();
statement1.execute();
Please guide.
You will need to union all all your tables to get that outline:
select Name,
ifnull (sum(BAmt), 0) BAmt,
ifnull (sum(RAmt), 0) RAmt
from
(
select Name,
null BAmt,
null RAmt
from Table1
union all
select Name,
BAmt,
null
from Table2
union all
select Name,
null,
RAmt
from Table3
) allTables
group by Name
order by Name
EDIT: updated to replace nulls with zeros.
See Documentation for ifnull.
The Join will take care of it:
select name, bamt, ramt from table1 t1 left outer join table2 t2 on t1.name = t2.name left outer join table3 on t1.name = t3.name order by name;

sql simple 3 table join using same table?

I have two tables:
Table 1
id, name1
Table 2
id, name2a, name2b
Table 2's column names name2a, and name2b are references to table 1's id. I need to create a query that pulls both the names out of table 1 based on the id's used in Table 2.
Therefore, if Table one contained:
1 Peter
2 Paul
And Table 2 contained:
1 1 2
2 2 2
Then a select statement should give me:
Peter Paul
Paul Paul
I've gone around the bend trying to build this SQL and the best I came up with was:
SELECT table1.name AS 'name', table1.name AS 'Other name'
FROM table1, table2
WHERE table1.id = table2.name2a
Which only gives me the name2a column correctly.
Any help appreciated! I guess I need to do a join, but I'm really struggling...
Start with your 2nd table and join TWICE to table 1 (different aliases respectively), then get the name field from each aliased Table1 entry.
select
T2.ID,
TJ1.Name1 as FirstName,
TJ2.Name1 as SecondName
from
Table2 t2
join Table1 TJ1
on t2.Name2a = TJ1.ID
join Table1 TJ2
on t2.Name2b = TJ2.ID
select foo.*, t1.x, t2.y
join t1 on t1.id = foo.a
join t1 as t2 on t2.id = foo.b
If there's a chance that col a or col b is null, use a left join.
Have you tried using an INNER JOIN?
SELECT table1.name AS 'name', table1.name AS 'Other name'
FROM table1 INNER JOIN table2 ON table1.id = table2.name2a;
Sorry if I'm no help, not that great at SQL myself hehe.
Your problem is that you need to reference table1 twice: once for the plain table1.name and again to look up what table2 is pointing at. You can join one table in multiple times if you give them aliases:
SELECT t1.name1, o.name1
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.name2a
JOIN table1 o ON t2.name2b = o.id -- And JOIN back to table1 to get the name1