Get list of duplicate rows in MySql

Get list of duplicate rows in MySql - sql

i have a table like this
ID nachname vorname
1 john doe
2 john doe
3 jim doe
4 Michael Knight
I need a query that will return all the fields (select *) from the records that have the same nachname and vorname (in this case, records 1 and 2).
Can anyone help me with this? Thanks

The following query will give the list of duplicates :
SELECT n1.* FROM table n1
inner join table n2 on n2.vorname=n1.vorname and n2.nachname=n1.nachname
where n1.id <> n2.id
BTW The data you posted seems to be wrong "Doe" and "Knight" are a lastname, not a firstname :p.

The general solution to your problem is a query of the form
SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
This will return one row for each set of duplicate row in the table. The last column in this result is the number of duplicates for the particular values.
If you really want the ID, try something like this:
SELECT id FROM
t1,
( SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1 ) as t2
WHERE t1.col1 = t2.col1 AND t1.col2 = t2.col2
Haven't tested it though

You can do it with a self-join:
select distinct t1.id from t as t1 inner join t as t2
on t1.col1=t2.col1 and t1.col2=t2.col2 and t1.id<>t2.id
the t1.id<>t2.id is necessary to avoid ids matching against themselves. (If you want only 1 row out of each set of duplicates, you can use t1.id<t2.id).

select * from table AS t1 inner join
(select max(id) As id,nachname,vorname, count(*)
from t1 group by nachname,vorname
having count(*) >1) AS t2 on t1.id=t2.id
This should return ALL of the columns from the table where there is duplicate nachname and vorname. I recommend changing * to the exact columns that you need.
Edit: I added a max(id) so that the group by wouldn't be a problem. My query isn't as elegant as I would want though. There's probably an easier way to do it.

Related

Find duplicate records in SQL Server, but also return the set of unique keys of each one

I have a SQL Server database with a unique key column and 49 columns of data elements (name/address/etc......). I have "duplicate" entries but with different keys and I want to find those duplicates entries.
As an example, I may have "John Smith" (with 47 other columns of information) in the table twice. Both John Smith entries will have a different unique key column, but other than that, all other columns would be identical. Including if one off the columns is NULL, then it will be NULL for both John Smith entries.
To complicate things, there are two tables which I need to join together, then once joined find any entries where data elements (everything except for the key) is the same.
Table1 layout
MyKey, table2ID, Col1, Col2, Col3....Col46.
Table2 layout
ID, col47, col48, col49
Col1 through to Col49 is where the "duplicate" data could be.
I have tried something like the below, which almost works. It fails if I have NULL values. For example, if Col22 is NULL on both John Smith entries (ie they are both the same NULL value) then they are not picked up in the selection.
Question: how do I get something like the below to work even when there are NULL values that need to be compared against each other.
with MyJoinedTable as
(
select PolicyNumber, col01, col02, col03......col49
from table1
inner join table2 on table2id = table2.id
)
select PolicyNumber, t1.col01, t1.col02, t1.col03.......t1.col49
from MyJoinedTable t1
inner join (select col01, col02, col03......col49
from MyJoinedTable
group by col01, col02, col03......col49
having count(*) > 1) t2
on t1.col01 = t2.col01
and t1.col02 = t2.col02
.......
and t1.col49 = t2.col49
order by t1.col01, t1.col02

Group in a subquery with HAVING count(*) > 1 and join it back in.
SELECT to1.policynumber,
to1.col1,
...
to1.col49
FROM elbat to1
INNER JOIN (SELECT ti.col1,
...
ti.col49
FROM elbat ti
GROUP BY col1,
...
col49
HAVING count(*) > 1) to2
ON to2.col1 = to1.col1
...
AND to2.col49 = to1.col49;
Or use an EXISTS.
SELECT to.policynumber,
to.col1,
...
to.col49
FROM elbat to
WHERE EXISTS (SELECT *
FROM elbat ti
WHERE ti.policynumber <> to.policynumber
AND ti.col1 = to.col1
...
AND ti.col49 = to.col49);

One method is:
select t.*
from t
where exists (select 1
from t t2
where t2.col1 = t.col1 and
t2.col2 = t.col2 and
. . .
t2.policyNumber <> t.policyNumber
);
This works assuming that none of the other columns are NULL.
EDIT:
If you are using SQL Server, I would just do:
select t.*
from (select t.*,
min(id) over (partition by col1, col2, . . . ) as min_id,
max(id) over (partition by col1, col2, . . . ) as max_id
from t
) t
where minid <> maxid;

How to use a subquery result for another sql select?

I want to use the result of a sql query and send another query based on the result.
Exmaple (of course real live query is more complex):
table1: name, age
table2: name, age, field1, fieldN
First query:
select name, age from table1 where age > 18.
Now I'd like to find all entries from table2 that match the multiple resulting fields of the first query.
Important note: I want to retrieve the full rows of table2 where the match is.
But how?

If you want to automatically join based on matching column names, then you can use a NATURAL JOIN:
WITH query1 AS (
SELECT age, name FROM table1 WHERE age > 18
)
SELECT age, name, t2.field1, t2.fieldN
FROM table2 t2 NATURAL JOIN query1;
Now, while NATURAL JOIN is generally not recommended, as it is really weak because your queries using it can easily brake due to schema changes, it may be OK for hand ad-hoc queries, or for queries, like the above, where you can make the columns used explicit. In either case, I advise against it and use the common join style:
WITH query1 AS (
SELECT age, name FROM table1 WHERE age > 18
)
SELECT t2.age, t2.name, t2.field1, t2.fieldN
FROM table2 t2 JOIN query1 q1 ON t2.age = q1.age AND t2.name = t1.name;

Now I'd like to:
find
all entries from table2
that match the multiple resulting fields
of the first query
SELECT * -- find
FROM table2 t2 -- from t2
WHERE EXISTS (
SELECT * FROM table1 t1
WHERE t1.name = t2.name -- that match
AND t1.age = t2.age -- Huh? "multiple matching fields" ?
AND t1.age > 18 -- with the same condition
);

Actually this is what I was looking for, but thanks for any help:
select * from table2 where (name, age) IN (
select name, age from table1 where age > 18
)

Query build based on MS sql server
select t1.*
from table1 as t1
join table2 as t2 on t1.name=t2.name and t1.age=t2.age
where t2.age > 18

Using MAX when selecting a high number of fields in a query

I understand some varieties of this question have been asked, but I could not find an answer to my specific scenario.
My query has over 50 fields being selected, and only one of them is an aggregate, using MAX(). On the GROUP BY clause, I would only like to pass two specific fields, name and UserID, not all 50 to make the query run. See small subset below.
SELECT
t1.name,
MAX(t2.id) as UserID,
t3.age,
t3.height,
t3.dob,
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
LEFT JOIN table3 t3 ON t1.id = t3.id
GROUP BY t1.name, UserID
Is there any workaround or better approach to accomplish my goal?
The database is SQL Server and any help would be greatly appreciated.

Hmmm . . . What values do you want for the other fields? If you want the max() of one column for each id and code, you can do:
select t.*
from (select t.*, max(col) over (partition by id, code) as maxcol
from t
) t
where col = maxcol;
Given that id might be unique, you might want the maximum id as well as the other columns for each code:
select t.*
from (select t.*, max(id) over (partition by code) as maxid
from t
) t
where id = maxid;

how to return unique value and also find duplicate values? SQL

I have this general idea to find duplicate values taken from this post:
Select statement to find duplicates on certain fields
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
this works great to find the duplicates, but i need to also pull out a unique number, in this case an "order number" column that goes along with each row returned. This unique value cannot be used in the method above, because that would then return no rows as none would be exact duplicates. I need to be able to return this data but also find the records that occur multiple times in a table. I think this can be done with a union or using exists, but not sure how that would be accomplished. Any ideas?
sample result idea:
order number, field1, field2, field3
123 a b c
456 d e f
789 a b c
would want it to return order numbers 123 and 789 like this:
order number, field1, field2, field3
123 a b c
789 a b c

;with a as
(
select count(*) over (partition by field1,field2,field3) count, order_number, field1,field2,field3
from table_name
)
select order_number, field1,field2,field3 from a where count > 1

I'm not entirely sure if this is what you want, but it sounds like maybe?
select min(t2.order_no), t2.field1, t2.field2, t2.field3, t1.cnt
from table_name t2, (
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
) t1
where t1.field1 = t2.field1
and t1.field2 = t2.field2
and t1.field3 = t2.field3
group by t2.field1, t2.field2, t2.field3, t1.cnt
For each record returned in your deduplicating subquery, the outer query will attach to that record the smallest "order number" that matches the given combination of fields. If this isn't what you're looking for, please clarify. Some sample data and sample output would be helpful.
EDIT: From your posted sample data, it looks like you're looking to just return records that have dulpicates. If that's what you're looking for, try this:
select *
from table_name t2
where exists (
select field1,field2,field3, count(*)
from table_name t1
where t1.field1 = t2.field1
and t1.field2 = t2.field2
and t1.field3 = t2.field3
group by field1,field2,field3
having count(*) > 1
)
SQLFiddle

mysql - union tables by unique field

I have two tables with the same structure:
id name
1 Merry
2 Mike
and
id name
1 Mike
2 Alis
I need to union second table to first with keeping unique names, so that result is:
id name
1 Merry
2 Mike
3 Alis
Is it possible to do this with MySQL query, without using php script?

This is not a join (set multiplication), this is a union (set addition).
SELECT #r := #r + 1 AS id, name
FROM (
SELECT #r := 0
) vars,
(
SELECT name
FROM table1
UNION
SELECT name
FROM table2
) q

This will select all names from table1 and combine those with all the names from table2 which are not in table1.
(
select *
from table1
)
union
(
select *
from table2 t2
left join table1 t1 on t2.name = t1.name
where t1.id is null
)

Use:
SELECT a.id,
a.name
FROM TABLE_A a
UNION
SELECT b.id,
b.name
FROM TABLE_B b
UNION will remove duplicates.

As commented, it all depends on what your 'id' means, cause in the example, it means nothing.
SELECT DISTINCT(name) FROM t1 JOIN t2 ON something
if you only want the names
SELECT SUM(something), name FROM t1 JOIN t2 ON something GROUP BY name
if you want to do some group by
SELECT DISTINCT(name) FROM t1 JOIN t2 ON t1.id = t2.id
if the id's are the same

SELECT DISTINCT COALESCE(t1.name,t2.name) FROM
mytable t1 LEFT JOIN mytable t2 ON (t1.name=t2.name);
will get you a list of unique names from the 2 tables. If you want them to get new ids (like Alis does in your desired results), that's something else and requires the answers to a couple of questions:
do any of the names need to maintain their previous id. And if they do, which table's id should be preferred?
why do you have 2 tables with the same structure? ie what are you trying to accomplish when you generate the unique name list?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas