SQL Get Duplicated based on all columns [duplicate] - sql

This question already has answers here:
SQL query to find duplicate rows, in any table
(4 answers)
Closed 4 years ago.
SELECT FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM CUSTOMER
GROUP BY FirstName, LastName, MobileNo;
Something like this will produce duplicates of the table Customer based on FirstName, LastName and MobileNo. However, I would like to produce a list of duplicates based on ALL columns (which are unknown). How would I accomplish this?

You could use checksum(*)
sqlfiddle.com/#!18/0a33d/4
Ex.
CREATE TABLE TEST_DATA
( Field1 VARCHAR(10),
Field2 VARCHAR(10)
);
INSERT INTO TEST_DATA VALUES ('1','1');
INSERT INTO TEST_DATA VALUES ('1','1');
INSERT INTO TEST_DATA VALUES ('2','2');
INSERT INTO TEST_DATA VALUES ('2','2');
INSERT INTO TEST_DATA VALUES ('2','2');
INSERT INTO TEST_DATA VALUES ('3','3');
SELECT TD1_CS.*
FROM (SELECT TD1.*,
CHECKSUM(*) CS1
FROM TEST_DATA TD1
) TD1_CS
INNER
JOIN (SELECT CHECKSUM(*) CS2
FROM TEST_DATA TD2
GROUP
BY CHECKSUM(*)
HAVING COUNT(*) > 1
) TD2_CS
ON TD1_CS.CS1 = TD2_CS.CS2

SELECT FirstName, LastName, MobileNo, COUNT(*)
FROM CUSTOMER
GROUP BY FirstName, LastName, MobileNo
HAVING COUNT(*) > 1

Related

How to delete duplicate records in SQL when similar in two columns and different in one column

I have a table like this:
ID Name Family Phone_Number
1 A B 123456
2 c d 321456
3 A B
4 A B 456789
I want to delete records 3 and 4.
Try to figure out duplicates and then delete the duplicate rows:
WITH cte AS (
SELECT
FirstName
, LastName
, row_number() OVER(PARTITION BY FirstName, LastName ORDER BY FirstName) AS RN
FROM YourTABLE
)
DELETE cte WHERE RN > 1
An example:
DECLARE #table TABLE
(
ID INT,
FirstName VARCHAR(10),
LastName VARCHAR(10)
);
INSERT INTO #table
(
ID,
FirstName,
LastName
)
VALUES
(1, 'A' , 'B')
, (2, 'c' , 'd')
, (3, 'A' , 'B')
, (4, 'A' , 'B')
Query to delete:
;WITH cte AS (
SELECT
FirstName
, LastName
, row_number() OVER(PARTITION BY FirstName, LastName ORDER BY FirstName) AS RN
FROM #table
)
DELETE cte WHERE RN > 1
SELECT * FROM #table
OUTPUT:
ID FirstName LastName
1 A B
2 c d
Write sql and execute
; WITH TableBWithRowID AS
(
SELECT ROW_NUMBER() OVER (ORDER BY Name, Family) AS RowID, Name, Family
FROM TABLE1
)
DELETE o
FROM TableBWithRowID o
WHERE RowID < (SELECT MAX(rowID) FROM TableBWithRowID i WHERE i.Name =o.Name and i.Family=o.Family GROUP BY Name, Family)
replace TABLE1 with your table name
The below query will delete all the duplicates records based on the first and last name column. Assuming there is no null in the first and last name column.
You just need to provide/change at two places in below query
DELETE FROM <YourTableName>
where Id not in (
SELECT MIN(ID) as RowId
FROM <YourTableName>
GROUP BY FirstName, LastName
)
With EXISTS:
delete t from tablename t
where exists (
select 1 from tablename
where name = t.name and family = t.family and id < t.id
)
See the demo

Counting repeated data

I'm trying to get maximum repeat of integer in table I tried many ways but could not make it work. The result I'm looking for is as:
"james";"108"
As this 108 when I concat of two fields loca+locb repeated two times but others did not I try below sqlfiddle link with sample table structure and the query I tried... sqlfiddle link
Query I tried is :
select * from (
select name,CONCAT(loca,locb),loca,locb
, row_number() over (partition by CONCAT(loca,locb) order by CONCAT(loca,locb) ) as att
from Table1
) tt
where att=1
please click here so you can see complete sample table and query I tried.
Edite: adding complete table structure and data:
CREATE TABLE Table1
(name varchar(50),loca int,locb int)
;
insert into Table1 values ('james',100,2);
insert into Table1 values ('james',100,3);
insert into Table1 values ('james',10,8);
insert into Table1 values ('james',10,8);
insert into Table1 values ('james',10,7);
insert into Table1 values ('james',10,6);
insert into Table1 values ('james',0,7);
insert into Table1 values ('james',10,0);
insert into Table1 values ('james',10);
insert into Table1 values ('james',10);
and what I'm looking for is to get (james,108) as that value is repeated two time in entire data, there is repetion of (james,10) but that have null value of loca so Zero value and Null value is to be ignored only those to be considered that have value in both(loca,locb).
SQL Fiddle
select distinct on (name) *
from (
select name, loca, locb, count(*) as total
from Table1
where loca is not null and locb is not null
group by 1,2,3
) s
order by name, total desc
WITH concat AS (
-- get concat values
SELECT name,concat(loca,locb) as merged
FROM table1 t1
WHERE t1.locb NOTNULL
AND t1.loca NOTNULL
), concat_count AS (
-- calculate count for concat values
SELECT name,merged,count(*) OVER (PARTITION BY name,merged) as merged_count
FROM concat
)
SELECT cc.name,cc.merged
FROM concat_count cc
WHERE cc.merged_count = (SELECT max(merged_count) FROM concat_count)
GROUP BY cc.name,cc.merged;
SqlFiddleDemo
select name,
newvalue
from (
select name,
CONCAT(loca,locb) newvalue,
COUNT(CONCAT(loca,locb)) as total,
row_number() over (order by COUNT(CONCAT(loca,locb)) desc) as att
from Table1
where loca is not null
and locb is not null
GROUP BY name, CONCAT(loca,locb)
) tt
where att=1

how to insert many records excluding some

I want to create a table with a subset of records from a master table.
for example, i have:
id name code
1 peter 73
2 carl 84
3 jack 73
I want to store peter and carl but not jack because has same peter's code.
I need hight performance because i have 20M records.
I try this:
SELECT id, name, DISTINCT(code) INTO new_tab
FROM old_tab
WHERE (conditions)
but don't work.
Assuming you want to pick the row with the maximum id per code, then this should do it:
insert into new_tab (id, name, code)
(SELECT id, name, code
FROM
(
SELECT id, name, code, rank() as rnk OVER (PARTITION BY code ORDER BY id DESC)
FROM old_tab WHERE rnk = 1
)
)
and for the minimum id per code, just change the sort order in the rank from DESC to ASC:
insert into new_tab (id, name, code)
(SELECT id, name, code
FROM
(
SELECT id, name, code, rank() as rnk OVER (PARTITION BY code ORDER BY id ASC)
FROM old_tab WHERE rnk = 1
)
)
Using a derived table, you can find the minID for each code, then join back to that in the outer to get the rest of the columns for that ID from oldTab.
select id,name,code
insert into newTabFROM
from old_tab t inner join
(SELECT min(id) as minId, code
from old_tab group by code) x
on t.id = x.minId
WHERE (conditions)
Try this:
CREATE TABLE #Temp
(
ID INT,
Name VARCHAR(50),
Code INT
)
INSERT #Temp VALUES (1, 'Peter', 73)
INSERT #Temp VALUES (2, 'Carl', 84)
INSERT #Temp VALUES (3, 'Jack', 73)
SELECT t2.ID, t2.Name, t2.Code
FROM #Temp t2
JOIN (
SELECT t.Code, MIN(t.ID) ID
FROM #temp t
JOIN (
SELECT DISTINCT Code
FROM #Temp
) d
ON t.Code = d.Code
GROUP BY t.Code
) b
ON t2.ID = b.ID

How to select top 3 values from each group in a table with SQL which have duplicates [duplicate]

This question already has answers here:
Select top 10 records for each category
(14 answers)
Closed 5 years ago.
Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person.
The issue can be solved if there are no duplicates in the table by the query provided in this article Select top 3 values from each group in a table with SQL . But if there are duplicates, what is the solution?
For example, if for one name John, he has 5 values related to him. They are 20,7,7,7,4. I need to return the name/value pairs as below order by value descending for each name:
-----------+-------+
| name | value |
-----------+-------+
| John | 20 |
| John | 7 |
| John | 7 |
-----------+-------+
Only 3 rows should be returned for John even though there are three 7s for John.
In many modern DBMS (e.g. Postgres, Oracle, SQL-Server, DB2 and many others), the following will work just fine. It uses CTEs and ranking function ROW_NUMBER() which is part of the latest SQL standard:
WITH cte AS
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
)
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;
Without CTE, only ROW_NUMBER():
SELECT name, value, rn
FROM
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
) tmp
WHERE rn <= 3
ORDER BY name, rn ;
Tested in:
Postgres
Oracle
SQL-Server
In MySQL and other DBMS that do not have ranking functions, one has to use either derived tables, correlated subqueries or self-joins with GROUP BY.
The (tid) is assumed to be the primary key of the table:
SELECT t.tid, t.name, t.value, -- self join and GROUP BY
COUNT(*) AS rn
FROM t
JOIN t AS t2
ON t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;
SELECT t.tid, t.name, t.value, rn
FROM
( SELECT t.tid, t.name, t.value,
( SELECT COUNT(*) -- inline, correlated subquery
FROM t AS t2
WHERE t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
) AS rn
FROM t
) AS t
WHERE rn <= 3
ORDER BY name, rn ;
Tested in MySQL
I was going to downvote the question. However, I realized that it might really be asking for a cross-database solution.
Assuming you are looking for a database independent way to do this, the only way I can think of uses correlated subqueries (or non-equijoins). Here is an example:
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3)
However, each database that you mention (and I note, Hadoop is not a database) has a better way of doing this. Unfortunately, none of them are standard SQL.
Here is an example of it working in SQL Server:
with t as (
select 1 as personid, 5 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 7 as val union all
select 1 as personid, 8 as val
)
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3);
Using GROUP_CONCAT and FIND_IN_SET you can do that.Check SQLFIDDLE.
SELECT *
FROM tbl t
WHERE FIND_IN_SET(t.value,(SELECT
SUBSTRING_INDEX(GROUP_CONCAT(t1.value ORDER BY VALUE DESC),',',3)
FROM tbl t1
WHERE t1.name = t.name
GROUP BY t1.name)) > 0
ORDER BY t.name,t.value desc
If your result set is not so heavy, you can write a stored procedure (or an anonymous PL/SQL-block) for that problem which iterates the result set and finds the bigges three by a simple comparing algorithm.
Try this -
CREATE TABLE #list ([name] [varchar](100) NOT NULL, [value] [int] NOT NULL)
INSERT INTO #list VALUES ('John', 20), ('John', 7), ('John', 7), ('John', 7), ('John', 4);
WITH cte
AS (
SELECT NAME
,value
,ROW_NUMBER() OVER (
PARTITION BY NAME ORDER BY (value) DESC
) RN
FROM #list
)
SELECT NAME
,value
FROM cte
WHERE RN < 4
ORDER BY value DESC
This works for MS SQL. Should be workable in any other SQL dialect that has the ability to assign row numbers in a group by or over clause (or equivelant)
if object_id('tempdb..#Data') is not null drop table #Data;
GO
create table #data (name varchar(25), value integer);
GO
set nocount on;
insert into #data values ('John', 20);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 5);
insert into #data values ('Jack', 5);
insert into #data values ('Jane', 30);
insert into #data values ('Jane', 21);
insert into #data values ('John', 5);
insert into #data values ('John', -1);
insert into #data values ('John', -1);
insert into #data values ('Jane', 18);
set nocount off;
GO
with D as (
SELECT
name
,Value
,row_number() over (partition by name order by value desc) rn
From
#Data
)
SELECT Name, Value
FROM D
WHERE RN <= 3
order by Name, Value Desc
Name Value
Jack 5
Jane 30
Jane 21
Jane 18
John 20
John 7
John 7

Delete Rows with Duplicate Column Data in Specific Columns

I have a table we'll call Table1 with a bunch of junk data in it and no unique identifier column.
I want to select some columns from Table1 and transfer the data to Table2. However, after the transfer, I need to delete rows from Table2 with duplicates in 3 of the columns.
Let's say I have a row in Table2 with columns [FirstName], [LastName], [CompanyName], [City], and [State] that were transferred. I want only the rows with unique combinations of [FirstName], [LastName], and [CompanyName] to remain. To add to the confusion, [LastName] and/or [CompanyName] could contain NULL values. How could I accomplish this? Thanks in advance for any help.
Unique entries can be created using the distinct keyword.
select distinct
FirstName,
LastName,
CompanyName
from MyTable
So if you issue the following command, you will only add distinct values to the new table
insert into newTable
(
FirstName,
LastName,
CompanyName
)
select distinct
FirstName,
LastName,
CompanyName
from MyTable
where not exists (
select 1 from newTable
where newTable.FirstName = MyTable.FirstName
and newTable.LastName = MyTable.LastName
and newTable.CompanyName = MyTable.CompanyName
)
Another nice way to add distinct new values to a table can be done by using the 'MERGE' command.
merge newtable as target
using (select distinct
FirstName,
LastName,
CompanyName
from MyTable
) as source
on target.FirstName = target.FirstName
and target.LastName = target.LastName
and target.CompanyName = target.CompanyName
when not matched by target then
insert (FirstName,
LastName,
CompanyName)
values (target.FirstName,
target.LastName,
target.CompanyName);
The MERGE command gives you the option to control when you want to synchronize tables.
see here example , might be this is what you wanted..link
insert into Table2(`firstname` , `lastname` , `companyname`)
select a.firstname,a.lastname,a.companyname
from
(select distinct(concat(firstname,',',lastname,',',companyname))
,firstname,lastname,companyname from Table1) a;
create table t2
as
select distinct FirstName,LastName,CompanyName,City,State from t1;
with the below query u 'll get to know we have duplicate entries or not.
select FirstName,LastName,CompanyName,count(*) from t2
group by FirstName,LastName,CompanyName
having Count(*) >1;
delete from t2 a where rowid not in (select min(rowid) from t2 b where a.column1=b.column1
and .....);