Delete Rows with Duplicate Column Data in Specific Columns

Delete Rows with Duplicate Column Data in Specific Columns - sql

I have a table we'll call Table1 with a bunch of junk data in it and no unique identifier column.
I want to select some columns from Table1 and transfer the data to Table2. However, after the transfer, I need to delete rows from Table2 with duplicates in 3 of the columns.
Let's say I have a row in Table2 with columns [FirstName], [LastName], [CompanyName], [City], and [State] that were transferred. I want only the rows with unique combinations of [FirstName], [LastName], and [CompanyName] to remain. To add to the confusion, [LastName] and/or [CompanyName] could contain NULL values. How could I accomplish this? Thanks in advance for any help.

Unique entries can be created using the distinct keyword.
select distinct
FirstName,
LastName,
CompanyName
from MyTable
So if you issue the following command, you will only add distinct values to the new table
insert into newTable
(
FirstName,
LastName,
CompanyName
)
select distinct
FirstName,
LastName,
CompanyName
from MyTable
where not exists (
select 1 from newTable
where newTable.FirstName = MyTable.FirstName
and newTable.LastName = MyTable.LastName
and newTable.CompanyName = MyTable.CompanyName
)
Another nice way to add distinct new values to a table can be done by using the 'MERGE' command.
merge newtable as target
using (select distinct
FirstName,
LastName,
CompanyName
from MyTable
) as source
on target.FirstName = target.FirstName
and target.LastName = target.LastName
and target.CompanyName = target.CompanyName
when not matched by target then
insert (FirstName,
LastName,
CompanyName)
values (target.FirstName,
target.LastName,
target.CompanyName);
The MERGE command gives you the option to control when you want to synchronize tables.

see here example , might be this is what you wanted..link
insert into Table2(`firstname` , `lastname` , `companyname`)
select a.firstname,a.lastname,a.companyname
from
(select distinct(concat(firstname,',',lastname,',',companyname))
,firstname,lastname,companyname from Table1) a;

create table t2
as
select distinct FirstName,LastName,CompanyName,City,State from t1;
with the below query u 'll get to know we have duplicate entries or not.
select FirstName,LastName,CompanyName,count(*) from t2
group by FirstName,LastName,CompanyName
having Count(*) >1;
delete from t2 a where rowid not in (select min(rowid) from t2 b where a.column1=b.column1
and .....);

Related

SQL Get Duplicated based on all columns [duplicate]

This question already has answers here:
SQL query to find duplicate rows, in any table
(4 answers)
Closed 4 years ago.
SELECT FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM CUSTOMER
GROUP BY FirstName, LastName, MobileNo;
Something like this will produce duplicates of the table Customer based on FirstName, LastName and MobileNo. However, I would like to produce a list of duplicates based on ALL columns (which are unknown). How would I accomplish this?

You could use checksum(*)
sqlfiddle.com/#!18/0a33d/4
Ex.
CREATE TABLE TEST_DATA
( Field1 VARCHAR(10),
Field2 VARCHAR(10)
);
INSERT INTO TEST_DATA VALUES ('1','1');
INSERT INTO TEST_DATA VALUES ('1','1');
INSERT INTO TEST_DATA VALUES ('2','2');
INSERT INTO TEST_DATA VALUES ('2','2');
INSERT INTO TEST_DATA VALUES ('2','2');
INSERT INTO TEST_DATA VALUES ('3','3');
SELECT TD1_CS.*
FROM (SELECT TD1.*,
CHECKSUM(*) CS1
FROM TEST_DATA TD1
) TD1_CS
INNER
JOIN (SELECT CHECKSUM(*) CS2
FROM TEST_DATA TD2
GROUP
BY CHECKSUM(*)
HAVING COUNT(*) > 1
) TD2_CS
ON TD1_CS.CS1 = TD2_CS.CS2

SELECT FirstName, LastName, MobileNo, COUNT(*)
FROM CUSTOMER
GROUP BY FirstName, LastName, MobileNo
HAVING COUNT(*) > 1

Insert or update data using cte_results in SQL Server

I have a query having cte with number of columns, I want to insert a record if ID from the results of that query does not exist in table that I am inserting, or if the ID exists I want to update data using that ID.
So far I have tried this:
WITH cte_base as(
SELECT DISTINCT ID, statusID
FROM testtable
)
SELECT *
FROM cte_base
IF EXISTS(SELECT * FROM Newtable WHERE EXISTS (SELECT ID FROM cte_base))
UPDATE newtable
SET statusID = 2
WHERE Newtable.ID = cte_base.ID
ELSE
INSERT INTO newtable(ID, statusID)
SELECT ID, statusID
FROM cte_base
WHERE Newtable.ID <> cte_base.ID
I have to run this query against live data, hence I would like to know if my logic is correct.

Basic merge example based on your provided code.
MERGE INTO NewTable AS T
USING
(
SELECT DISTINCT ID,statusID
FROM testtable
) AS S
ON S.ID = T.ID
WHEN MATCHED THEN SET
T.StatusID = 2
WHEN NOT MATCHED INSERT (ID,statusID)
VALUES (S.ID,S.statusID)
;

What are you trying to do?
EXISTS (SELECT ID FROM cte_base)
If cte_base has any records that will be true every time
That is no different than
SELECT DISTINCT ID, statusID
FROM testtable
And will be true every time if there are any records in testtable

Counting repeated data

I'm trying to get maximum repeat of integer in table I tried many ways but could not make it work. The result I'm looking for is as:
"james";"108"
As this 108 when I concat of two fields loca+locb repeated two times but others did not I try below sqlfiddle link with sample table structure and the query I tried... sqlfiddle link
Query I tried is :
select * from (
select name,CONCAT(loca,locb),loca,locb
, row_number() over (partition by CONCAT(loca,locb) order by CONCAT(loca,locb) ) as att
from Table1
) tt
where att=1
please click here so you can see complete sample table and query I tried.
Edite: adding complete table structure and data:
CREATE TABLE Table1
(name varchar(50),loca int,locb int)
;
insert into Table1 values ('james',100,2);
insert into Table1 values ('james',100,3);
insert into Table1 values ('james',10,8);
insert into Table1 values ('james',10,8);
insert into Table1 values ('james',10,7);
insert into Table1 values ('james',10,6);
insert into Table1 values ('james',0,7);
insert into Table1 values ('james',10,0);
insert into Table1 values ('james',10);
insert into Table1 values ('james',10);
and what I'm looking for is to get (james,108) as that value is repeated two time in entire data, there is repetion of (james,10) but that have null value of loca so Zero value and Null value is to be ignored only those to be considered that have value in both(loca,locb).

SQL Fiddle
select distinct on (name) *
from (
select name, loca, locb, count(*) as total
from Table1
where loca is not null and locb is not null
group by 1,2,3
) s
order by name, total desc

WITH concat AS (
-- get concat values
SELECT name,concat(loca,locb) as merged
FROM table1 t1
WHERE t1.locb NOTNULL
AND t1.loca NOTNULL
), concat_count AS (
-- calculate count for concat values
SELECT name,merged,count(*) OVER (PARTITION BY name,merged) as merged_count
FROM concat
)
SELECT cc.name,cc.merged
FROM concat_count cc
WHERE cc.merged_count = (SELECT max(merged_count) FROM concat_count)
GROUP BY cc.name,cc.merged;

SqlFiddleDemo
select name,
newvalue
from (
select name,
CONCAT(loca,locb) newvalue,
COUNT(CONCAT(loca,locb)) as total,
row_number() over (order by COUNT(CONCAT(loca,locb)) desc) as att
from Table1
where loca is not null
and locb is not null
GROUP BY name, CONCAT(loca,locb)
) tt
where att=1

Optimal way of determining number of table entries that are duplicate on particular columns

I need to determine whether particular table rows are unique on particular columns. Currently I'm doing this using a subquery like so:
SELECT
t1.ID,
(SELECT COUNT(*)
FROM MyTable AS t2
WHERE (t2.FirstName = t1.FirstName) AND (t2.Surname = t1.Surname)
) AS cnt
FROM MyTable AS t1
WHERE t1.ID IN (100, 101, 102);
Which works fine. However, I'd like to know if anyone knows of a more efficient way of achieving the same result than using a subquery.
I'm doing this on an Azure SQL Server, by the way.

You could use a group by like this:
SELECT
t1.FirstName,
t1.Surname,
COUNT(t1.ID) as cnt
FROM MyTable AS t1
WHERE t1.ID IN (100, 101, 102)
GROUP BY t1.FirstName, t1.Surname
ORDER BY cnt DESC
You can add a HAVING cnt > 1 after the GROUP BY clause if you want to filter only the dupplicates.
However, that depends if you need the ID column as well, if you do, you might have to use a subquery.
Here you can find more information on the subject:
http://technet.microsoft.com/en-us/library/ms177673.aspx

I don't know how this will compare with your query in your environment but I would expect this to perform better:
Select id, qty
From mytable
Inner join
(
Select firstname, surname, count(0) as qty
From mytable
Group by firstname, surname
) as qtytable
On mytable.firstname = qtytable.firstname and mytable.surname = qtytable.surname

I think a more efficient way would be to either use the COUNT function with OVER clause or ROW_NUMBER ranking function
SELECT ID, COUNT(*) OVER(PARTITION BY FirstName, Surname) AS cnt
FROM MyTable
WHERE ID IN (100, 101, 102)
OR
SELECT ID, ROW_NUMBER() OVER(PARTITION BY FirstName, Surname ORDER BY ID) AS rn
FROM MyTable
WHERE ID IN (100, 101, 102)
ROW_NUMBER returns the sequential number of a row within a partition
of a result set, starting at 1 for the first row in each partition.

A little extreme but since need to use IN (100, 101, 102) twice then create #temp
CREATE TABLE #temp(
[ID] [int] NOT NULL,
[fname] [varchar](50) NOT NULL,
[lname] [varchar](50) NOT NULL);
insert into #temp([ID],[fname],[lname])
SELECT ID, FirstName, Surname
FROM MyTable
WHERE ID IN (100, 101, 102);
select t1.ID, t2.count
from #temp as t1
join
(
select [fname],[lname], count(*) as count
from #temp
group by [fname],[lname]
) as t2
on t1.[fname] = t2.[fname]
and t1.[lname] = t2.[lname];
Solution from Alexander is probably better
For sure it is less code

SQL DISTINCT Value Question

How can I filter my results in a Query? example
I have 5 Records
John,Smith,apple
Jane,Doe,apple
Fred,James,apple
Bill,evans,orange
Willma,Jones,grape
Now I want a query that would bring me back 3 records with the DISTINCT FRUIT, BUT... and here is the tricky part, I still want the columns for First Name , Last Name. PS I do not care which of the 3 it returns mind you, but I need it to only return 3 (or what ever how many DISTINCT fruit there are.
ex return would be
John,Smith,apple
Bill,evans,orange
Willma,Jones,grape
Thanks in advance I've been banging my head on this all day.

Oddly enough, the best solution doesn't involve GROUP BY.
WITH DistinctFruit AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY Fruit ORDER BY LastName) AS FruitNo,
LastName,
FirstName,
Fruit
FROM table)
SELECT FirstName, LastName, Fruit
FROM DistinctFruit
WHERE FruitNo = 1;

If you have a small amount of data (not tens of thousands of rows), you can do sub-queries.
select distinct t1.fruit as Fruit,
(select top 1 t2.lastname
from t1 as t2
where t1.fruit = t2.fruit
order by t2.lastname) as LastName,
(select top 1 t2.firstname
from t1 as t2
where t1.fruit = t2.fruit
order by t2.lastname, t2.firstname) as FirstName
from t1
Note the FirstName column is sorted the same as the LastName column. This will give you a matching last name with the correct first name.
Here is my test data:
create table t1
(firstname varchar(20),
lastname varchar(20),
fruit varchar(20))
insert into t1
values ('John','Smith','apple')
insert into t1
values ('Jane','Doe','apple')
insert into t1
values ('Fred','James','apple')
insert into t1
values ('Bill','evans','orange')
insert into t1
values ('Willma','Jones','grape')

Just another solution
select distinct x.*,fruit from t1
cross apply
(select top 1 firstname, lastname from t1 t2 where t1.fruit=t2.fruit) x

SELECT DISTINCT x.*,fruit FROM peopleFruit pf
CROSS APPLY
(SELECT TOP 1 firstname, lastname FROM peopleFruit pf1 WHERE pf.fruit=pf1.fruit) x

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete Rows with Duplicate Column Data in Specific Columns - sql

see here example , might be this is what you wanted..link insert into Table2(`firstname` , `lastname` , `companyname`) select a.firstname,a.lastname,a.companyname from (select distinct(concat(firstname,',',lastname,',',companyname)) ,firstname,lastname,companyname from Table1) a;

Related

SQL Get Duplicated based on all columns [duplicate]

Insert or update data using cte_results in SQL Server

Counting repeated data

Optimal way of determining number of table entries that are duplicate on particular columns

SQL DISTINCT Value Question

Categories

Resources