Delete multiple duplicate rows in table

Delete multiple duplicate rows in table - sql

I have multiple groups of duplicates in one table (3 records for one, 2 for another, etc) - multiple rows where more than 1 exists.
Below is what I came up with to delete them, but I have to run the script for however many duplicates there are:
set rowcount 1
delete from Table
where code in (
select code from Table
group by code
having (count(code) > 1)
)
set rowcount 0
This works well to a degree. I need to run this for every group of duplicates, and then it only deletes 1 (which is all I need right now).

If you have a key column on the table, then you can use this to uniquely identify the "distinct" rows in your table.
Just use a sub query to identify a list of ID's for unique rows and then delete everything outside of this set. Something along the lines of.....
create table #TempTable
(
ID int identity(1,1) not null primary key,
SomeData varchar(100) not null
)
insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData3')
insert into #TempTable(SomeData) values('someData4')
select * from #TempTable
--Records to be deleted
SELECT ID
FROM #TempTable
WHERE ID NOT IN
(
select MAX(ID)
from #TempTable
group by SomeData
)
--Delete them
DELETE
FROM #TempTable
WHERE ID NOT IN
(
select MAX(ID)
from #TempTable
group by SomeData
)
--Final Result Set
select * from #TempTable
drop table #TempTable;
Alternatively you could use a CTE for example:
WITH UniqueRecords AS
(
select MAX(ID) AS ID
from #TempTable
group by SomeData
)
DELETE A
FROM #TempTable A
LEFT outer join UniqueRecords B on
A.ID = B.ID
WHERE B.ID IS NULL

It is frequently more efficient to copy unique rows into temporary table,
drop source table, rename back temporary table.
I reused the definition and data of #TempTable, called here as SrcTable instead, since it is impossible to rename temporary table into a regular one)
create table SrcTable
(
ID int identity(1,1) not null primary key,
SomeData varchar(100) not null
)
insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData3')
insert into SrcTable(SomeData) values('someData4')
by John Sansom in previous answer
-- cloning "unique" part
SELECT * INTO TempTable
FROM SrcTable --original table
WHERE id IN
(SELECT MAX(id) AS ID
FROM SrcTable
GROUP BY SomeData);
GO;
DROP TABLE SrcTable
GO;
sys.sp_rename 'TempTable', 'SrcTable'

You can alternatively use ROW_NUMBER() function to filter out duplicates
;WITH [CTE_DUPLICATES] AS
(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY SomeData ORDER BY SomeData)
FROM #TempTable
)
DELETE FROM [CTE_DUPLICATES] WHERE RN > 1

SET ROWCOUNT 1
DELETE Table
FROM Table a
WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1
WHILE ##rowcount > 0
DELETE Table
FROM Table a
WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1
SET ROWCOUNT 0
this will delete all duplicate rows, But you can add attributes if you want to compare according to them .

Related

how to see difference between 2 tables

I have 2 tables: 1 temp and the other one is my main table.
Each day I would update my temp table and I want to update my main table based on the changes I made from the temp table.
Example: The first temp table contains an id and name. Then I insert the value from temp into the main table. But when I made changes from my temp like insert another id and name, I want my main table to compare and only insert the unique id from the temp table.

As you said, it seems like you have a table object named as temp table. If this is the case then you may use after insert trigger on temp table to insert new inserted value in your main table.
CREATE TRIGGER AfterINSERTTrigger on [Temptable]
FOR INSERT
AS DECLARE #id INT,
#col1 VARCHAR(50),
.
.
SELECT #id = ins.id FROM INSERTED ins;
SELECT #col1 = ins.col1 FROM INSERTED ins;
.
.
INSERT INTO [MainTable](
[id]
,[col1]
.
.)
VALUES (#id,
#col1,
.
.
.
);
PRINT 'We Successfully Fired the AFTER INSERT Triggers in SQL Server.'
GO
Similarly you can update your table on update of record in temptable using update trigger. You may find this link on more info on trigger. LINK
OR
If you are creating temp table object to get the new inserted record then use simple not in or not exists clause to get the newly inserted record.
Using NOT IN
insert into maintable ( id, col1, ...)
select Id , col1, .... from temptable
where id not in (select id from maintable)
Using NOT EXISTS
insert into maintable ( id, col1, ... )
select id, col1, ... from temptable as temp
where not exists (select id from maintable as main where main.id=temp.id)

You can use NOT EXISTS as follows
INSERT into main_table(
id, name,
...
)
SELECT
id,name,
...
FROM temp_table t
WHERE
NOT EXISTS(
SELECT 1
FROM main_table m
WHERE m.id = t.id
)
Cheers!!

How to write a check to avoid the message "INSERT statement conflicted with the FOREIGN KEY constraint"?

I read and understood the entries in following asked question: INSERT statement conflicted with the FOREIGN KEY constraint
. I do get the point, however, I'm in this situation where I have around 1 Gb of records need to be inserted into a table, some of those records have conflicted foreign key. The query looks like this:
IF NOT EXISTS (SELECT * FROM [dbo].[tbl_R_TaskHistory] WHERE [TaskID] =
10000529)
BEGIN insert into [dbo].[tbl_History]
([TaskID],[UserID],[ActD],[RequestD],[No],[SignID],[Completed])
values (10000529,'A0000187',NULL,5738366,0,NULL,CAST(N'2011-03-16
04:53:37.210' AS DateTime)) END
The conflict ocurs on RequestID, so I was thinking there must be a way to make a check to avoid the error messages.
My point is that I want my query to check if the RequestID has not FOREIGN KEY constraint it will not insert this record and move to the next one.

If your query contains only one row, you can just expand the check like this:
IF NOT EXISTS (SELECT * FROM [dbo].[tbl_R_TaskHistory] WHERE [TaskID] = 10000529) AND EXISTS(SELECT 1 FROM [dbo].[...referencing table...] WHERE [RequestD] = 5738366)
BEGIN
insert into [dbo].[tbl_History] ([TaskID],[UserID],[ActD],[RequestD],[No],[SignID],[Completed])
values (10000529,'A0000187',NULL,5738366,0,NULL,CAST(N'2011-03-16 04:53:37.210' AS DateTime));
END
Anyway, if you are inserting many rows at the same time and for performance considerations, it will be better to store the values in buffer table. Something like this:
insert into #tbl_History ([TaskID],[UserID],[ActD],[RequestD],[No],[SignID],[Completed])
values (10000529,'A0000187',NULL,5738366,0,NULL,CAST(N'2011-03-16 04:53:37.210' AS DateTime))
,(...)
,(...)
,(...)
Then, just perform an inner join to your referencing table:
insert into [dbo].[tbl_History] ([TaskID],[UserID],[ActD],[RequestD],[No],[SignID],[Completed])
SELECT [TaskID],[UserID],[ActD],[RequestD],[No],[SignID],[Completed]
FROM #tbl_History A
INNER JOIN [dbo].[...referencing table...] B
ON A.[RequestD] = B.[RequestD];

This syntax also works
declare #a int = 5;
declare #b int = 18;
insert into sample (a, b)
select #a, #b
where not exists (select 1 from sample where b = #b)
and exists (select 1 from student where iden = #a)
This avoids creating a #temp
insert into sample (a, b)
select a, b
from ( values (5,19)
, (5,30)
, (5,31)
, (5,32)
, (7,41)
, (7,42)
) v(a,b)
where not exists (select 1 from sample where b = v.b)
and exists (select 1 from student where iden = v.a)

How to create an auto increment column that is segmented by an other column

I need to create a table that will contain a incremental id, but I would like the ids be automatically segmented according to an other column. Here is what I want :
CREATE TABLE dbo.MyTable (
myKey INT IDENTITY PRIMARY KEY,
category INT,
incrementalId INT
);
INSERT INTO dbo.MyTable (category) VALUES (100);
INSERT INTO dbo.MyTable (category) VALUES (200);
INSERT INTO dbo.MyTable (category) VALUES (100);
INSERT INTO dbo.MyTable (category) VALUES (100);
INSERT INTO dbo.MyTable (category) VALUES (100);
INSERT INTO dbo.MyTable (category) VALUES (200);
SELECT *
FROM dbo.MyTable;
I would like this to display something like :
myKey category incrementalId
----------- ----------- -------------
1 100 1
2 200 1
3 100 2
4 100 3
5 100 4
6 200 2
Meaning I want the incrementalId to be automatically incremented per category and restart from 1 for any new category inserted. I want this to be done by itself on any inserts in the table (I don't want to have to remember to do that when I insert in this table).
I think this might be done with window functions and maybe a trigger, but I just can't figure how.
EDIT:
I would like the data to be persisted to avoid incrementalId to be shifted if data deletion happens. Also, ideally the same ID would not be re-given in the event of rows deletion (the same way that sequences or IDENTITY works)
Any idea ?

CREATE TABLE dbo.MyTable (
myKey INT IDENTITY PRIMARY KEY,
category INT,
incrementalId INT
);
GO
create table dbo.nextCategoryID (
category int,
nextidvalue int,
constraint PK_nextCategoryID primary key clustered( category, nextidvalue )
);
GO
create trigger numberByCategory on dbo.MyTable
after insert as
-- Automatically add any net new category
insert into dbo.nextCategoryID ( category, nextidvalue )
select distinct category, 1 as nextidvalue
from inserted
where not exists ( select * from dbo.nextCategoryID s
where s.category = inserted.category );
-- Number the new rows in each incoming category
with numberedrows as (
select
i.myKey,
i.category,
n.nextidvalue - 1 + row_number() over ( partition by i.category order by i.category ) as incrementalId
from inserted i
join dbo.nextCategoryID n on i.category = n.category
)
update m
set incrementalId = n.incrementalId
from dbo.MyTable m
join inserted i on m.myKey = i.myKey
join numberedrows n on n.myKey = i.myKey;
update dbo.nextCategoryID
set nextidvalue = 1 + ( select max( m.incrementalId )
from inserted i
join dbo.MyTable m on i.myKey = m.myKey
where i.category = nextCategoryID.category
)
where exists ( select *
from inserted i
where i.category = nextCategoryID.category
);
GO
-- Test data
INSERT INTO dbo.MyTable (category) VALUES (100);
INSERT INTO dbo.MyTable (category) VALUES (200);
INSERT INTO dbo.MyTable (category) VALUES (100);
INSERT INTO dbo.MyTable (category) VALUES (100);
INSERT INTO dbo.MyTable (category) VALUES (100);
INSERT INTO dbo.MyTable (category) VALUES (200);
insert into dbo.MyTable (category)
values
( 200 ),
( 200 ),
( 100 ),
( 300 ),
( 400 ),
( 400 )
SELECT *
FROM dbo.MyTable;

You can easily achieved this via a trigger:
CREATE TRIGGER dbo.UpdateIncrementalID
ON dbo.MyTable
AFTER INSERT
AS
UPDATE x
SET incrementalId = ROW_NUMBER() OVER(PARTITION BY category ORDER BY myKey DESC)
FROM dbo.MyTable x

I think you don't need to add additional column 'IncrementalID' in your table.
You can make it in your select statement.
SELECT myKey,category,ROW_NUMBER() OVER(PARTITION BY category ORDER BY myKey )incrementalId
FROM MyTable
ORDER BY myKey
sample output.
Else you can create a view from your actual table.
CREATE VIEW dbo.VIEW_MyTable
AS
SELECT myKey,category,ROW_NUMBER() OVER(PARTITION BY category ORDER BY myKey )incrementalId
FROM MyTable
ORDER BY myKey

You can update the same table using below Update query
;with cte as (
select mykey, category, incrementalid, row_number() over (partition by category order by mykey,category) as rn from MyTable
)
update cte
set incrementalId = rn

Extending #Kannan's solution into a UDF that's called from a compute column:
create function dbo.fnsPartId(#mykey int)
returns int
as
begin
declare #Ret int
;
with
enum as
(
select mykey, category, incrementalid,
row_number() over (partition by category order by mykey, category) as rn
from MyTable
)
select #Ret = rn from enum where mykey = #mykey
return #Ret
end
And modify the table as:
CREATE TABLE dbo.MyTable (
myKey INT IDENTITY PRIMARY KEY,
category INT,
incrementalId AS ([dbo].[fnsPartId]([mykey]))
);

Try to create a default constraint on the column. Use a function to generate next value for the row as a default row returned by the function.

Please try this (added after insert trigger on this table)-
create trigger InsertIncrementalID
on dbo.MyTable
after insert
as
begin
update mt
set incrementalId = (select count(mt1.category) from dbo.MyTable mt1 where mt1.category = mt.category)
from dbo.MyTable mt
inner join inserted i on i.myKey = mt.myKey
end
Please remember two points while using trigger -
1. We are updating table from inside trigger so if you have any other trigger(after update) on this table, that trigger will be executed too.
2. While inserting multiple rows in this table with single select query, this trigger will be executed only once.

How I can copy ID from table A to table B

This is SQL code:
Declare #A_ID AS A_ID
insert into TBL_FO VALUES(NEWID(), #A_ID, '30000.00','1','1')
(select * from TBL_DETAIL where A_ID = '59366409-2EB6-49BC-A88F-801692B735D6')
I want to follow or copy the A_ID from TBL_DETAIL as '59366409-2EB6-49BC-A88F-801692B735D6' to TBL_FO for #A_ID.. How I can declare the A_ID for copy the same ID

Try this:
Declare #A_ID as varchar(100) -- adjust the length as needed
select #A_ID = A_ID from TBL_DETAIL where A_ID = '59366409-2EB6-49BC-A88F-801692B735D6'
Declare #FO_ID as int -- To insert this into One or more tables
SET #FO_ID = NEWID() -- Initialize it one time.
insert into TBL_FO VALUES(#FO_ID, #A_ID, '30000.00','1','1') -- First use of #FO_ID
insert into TBL_SOMEOTHERTBL VALUES( #FO_ID, .... ) -- Second use of #FO_ID
etc ...

Specify the columns you want to set and then follow with the subquery having the values you need in addition to the A_ID(in the order of your column names) :
insert
into
TBL_FO (FirstColumnName,
SecondColumnName,
ThirdColumnName,
FourthColumnName,
FifthColumnName)
(select
NEWID(),
A_ID,
'30000.00',
'1',
'1'
from
TBL_DETAIL
where
A_ID = '59366409-2EB6-49BC-A88F-801692B735D6')
Note: there is also the option to not specify the column names if the table columns are in the same order as you insert them

deleting duplicate rows?

i want to delete duplicate rows from my table on the basis of category ID, but don't want to delete all, i want to left one rows if there are more than one row with the same category ID.
this is my query i am making i need to change it.
delete from twinhead_tblcategory where categoryid in (select categoryid from twinhead_tblcategory group by categoryid having count(categoryid) > 1 )

For SQL Server you can do it:
WITH MyTableCTE (CategoryId, RowNumber)
AS
(
SELECT CategoryId, ROW_NUMBER() OVER (ORDER BY CategoryId) AS 'RowNumber'
FROM MyTable
)
Delete From MyTableCTE Where RowNumber > 1

Do a select distinct into a new table, delete the old one and rename the new one into old table name.

If your rows have a distinct id column, then this should work:
DELETE t1 FROM your_table t1, your_table t2
WHERE t1.column1 = t2.column1 AND t1.column2 = t2.column2
AND ... /* check equality of all relevant columns */
AND t1.id < t2.id

Check here for sql server - http://support.microsoft.com/kb/139444 - that should get you started.

This is probably heavy-handed but perhaps you could select distinct * into a temp table, then truncate the table, then insert into the table the contents of the temp table. Foreign key constraints may prevent this, though.

For SqlServer, you could use a cursor to loop through all items, ordered by that categoryID.
Is the current ID the same as the previous one? Then delete it, see example C of this article.
Else remember the ID for the next round.

You have several way for delete duplicate rows.
for my solutions , first consider this table for example
CREATE TABLE #Employee
(
ID INT,
FIRST_NAME NVARCHAR(100),
LAST_NAME NVARCHAR(300)
)
INSERT INTO #Employee VALUES ( 1, 'Vahid', 'Nasiri' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );
First solution : Use another table for duplicate rows.
SELECT DISTINCT *
FROM #Employee
SELECT * INTO #DuplicateEmployee
FROM #Employee
INSERT #DuplicateEmployee
SELECT DISTINCT *
FROM #Employee
BEGIN TRAN
DELETE #Employee
INSERT #Employee
SELECT *
FROM #DuplicateEmployee
COMMIT TRAN
DROP TABLE #DuplicateEmployee
SELECT DISTINCT *
FROM #Employee
Second solution :
SELECT DISTINCT * FROM #Employee
SELECT * INTO #DuplicateEmployee FROM #Employee
INSERT #DuplicateEmployee
SELECT ID,
FIRST_NAME,
LAST_NAME
FROM #Employee
GROUP BY
ID,FIRST_NAME,LAST_NAME
HAVING COUNT(*) > 1
BEGIN TRAN
DELETE #Employee
FROM #DuplicateEmployee
WHERE #Employee.ID = #DuplicateEmployee.ID
AND #Employee.FIRST_NAME = #DuplicateEmployee.FIRST_NAME
AND #Employee.LAST_NAME = #DuplicateEmployee.LAST_NAME
INSERT #Employee
SELECT *
FROM #DuplicateEmployee
COMMIT TRAN
DROP TABLE #DuplicateEmployee
SELECT DISTINCT * FROM #Employee
teared solution : use rowcount
SELECT DISTINCT *
FROM #Employee
SET ROWCOUNT 1
SELECT 1
WHILE ##rowcount > 0
DELETE #Employee
WHERE 1 < (
SELECT COUNT(*)
FROM #Employee a2
WHERE #Employee.ID = a2.ID
AND #Employee.FIRST_NAME = a2.FIRST_NAME
AND #Employee.LAST_NAME = a2.LAST_NAME
)
SET ROWCOUNT 0
SELECT DISTINCT *
FROM #Employee
Fourth solution : use Analytical Functions
SELECT DISTINCT *
FROM #Employee;
WITH #DeleteEmployee AS (
SELECT ROW_NUMBER()
OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
RNUM
FROM #Employee
)
DELETE
FROM #DeleteEmployee
WHERE RNUM > 1
SELECT DISTINCT *
FROM #Employee
Fifth solution : Use identity field
SELECT DISTINCT *
FROM #Employee;
ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)
DELETE
FROM #Employee
WHERE UNIQ_ID < (
SELECT MAX(UNIQ_ID)
FROM #Employee a2
WHERE #Employee.ID = a2.ID
AND #Employee.FIRST_NAME = a2.FIRST_NAME
AND #Employee.LAST_NAME = a2.LAST_NAME
)
ALTER TABLE #Employee DROP COLUMN UNIQ_ID
SELECT DISTINCT *
FROM #Employee
and end of all solution use this command
DROP TABLE #Employee
Source of my answer is this site

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete multiple duplicate rows in table - sql

You can alternatively use ROW_NUMBER() function to filter out duplicates ;WITH [CTE_DUPLICATES] AS ( SELECT RN = ROW_NUMBER() OVER (PARTITION BY SomeData ORDER BY SomeData) FROM #TempTable ) DELETE FROM [CTE_DUPLICATES] WHERE RN > 1

Related

how to see difference between 2 tables

How to write a check to avoid the message "INSERT statement conflicted with the FOREIGN KEY constraint"?

How to create an auto increment column that is segmented by an other column

How I can copy ID from table A to table B

deleting duplicate rows?

Categories

Resources