deleting duplicate rows?

deleting duplicate rows? - sql

i want to delete duplicate rows from my table on the basis of category ID, but don't want to delete all, i want to left one rows if there are more than one row with the same category ID.
this is my query i am making i need to change it.
delete from twinhead_tblcategory where categoryid in (select categoryid from twinhead_tblcategory group by categoryid having count(categoryid) > 1 )

For SQL Server you can do it:
WITH MyTableCTE (CategoryId, RowNumber)
AS
(
SELECT CategoryId, ROW_NUMBER() OVER (ORDER BY CategoryId) AS 'RowNumber'
FROM MyTable
)
Delete From MyTableCTE Where RowNumber > 1

Do a select distinct into a new table, delete the old one and rename the new one into old table name.

If your rows have a distinct id column, then this should work:
DELETE t1 FROM your_table t1, your_table t2
WHERE t1.column1 = t2.column1 AND t1.column2 = t2.column2
AND ... /* check equality of all relevant columns */
AND t1.id < t2.id

Check here for sql server - http://support.microsoft.com/kb/139444 - that should get you started.

This is probably heavy-handed but perhaps you could select distinct * into a temp table, then truncate the table, then insert into the table the contents of the temp table. Foreign key constraints may prevent this, though.

For SqlServer, you could use a cursor to loop through all items, ordered by that categoryID.
Is the current ID the same as the previous one? Then delete it, see example C of this article.
Else remember the ID for the next round.

You have several way for delete duplicate rows.
for my solutions , first consider this table for example
CREATE TABLE #Employee
(
ID INT,
FIRST_NAME NVARCHAR(100),
LAST_NAME NVARCHAR(300)
)
INSERT INTO #Employee VALUES ( 1, 'Vahid', 'Nasiri' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );
First solution : Use another table for duplicate rows.
SELECT DISTINCT *
FROM #Employee
SELECT * INTO #DuplicateEmployee
FROM #Employee
INSERT #DuplicateEmployee
SELECT DISTINCT *
FROM #Employee
BEGIN TRAN
DELETE #Employee
INSERT #Employee
SELECT *
FROM #DuplicateEmployee
COMMIT TRAN
DROP TABLE #DuplicateEmployee
SELECT DISTINCT *
FROM #Employee
Second solution :
SELECT DISTINCT * FROM #Employee
SELECT * INTO #DuplicateEmployee FROM #Employee
INSERT #DuplicateEmployee
SELECT ID,
FIRST_NAME,
LAST_NAME
FROM #Employee
GROUP BY
ID,FIRST_NAME,LAST_NAME
HAVING COUNT(*) > 1
BEGIN TRAN
DELETE #Employee
FROM #DuplicateEmployee
WHERE #Employee.ID = #DuplicateEmployee.ID
AND #Employee.FIRST_NAME = #DuplicateEmployee.FIRST_NAME
AND #Employee.LAST_NAME = #DuplicateEmployee.LAST_NAME
INSERT #Employee
SELECT *
FROM #DuplicateEmployee
COMMIT TRAN
DROP TABLE #DuplicateEmployee
SELECT DISTINCT * FROM #Employee
teared solution : use rowcount
SELECT DISTINCT *
FROM #Employee
SET ROWCOUNT 1
SELECT 1
WHILE ##rowcount > 0
DELETE #Employee
WHERE 1 < (
SELECT COUNT(*)
FROM #Employee a2
WHERE #Employee.ID = a2.ID
AND #Employee.FIRST_NAME = a2.FIRST_NAME
AND #Employee.LAST_NAME = a2.LAST_NAME
)
SET ROWCOUNT 0
SELECT DISTINCT *
FROM #Employee
Fourth solution : use Analytical Functions
SELECT DISTINCT *
FROM #Employee;
WITH #DeleteEmployee AS (
SELECT ROW_NUMBER()
OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
RNUM
FROM #Employee
)
DELETE
FROM #DeleteEmployee
WHERE RNUM > 1
SELECT DISTINCT *
FROM #Employee
Fifth solution : Use identity field
SELECT DISTINCT *
FROM #Employee;
ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)
DELETE
FROM #Employee
WHERE UNIQ_ID < (
SELECT MAX(UNIQ_ID)
FROM #Employee a2
WHERE #Employee.ID = a2.ID
AND #Employee.FIRST_NAME = a2.FIRST_NAME
AND #Employee.LAST_NAME = a2.LAST_NAME
)
ALTER TABLE #Employee DROP COLUMN UNIQ_ID
SELECT DISTINCT *
FROM #Employee
and end of all solution use this command
DROP TABLE #Employee
Source of my answer is this site

Related

Looping over the insert data into multiple tables

I have a query where I need to run to do manual inserts
I can do it but there are many records and was looking if I can build something.
I have a structure somewhat like this:
Have 4 id of a table - primary key values as:
var ids = "1,2,3,4";
loop over ids {
insert into table1(col1,col2,col3)
select col1,newid(),getdate() from table1 where id = ids - 1 at a time
var selectedID = get the id of the inserted row and then insert into anotehr table as:
insert into table2(col1,col2,col3,col4)
select selectedID, getdate(),getdate(),4 from table2 where fkID = ids - one at a time
}

You can use both loops and cursors but often they can be avoided.
Is there a specific reason you note you want them inserted one at a time? An alternative would be to have the IDs staged, in a temp table, or CTE, e.g.
;WITH [Ids] AS
(
SELECT '1' AS [ID]
UNION
SELECT '2'
UNION
SELECT '3'
UNION
SELECT '4'
)
INSERT INTO [Table1]
(
[Col1],
[Col2],
[Col3]
)
SELECT [Col1],
NEWID(),
GETDATE()
FROM [Table1] T
INNER JOIN [Ids] I ON I.[ID] = T.[Id];
Which avoids the need for any loops, and should perform much better.
Edit
The way I would structure this, to make the query reusable would be as follows:
IF OBJECT_ID('tempdb..#IDS') IS NOT NULL
BEGIN
DROP TABLE #IDS
END
IF OBJECT_ID('tempdb..#Inserted_IDS') IS NOT NULL
BEGIN
DROP TABLE #Inserted_IDS
END
CREATE TABLE #IDS
(
ID INT
);
CREATE TABLE #Inserted_IDS
(
ID INT,
);
INSERT INTO #IDS
(
ID
)
SELECT 1 UNION
SELECT 2 UNION
SELECT 3 UNION
SELECT 4;
INSERT INTO [Table1]
(
[Col1],
[Col2],
[Col3]
)
OUTPUT Inserted.ID
INTO #Inserted_IDS
SELECT [Col1],
NEWID(),
GETDATE()
FROM [Table1] T
INNER JOIN #IDS I ON I.[ID] = T.[Id];
INSERT INTO [table2]
(
[col1],
[col2],
[col3],
[col4]
)
SELECT I.[ID],
getdate(),
getdate(),
4
FROM [#Inserted_IDS] I
DROP TABLE #IDS;
DROP TABLE #Inserted_IDS;
Therefore you only need to amend the IDs being entered into the temp table each time you need to do the inserts.

SQL Insert row when a cell does not contain a specific set of characters

My database has a column named Group.
This Group can be one of two values:
Group101 = Main group
Group101D1 = Subgroup
Every group has two options like this.
But I have some situations where Group101D1 exists but Group101 does not.
Now I want to create an insert where I search for groups with D1 that doesn't have a main group. for example I have Group105D1 but don't have Group105. I want an insert to create a row with Group105.
This is as far as I have come:
INSERT INTO (Group)
SELECT [Table1].[Group], [Table2].[Group]
FROM [Table] Table1
INNER JOIN [Table] Table2 ON [table1].[Group] = [Table2].[Group]
-- WHERE [Table2].[Group] LIKE '%D1'
-- AND [Table1].[Group] NOT LIKE '%D1'
Can some of you please help, I don't know how to finish this.
I know I probably need to use inner join, replace and a where not clause.

You will need replace Also query should exclude rows already exists
Sample data:
DECLARE #groups TABLE ([Group] VARCHAR(100), description VARCHAR(100))
INSERT #groups ([Group], description)
SELECT 'Group101', 'Something1'
UNION ALL
SELECT 'Group101d1', 'Something1'
UNION ALL
SELECT 'Group105d1', 'description_Bleh'
UNION ALL
SELECT 'Group2054', 'desc_2054'
UNION ALL
SELECT 'Group2054d1', 'desc_2054'
Use replace function and exclude Maingroup if it exists
SELECT Replace([group], 'd1', '') [Group], description
FROM #groups
WHERE [Group] NOT IN (
SELECT p.[Group]
FROM #groups g
INNER JOIN #groups p
ON g.[Group] = replace(p.[Group], 'd1', '')
)
Result:
Group description
Group105 description_Bleh

If all subgroups ends with "D1" you can use below query to insert missing main groups.
INSERT INTO (Group)
SELECT
left(subtable.Group,(len(subtable.Group)-len('D1')))
FROM
[Table1].[Group] subtable
where
charindex ( 'D1', subtable.Group) > 0 -- if it is sub-record
and
not exists --check if main group exists
(SELECT
1
FROM
[Table1].[Group] main
where (charindex ( main.Group+'D1', subtable.Group) != 0)
)

Stuff you have already just packaged with data
DECLARE #groups TABLE (grp VARCHAR(100), description VARCHAR(100));
INSERT #groups (grp, description) values
('Group101', 'Something1')
, ('Group101d1', 'Something1')
, ('Group105d1', 'description_Bleh')
, ('Group106d1', 'description_Bleh6')
, ('Group2054', 'desc_2054')
, ('Group2054d1', 'desc_2054');
select * from #groups order by grp;
insert into #groups
select replace(g1.grp, 'd1', ''), g1.description
from #groups g1
where g1.grp like '%d1'
and not exists ( select 1
from #groups g2
where g2.Grp = replace(g1.grp, 'd1', '')
);
select * from #groups order by grp;

You can use NOT EXISTS and REPLACE to get the desired results like below :
INSERT INTO Table2(Group)
select replace([Group], 'd1', '') from Table1 a
where [Group] like '%d1' and
not exists(
select 1 from Table1 where [group] = replace(a.[Group], 'd1', '')
)

I did it this way in the end!
insert into Table (Group, Description)
select replace(Group,'D1','') , description from Table
where Group like '%D1'
and replace(Group,'D1','') not in (
select Group from Table
where Group like 'Group%' and Group not like '%D1')

Inserting temp table values into a table.

I have a temp table declared
declare #tmptable(
value nvarchar(500) not null
);
I use a function to insert values into that temp table.
I am trying to figure out how to update a table using the values of #tmptable
insert into t1 (
active
,SchoolId
,inserted
)
select
1
,temp.value
,#insertedDate
select temp.value from #tmptable;
When i try to insert in table t1 it doesn't work. I guess there are two Select statements is causing the problem. Please let me know how to fix it. Thanks

Try this one -
INSERT INTO dbo.t1
(
Active
, SchoolId
, Inserted
)
SELECT
1
, t.value
, #insertedDate
FROM #tmptable t;

INSERT INTO t1
(
ACTIVE
,SchoolId
,INSERTED
)
SELECT 1
,temp.value
,#insertedDate
FROM #tmptable temp;

insert into t1 (
active
,SchoolId
,inserted
)
select
1
,temp.value
,#insertedDate
from #tmptable;
this will work...

Union temporary tables to a final temporary table

I have like 10 diff temporary tables created in SQL server, what I am looking to do is union them all to a final temporary table holding them all on one table. All the tables have only one row and look pretty much exactly like the two temp tables below.
Here is what I have so far this is an example of just two of the temp tables as their all exactly like this one then #final is the table I want to union the all to:
create table #lo
(
mnbr bigint
)
insert into #login (mnbr)
select distinct (_ID)
FROM [KDB].[am_LOGS].[dbo].[_LOG]
WHERE time >= '2012-7-26 9:00:00
Select count(*) as countreject
from #lo
create table #pffblo
(
mber_br
)
insert into #pffblo (mber_br)
select distinct (mber_br)
from individ ip with (nolock)
join memb mp with (nolock)
on( ip.i_id=mp.i_id and mp.p_type=101)
where ip.times >= '2012-9-26 11:00:00.000'
select count(*) as countaccept
create table #final
(
countreject bigint
, Countacceptbigint
.....
)
insert into #final (Countreject, Countaccept....more rows here...)
select Countreject, Countaccept, ...more rows selected from temp tables.
from #final
union
(select * from #lo)
union
(select * from #pffblo)
select *
from #final
drop table #lo
drop table #pffblo
drop table #final
if this the form to union the rows form those temp tables to this final one. Then is this correct way to show all those rows that were thus unioned. When I do this union I get message number of columns in union need to match number of columns selected in union

I think you're using a union the wrong way. A union is used when you have to datasets that are the same structure and you want to put them into one dataset.
e.g.:
CREATE TABLE #Table1
(
col1 BIGINT
)
CREATE TABLE #Table2
(
col1 BIGINT
)
--populate the temporary tables
CREATE TABLE #Final
(
col1 BIGINT
)
INSERT INTO #Final (col1)
SELECT *
FROM #Table1
UNION
SELECT *
FROM #Table2
drop table #table1
drop table #table2
drop table #Final
I think what you're trying to do is get 1 data set with the count of all your tables in it. Union won't do this.
The easiest way (although not very performant) would be to do select statements like the following:
CREATE TABLE #Table1
(
col1 BIGINT
)
CREATE TABLE #Table2
(
col1 BIGINT
)
--populate the temporary tables
CREATE TABLE #Final
(
col1 BIGINT,
col2 BIGINT
)
INSERT INTO #Final (col1, col2)
select (SELECT Count(*) FROM #Table1) as a, (SELECT Count(*) FROM #Table2) as b
select * From #Final
drop table #table1
drop table #table2
drop table #Final

It appears that you want to take the values from each of temp tables and then place then into a single row of data. This is basically a PIVOT, you can use something like this:
create table #final
(
countreject bigint
, Countaccept bigint
.....
)
insert into #final (Countreject, Countaccept....more rows here...)
select
from
(
select count(*) value, 'Countreject' col -- your UNION ALL's here
from #lo
union all
select count(*) value, 'countaccept' col
from #pffblo
) x
pivot
(
max(value)
for col in ([Countreject], [countaccept])
) p
Explanation:
You will create a subquery similar to this that will contain the COUNT for each of your individual temp table. There are two columns in the subquery, one column contains the count(*) from the table and the other column is the name of the alias:
select count(*) value, 'Countreject' col
from #lo
union all
select count(*) value, 'countaccept' col
from #pffblo
You then PIVOT these values to insert into your final temp table.
If you do not want to use PIVOT, then you can use a CASE statement with an aggregate function:
insert into #final (Countreject, Countaccept....more rows here...)
select max(case when col = 'Countreject' then value end) Countreject,
max(case when col = 'countaccept' then value end) countaccept
from
(
select count(*) value, 'Countreject' col -- your UNION ALL's here
from #lo
union all
select count(*) value, 'countaccept' col
from #pffblo
) x
Or you might be able to JOIN all of the temp tables similar to this, where you create a row_number() for the one record in the table and then you join the tables with the row_number():
insert into #final (Countreject, Countaccept....more rows here...)
select isnull(lo.Countreject, 0) Countreject,
isnull(pffblo.Countaccept, 0) Countaccept
from
(
select count(*) Countreject,
row_number() over(order by (SELECT 0)) rn
from #lo
) lo
left join
(
select count(*) Countaccept,
row_number() over(order by (SELECT 0)) rn
from #pffblo
) pffblo
on lo.rn = pffblo.rn

SELECT *
INTO #1
FROM TABLE2
UNION
SELECT *
FROM TABLE3
UNION
SELECT *
FROM TABLE4

If you would like to get count for each temporary table in the resulting table, you will need just to calculate it for each column in subquery:
INSERT INTO result (col1, col2,...
SELECT
(SELECT COUNT() FROM tbl1) col1
,(SELECT COUNT() FROM tbl2) col2
..

Delete multiple duplicate rows in table

I have multiple groups of duplicates in one table (3 records for one, 2 for another, etc) - multiple rows where more than 1 exists.
Below is what I came up with to delete them, but I have to run the script for however many duplicates there are:
set rowcount 1
delete from Table
where code in (
select code from Table
group by code
having (count(code) > 1)
)
set rowcount 0
This works well to a degree. I need to run this for every group of duplicates, and then it only deletes 1 (which is all I need right now).

If you have a key column on the table, then you can use this to uniquely identify the "distinct" rows in your table.
Just use a sub query to identify a list of ID's for unique rows and then delete everything outside of this set. Something along the lines of.....
create table #TempTable
(
ID int identity(1,1) not null primary key,
SomeData varchar(100) not null
)
insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData3')
insert into #TempTable(SomeData) values('someData4')
select * from #TempTable
--Records to be deleted
SELECT ID
FROM #TempTable
WHERE ID NOT IN
(
select MAX(ID)
from #TempTable
group by SomeData
)
--Delete them
DELETE
FROM #TempTable
WHERE ID NOT IN
(
select MAX(ID)
from #TempTable
group by SomeData
)
--Final Result Set
select * from #TempTable
drop table #TempTable;
Alternatively you could use a CTE for example:
WITH UniqueRecords AS
(
select MAX(ID) AS ID
from #TempTable
group by SomeData
)
DELETE A
FROM #TempTable A
LEFT outer join UniqueRecords B on
A.ID = B.ID
WHERE B.ID IS NULL

It is frequently more efficient to copy unique rows into temporary table,
drop source table, rename back temporary table.
I reused the definition and data of #TempTable, called here as SrcTable instead, since it is impossible to rename temporary table into a regular one)
create table SrcTable
(
ID int identity(1,1) not null primary key,
SomeData varchar(100) not null
)
insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData3')
insert into SrcTable(SomeData) values('someData4')
by John Sansom in previous answer
-- cloning "unique" part
SELECT * INTO TempTable
FROM SrcTable --original table
WHERE id IN
(SELECT MAX(id) AS ID
FROM SrcTable
GROUP BY SomeData);
GO;
DROP TABLE SrcTable
GO;
sys.sp_rename 'TempTable', 'SrcTable'

You can alternatively use ROW_NUMBER() function to filter out duplicates
;WITH [CTE_DUPLICATES] AS
(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY SomeData ORDER BY SomeData)
FROM #TempTable
)
DELETE FROM [CTE_DUPLICATES] WHERE RN > 1

SET ROWCOUNT 1
DELETE Table
FROM Table a
WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1
WHILE ##rowcount > 0
DELETE Table
FROM Table a
WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1
SET ROWCOUNT 0
this will delete all duplicate rows, But you can add attributes if you want to compare according to them .

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

deleting duplicate rows? - sql

For SQL Server you can do it: WITH MyTableCTE (CategoryId, RowNumber) AS ( SELECT CategoryId, ROW_NUMBER() OVER (ORDER BY CategoryId) AS 'RowNumber' FROM MyTable ) Delete From MyTableCTE Where RowNumber > 1

Do a select distinct into a new table, delete the old one and rename the new one into old table name.

If your rows have a distinct id column, then this should work: DELETE t1 FROM your_table t1, your_table t2 WHERE t1.column1 = t2.column1 AND t1.column2 = t2.column2 AND ... /* check equality of all relevant columns */ AND t1.id < t2.id

Check here for sql server - http://support.microsoft.com/kb/139444 - that should get you started.

This is probably heavy-handed but perhaps you could select distinct * into a temp table, then truncate the table, then insert into the table the contents of the temp table. Foreign key constraints may prevent this, though.

For SqlServer, you could use a cursor to loop through all items, ordered by that categoryID. Is the current ID the same as the previous one? Then delete it, see example C of this article. Else remember the ID for the next round.

Related

Looping over the insert data into multiple tables

SQL Insert row when a cell does not contain a specific set of characters

Inserting temp table values into a table.

Union temporary tables to a final temporary table

Delete multiple duplicate rows in table

Categories

Resources