I have an insert query that goes into a table linked to QuickBooks. The table Test_InvoiceLine has a lot of Part_ID's and Descriptions that are exactly the same.
INSERT INTO InvoiceLine (Part_ID, More_Info )
SELECT Part_ID, Description
FROM Test_InvoiceLine;
How can I write a query that goes into the InvoiceLine table and deletes duplicates with the same Part_ID and Description that are already there?
Use a CTE as a temporary result set. You essentially query for duplicate records that are partitioned by your criteria, and delete the records that are not the first record (keeps the original record).
Double-check me before you blow anything away, because it has been a while since I've done this, but this should work.
WITH CTE AS(
SELECT Part_ID, Description,
RowNum = ROW_NUMBER()OVER(PARTITION BY Part_ID, Description ORDER BY Part_ID, Description)
FROM Test_InvoiceLine
)
DELETE FROM CTE WHERE RowNum > 1
Source: How to delete duplicate records in SQL Server
Is there a reason why you need to do an insert then delete? If you're trying to insert -> delete, it sounds like you're actually trying to update the data by Part_ID.
update InvoiceLine a
set More_Info = b.Description
from ( select Part_ID,
Description
from Test_InvoiceLine ) b
where b.Part_ID = a.Part_ID
This will update More_Info in the InvoiceLine table with Description.
This is SQL Server, however, ROW_NUMBER is widely used in other RDMS.
Here is the query you need.
;WITH Data AS
(
SELECT
Part_ID,Description,
RowNumber = ROW_NUMBER() OVER(PARTITION BY Part_Id,Description ORDER BY Part_Id,Description)
FROM Test_InvoiceLine
)
DELETE FROM Data WHERE RowNumber > 1
I don't know how More_Info will make a difference here as the duplicate key does not include it, according to your post, however, if you need to inspect the more_info values in the delete statement then perhaps you could use something similar to the query below.
;WITH Data AS
(
SELECT
More_Info,
Part_ID,Description,
RowNumber = ROW_NUMBER() OVER(PARTITION BY Part_Id,Description ORDER BY Part_Id,Description)
FROM Test_InvoiceLine
)
DELETE T
FROM Test_InvoiceLine T
INNER JOIN Data D ON D.RowNumber > 1 AND D.MoreInfo = "Y" AND D.Part_Id = T.Part_ID
Related
I have the following SQL Syntax to delete duplicate rows, but never are any rows affected.
DELETE FROM content_stacks WHERE id NOT IN (
SELECT id
FROM content_stacks
GROUP BY user_id, content_id
);
The subquery itself is returning the id list of first entries correctly.
SELECT id
FROM content_stacks
GROUP BY user_id, content_id
When I'm inserting the results list as a string it is working, too:
DELETE FROM content_stacks WHERE id NOT IN (239,231,217,218,219,232,233,220,230,226,234,235,224,225,221,223,222,227,228,229,236,237,238,216,208,209,210,204,211,212,242,203,240,201,241,205,206,207,213,214,215);
I checked many similar examples and this should be working in my opinion. What am I missing?
First find first rows using ROW_NUMBER Then delete record with row number greater than 1:
WITH CTE AS (
SELECT id , ROW_NUMBER() OVER(PARTITION BY user_id, content_id, ORDER BY id) rn
FROM content_stacks
)
DELETE cs
FROM content_stacks cs
INNER JOIN CTE ON CTE.id = cs.id
WHERE rn > 1
Am sorry to ask but if your deleting why would u need to group the records.
Are not just increasing the runtime.
The code from Meyssam Toluie is not working as it is but I made a similar solution with the same idea with rownumbers:
DELETE FROM content_stacks WHERE id IN
(SELECT id FROM (
SELECT id, ROW_NUMBER() OVER(PARTITION BY user_id, content_id)row_num
FROM content_stacks
) sub
WHERE row_num > 1)
This is working for me now.
My first command did not work because: The group by command does not show all ids in the output, but they are still there, so in fact all ids were returned in the NOT IN id-list. The row number seems to be the easiest way for this problem.
How do I delete specific record from multiple duplicates
below is the table for eg
This is just one of the example and we have many cases like this. From this table I need to delete rank 2 and 3.
Kindly suggest me best way to identify duplicate records and delete the specific rows
This should work
delete
from <your table> t
where rank != (select top(rank)
from <your table> tt
where tt.emp_id = t.emp_id
order by rank desc --put asc if you want to keep the lowest rank
)
group by t.emp_id
I do not encourage record deleting but this solution can help with expiring records or deleting them:
The table should have a unique ID and a field that allows you to identify that the record has been expired. If it does not, I recommend adding it to the table. You can creating a composite ID in your query but down the road you will wish you had these attributes.
Create a query that identifies every record where the RANK <> 1. This will be your subquery.
Write your UPDATE query
UPDATE A
SET [EXPIRE_DTTM] = GETDATE()
FROM *TableNameWithTheRecords* A
INNER JOIN (*SubQuery*) B ON A.UniqueID = B.UniqueID
**If you truly want to delete the records, use this:
DELETE FROM *TableNameWithTheRecords*
WHERE *UniqueID* = (SELECT *UniqueID* FROM *TableNameWithTheRecords* WHERE RANK <> 1)
WITH tbl_alias AS
(
SELECT emp_ID,
RN = ROW_NUMBER() OVER(PARTITION BY emp_ID ORDER BY emp_ID)
FROM tblName
)
DELETE FROM tbl_alias WHERE RN > 1
My issue is how do we delete a primary key row in case it is duplicated. The other fields may/may not be duplicates. I am interested only in the primary key being duplicated and would like to retain the first instance while deleting the other duplicate entries.
For example,
I have 2 tables with the following data:
Table1:- Portfolio
Columns:- PortfolioID(PK), PortfolioName
Sample data :-
1, North America
2, Europe
3, Asia
Table2:- Account
Columns:- AccountID(PK), PortfolioID(FK), AccountName
Sample data :-
1,1,Quake
1,1,Wind
2,1,Fire
3,1,Quake
4,2,Flood
5,2,Wind
Lets say for PortfolioID = 1,
I am trying to delete row number 2 from the Account table where the AccountID 1 is repeated for PortfolioID =1. I have tried using the CTE expression where I use the ROW_NUMBER statement and try to delete ROWNUMBER <> 1. But this query doesn't work as it deletes all the rows in the table.
The query I tried:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [Account].[AccountID] ORDER BY [Account].[AccountID]) AS [ROWNUMBER],
[Account].[AccountID]
FROM [Account]
INNER JOIN [Portfolio] ON [Portfolio].[PortfolioID] = [Account]. [PortfolioID]
WHERE [Portfolio].[PortfolioID] = 1
)
DELETE [Account]
FROM [CTE]
WHERE [ROWNUMBER] <> 1
Am I doing something wrong in the query? Thanks in advance for the help.
Firstly, if you define the AccountID column as the primary key in your database, this going forward will help solve having these kinds of problems.
Secondly, are you using Sql Server? Which version?
Assuming you are using Sql Server and a recent version which allows you to use windowing, you can try something like this to delete any duplicates that you have.
This will delete ALL copies of ALL duplicates:
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY AccountID,PortfolioID)
FROM Account)
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
This alternative script will keep one of the duplicates if that is what you prefer:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
)
DELETE FROM CTE WHERE RN<>1
Finally, if you want to only delete duplicates for Portfolio Id 1:
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY AccountID,PortfolioID ORDER BY AccountID,PortfolioID) AS RN
FROM Account
Where PortfolioID = 1
)
DELETE FROM CTE WHERE RN<>1
Primary key column never ever support duplicate entries.
Try with the below query for the desired result based on the given data/inputs.
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY a.[AccountID],a.PortfolioID ORDER BY a.[AccountID]) AS [ROWNUMBER],*
FROM [Account] a
WHERE a.[PortfolioID] = 1
)
DELETE
FROM [CTE]
WHERE [ROWNUMBER] > 1
I needed an advice regarding a SQL statement that has to run with DB2 and Oracle.
Some time ago a database table has been set up without an ID column. Adding the ID column is not the problem but I absolutely need to fill it with the row number of each row.
I found out, that rank() would be perfect but here I'm not able to select for specific values because then I always get the value '1'.
When I set up an intermediate table as described below, I output all data, that I need
WITH MY_TEMP_TABLE AS
(
SELECT RANK() OVER (ORDER BY CODE ASC) MY_ROW, CODE, LAND
FROM SECOND_TABLE
)
SELECT *
FROM SECOND_TABLE
INNER JOIN MY_TEMP_TABLE ON SECOND_TABLE.CODE=MY_TEMP_TABLE.CODE
How is it possible to update the ID column in the database table (here: SECOND_TABLE) with the values in MY_ROW?
Thanks a lot...
Use row_number() instead of rank():
WITH MY_TEMP_TABLE AS
(
SELECT row_number() OVER (ORDER BY CODE ASC) MY_ROW, CODE, LAND
FROM SECOND_TABLE
)
SELECT *
FROM SECOND_TABLE
INNER JOIN MY_TEMP_TABLE ON SECOND_TABLE.CODE=MY_TEMP_TABLE.CODE
I want to know the way we can remove duplicate records where PK is uniqueidentifier.
I have to delete records on the basis of duplicate values in a set of fields.we can use option to get temptable using Row_Number() and except row number one we can delete rest or the records.
But I wanted to build one liner query. Any suggestion?
You can use CTE to do this, without seeing your table structure here is the basic SQL
;with cte as
(
select *, row_number() over(partition by yourfields order by yourfields) rn
from yourTable
)
delete
from cte
where rn > 1
delete from table t using table ta where ta.dup_field=t.dup_field and t.pk >ta.pk
;