I'm using MariaDB, and I am trying to make two things, both are failing.
(1) I'm trying to delete all duplicated items, but maintaining one record.
WITH CTE AS (
SELECT asin, ROW_NUMBER() OVER (PARTITION BY asin ORDER BY created_at) AS n
FROM asin_list
)
DELETE
FROM CTE
WHERE n > 1
This returns the following error:
You have an error in your SQL syntax; check the manual that
corresponds to your MariaDB.
(2) As a workaround from above query I was trying to insert all duplicated ASINs into a table, having as a goal to select max(asin) later on and delete it.
WITH CTE AS (
SELECT asin, ROW_NUMBER() OVER (PARTITION BY asin ORDER BY created_at) AS n
FROM asin_list
)
INSERT INTO temp1 *
FROM FROM CTE
WHERE n > 1
But this returns the same error. Can you please, help me fixing this?
You could write the statement as:
select * -- delete
from asin_list as newer
where exists (
select *
from asin_list as older
where older.asin = newer.asin and (
older.created_at < newer.created_at or
older.created_at = newer.created_at and older.pri_key < newer.pri_key
)
)
Try to add “;” before “WITH”. Something like:
;WITH CTE AS ( SELECT asin , row_number() OVER(PARTITION BY asin ORDER BY asin_list.created_at) AS n FROM asin_list ) delete from CTE WHERE n > 1
Let me know
Related
As you can see below, I'm able to select all the row_numbers that are duplicates. I identified them using a window function ROW_NUMBER()
Although I want to delete them from the database.
How can I change my code to remove the duplicates identified, as I'm currently getting an error
WITH RowNumCTE AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY ParcelID,
PropertyAddress,
SalePrice,
SaleDate,
LegalReference
ORDER BY
UniqueID
) row_num
FROM housing_data
)
SELECT *
FROM RowNumCTE
WHERE row_num > 1
Duplicates are identified as having a row_number greater than 1.
Thanks
I found the solution. I used
DELETE FROM housing_data
WHERE ROWID NOT IN (
SELECT MIN(ROWID)
FROM housing_data
GROUP BY ParcelID, PropertyAddress, SalePrice, SaleDate, LegalReference
);
I've found answers online, but none with specifically what I am doing. I couldn't get anything to work.
I have a select that randomly selects records and I just want to be able to have it insert into a table instead.
My SQL is
with data as (
select *, row_number() over (partition by DIVISION order by DIVISION) as rn
from WORK
)
select *
from data
where rn <= #randomNumber or (rn - #randomNumber) % 18 = 1 AND DIVISION != 4;
I know I am not supposed to do the partition by DIVISION order by DIVISION but that's a separate issue I believe.
I just need to be able to insert this data into another table WORK_CLEAN
You just need to add an insert statement.
with data as (
select *, row_number() over (partition by DIVISION order by DIVISION) as rn
from WORK
)
insert yourTable ([ColumnsHere])
select *
from data
where rn <= #randomNumber or (rn - #randomNumber) % 18 = 1 AND DIVISION != 4;
I have query with duplicates. And now I need to build query without duplicates. I'm trying to do it, but my query need long time. My query with duplicates:
SELECT
c.*
FROM
Clients c
INNER JOIN
(
SELECT
iin,
COUNT(iin) AS countIIN
FROM
Clients
GROUP BY
iin
HAVING
COUNT(iin) > 1
) cc
ON c.IIN = cc.IIN
ORDER BY
c.last_name DESC
I need above anti-query.
You can use below query to find only unique record.
WITH CTE AS
(SELECT *, COUNT(IIN) OVER (PARTITION BY IIN) RECORDCOUNT FROM CLIENTS)
SELECT * FROM CTE WHERE RECORDCOUNT =1
make sure * should be replace in query with required column.
Also if you want to fetch unique record from duplicate list as well then you can choose below query
WITH CTE AS
(SELECT *, RECORD_NUMBER() OVER (PARTITION BY IIN ORDER BY IIN) RECORDCOUNT FROM CLIENTS)
SELECT * FROM CTE WHERE RECORDCOUNT =1
To find duplicates in SQL Row_Number() function is best option,
Please check following query
WITH [CTE NoDuplicates] AS
(
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY iin ORDER BY c.last_name DESC),
*
FROM Clients
)
DELETE FROM [CTE DUPLICATE] WHERE RN = 1
I have the following SQL select. How can I convert it to a delete statement so it keeps 1 of the rows but deletes the duplicate?
select s.ForsNr, t.*
from [testDeleteDublicates] s
join (
select ForsNr, period, count(*) as qty
from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
) t on s.ForsNr = t.ForsNr and s.Period = t.Period
Try using following:
Method 1:
DELETE FROM Mytable WHERE RowID NOT IN (SELECT MIN(RowID) FROM Mytable GROUP BY Col1,Col2,Col3)
Method 2:
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY ForsNr, period
ORDER BY ( SELECT 0)) RN
FROM testDeleteDublicates)
DELETE FROM cte
WHERE RN > 1
Hope this helps!
NOTE:
Please change the table & column names according to your need!
This is easy as long as you have a generated primary key column (which is a good idea). You can simply select the min(id) of each duplicate group and delete everything else - Note that I have removed the having clause so that the ids of non-duplicate rows are also excluded from the delete.
delete from [testDeleteDublicates]
where id not in (
select Min(Id) as Id
from [testDeleteDublicates]
group by ForsNr, period
)
If you don't have an artificial primary key you may have to achieve the same effect using row numbers, which will be a bit more fiddly as their implementation varies from vendor to vendor.
You can do with 2 option.
Add primary-key and delete accordingly
http://www.mssqltips.com/sqlservertip/1103/delete-duplicate-rows-with-no-primary-key-on-a-sql-server-table/
'2. Use row_number() with partition option, runtime add row to each row and then delete duplicate row.
Removing duplicates using partition by SQL Server
--give group by field in partition.
;with cte(
select ROW_NUMBER() over( order by ForsNr, period partition ForsNr, period) RowNo , * from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
)
select RowNo from cte
group by ForsNr, period
I am using SQL Server 2008 R2.
I found duplicate rows with this script:
SELECT CLDest, CdClient,
COUNT(CLDest) AS NumOccurrences
FROM DEST
GROUP BY CLDest,CdClient
HAVING ( COUNT(CLDest) > 1 )
It return 48 entries
Before I delete I have to make sure that I delete the doubles:
SELECT DEST.CdClient
,DEST.CLDest
FROM [Soft8Exp_Client_WEB].[dbo].[DEST]
WHERE DEST.CdClient IN (SELECT CdClient
FROM DEST
GROUP BY CdClient
HAVING (COUNT(CLDest) > 1) )
AND DEST.CLDest IN (SELECT CLDest
FROM DEST
GROUP BY CLDest
HAVING (COUNT(CLDest) > 1) )
This query returns 64628 entries
So I suppose my select is wrong.
SQL Server has the nice property of updatable CTEs. When combined with the function row_number(), this does what you want:
with todelete as (
select d.*,
row_number() over (partition by CLDest, CdClient order by newid()) as seqnum
from dest d
)
delete from todelete
where seqnum > 1;
This version will randomly delete one of the duplicates. What it does is assign a sequential number to the rows with the same value and delete all but the first one found. If you want to keep something by date, then use a different expression in the order by.
;WITH Duplicates
AS
(
SELECT CLDest
, CdClient
, ROW_NUMBER() OVER (PARTITION BY CLDest, CdClient ORDER BY CdClient) AS Rn
FROM DEST
)
DELETE FROM Duplicates
WHERE RN > 1
SELECT DEST.CdClient,DEST.CLDest
FROM [Soft8Exp_Client_WEB].[dbo].[DEST]
WHERE DEST.CdClient+DEST.CLDest
IN (
SELECT CdClient+CLDest FROM DEST GROUP BY CLDest HAVING ( COUNT(CLDest) > 1 )
)