Delete Duplicate SQL Records - sql

What is the simplest way to delete records with duplicate name in a table? The answers I came across are very confusing.
Related:
Removing duplicate records from table

I got it! Simple and it worked great.
delete
t1
from
tTable t1, tTable t2
where
t1.locationName = t2.locationName and
t1.id > t2.id
http://www.cryer.co.uk/brian/sql/sql_delete_duplicates.htm

SQL Server 2005:
with FirstKey
AS
(
SELECT MIN(ID), Name, COUNT(*) AS Cnt
FROM YourTable
GROUP BY Name
HAVING COUNT(*) > 1
)
DELETE YourTable
FROM YourTable YT
JOIN FirstKey FK ON FK.Name = YT.Name AND FK.ID != YT.ID

Related

How to keep only the first unique element and delete its duplicates in oracle?

So this query looks like a very easy by its statement but in actual isnt that easy.
Heres my code what ive tried.
Delete from table where id
In (select id from (select
id, row_number()
over(partition by id)
rn from table where rn>1)
The above can work but thats not standard sql for almost all databases like partition by may not be supported in most of the other databases. What i was trying was below is it possible using group by. I tried below but i am not sure this will work or not. Any suggestions and which one is optimized
//using group by
Delete from table where id
In (select id from(select
id from table
Group by id
Having sum(1)>1)
)
As question says
delete its duplicates in oracle
then
delete from your_table a
where a.rowid > (select min(b.rowid)
from your_table b
where b.id = a.id
);
You can use exists as follows:
Delete from your_table t
Where exists (select 1 from your_table t1
Where t1.id = t.id
And t1.rowid > t.rowid)

SQL Server 2008 Delete Records from Self-Referencing Table in correct order [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
SQL: DELETE data from self-referencing table in specific order
I need to delete a sub set of records from a self-referencing table in SQL Server 2008. I am trying to do the following but it is does not like the order by.
WITH SelfReferencingTable (ID, depth)
AS
(
SELECT id, 0 as [depth]
FROM dbo.Table
WHERE parentItemID IS NULL AND [t].ColumnA = '123'
UNION ALL
SELECT [t].ID, [srt].[depth] + 1
FROM dbo.Table t
INNER JOIN SelfReferencingTable srt ON [t].parentItemID = [srt].id
WHERE [t].ColumnA = '123'
)
DELETE y FROM dbo.Table y
JOIN SelfReferencingTable x on x.ID = y.id
ORDER BY x.depth DESC
Any ideas why this isn't working?
You don't need to delete them in any particular order.
Just delete them all in one operation. The constraints are checked at the end of the statement.
Assuming that your recursive CTE code correctly identifies all the rows that you need to delete you can just use
WITH SelfReferencingTable (ID)
AS
(
SELECT id
FROM dbo.Table
WHERE parentItemID IS NULL AND [t].ColumnA = '123'
UNION ALL
SELECT [t].ID
FROM dbo.Table t
INNER JOIN SelfReferencingTable srt ON [t].parentItemID = [srt].id
WHERE [t].ColumnA = '123'
)
DELETE
FROM dbo.Table
WHERE ID IN (SELECT id FROM SelfReferencingTable )
A delete clause doesn't allow an order by. The delete is a single atomic operation, and it does not matter in which order the steps of the delete operation are completed.
An common solution is to delete rows with no dependent rows in a loop:
while 1=1
begin
delete yt
from YourTable yt
where where ColumnA = '123'
and not exists
(
select *
from YourTable yt2
where yt2.parentItemId = yt.id
)
if ##rowcount = 0
break
end

SQL - using a value in a nested select

Hope the title makes some kind of sense - I'd basically like to do a nested select, based on a value in the original select, like so:
SELECT MAX(iteration) AS maxiteration,
(SELECT column
FROM table
WHERE id = 223652
AND iteration = maxiteration)
FROM table
WHERE id = 223652;
I get an ORA-00904 invalid identifier error.
Would really appreciate any advice on how to return this value, thanks!
It looks like this should be rewritten with a where clause:
select iteration,
col
from tbl
where id = 223652
and iteration = (select max(iteration) from tbl where id = 223652);
You can circumvent the problem alltogether by placing the subselect in an INNER JOIN of its own.
SELECT t.iteration
, t.column
FROM table t
INNER JOIN (
SELECT id, MAX(iteration) AS iteration
FROM table
WHERE id = 223652
) tm ON tm.id = t.id AND tm.iteration = t.iteration
Since you're using Oracle, I'd suggest using analytic functions for this:
SELECT * FROM (
SELECT col,
iteration,
row_number() over (partition by id order by iteration desc) rn
FROM tab
WHERE id = 223652
) WHERE rn = 1
do it like this:
with maxiteration as
(
SELECT MAX(iteration) AS maxiteration
FROM table
WHERE id = 223652
)
select
column,
iteration
from
table
where
id = 223652
AND iteration = maxiteration
;
Not 100% sure on Oracle syntax, but isn't it something like:
select iteration, column from table where id = 223652 order by iteration desc limit 1
I would approach this problem in a slightly different way. You're basically looking for the row that has no other iterations greater than it. There are at least 3 ways I can think of to do this:
SELECT
T1.iteration AS maxiteration,
T1.column
FROM
Table T1
WHERE
T1.id = 223652 AND
NOT EXISTS
(
SELECT *
FROM Table T2
WHERE
T2.id = 223652 AND
T2.iteration > T1.iteration
)
Or...
SELECT
T1.iteration AS maxiteration,
T1.column
FROM
Table T1
LEFT OUTER JOIN Table T2 ON
T2.id = T1.id AND
T2.iteration > T1.iteration
WHERE
T1.id = 223652 AND
T2.id IS NULL
Or...
SELECT
T1.iteration AS maxiteration,
T1.column
FROM
Table T1
INNER JOIN (SELECT id, MAX(iteration) AS maxiteration FROM Table T2 GROUP BY id) SQ ON
SQ.id = T1.id AND
SQ.maxiteration = T1.iteration
WHERE
T1.id = 223652
EDIT: I didn't see the ORA error the first time reading the question and it wasn't tagged as Oracle specific. I think that there may be some differences in the syntax and use of aliases in Oracle, so you may need to tweak some of the above queries.
The Oracle error is telling you that it doesn't know what maxiteration is, because the column alias isn't available yet inside the subquery. You need to refer to it by the table alias and column name instead of the column alias I believe.
You do something like
select maxiteration,column from table a join (select max(iteration) as maxiteration from table where id=1) b using (id) where b.maxiteration=a.iteration;
This could of course return multiple rows for one maxiteration unless your table has a constraint against it.

How to remove duplicate records in a table?

I've got a table in a testing DB that someone apparently got a little too trigger-happy on when running INSERT scripts to set it up. The schema looks like this:
ID UNIQUEIDENTIFIER
TYPE_INT SMALLINT
SYSTEM_VALUE SMALLINT
NAME VARCHAR
MAPPED_VALUE VARCHAR
It's supposed to have a few dozen rows. It has about 200,000, most of which are duplicates in which TYPE_INT, SYSTEM_VALUE, NAME and MAPPED_VALUE are all identical and ID is not.
Now, I could probably make a script to clean this up that creates a temporary table in memory, uses INSERT .. SELECT DISTINCT to grab all the unique values, TRUNCATE the original table and then copy everything back. But is there a simpler way to do it, like a DELETE query with something special in the WHERE clause?
You don't give your table name but I think something like this should work. Just leaving the record which happens to have the lowest ID. You might want to test with the ROLLBACK in first!
BEGIN TRAN
DELETE <table_name>
FROM <table_name> T1
WHERE EXISTS(
SELECT * FROM <table_name> T2
WHERE
T1.TYPE_INT = T2.TYPE_INT AND
T1.SYSTEM_VALUE = T2.SYSTEM_VALUE AND
T1.NAME = T2.NAME AND
T1.MAPPED_VALUE = T2.MAPPED_VALUE AND
T2.ID > T1.ID
)
SELECT * FROM <table_name>
ROLLBACK
here is a great article on that: Deleting duplicates, which basically uses this pattern:
WITH q AS
(
SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn
FROM t_duplicate d
)
DELETE
FROM q
WHERE rn > 1
SELECT *
FROM t_duplicate
WITH Duplicates(ID , TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE )
AS
(
SELECT Min(Id) ID TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE
FROM T1
GROUP BY TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE
HAVING Count(Id) > 1
)
DELETE FROM T1
WHERE ID IN (
SELECT T1.Id
FROM T1
INNER JOIN Duplicates
ON T1.TYPE_INT = Duplicates.TYPE_INT
AND T1.SYSTEM_VALUE = Duplicates.SYSTEM_VALUE
AND T1.NAME = Duplicates.NAME
AND T1.MAPPED_VALUE = Duplicates.MAPPED_VALUE
AND T1.Id <> Duplicates.ID
)

SQL - Query to return result

There is a table with Columns as below:
Id : long autoincrement;
timestamp:long;
price:long
Timestamp is given as a unix_time in ms.
Question: what is the average time difference between the records ?
First thought is a sub-query grabbing the record immediately previous:
SELECT timestamp -
(select top 1 timestamp from Table T1 where T1.Id < Table.Id order by Id desc)
FROM Table
Then you can take the average of that:
SELECT AVG(delta)
from (SELECT timestamp -
(select top 1 timestamp from Table T1 where T1.Id < Table.Id order by Id desc) as delta
FROM Table) T
There will probably need to be some handling of the null that results for the first row, but I haven't tested to be sure.
In SQL Server, you could write something like that to get that information:
SELECT
t1.ID, t2.ID,
DATEDIFF(MILLISECOND, t2.PriceTime, test2.PriceTime)
FROM table t1
INNER JOIN table t2 ON t2.ID = t1.ID-1
WHERE t1.ID > (SELECT MIN(ID) FROM table)
and if you're only interested in the AVG across all entries, you could use:
SELECT
AVG(DATEDIFF(MILLISECOND, t2.PriceTime, test2.PriceTime))
FROM table t1
INNER JOIN table t2 ON t2.ID = t1.ID-1
WHERE t1.ID > (SELECT MIN(ID) FROM table)
Basically, you need to join the table with itself, and use "t1.ID = t2.ID-1" to associate item no. 2 in one table with item no. 1 in the other table and then calculate the time difference between the two. In order to avoid accessing item no. 0 which doesn't exist, use the "T1.ID > (SELECT MIN(ID) FROM table)" clause to start from the second item.
Marc
At a guess:
SELECT AVG(timestamp)
I think you need to provide more information in your question for us to help.
If you mean difference between each-other row:
select AVG(x) from (
select a.timestamp - b.timestamp as x
from table a, table b -- this multiplies a*b ) sub
SELECT AVG(T2.Timestamp - T1.TimeStamp)
FROM Table T1
JOIN Table T2 ON T2.ID = T1.ID + 1
try this
Select Avg(E.Timestamp - B.Timestamp)
From Table B Join Table E
On E.Timestamp =
(Select Max(Timestamp)
From Table
Where Timestamp < R.Timestamp)