SQL how to remove duplicate records - sql

How can I clean up a table by removing the duplicate records?
+----------+--------+------------+
| clientID | status | Insertdate |
+----------+--------+------------+
| 1 | new | 20191206 |
| 1 | new | 20191206 |
| 2 | old | 20191206 |
| 2 | old | 20191206 |
| 3 | new | 20191205 |
| 3 | new | 20191205 |
+----------+--------+------------+
I don't have any identity field.

Please find the below query. You can use Row Number.
;WITH cte as (
select clientid
, status, Insertdate
, ROW_NUMBER() over (partition by clientid, status, Insertdate order by clientid) RowNumber
from Yourtable
)
delete from cte where RowNumber > 1

Hope this will help if you are running MySQL database
SELECT clientID, status, Insertdate, count(*)
FROM table_name
GROUP BY clientID, status, Insertdate
having count(*) > 1

Related

SQL SERVER How to select the latest record in each group? [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 2 years ago.
| ID | TimeStamp | Item |
|----|-----------|------|
| 1 | 0:00:20 | 0 |
| 1 | 0:00:40 | 1 |
| 1 | 0:01:00 | 1 |
| 2 | 0:01:20 | 1 |
| 2 | 0:01:40 | 0 |
| 2 | 0:02:00 | 1 |
| 3 | 0:02:20 | 1 |
| 3 | 0:02:40 | 1 |
| 3 | 0:03:00 | 0 |
I have this and I would like to turn it into
| ID | TimeStamp | Item |
|----|-----------|------|
| 1 | 0:01:00 | 1 |
| 2 | 0:02:00 | 1 |
| 3 | 0:03:00 | 0 |
Please advise, thank you!
A correlated subquery is often the fastest method:
select t.*
from t
where t.timestamp = (select max(t2.timestamp)
from t t2
where t2.id = t.id
);
For this, you want an index on (id, timestamp).
You can also use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by timestamp desc) as seqnum
from t
) t
where seqnum = 1;
This is typically a wee bit slower because it needs to assign the row number to every row, even those not being returned.
You need to group by id, and filter out through timestamp values descending in order to have all the records returning as first(with value 1) in the subquery with contribution of an analytic function :
SELECT *
FROM
(
SELECT *,
DENSE_RANK() OVER (PARTITION BY ID ORDER BY TimeStamp DESC) AS dr
FROM t
) t
WHERE t.dr = 1
where DENSE_RANK() analytic function is used in order to include records with ties also.

Delete first rows after qualified ones

Let's suppose that I have the following table called Orders:
---------------------------------
| OrderId | Status | CustomerId |
---------------------------------
| 1 | + | 2 |
---------------------------------
| 2 | - | 1 |
---------------------------------
| 3 | + | 2 |
---------------------------------
| 4 | + | 1 |
---------------------------------
| 5 | - | 3 |
---------------------------------
| 6 | + | 4 |
---------------------------------
| 7 | + | 3 |
---------------------------------
The question is how I can delete the next order after cancelled one for each customer? I basically want to delete order with id = 4, 7.
So the result should be:
---------------------------------
| OrderId | Status | CustomerId |
---------------------------------
| 1 | + | 2 |
---------------------------------
| 2 | - | 1 |
---------------------------------
| 3 | + | 2 |
---------------------------------
| 5 | - | 3 |
---------------------------------
| 6 | + | 4 |
---------------------------------
I use SQL Server, but I'm realy curious about writing it using ANSI SQL.
You can get the last cancelled order for each customer. Then delete the orders after that:
with todelete as (
select t.*,
min(case when status = '-' then orderid end) over
(partition by customerid) as deleted_orderid
from table t
)
delete from todelete
where orderid > deleted_orderid;
EDIT:
To delete just the next one, let's use row_number():
with todelete as (
select t.*, min(case when orderid > deleted_orderid then orderid end) over
(partition by customerid) as orderid_to_delete
from (select t.*,
min(case when status = '-' then orderid end) over
(partition by customerid) as deleted_orderid
from table t
) t
)
delete from todelete
where orderid = orderid_to_delete;
EDIT II:
If you want to delete the next order after any delete, the query is a bit simpler:
with todelete as (
select t.*, lag(status) over (partition by customerid order by orderid) as prev_status
from table t
)
delete from todelete
where prev_status = '-';
This is ANSI SQL. If you are using SQL Server 2008, you need to use a correlated subquery or maybe cross apply (I'm not 100% sure that cross apply will work in a delete CTE, but it should.)

Do I need a recursive CTE to update a table that relies on itself?

I need to apologize for the title. I put a lot of thought into it but didn't get too far.
I have a table that looks like this:
+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+--------+
| accountid | pricexxxxxid | accountid | pricelevelid | counts |
+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+--------+
| 36B077D4-E765-4C70-BE18-2ECA871420D3 | 00000000-0000-0000-0000-000000000000 | 36B077D4-E765-4C70-BE18-2ECA871420D3 | F43C47CE-28C6-42E2-8399-92C58ED4BA9D | 1 |
| EBC18CBC-2D2E-44CB-B36A-0ADE9E2BDE9F | 00000000-0000-0000-0000-000000000000 | EBC18CBC-2D2E-44CB-B36A-0ADE9E2BDE9F | 3BEEA9D3-F26B-47E4-88FA-A2AA366980ED | 1 |
| 8DC8D0FC-3138-425A-A922-2F0CAC57E887 | 00000000-0000-0000-0000-000000000000 | 8DC8D0FC-3138-425A-A922-2F0CAC57E887 | F1B8AD5D-B008-4C3F-94A0-AD3F90C777D7 | 1 |
| 8F908A92-1327-4655-BAE4-C890D971A554 | 00000000-0000-0000-0000-000000000000 | 8F908A92-1327-4655-BAE4-C890D971A554 | 2E0EC67E-5F8F-4305-932E-BBF8DF83DBEC | 1 |
| 37221AAC-B885-4002-B7D9-591F8C14D019 | 00000000-0000-0000-0000-000000000000 | 37221AAC-B885-4002-B7D9-591F8C14D019 | F4A2A0CA-FDFF-4C21-AE92-D4583DC18DED | 1 |
| 66F406B4-0D9B-40B8-9A23-119EE74B00B7 | 00000000-0000-0000-0000-000000000000 | 66F406B4-0D9B-40B8-9A23-119EE74B00B7 | 204B8570-CEBA-4C72-9B72-8B9B14AF625E | 2 |
| D0168CE3-479E-439E-967C-4FF0D701291A | 00000000-0000-0000-0000-000000000000 | D0168CE3-479E-439E-967C-4FF0D701291A | 204B8570-CEBA-4C72-9B72-8B9B14AF625E | 2 |
| 57E5F6E5-0A8A-4E54-B793-2F6493DC1EA3 | 00000000-0000-0000-0000-000000000000 | 57E5F6E5-0A8A-4E54-B793-2F6493DC1EA3 | 893F9FD2-43C9-4355-AEFC-08A62BF2B066 | 3 |
+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+--------+
It is sorted by ascending counts.
I would like to update the pricexxxxids that are all 00000000-0000-0000-0000-000000000000 with their corresponding pricelevelid.
For example for accountid = 36B077D4-E765-4C70-BE18-2ECA871420D3 I would like the pricexxxxid to be F43C47CE-28C6-42E2-8399-92C58ED4BA9D.
After that is done, I would like all the records FOLLOWING this one where accountid = 36B077D4-E765-4C70-BE18-2ECA871420D3 to be deleted.
Another words in result I will end up with a distinct list of accountids with pricexxxxid to be assigned with the corresponding value from pricelevelid.
Thank you so much for your guidance.
for your first case do !
update table
set pricexxxxids=pricelevelid.
if i understand your second case correctly :(delete duplicates/select distinct)?
delete from
(
select *,rn=row_number()over(partition by accountid order by accountid) from table
)x
where rn>1
--select distinct * from table
edited
select * from
(
select *,rn=row_number()over(partition by accountid order by accountid) from table
)x
where x.rn=1
updated
SELECT accountid,pricelevelid FROM
(
(SELECT *,
Row_number() OVER ( partition BY accountid ORDER BY counts, pricelevelid ) AS Recency
FROM table
)x
WHERE x.Recency = 1

SELECT only latest record of an ID from given rows

I have this table shown below...How do I select only the latest data of the id based on changeno?
+----+--------------+------------+--------+
| id | data | changeno | |
+----+--------------+------------+--------+
| 1 | Yes | 1 | |
| 2 | Yes | 2 | |
| 2 | Maybe | 3 | |
| 3 | Yes | 4 | |
| 3 | Yes | 5 | |
| 3 | No | 6 | |
| 4 | No | 7 | |
| 5 | Maybe | 8 | |
| 5 | Yes | 9 | |
+----+---------+------------+-------------+
I would want this result...
+----+--------------+------------+--------+
| id | data | changeno | |
+----+--------------+------------+--------+
| 1 | Yes | 1 | |
| 2 | Maybe | 3 | |
| 3 | No | 6 | |
| 4 | No | 7 | |
| 5 | Yes | 9 | |
+----+---------+------------+-------------+
I currently have this SQL statement...
SELECT id, data, MAX(changeno) as changeno FROM Table1 GROUP BY id;
and clearly it doesn't return what I want. This should return an error because of the aggrerate function. If I added fields under the GROUP BY clause it works but it doesn't return what I want. The SQL statement is by far the closest I could think of. I'd appreciate it if anybody could help me on this. Thank you in advance :)
This is typically referred to as the "greatest-n-per-group" problem. One way to solve this in SQL Server 2005 and higher is to use a CTE with a calculated ROW_NUMBER() based on the grouping of the id column, and sorting those by largest changeno first:
;WITH cte AS
(
SELECT id, data, changeno,
rn = ROW_NUMBER() OVER (PARTITION BY id ORDER BY changeno DESC)
FROM dbo.Table1
)
SELECT id, data, changeno
FROM cte
WHERE rn = 1
ORDER BY id;
You want to use row_number() for this:
select id, data, changeno
from (SELECT t.*,
row_number() over (partition by id order by changeno desc) as seqnum
FROM Table1 t
) t
where seqnum = 1;
Not a well formed or performance optimized query but for small tasks it works fine.
SELECT * FROM TEST
WHERE changeno IN (SELECT MAX(changeno)
FROM TEST
GROUP BY id)
for other alternatives :
DECLARE #Table1 TABLE
(
id INT, data VARCHAR(5), changeno INT
);
INSERT INTO #Table1
SELECT 1,'Yes',1
UNION ALL
SELECT 2,'Yes',2
UNION ALL
SELECT 2,'Maybe',3
UNION ALL
SELECT 3,'Yes',4
UNION ALL
SELECT 3,'Yes',5
UNION ALL
SELECT 3,'No',6
UNION ALL
SELECT 4,'No',7
UNION ALL
SELECT 5,'Maybe',8
UNION ALL
SELECT 5,'Yes',9
SELECT Y.id, Y.data, Y.changeno
FROM #Table1 Y
INNER JOIN (
SELECT id, changeno = MAX(changeno)
FROM #Table1
GROUP BY id
) X ON X.id = Y.id
WHERE X.changeno = Y.changeno
ORDER BY Y.id

How to select row with the latest timestamp from duplicated rows in a database table?

I have a table with duplicate & triplicate rows - how do I select the rows that are duplicated but have the latest timestamp as well as the un-duped rows?
-------------------------------------
| pk_id | user_id | some_timestamp |
|-------------------------------------|
| 1 | 123 | 10-Jun-12 14.30 |
| 2 | 123 | 19-Jun-12 21.50 |
| 3 | 567 | 10-Jun-12 09.23 |
| 4 | 567 | 12-Jun-12 09.45 |
| 5 | 567 | 13-Jun-12 08.40 |
| 6 | 890 | 13-Jun-12 08.44 |
-------------------------------------
So that I end up with:
-------------------------------------
| pk_id | user_id | some_timestamp |
|-------------------------------------|
| 2 | 123 | 19-Jun-12 21.50 |
| 5 | 567 | 13-Jun-12 08.40 |
| 6 | 890 | 13-Jun-12 08.44 |
-------------------------------------
SELECT * FROM (
SELECT pk_id,
user_id,
some_timestamp,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY some_timestamp DESC) col
FROM table) x
WHERE x.col = 1
try this
select * from table
where some_timestamp
in (select max(some_timestamp)
from table group by user_id)
Try this, I made a SQLFIDDLE which returns the correct set of data
SELECT * FROM YourTable AS T1
INNER JOIN
( SELECT user_id , MAX(some_timestamp) AS some_timestamp FROM YourTable
GROUP BY user_id
) AS T2
ON T1.User_Id = T2.User_Id AND T1.some_timestamp = T2.some_timestamp
ORDER BY 1
http://sqlfiddle.com/#!6/f7bba/6
Try this:
select * from my_table
where (user_id, some_timestamp) IN (select user_id, max(some_timestamp) from my_table group by user_id);
select YourTable.*
from
YourTable JOIN
(select User_Id, Max(Some_Timestamp) as Mx
from YourTable
group by User_Id) Mx
on YourTable.User_Id=Mx.User_Id
and YourTable.Some_Timestamp=Mx.Mx