Remove duplicate records based on timestamp - sql

I'm writing a query to find duplicate records. I have table with following columns
Id, Deliveries, TankId, Timestamp.
I have inserted duplicate records, that is for same tankid, same deliveries with the +1 day offset timestamp.
Now I want to remove duplicate records which is with lesser timestamp.
e.g. I have duplicate deliveries added for same tankid on 24th and 25th july. I need to remove 24th record.
I tried the following query;
SELECT raw.TimeStamp,raw.[Delivery],raw.[TankId]
FROM [dbo].[tObservationData] raw
INNER JOIN (
SELECT [Delivery],[TankSystemId]
FROM [dbo].[ObservationData]
GROUP BY [Delivery],[TankSystemId]
HAVING COUNT([ObservationDataId]) > 1
) dup
ON raw.[Delivery] = dup.[Delivery] AND raw.[TankId] = dup.[TankId]
AND raw.TimeStamp >'2019-06-30 00:00:00.0000000' AND raw.[DeliveryL]>0
ORDER BY [TankSystemId],TimeStamp
But above gives other records too, how can I find and delete those duplicate records?

In this case you can use partition by order by clause. You can partition by TankID and Delivery and order by Timestamp in desc order
Select * from (
Select *,ROW_NUMBER() OVER (PARTITION BY TankID,Delievry ORDER BY [Timestamp] DESC) AS rn
from [dbo].[ObservationData]
)
where rn = 1
In the above code records with rn=1 will have the latest timestamp. So you can only select those and ignore others. Also you can use the same to remove/delete the records from you table.
WITH TempObservationdata (TankID,Delivery,Timestamp)
AS
(
SELECT TankID,Delivery,ROW_NUMBER() OVER(PARTITION by TankID, Delivery ORDER BY Timsetamp desc)
AS Timestamp
FROM dbo.ObservationData
)
--Now Delete Duplicate Rows
DELETE FROM TempObservationdata
WHERE Timestamp > 1

think it will work
SELECT raw.TimeStamp,raw.[Delivery],raw.[TankId]
FROM [dbo].[tObservationData] raw
INNER JOIN (
SELECT [Delivery],[TankSystemId],min([TimeStamp]) as min_ts
FROM [dbo].[ObservationData]
GROUP BY [Delivery],[TankSystemId]
HAVING COUNT([ObservationDataId]) > 1
) dup
ON raw.[Delivery] = dup.[Delivery] AND raw.[TankId] = dup.[TankId] and raw.[TimeStamp] = dup.min_ts
AND raw.TimeStamp >'2019-06-30 00:00:00.0000000' AND raw.[DeliveryL]>0
ORDER BY [TankSystemId],TimeStamp

Are you just looking for this?
SELECT od.*
FROM (SELECT od.*,
ROW_NUMBER() OVER (PARTITION BY od.TankId, od.Delivery ORDER BY od.TimeStamp DESC) as seqnum
FROM [dbo].[tObservationData] od
) od
WHERE seqnum = 1;

Related

Get last record by month/year and id

I need to get the last record of each month/year for each id.
My table captures daily, for each id, an order value which is cumulative. So, I need that at the end I only have the last record of the month for each id.
I believe without something simple, but with the examples found I could not replicate for my case.
Here is an example of my input data and the expected result: db_fiddle.
My attempt doesn't include grouping by month and year:
select ar.id, ar.value, ar.aquisition_date
from table_views ar
inner join (
select id, max(aquisition_date) as last_aquisition_date_month
from table_views
group by id
)ld
on ar.id = ld.id and ar.aquisition_date = ld.last_aquisition_date_month
You could do this:
with tn as (
select
*,
row_number() over (partition by id, date_trunc('month', aquisition_date) order by aquisition_date desc) as rn
from table_views
)
select * from tn where rn = 1
The tn cte adds a row number that counts incrementally in descending order of date, for each month/id.. Then you take only those with rn=1, which is the last aquisition_date of any given month, for each id

How to get the latest date time record in an sql query with a where clause?

I have two tables : RO_LAMEL_DATA and RO_MAIN_TABLE. RO_MAIN_TABLE includes all the serial numbers (serial_nr) for the productions which have a record key (record_key). RO_LAMEL_DATA has several records (on the same day) for each record key such as machine status (machine_status) with a date time value (pr_date_time). I want to get the latest machine status of one production. For this I do:
select a.machine_status
from ro_lamel_Data a inner join (
select record_key, max(pr_date_time) as MaxDate
from ro_lamel_Data
group by record_key
) ro_main_table on (a.record_key = ro_main_table.record_key) and a.pr_date_time = MaxDate
where a.record_key =(
select record_key from ro_main_table where serial_nr = 'Y39489');
However I get the error:
single-row subquery returns more than one row
How can I solve this? Thanks in advance!
Maybe you need something like
WITH cte AS ( SELECT machine_status,
record_key,
ROW_NUMBER() OVER (PARTITION BY record_key
ORDER BY pr_date_time DESC) rn
FROM ro_lamel_Data )
SELECT cte.record_key, cte.machine_status last_status
FROM cte
JOIN ro_main_table ON cte.record_key = ro_main_table.record_key
WHERE rn = 1
fiddle
If you want one row, the use order by and fetch first:
select ld.machine_status
from ro_lamel_Data ld join
ro_main_table mt
using (record_key)
where mt.serial_nr = 'Y39489'
order by ld.pr_ate_time desc
fetch first 1 row only;

Delete Duplicate Rows in SQL

I have a table with unique id but duplicate row information.
I can find the rows with duplicates using this query
SELECT
PersonAliasId, StartDateTime, GroupId, COUNT(*) as Count
FROM
Attendance
GROUP BY
PersonAliasId, StartDateTime, GroupId
HAVING
COUNT(*) > 1
I can manually delete the rows while keeping the 1 I need with this query
Delete
From Attendance
Where Id IN(SELECT
Id
FROM
Attendance
Where PersonAliasId = 15
and StartDateTime = '9/24/2017'
and GroupId = 1429
Order By ModifiedDateTIme Desc
Offset 1 Rows)
I am not versed in SQL enough to figure out how to use the rows in the first query to delete the duplicates leaving behind the most recent. There are over 3481 records returned by the first query to do this one by one manually.
How can I find the duplicate rows like the first query and delete all but the most recent like the second?
You can use a Common Table Expression to delete the duplicates:
WITH Cte AS(
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;
This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId combination.
Use the MAX aggregate function to identify the latest startdatetime for each group/person combination. Then delete records which do not have that latest time.
DELETE a
FROM attendance as a
INNER JOIN (
SELECT
PersonAliasId, MAX(StartDateTime) AS LatestTime, GroupId,
FROM
Attendance
GROUP BY
PersonAliasId, GroupId
HAVING
COUNT(*) > 1
) as b
on a.personaliasid=b.personaliasid and a.groupid=b.groupid and a.startdatetime < b.latesttime
Same as the CTE answer - give Felix the check
delete
from ( SELECT rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
) tt
where tt.rn > 1

Filter the table with latest date having duplicate OrderId

I have following table:
I need to filter out the rows for which start date is latest corresponding to its order id .With reference to given table row no 2 and 3 should be the output.
As row 1 and row 2 has same order id and order date but start date is later than first row. And same goes with row number 3 and 4 hence I need to take out row no 3 . I am trying to write the query in SQL server. Any help is appreciated.Please let me know if you need more details.Apologies for poor English
You can do this easily with a ROW_NUMBER() windowed function:
;With Cte As
(
Select *,
Row_Number() Over (Partition By OrderId Order By StartDate Desc) RN
From YourTable
)
Select *
From Cte
Where RN = 1
But I question the StartDate datatype. It looks like these are being stored as VARCHAR. If that is the case, you need to CONVERT the value to a DATETIME:
;With Cte As
(
Select *,
Row_Number() Over (Partition By OrderId
Order By Convert(DateTime, StartDate) Desc) RN
From YourTable
)
Select *
From Cte
Where RN = 1
Another way using a derived table.
select
t.*
from
YourTable t
inner join
(select OrderId, max(StartDate) dt
from YourTable
group by OrderId) t2 on t2.dt = t.StartDate and t2.OrderId = t.OrderId

Delete duplicates but keep 1 with multiple column key

I have the following SQL select. How can I convert it to a delete statement so it keeps 1 of the rows but deletes the duplicate?
select s.ForsNr, t.*
from [testDeleteDublicates] s
join (
select ForsNr, period, count(*) as qty
from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
) t on s.ForsNr = t.ForsNr and s.Period = t.Period
Try using following:
Method 1:
DELETE FROM Mytable WHERE RowID NOT IN (SELECT MIN(RowID) FROM Mytable GROUP BY Col1,Col2,Col3)
Method 2:
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY ForsNr, period
ORDER BY ( SELECT 0)) RN
FROM testDeleteDublicates)
DELETE FROM cte
WHERE RN > 1
Hope this helps!
NOTE:
Please change the table & column names according to your need!
This is easy as long as you have a generated primary key column (which is a good idea). You can simply select the min(id) of each duplicate group and delete everything else - Note that I have removed the having clause so that the ids of non-duplicate rows are also excluded from the delete.
delete from [testDeleteDublicates]
where id not in (
select Min(Id) as Id
from [testDeleteDublicates]
group by ForsNr, period
)
If you don't have an artificial primary key you may have to achieve the same effect using row numbers, which will be a bit more fiddly as their implementation varies from vendor to vendor.
You can do with 2 option.
Add primary-key and delete accordingly
http://www.mssqltips.com/sqlservertip/1103/delete-duplicate-rows-with-no-primary-key-on-a-sql-server-table/
'2. Use row_number() with partition option, runtime add row to each row and then delete duplicate row.
Removing duplicates using partition by SQL Server
--give group by field in partition.
;with cte(
select ROW_NUMBER() over( order by ForsNr, period partition ForsNr, period) RowNo , * from [testDeleteDublicates]
group by ForsNr, period
having count(*) > 1
)
select RowNo from cte
group by ForsNr, period