Remove duplicates based on a condition - sql

For the below given data set I want to remove the row which has later timestamp.
**37C1Z2990E5E0 (TRXID) should be UNIQUE** in the below dataSet
JKLAMMSDF123 20141112 20141117 5000.0 P 1.22 RT101018 *2014-11-12 10:10:26* 37C1Z2990E5E0 101018
JKLAMMSDF123 20141110 20141114 5000.0 P 1.22 RT161002 *2014-11-12 10:11:33* 37C1Z2990E5E0 161002
-- More rows

Try this:
;WITH DATA AS
(
SELECT TRXID, MAX(YourTimestampColumn) AS TS
FROM YourTable
GROUP BY TRXID
HAVING COUNT(*) > 1
)
DELETE T
FROM YourTable AS T
INNER JOIN DATA AS D
ON T.TRXID = D.TRXID
AND T.YourTimestampColumn = D.TS;

Select the min of the timestamp column and group by all of the other columns.
SELECT MIN(TIMESTAMP), C1, C2, C3...
FROM YOUR_TABLE
GROUP BY C1, C2, C3..

I will do this by using window function plus CTE.
To check the result after removing duplicates use this.
;WITH DATA
AS (SELECT *,
Row_number()OVER(partition BY TRXID ORDER BY YourTimestampColumn) rn
FROM YourTable)
select *
FROM data
WHERE rn = 1
To delete the duplicates use this.
;WITH DATA
AS (SELECT *,
Row_number()OVER(partition BY TRXID ORDER BY YourTimestampColumn) rn
FROM YourTable)
DELETE FROM data
WHERE rn > 1
This will work even if you more than one duplicate for same TRXID

Related

Selecting the latest order

I need to select the data of all my customers with the records displayed in the image. But I need to get the most recent record only, for example I need to get the order # E987 for John and E888 for Adam. As you can see from the example, when I do the select statement, I get all the order records.
You don't mention the specific database, so I'll answer with a generic solution.
You can do:
select *
from (
select t.*,
row_number() over(partition by name order by order_date desc) as rn
from t
) x
where rn = 1
You can use analytical function row_number.
Select * from
(Select t.*,
Row_number() over (partition by customer_id order by order_date desc) as rn
From your_table t) t
Where rn = 1
Or you can use not exists as follows:
Select *
From yoir_table t
Where not exists
(Select 1 from your_table tt
Where t.customer_id = tt.custome_id
And tt.order_date > t.order_date)
You can do it with a subquery that finds the last order date.
SELECT t.*
FROM yoir_table t
JOIN (SELECT tt.custome_id,
MAX(tt.order_date) MaxOrderDate
FROM yoir_table tt
GROUP BY tt.custome_id) AS tt
ON t.custome_id = tt.custome_id
AND t.order_date = tt.MaxOrderDate

Select every second record then determine earliest date

I have table that looks like the following
I have to select every second record per PatientID that would give the following result (my last query returns this result)
I then have to select the record with the oldest date which would be the following (this is the end result I want)
What I have done so far: I have a CTE that gets all the data I need
WITH cte
AS
(
SELECT visit.PatientTreatmentVisitID, mat.PatientMatchID,pat.PatientID,visit.RegimenDate AS VisitDate,
ROW_NUMBER() OVER(PARTITION BY mat.PatientMatchID, pat.PatientID ORDER BY visit.VisitDate ASC) AS RowNumber
FROM tblPatient pat INNER JOIN tblPatientMatch mat ON mat.PatientID = pat.PatientID
LEFT JOIN tblPatientTreatmentVisit visit ON visit.PatientID = pat.PatientID
)
I then write a query against the CTE but so far I can only return the second row for each patientID
SELECT *
FROM
(
SELECT PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate, RowNumber FROM cte
) as X
WHERE RowNumber = 2
How do I return the record with the oldest date only? Is there perhaps a MIN() function that I could be including somewhere?
If I follow you correctly, you can just order your existing resultset and retain the top row only.
In standard SQL, you would write this using a FETCH clause:
SELECT *
FROM (
SELECT
visit.PatientTreatmentVisitID,
mat.PatientMatchID,
pat.PatientID,
visit.RegimenDate AS VisitDate,
ROW_NUMBER() OVER(PARTITION BY mat.PatientMatchID, pat.PatientID ORDER BY visit.VisitDate ASC) AS rn
FROM tblPatient pat
INNER JOIN tblPatientMatch mat ON mat.PatientID = pat.PatientID
LEFT JOIN tblPatientTreatmentVisit visit ON visit.PatientID = pat.PatientID
) t
WHERE rn = 2
ORDER BY VisitDate
OFFSET 0 ROWS FETCH FIRST 1 ROW ONLY
This syntax is supported in Postgres, Oracle, SQL Server (and possibly other databases).
If you need to get oldest date from all selected dates (every second row for each patient ID) then you can try window function Min:
SELECT * FROM
(
SELECT *, MIN(VisitDate) OVER (Order By VisitDate) MinDate
FROM
(
SELECT PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate,
RowNumber FROM cte
) as X
WHERE RowNumber = 2
) Y
WHERE VisitDate=MinDate
Or you can use SELECT TOP statement. The SELECT TOP clause allows you to limit the number of rows returned in a query result set:
SELECT TOP 1 PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate FROM
(
SELECT *
FROM
(
SELECT PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate,
RowNumber FROM cte
) as X
WHERE RowNumber = 2
) Y
ORDER BY VisitDate
For simplicity add order desc on date column and use TOP to get the first row only
SELECT TOP 1 *
FROM
(
SELECT PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate, RowNumber FROM cte
) as X
WHERE RowNumber = 2
order by VisitDate desc

Filter the table with latest date having duplicate OrderId

I have following table:
I need to filter out the rows for which start date is latest corresponding to its order id .With reference to given table row no 2 and 3 should be the output.
As row 1 and row 2 has same order id and order date but start date is later than first row. And same goes with row number 3 and 4 hence I need to take out row no 3 . I am trying to write the query in SQL server. Any help is appreciated.Please let me know if you need more details.Apologies for poor English
You can do this easily with a ROW_NUMBER() windowed function:
;With Cte As
(
Select *,
Row_Number() Over (Partition By OrderId Order By StartDate Desc) RN
From YourTable
)
Select *
From Cte
Where RN = 1
But I question the StartDate datatype. It looks like these are being stored as VARCHAR. If that is the case, you need to CONVERT the value to a DATETIME:
;With Cte As
(
Select *,
Row_Number() Over (Partition By OrderId
Order By Convert(DateTime, StartDate) Desc) RN
From YourTable
)
Select *
From Cte
Where RN = 1
Another way using a derived table.
select
t.*
from
YourTable t
inner join
(select OrderId, max(StartDate) dt
from YourTable
group by OrderId) t2 on t2.dt = t.StartDate and t2.OrderId = t.OrderId

How to get the row that holds the last value in a queue of identical values? (SQL)

I think it's easier to show you an image:
So, for each fld_call_id, go to the next value, if it's identical. When we get to the last value, I need the value in column fld_menu_id.
Or, to put it in another way, eliminate fld_call_id duplicates and save only the last one.
You can use ROW_NUMBER:
WITH CTE AS(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY fld_call_id ORDER BY fld_id DESC),
fld_menu_id
FROM dbo.TableName
)
SELECT fld_menu_id FROM CTE WHERE RN = 1
You can create a Rank column and only select that row, something along the lines of the following:
;WITH cte AS
(
SELECT
*
,RANK() OVER (PARTITION BY fld_call_id ORDER BY fld_id DESC) Rnk
FROM YourTable
)
SELECT
*
FROM cte
WHERE Rnk=1
So you GROUP BY fld_call_id and ORDER BY fld_id in descending order so that the last value comes first. These are the rows where Rnk=1.
Edit after comments of OP.
SELECT Table.*
FROM Table
INNER JOIN
(
SELECT MAX(fldMenuID) AS fldMenuID,
fldCallID
FROM Table
GROUP BY fldCallID
) maxValues
ON (maxValues.fldMenuID = Table.fldMenuID
AND maxValues.fldCallID= Table.fldCallID)
Hope This works
SELECT A.*
FROM table A
JOIN (SELECT fld_id,
ROW_NUMBER() OVER (PARTITION BY Fld_call_id ORDER BY fld_id DESC) [Row]
FROM table) LU ON A.fld_id = LU.fld_id
WHERE LU.[Row] = 1

SQL query for column threaded relationship

This is a simplified view of a table. I apologize, but I could not save a picture of the table so I hope this is ok.
c1___c2
1____a
1____b
2____a
2____b
2____c
2____d
3____e
3____a
4____z
5____d
The result is that due to the relationships of column C2,
Group 1 would include, 1,2,3,5 (because they have overlapping c2 values basically stating a=b=c=d=e)
Group 2 would include 4
I have millions of rows with this kind of data and currently there is a cursor job that runs x number of times to build these groups. I am able to visualize how this should work, but I have not been able to build a query that can pull out this relationship.
Any suggestions?
Thank you
Tested on SQL Server 2012:
WITH t AS (
SELECT
t.c1,
t.c2,
tm.c1_min
FROM
Test t
JOIN
(
SELECT
c2,
MIN(c1) AS c1_min
FROM
Test
GROUP BY
c2
) AS tm
ON
t.c2 = tm.c2
),
rt AS (
SELECT
c1_min,
c1,
1 AS cnt
FROM
t
UNION ALL
SELECT
rt.c1_min,
t.c1,
rt.cnt + 1 AS cnt
FROM
rt
JOIN
t
ON
rt.c1 = t.c1_min
AND
rt.c1 < t.c1
)
SELECT
SUM(t.rst) OVER (ORDER BY t.ord ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS group_number,
t.c1
FROM
(
SELECT
t.c1,
t.rst,
t.ord
FROM
(
SELECT
rt.c1,
CASE
WHEN rt.c1_min = MIN(rt.c1_min) OVER (ORDER BY rt.c1_min, rt.c1 ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) THEN 0
ELSE 1
END AS rst,
ROW_NUMBER() OVER (ORDER BY rt.c1_min, rt.c1) AS ord,
ROW_NUMBER() OVER (PARTITION BY rt.c1 ORDER BY rt.c1_min, rt.cnt) AS qfy
FROM
rt
) AS t
WHERE
t.qfy = 1
) AS t;