Filter and keep most recent duplicate - sql

Please help me with this one, I'm stuck and cant figure out how to write my Query. I'm working with SQL Server 2014.
Table A (approx 65k ROWS) CEID = primary key
CEID State Checksum
1 2 666
2 2 666
3 2 666
4 2 333
5 2 333
6 9 333
7 9 111
8 9 111
9 9 741
10 2 656
Desired output
CEID State Checksum
3 2 666
6 9 333
8 9 111
9 9 741
10 2 656
I want to keep the row with highest CEID if "state" is equal for all duplicate checksums. If state differs but Checksum is equal i want to keep the row with highest CEID for State=9. Unique rows like CEID 9 and 10 should be included in result regardless of State.
This join returns all duplicates:
SELECT a1.*, a2.*
FROM tableA a1
INNER JOIN tableA a2 ON a1.ChecksumI = a2.ChecksumI
AND a1.CEID <> a2.CEID
I've also identified MAX(CEID) for each duplicate checksum with this query
SELECT a.Checksum, a.State, MAX(a.CEID) CEID_MAX ,COUNT(*) cnt
FROM tableA a
GROUP BY a.Checksum, a.State
HAVING COUNT(*) > 1
ORDER BY a.Checksum, a.State
With the first query, I can't figure out how to SELECT the row with the highest CEID per Checksum.
The problem I encounter with last one is that GROUP BY isn't allowed in subqueries when I try to join on it.

You can use row_number() with partition by checksum and order by State desc and CEID desc. Note both of your condition may be fulfill by ORDER BY State desc, CEID desc
And take the first row_number
;with
cte as
(
select *, rn = row_number() over (Partition by Checksum order by State desc, CEID desc)
from TableA
)
select *
from cte
where rn = 1
order by CEID;

Related

Get certain rows, plus rows before and after

Let's say I have the following data set:
ID
Identifier
Admission_Date
Release_Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
234
2
4/15/22
4/18/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
789
2
7/1/22
7/5/22
321
2
6/1/21
6/3/21
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
321
2
5/6/21
5/10/21
I want all rows with identifier=1. I also want rows that are either directly below or above rows with Identifier=1 - sorted by most recent to least recent.
There is always a row below rows with identifier=1. There may or may not be a row above. If there is no row with identifier=1 for an ID, then it will not be brought in with a prior step.
The resulting data set should be as follows:
ID
Identifier
Admission Date
Release Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
I am using DBeaver, which runs PostgreSQL.
I admittedly don't know Postgres well so the following could possibly be optimised, however using a combination of lag and lead to obtain the previous and next dates (assuming Admission_date is the one to order by) you could try
with d as (
select *,
case when identifier = 1 then Lag(admission_date) over(partition by id order by Admission_Date desc) end pd,
case when identifier = 1 then Lead(admission_date) over(partition by id order by Admission_Date desc) end nd
from t
)
select id, Identifier, Admission_Date, Release_Date
from d
where identifier = 1
or exists (
select * from d d2
where d2.id = d.id
and (d.Admission_Date = pd or d.admission_date = nd)
)
order by Id, Admission_Date desc;
One way:
SELECT (x.my_row).* -- decompose fields from row type
FROM (
SELECT identifier
, lag(t) OVER w AS t0 -- take whole row
, t AS t1
, lead(t) OVER w AS t2
FROM tbl t
WINDOW w AS (PARTITION BY id ORDER BY admission_date)
) sub
CROSS JOIN LATERAL (
VALUES (t0), (t1), (t2) -- pivot
) x(my_row)
WHERE sub.identifier = 1
AND (x.my_row).id IS NOT NULL; -- exclude rows with NULL ( = missing row)
db<>fiddle here
The query is designed to only make a single pass over the table.
Uses some advanced SQL / Postgres features.
About LATERAL:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
About the VALUES expression:
Postgres: convert single row to multiple rows (unpivot)
The manual about extracting fields from a composite type.
If there are many rows per id, other solutions will be (much) faster - with proper index support. You did not specify ...

Postgres query with limit that selects all records with similar identifier

I have a table that looks something like this:
customer_id
data
1
123
1
456
2
789
2
101
2
121
2
123
3
123
4
456
What I would like to do is perform a SELECT combined with a LIMIT X to get X number of records as well as any other records that have the same customer_id
Example query: SELECT customer_id, data FROM table ORDER BY customer_id LIMIT 3;
This query returns:
customer_id
data
1
123
1
456
2
789
I'd like a query that will look at the last customer_id value and return all remaining records that match beyond the LIMIT specified. Is it possible to do this in a single operation?
Desired output:
customer_id
data
1
123
1
456
2
789
2
101
2
121
2
123
In Postgres 13 can use with ties:
select t.*
from t
order by customer_id
fetch first 3 rows with ties;
In earlier versions you can use in:
select t.*
from t
where t.customer_id in (select t2.customer_id
from t t2
order by t2.customer_id
limit 3
);
You can use corelated subquery with count as follows:
Select t.*
From t
Where 3 >= (select count(distinct customer_id)
From t tt
where t.customer_id >= tt.customer_id)

Delete rows, which are duplicated and follow each other consequently

It's hard to formulate, so i'll just show an example and you are welcome to edit my question and title.
Suppose, i have a table
flag id value datetime
0 b 1 343 13
1 a 1 23 12
2 b 1 21 11
3 b 1 32 10
4 c 2 43 11
5 d 2 43 10
6 d 2 32 9
7 c 2 1 8
For each id i want to squeze the table by flag columns such that all duplicate flag values that follow each other collapse to one row with sum aggregation. Desired result:
flag id value
0 b 1 343
1 a 1 23
2 b 1 53
3 c 2 75
4 d 2 32
5 c 2 1
P.S: I found functions like CONDITIONAL_CHANGE_EVENT, which seem to be able to do that, but the examples of them in docs dont work for me
Use the differnece of row number approach to assign groups based on consecutive row flags being the same. Thereafter use a running sum.
select distinct id,flag,sum(value) over(partition by id,grp) as finalvalue
from (
select t.*,row_number() over(partition by id order by datetime)-row_number() over(partition by id,flag order by datetime) as grp
from tbl t
) t
Here's an approach which uses CONDITIONAL_CHANGE_EVENT:
select
flag,
id,
sum(value) value
from (
select
conditional_change_event(flag) over (order by datetime desc) part,
flag,
id,
value
from so
) t
group by part, flag, id
order by part;
The result is different from your desired result stated in the question because of order by datetime. Adding a separate column for the row number and sorting on that gives the correct result.

Query to return all results except for the first record

I have a archive table that has records of transactions per locationID.
A location will have 0, 1 or many rows in this table.
I need a SELECT query that will return rows for any location that has more than 1 row, and to skip the first entry.
e.g.
Transactions table
transactionId locationId amount
1 11 2343
2 11 23434
3 25 342
4 32 234
5 77 234
6 11 38938
7 43 234
8 43 1235
So given the above, since the locationID has multiple rows, I will get back all rows except for the first one (lowest transacton_id):
2 11 23434
6 11 38938
8 43 1235
You can use row_number to do this. This assumes there would be no duplicate transactionid's.
select transactionid,locationid,amount
from
(select t.*, row_number() over(partition by locationid order by transactionid) as rn
from transactions t) t
where rn > 1
The other answer is fine. You could also write it this way, it might give you a little insight into grouping practices:
SELECT Transactions.TransactionID, Transactions.locationID, Transactions.amount
FROM Transactions INNER JOIN
(SELECT locationID, MIN(TransactionID) AS MinTransaction,
COUNT(TransactionID) AS CountTransaction
FROM Transactions
GROUP BY locationID) TableSum ON Transactions.locationID = TableSum.locationID
WHERE (Transactions.TransactionID <> TableSum.MinTransaction) AND
(TableSum.CountTransaction > 1)

SQL Server Group by clause - Simple stuff

I have a table with questions and answers and sessionid.
Sometimes the same person (sessionid) will answer the same question more than 1 time, and that gets stored in the table.
The qapp_answer table content looks something like this:
Id, SessionID, QNumber, Qanswer
72 11 1 3
73 11 1 4
74 11 2 1
75 11 2 3
76 11 3 1
So, I only want each Qnumber to be displayed one time (so 3 rows in total), and basically just use their latest answer for display (Qanswer).
This is the code so far:
select Qnumber, Qanswer
from qapp_answers
where sessionid = 11
group by QNumber, Qanswer
And it returns 5 rows.
Should be simple, but i havent used SQL for years.
You can basically use ROW_NUMBER() which generates sequential number based on the group specified. The query belows group the records by QNumber and generates sequential number sorted by ID in descending order. The latest ID for every group has the value of 1 so you need to filter out records that has a generated value of 1.
SELECT ID, SessionID, QNumber, Qanswer
FROM
(
SELECT ID, SessionID, QNumber, Qanswer,
ROW_NUMBER() OVER (PARTITION BY QNumber ORDER BY ID Desc) rn
FROM tableName
WHERE SessionID = 11
) a
WHERE a.rn = 1
SQLFiddle Demo