Select the first row in the last group of consecutive rows - sql

How would I select the row that is the first occurrence in the last 'grouping' of consecutive rows, where a grouping is defined by the consecutive appearance of a particular column value (in the example below state).
For example, given the following table:
id
datetime
state
value_needed
1
2021-04-01 09:42:41.319000
incomplete
A
2
2021-04-04 09:42:41.319000
done
B
3
2021-04-05 09:42:41.319000
incomplete
C
4
2021-04-05 10:42:41.319000
incomplete
C
5
2021-04-07 09:42:41.319000
done
D
6
2021-04-012 09:42:41.319000
done
E
I would want the row with id=5 as it it is the first occurrence of state=done in the last (i.e. most recent) grouping of state=done.

Assuming all columns NOT NULL.
SELECT *
FROM tbl t1
WHERE NOT EXISTS (
SELECT FROM tbl t2
WHERE t2.state <> t1.state
AND t2.datetime > t1.datetime
)
ORDER BY datetime
LIMIT 1;
db<>fiddle here
NOT EXISTS is only true for the last group of peers. (There is no later row with a different state.)
ORDER BY datetime and take the first. Voilá.

Here's a window function solution that accesses your table only once (which may or may not perform better for large data sets):
SELECT *
FROM (
SELECT *,
LEAD (state) OVER (ORDER BY datetime DESC)
IS DISTINCT FROM state AS first_in_group
FROM tbl
) t
WHERE first_in_group
ORDER BY datetime DESC
LIMIT 1
A dbfiddle based on Erwin Brandstetter's. To illustrate, here's the value of first_in_group for each row:
id datetime state value_needed first_in_group
---------------------------------------------------------------------
6 2021-04-12 09:42:41.319 done E f
5 2021-04-07 09:42:41.319 done D t
4 2021-04-05 10:42:41.319 incomplete C f
3 2021-04-05 09:42:41.319 incomplete C t
2 2021-04-04 09:42:41.319 done B t
1 2021-04-01 09:42:41.319 incomplete A t

Related

SQL select 1 row out of several rows that have similar values

I have a table like this:
ID
OtherID
Date
1
z
2022-09-19
1
b
2021-04-05
2
e
2022-04-05
3
t
2022-07-08
3
z
2021-03-02
I want a table like this:
ID
OtherID
Date
1
z
2022-09-19
2
e
2022-04-05
3
t
2022-07-08
That have distinct pairs consisted of ID-OtherID based on the Date values which are the most recent.
The problem I have now is the relationship between ID and OtherID is 1:M
I've looked at SELECT DISTINCT, GROUP BY, LAG but I couldn't figure it out. I'm sorry if this is a duplicate question. I couldn't find the right keywords to search for the answer.
Update: I use Postgres but would like to know other SQL as well.
This works for many dbms (versions of postgres, mysql and others) but you may need to adapt if something else. You could use a CTE, or a join, or a subquery such as this:
select id, otherid, date
from (
select id, otherid, date,
rank() over (partition by id order by date desc) as id_rank
from my_table
)z
where id_rank = 1
id
otherid
date
1
z
2022-09-19T00:00:00.000Z
2
e
2022-04-05T00:00:00.000Z
3
t
2022-07-08T00:00:00.000Z
You can use a Common Table Expression (CTE) with ROW_NUMBER() to assign a row number based on the ID column (then return the first row for each ID in the WHERE clause rn = 1):
WITH cte AS
(SELECT ID,
OtherID,
Date,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date DESC) AS rn
FROM sample_table)
SELECT ID,
OtherID,
Date
FROM cte
WHERE rn = 1;
Result:
ID
OtherID
Date
1
z
2022-09-19
2
e
2022-04-05
3
t
2022-07-08
Fiddle here.

Get certain rows, plus rows before and after

Let's say I have the following data set:
ID
Identifier
Admission_Date
Release_Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
234
2
4/15/22
4/18/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
789
2
7/1/22
7/5/22
321
2
6/1/21
6/3/21
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
321
2
5/6/21
5/10/21
I want all rows with identifier=1. I also want rows that are either directly below or above rows with Identifier=1 - sorted by most recent to least recent.
There is always a row below rows with identifier=1. There may or may not be a row above. If there is no row with identifier=1 for an ID, then it will not be brought in with a prior step.
The resulting data set should be as follows:
ID
Identifier
Admission Date
Release Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
I am using DBeaver, which runs PostgreSQL.
I admittedly don't know Postgres well so the following could possibly be optimised, however using a combination of lag and lead to obtain the previous and next dates (assuming Admission_date is the one to order by) you could try
with d as (
select *,
case when identifier = 1 then Lag(admission_date) over(partition by id order by Admission_Date desc) end pd,
case when identifier = 1 then Lead(admission_date) over(partition by id order by Admission_Date desc) end nd
from t
)
select id, Identifier, Admission_Date, Release_Date
from d
where identifier = 1
or exists (
select * from d d2
where d2.id = d.id
and (d.Admission_Date = pd or d.admission_date = nd)
)
order by Id, Admission_Date desc;
One way:
SELECT (x.my_row).* -- decompose fields from row type
FROM (
SELECT identifier
, lag(t) OVER w AS t0 -- take whole row
, t AS t1
, lead(t) OVER w AS t2
FROM tbl t
WINDOW w AS (PARTITION BY id ORDER BY admission_date)
) sub
CROSS JOIN LATERAL (
VALUES (t0), (t1), (t2) -- pivot
) x(my_row)
WHERE sub.identifier = 1
AND (x.my_row).id IS NOT NULL; -- exclude rows with NULL ( = missing row)
db<>fiddle here
The query is designed to only make a single pass over the table.
Uses some advanced SQL / Postgres features.
About LATERAL:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
About the VALUES expression:
Postgres: convert single row to multiple rows (unpivot)
The manual about extracting fields from a composite type.
If there are many rows per id, other solutions will be (much) faster - with proper index support. You did not specify ...

Group by in SQL returning error: Selected non-aggregate values must be part of the associated group

I have a table that looks like this:
date store flag
1 5/4/2018 a 1
2 5/4/2018 a 1
3 5/3/2018 b 1
4 5/3/2018 b 0
5 5/2/2018 a 1
6 5/2/2018 b 0
I want to group by date and store and sum the number of flags
i.e. table_a below:
date store total_flag
1 5/4/2018 a 2
3 5/3/2018 b 1
4 5/2/2018 a 1
5 5/2/2018 b 0
This is what I'm trying:
create multiset volatile table flag_summary as (
sel table_a.*, SUM(table_a.flag) as total_flag
group by date, store
)
with data primary index (date, store) on commit preserve rows;
The above gives me an error, "CREATE TABLE Failed. [3504] Selected non-aggregate values must be part of the associated group.
You are selecting all of tableA (including the flag). You should just be pulling the date and the store since you want the sum of the flag.
SELECT date, store, SUM(flag)
FROM tableA
GROUP BY date, store

SQL Group by only correlative rows

Say I have the following table:
Code A B C Date ID
------------------------------
50 1 1 A 2018-01-08 150001
50 1 1 A 2018-01-15 165454
50 1 1 B 2018-02-01 184545
50 1 1 A 2018-02-02 195487
I need the sql query to output the following:
Code A B C Min(Date) Min(ID)
-------------------------------
50 1 1 A 2018-01-08 150001
50 1 1 B 2018-02-01 184545
50 1 1 A 2018-02-02 195487
If I use standard group by, rows 1,2,4 are grouped in 1 row, and this is not that I want.
I want to select the row with MIN(date) and MIN(id) from the duplicate records that are together based on column code, A, B and C
in this case 1st 2 rows are duplicates so i want the min() row.
and 3rd and 4th row are distinct.
Note that the database is Vertica 8.1, that is very similar to Oracle or PostgreSQL
I think you would need the analytic function LAG(). Using this function, you can get the value of the previous row (or NULL if it's the first row itself). So you can check if the value on the previous row is different or not, and filter accordingly.
I'm not familiar with Vertica, but this should be the correct documentation for it: https://my.vertica.com/docs/7.0.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Analytic/LAGAnalytic.htm
Please try the query below, it should do it:
SELECT l.Code, l.A, l.B, l.C, l.Date, l.ID
FROM (SELECT t.*,
LAG(t.C, 1) OVER (PARTITION BY t.Code, t.A ORDER BY t.Date) prev_val
FROM table_1 t) l
WHERE l.C != l.prev_val
OR l.prev_val IS NULL
ORDER BY l.Code, l.A, l.Date

Delete rows, which are duplicated and follow each other consequently

It's hard to formulate, so i'll just show an example and you are welcome to edit my question and title.
Suppose, i have a table
flag id value datetime
0 b 1 343 13
1 a 1 23 12
2 b 1 21 11
3 b 1 32 10
4 c 2 43 11
5 d 2 43 10
6 d 2 32 9
7 c 2 1 8
For each id i want to squeze the table by flag columns such that all duplicate flag values that follow each other collapse to one row with sum aggregation. Desired result:
flag id value
0 b 1 343
1 a 1 23
2 b 1 53
3 c 2 75
4 d 2 32
5 c 2 1
P.S: I found functions like CONDITIONAL_CHANGE_EVENT, which seem to be able to do that, but the examples of them in docs dont work for me
Use the differnece of row number approach to assign groups based on consecutive row flags being the same. Thereafter use a running sum.
select distinct id,flag,sum(value) over(partition by id,grp) as finalvalue
from (
select t.*,row_number() over(partition by id order by datetime)-row_number() over(partition by id,flag order by datetime) as grp
from tbl t
) t
Here's an approach which uses CONDITIONAL_CHANGE_EVENT:
select
flag,
id,
sum(value) value
from (
select
conditional_change_event(flag) over (order by datetime desc) part,
flag,
id,
value
from so
) t
group by part, flag, id
order by part;
The result is different from your desired result stated in the question because of order by datetime. Adding a separate column for the row number and sorting on that gives the correct result.