Find N element in sequence using SQL - sql

Given the following table:
Sequence Tag
----- ----
1 a
2 a
3 a
88 a
100 a
1 b
7 b
88 b
101 b
I would like a query that returns the 4th in each sequence of tags (ordered by Tag, Sequence asc):
Tag 4thInSequence
----- --------
a 88
b 101
What is the most efficient SQL I can use here? (Note: SQL Server 2008 tricks are allowed)

WITH Enumerated AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Tag ORDER BY Sequence) AS RN
FROM MyTable
)
SELECT * FROM Enumerated WHERE RN = 4;

Related

Get certain rows, plus rows before and after

Let's say I have the following data set:
ID
Identifier
Admission_Date
Release_Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
234
2
4/15/22
4/18/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
789
2
7/1/22
7/5/22
321
2
6/1/21
6/3/21
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
321
2
5/6/21
5/10/21
I want all rows with identifier=1. I also want rows that are either directly below or above rows with Identifier=1 - sorted by most recent to least recent.
There is always a row below rows with identifier=1. There may or may not be a row above. If there is no row with identifier=1 for an ID, then it will not be brought in with a prior step.
The resulting data set should be as follows:
ID
Identifier
Admission Date
Release Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
I am using DBeaver, which runs PostgreSQL.
I admittedly don't know Postgres well so the following could possibly be optimised, however using a combination of lag and lead to obtain the previous and next dates (assuming Admission_date is the one to order by) you could try
with d as (
select *,
case when identifier = 1 then Lag(admission_date) over(partition by id order by Admission_Date desc) end pd,
case when identifier = 1 then Lead(admission_date) over(partition by id order by Admission_Date desc) end nd
from t
)
select id, Identifier, Admission_Date, Release_Date
from d
where identifier = 1
or exists (
select * from d d2
where d2.id = d.id
and (d.Admission_Date = pd or d.admission_date = nd)
)
order by Id, Admission_Date desc;
One way:
SELECT (x.my_row).* -- decompose fields from row type
FROM (
SELECT identifier
, lag(t) OVER w AS t0 -- take whole row
, t AS t1
, lead(t) OVER w AS t2
FROM tbl t
WINDOW w AS (PARTITION BY id ORDER BY admission_date)
) sub
CROSS JOIN LATERAL (
VALUES (t0), (t1), (t2) -- pivot
) x(my_row)
WHERE sub.identifier = 1
AND (x.my_row).id IS NOT NULL; -- exclude rows with NULL ( = missing row)
db<>fiddle here
The query is designed to only make a single pass over the table.
Uses some advanced SQL / Postgres features.
About LATERAL:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
About the VALUES expression:
Postgres: convert single row to multiple rows (unpivot)
The manual about extracting fields from a composite type.
If there are many rows per id, other solutions will be (much) faster - with proper index support. You did not specify ...

Retrieve max date for distinct IDs in a table [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
GROUP BY with MAX(DATE) [duplicate]
(6 answers)
Select First Row of Every Group in sql [duplicate]
(2 answers)
Oracle SQL query: Retrieve latest values per group based on time [duplicate]
(2 answers)
Return row with the max value of one column per group [duplicate]
(3 answers)
Closed 3 years ago.
I have the table ABC with the following data
Id Name Date Execution id
-- ---- --------- -------------
1 AA 09SEP2019 11
1 AA 08SEP2019 22
1 AA 07SEP2019 33
2 BB 09SEP2019 44
2 BB 08SEP2019 55
2 BB 07SEP2019 66
And I want to get for every distinct ID in the table its max date. So the result set must be as the following
Id Name Date Execution id
-- ---- --------- -------------
1 AA 09SEP2019 11
2 BB 09SEP2019 44
The query that returns the result I need
WITH MaxDate as (
SELECT Id,Name,Max(Date) from ABC group by Id,Name
)
SELECT view1.*, view2.exection_id
from
MaxDate view1,
ABC view2
WHERE
view1.date=view2.date and
view1.name=view2.name;
I don't like to get the max date for the distinct ID by this way. May be there is another way ? Might be there is more easiest way?
One way is to use RANK:
WITH cte AS (
SELECT ABC.*, RANK() OVER(PARTITION BY Id,Name ORDER BY Date DESC) rnk
FROM ABC
)
SELECT *
FROM cte
WHERE rnk = 1
ORDER BY id;
You can use keep dense_rank last do to this in one level of query, as long as you only want one or a small number of column retained:
select id,
name,
max(date_) as date_,
max(execution_id) keep (dense_rank last order by date_) as execution_id
from abc
group by id, name
order by id;
ID NAME DATE_ EXECUTION_ID
---------- ---- ---------- ------------
1 AA 2019-09-09 11
2 BB 2019-09-09 44
If ID and name are not always the same, and you want the name form the latest date too, then use the same pattern:
select id,
max(name) keep (dense_rank last order by date_) as name,
max(date_) as date_,
max(execution_id) keep (dense_rank last order by date_) as execution_id
from abc
group by id
order by id;
which gets the same result with your sample data.
With lots of columns it's probably simpler to use a subquery (CTE or inline view) with a ranking function and a filter (as #Lukasz shows).
With NOT EXISTS:
select t.* from ABC t
where not exists (
select 1 from ABC
where "Id" = t."Id" and "Name" = t."Name" and "Date" > t."Date"
)
I used and name = t.name only because you have it in your code.
If it is not needed you can remove it.
See the demo.
Results:
Id | Name | Date | Execution id
-: | :--- | :---------| -----------:
1 | AA | 09-SEP-19 | 11
2 | BB | 09-SEP-19 | 44

SQL Server GROUP BY COUNT Consecutive Rows Only

I have a table called DATA on Microsoft SQL Server 2008 R2 with three non-nullable integer fields: ID, Sequence, and Value. Sequence values with the same ID will be consecutive, but can start with any value. I need a query that will return a count of consecutive rows with the same ID and Value.
For example, let's say I have the following data:
ID Sequence Value
-- -------- -----
1 1 1
5 1 100
5 2 200
5 3 200
5 4 100
10 10 10
I want the following result:
ID Start Value Count
-- ----- ----- -----
1 1 1 1
5 1 100 1
5 2 200 2
5 4 100 1
10 10 10 1
I tried
SELECT ID, MIN([Sequence]) AS Start, Value, COUNT(*) AS [Count]
FROM DATA
GROUP BY ID, Value
ORDER BY ID, Start
but that gives
ID Start Value Count
-- ----- ----- -----
1 1 1 1
5 1 100 2
5 2 200 2
10 10 10 1
which groups all rows with the same values, not just consecutive rows.
Any ideas? From what I've seen, I believe I have to left join the table with itself on consecutive rows using ROW_NUMBER(), but I am not sure exactly how to get counts from that.
Thanks in advance.
You can use Sequence - ROW_NUMBER() OVER (ORDER BY ID, Val, Sequence) AS g to create a group:
SELECT
ID,
MIN(Sequence) AS Sequence,
Val,
COUNT(*) AS cnt
FROM
(
SELECT
ID,
Sequence,
Sequence - ROW_NUMBER() OVER (ORDER BY ID, Val, Sequence) AS g,
Val
FROM
yourtable
) AS s
GROUP BY
ID, Val, g
Please see a fiddle here.

Highest per each group

It's hard to show my actual table and data here so I'll describe my problem with a sample table and data:
create table foo(id int,x_part int,y_part int,out_id int,out_idx text);
insert into foo values (1,2,3,55,'BAK'),(2,3,4,77,'ZAK'),(3,4,8,55,'RGT'),(9,10,15,77,'UIT'),
(3,4,8,11,'UTL'),(3,4,8,65,'MAQ'),(3,4,8,77,'YTU');
Following is the table foo:
id x_part y_part out_id out_idx
-- ------ ------ ------ -------
3 4 8 11 UTL
3 4 8 55 RGT
1 2 3 55 BAK
3 4 8 65 MAQ
9 10 15 77 UIT
2 3 4 77 ZAK
3 4 8 77 YTU
I need to select all fields by sorting the highest id of each out_id.
Expected output:
id x_part y_part out_id out_idx
-- ------ ------ ------ -------
3 4 8 11 UTL
3 4 8 55 RGT
3 4 8 65 MAQ
9 10 15 77 UIT
Using PostgreSQL.
Postgres specific (and fastest) solution:
select distinct on (out_id) *
from foo
order by out_id, id desc;
Standard SQL solution using a window function (second fastest)
select id, x_part, y_part, out_id, out_idx
from (
select id, x_part, y_part, out_id, out_idx,
row_number() over (partition by out_id order by id desc) as rn
from foo
) t
where rn = 1
order by id;
Note that both solutions will only return each id once, even if there are multiple out_id values that are the same. If you want them all returned, use dense_rank() instead of row_number()
select *
from foo
where (id,out_id) in (
select max(id),out_id from foo group by out_id
) order by out_id
Finding max(val) := finding the record for which no larger val exists:
SELECT *
FROM foo f
WHERE NOT EXISTS (
SELECT 317
FROM foo nx
WHERE nx.out_id = f.out_id
AND nx.id > f.id
);

SQL query to get value of another column corresponding to a max value of a column based on group by

I have the following table:
ID BLOWNUMBER TIME LADLE
--- ---------- ---------- -----
124 1 01/01/2012 2
124 1 02/02/2012 1
124 1 03/02/2012 0
124 2 04/01/2012 1
125 2 04/06/2012 1
125 2 01/03/2012 0
I want to have the TIME for the maximum value of LADLE for a group of ID & BLOWNUMBER.
Output required:
124 1 01/01/2012
124 2 04/01/2012
125 2 04/06/2012
If you're using SQL Server (or another engine which supports CTE's and ROW_NUMBER), you can use this CTE (Common Table Expression) query:
;WITH CTE AS
(
SELECT
ID, BlowNumber, [Time],
RN = ROW_NUMBER() OVER (PARTITION BY ID, BLOWNUMBER ORDER BY [Time] DESC)
FROM Sample
)
SELECT *
FROM CTE
WHERE RN = 1
See this SQL Fiddle here for an online live demo.
This CTE "partitions" your data by (ID, BLOWNUMBER), and the ROW_NUMBER() function hands out numbers, starting at 1, for each of those "partitions", ordered by the [Time] columns (newest time value first).
Then, you just select from that CTE and use RN = 1 to get the most recent of each data partition.
If you are using sqllite (probably compatible with other DBs as well); you could do:
select
ct.id
, ct.blownumber
, time
from
new
, (
select
id
, blownumber
, max(ladle) as ldl
from
new
group by
id
, blownumber
) ct
where
ct.id = new.id
and ct.blownumber = new.blownumber
and ct.ldl = new.ladle;