Select the nth most recent row from multiple groups - sql

Using Oracle 12c, I have a table table1 like this:
ID DATA1 DATA2 LAST_UPDATE_TIMESTAMP
1 1 2 time_stamp1
2 1 2 time_stamp2
3 2 1 time_stamp3
4 2 2 time_stamp4
5 1 2 time_stamp5
6 1 1 time_stamp6
7 2 2 time_stamp7
8 1 1 time_stamp8
9 2 1 time_stamp9
10 1 2 time_stamp10
The DATA1 AND DATA2 only has four posssible pairs:
1,1 1,2 2,1 2,2
How to get the IDs of every pair, if ordered by LAST_UPDATE_TIMESTAMP, which are the nth most recent records?
For example, if LAST_UPDATE_TIMESTAMP is already ordered in descending order, then for the most recent, the IDs of four pairs would be 1,3,4,6. For the second most recent, it would be 2,7,8,9.
Solution
Thanks to #kordirko. This is the SQL I end up using
SELECT ID
FROM (
SELECT t.*,
row_number()
over (partition by data1, data2
ORDER BY last_updated_timestamp DESC) as rn
FROM table1 t
)
WHERE rn = n --n means the nth most recent, starts from 1

Try:
SELECT ID, DATA1, DATA2, LAST_UPDATE_TIMESTAMP,
rn -- this is a number of pair: 1-first most recent, 2-second most recent etc.
FROM (
SELECT t.*,
row_number()
over (partition by data1, data2
ORDER BY last_updated_timestamp DESC) as Rn
)
WHERE rn <= 5 -- where 5 is a limit ==> you will get at most 5 most recent records for each pair

If you want only one row returned, then you can use fetch first clause (technically called a "row-limiting clause"). For instance, to get the fifth row for (1, 1):
select t.*
from table1 t
where data1 = 1 and data2 = 1
order by last_update_timestamp desc
offset 4
fetch next 1 row only;
Note that offset is "4" not "5" in this case because four rows are skipped to get to the fifth row. For performance, an index on (data1, data2, last_upate_timestamp) is recommended.

Related

How to use LIMIT to sample rows dynamically

I have a table as follows:
SampleReq
Group
ID
2
1
_001
2
1
_002
2
1
_003
1
2
_004
1
2
_005
1
2
_006
I want my query to IDs based on the column SampleReq, resulting in the following output:
Group
ID
1
_001
1
_003
2
_006
The query should pick any 2 IDs from group 1, any 1 IDs from group 2 and so on (depending on the column SampleReq).
I tried the query using LIMIT, but this gives me an error saying column names cannot be parsed to a limit.
SELECT Group, ID
FROM Table
LIMIT SampleReq
ORDER BY RAND()
One method is row_number():
select t.*
from (select t.*,
row_number() over (partition by samplereq order by random()) as seqnum
from t
) t
where seqnum <= 2 and id = 1 or
seqnum <= 1 and id = 2;

(SQL) Per ID, starting from the first row, return all successive rows with a value N greater than the prior returned row

I have the following example dataset:
ID
Value
Row index (for reference purposes only, does not need to exist in final output)
a
4
1
a
7
2
a
12
3
a
12
4
a
13
5
b
1
6
b
2
7
b
3
8
b
4
9
b
5
10
I would like to write a SQL script that returns the next row which has a Value of N or more than the previously returned row starting from the first row per ID and ordered ascending by [Value]. An example of the final table for N = 3 should look like the following:
ID
Value
Row index
a
4
1
a
7
2
a
12
3
b
1
6
b
4
9
Can this script be written in a vectorised manner? Or must a loop be utilised? Any advice would be greatly appreciated. Thanks!
SQL tables represent unordered sets. There is no definition of "previous" value, unless you have a column that specifies the ordering. With such a column, you can use lag():
select t.*
from (select t.*,
lag(value) over (partition by id order by <ordering column>) as prev_value
from t
) t
where prev_value is null or prev_value <= value - 3;
EDIT:
I think I misunderstood what you want to do. You seem to want to start with the first row for each id. Then get the next row that is 3 or higher in value. Then hold onto that value and get the next that is 3 or higher than that. And so on.
You can do this in SQL using a recursive CTE:
with ts as (
select distinct t.id, t.value, dense_rank() over (partition by id order by value) as seqnum
from t
),
cte as (
select id, value, value as grp_value, 1 as within_seqnum, seqnum
from ts
where seqnum = 1
union all
select ts.id, ts.value,
(case when ts.value >= cte.grp_value + 3 then ts.value else cte.grp_value end),
(case when ts.value >= cte.grp_value + 3 then 1 else cte.within_seqnum + 1 end),
ts.seqnum
from cte join
ts
on ts.id = cte.id and ts.seqnum = cte.seqnum + 1
)
select *
from cte
where within_seqnum = 1
order by id, value;
Here is a db<>fiddle.

How to sample from different values in a column but only return records that are unique from another column?

I am struggling with a sampling issue using Teradata
Below is the format of the data
ID Group Rank
1 dog 1
1 cat 1
1 lion 1
1 elephant 2
2 dog 1
2 cat 1
2 lion 1
2 elephant 1
3 dog 1
3 cat 2
3 lion 1
3 elephant 1
4 dog 2
4 cat 1
4 lion 1
4 elephant 1
...
I would ideally like to return a sample number for each entry in Group but with only unique values from ID.
Below is the current query I produced but this returns duplicates for ID
SELECT ID, Group FROM Table
WHERE rank = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END
with cte as
(
SELECT ID, Group,
random(1,10000) as rnd -- RANDOM can't be directly used in OLAP-functions
FROM Table
WHERE rank = 1
)
SELECT ID, Group
FROM cte
QUALIFY
ROW_NUMBER() -- get one random row per ID
OVER (PARTITION BY ID
ORDER BY rnd) = 1
SAMPLE
WHEN group = 'dog' then 10
WHEN group = 'cat' then 10
WHEN group = 'elephant' then 5
WHEN group = 'lion' then 5
END
Assuming you have enough records, choose a random row for each id and then choose the appropriate numbers from that:
select t.*
from (select t.*,
row_number() over (partition by group order by seqnum) as sequm_g
from (select t.*,
row_number() over (partition by id order by random(1, 1000000))
from t
) t
where seqnum = 1
) t
where (group in ('dog', 'cat') and seqnum_g <= 10) or
(group in ('elephant', 'lion') and seqnum_g <= 5) ;
This doesn't guarantee that the groups will be big enough in the result set. But if you have enough data relative to the size of the groups, then it should work.

Group by data based with same group occuring multiple times

Input data
id group
1 a
1 a
1 b
1 b
1 a
1 a
1 a
expected result
id group row_number
1 a 1
1 a 1
1 b 2
1 b 2
1 a 4
1 a 4
1 a 4
I require the rwo_number based on the above result. If the same group occurring the second time generates different row_number for that? I have one more column sequence of date top to end.
This is an example of a gaps-and-islands problem. Solving it, though, requires that the data be ordered -- and SQL tables represent unordered sets.
Let me assume you have such a column. Then the difference of row numbers can be used:
select t.*,
dense_rank() over (partition by id order by grp, (seqnum - seqnum_g)) as grouping
from (select t.*,
row_number() over (partition by id order by ?) as seqnum,
row_number() over (partition by id, grp order by ?) as seqnum_g
from t
) t;
This does not produce the values that you specifically request, but it does identify each group.

Update Last Row as First row Value

I have table in below format
SeqID Control_ID Data_Value RowNum
1 SEARCH SEARCH 3
1 BROKERREF BZ815 4
1 SYSTEM 0 5
2 pdp pdp 1
2 test 123 2
2 system 235 3
I want to update the table to be in below format
SeqID Control_ID Data_Value RowNum
1 BROKERREF BZ815 4
1 SYSTEM 0 5
1 SEARCH SEARCH 3
2 test 123 2
2 system 235 3
2 pdp pdp 1
I need to select the top row as a last row for each group of sequence ID.
Note : Control_Id and DataValue Column name will be same for the row which need to be selected as a last row for each unique sequenceID
Thanks in advance.
Use ROW_NUMBER() to locate the first row per SeqID. Then, in an outer query, use CASE expression in the ORDER BY clause to place this row in the last
position within its partition:
SELECT SeqID, Control_ID, Data_Value, RowNum
FROM (
SELECT SeqID, Control_ID, Data_Value, RowNum,
ROW_NUMBER() OVER (PARTITION BY SeqID ORDER BY RowNum) AS rn
FROM mytable ) t
ORDER BY SeqID,
CASE WHEN t.rn = 1 THEN 1 ELSE 0 END,
RowNum
SQL Fiddle Demo