ROW_NUMBER query - sql

I have a table:
Trip Stop Time
-----------------
1 A 1:10
1 B 1:16
1 B 1:20
1 B 1:25
1 C 1:31
1 B 1:40
2 A 2:10
2 B 2:17
2 C 2:20
2 B 2:25
I want to add one more column to my query output:
Trip Stop Time Sequence
-------------------------
1 A 1:10 1
1 B 1:16 2
1 B 1:20 2
1 B 1:25 2
1 C 1:31 3
1 B 1:40 4
2 A 2:10 1
2 B 2:17 2
2 C 2:20 3
2 B 2:25 4
The hard part is B, if B is next to each other I want it to be the same sequence, if not then count as a new row.
I know
row_number over (partition by trip order by time)
row_number over (partition by trip, stop order by time)
None of them will meet the condition I want. Is there a way to query this?

create table test
(trip number
,stp varchar2(1)
,tm varchar2(10)
,seq number);
insert into test values (1, 'A', '1:10', 1);
insert into test values (1, 'B', '1:16', 2);
insert into test values (1, 'B', '1:20', 2);
insert into test values (1 , 'B', '1:25', 2);
insert into test values (1 , 'C', '1:31', 3);
insert into test values (1, 'B', '1:40', 4);
insert into test values (2, 'A', '2:10', 1);
insert into test values (2, 'B', '2:17', 2);
insert into test values (2, 'C', '2:20', 3);
insert into test values (2, 'B', '2:25', 4);
select t1.*
,sum(decode(t1.stp,t1.prev_stp,0,1)) over (partition by trip order by tm) new_seq
from
(select t.*
,lag(stp) over (order by t.tm) prev_stp
from test t
order by tm) t1
;
TRIP S TM SEQ P NEW_SEQ
------ - ---------- ---------- - ----------
1 A 1:10 1 1
1 B 1:16 2 A 2
1 B 1:20 2 B 2
1 B 1:25 2 B 2
1 C 1:31 3 B 3
1 B 1:40 4 C 4
2 A 2:10 1 B 1
2 B 2:17 2 A 2
2 C 2:20 3 B 3
2 B 2:25 4 C 4
10 rows selected
You want to see if the stop changes between one row and the next. If it does, you want to increment the sequence. So use lag to get the previous stop into the current row.
I used DECODE because of the way it handles NULLs and it is more concise than CASE, but if you are following the text book, you should probably use CASE.
Using SUM as an analytic function with an ORDER BY clause will give the answer you are looking for.

select *, dense_rank() over(partition by trip, stop order by time) as sqnc
from yourtable;
Use dense_rank so you get all the numbers consecutively, with no skipped numbers in between.

I think this is more complicated than a simple row_number(). You need to identify groups of adjacent stops and then enumerate them.
You can identify the groups using a difference of row numbers. Then, a dense_rank() on the difference does what you want if there are no repeated stops on a trip:
select t.*,
dense_rank() over (partition by trip order by grp, stop)
from (select t.*,
(row_number() over (partition by trip order by time) -
row_number() over (partition by trip, stop order by time)
) as grp
from table t
) t;
If there are:
select t.*, dense_rank() over (partition by trip order by mintime)
from (select t.*,
min(time) over (partition by trip, grp, stop) as mintime
from (select t.*,
(row_number() over (partition by trip order by time) -
row_number() over (partition by trip, stop order by time)
) as grp
from table t
) t
) t;

Related

Calculate Rank Based on Shared Column Values and Consecutive Date Ranges (same rank for records with consecutive range)

I am trying to get the rank of a table that has specific id's and a start and end date for each record, as such:
id1
id2
flag
startdate
enddate
1
1
y
2007-01-10
2007-02-12
1
1
y
2007-02-13
2007-08-04
1
1
y
2007-08-05
2008-10-04
1
1
n
2008-10-05
2008-11-14
1
1
n
2008-11-15
2008-12-02
1
1
n
2008-12-08
2008-12-20
2
2
y
2012-01-10
2012-02-12
2
2
y
2012-02-13
2012-08-04
2
3
y
2012-01-10
2012-02-14
2
4
y
2012-08-14
2013-01-10
2
4
y
2013-01-15
2013-01-26
2
4
y
2013-01-27
2013-02-04
2
4
n
2016-03-14
2016-04-12
Where I essentially want to give the same count value to all records which share the same id1, id2, and flag, and are consecutive in their dates. Consecutive, meaning the start date of one record is equal to the end date of the previous record + 1 day.
The desired output should look like:
id1
id2
flag
startdate
enddate
rank_t
1
1
y
2007-01-10
2007-02-12
1
1
1
y
2007-02-13
2007-08-04
1
1
1
y
2007-08-05
2008-10-04
1
1
1
n
2008-10-05
2008-11-14
2
1
1
n
2008-11-15
2008-12-02
2
1
1
n
2008-12-08
2008-12-20
3
2
2
y
2012-01-10
2012-02-12
4
2
2
y
2012-02-13
2012-08-04
4
2
3
y
2012-01-10
2012-02-14
5
2
4
y
2012-08-14
2013-01-10
6
2
4
y
2013-01-15
2013-01-26
7
2
4
y
2013-01-27
2013-02-04
7
2
4
n
2016-03-14
2016-04-12
8
The output or rank does not have to be in that exact order, but the idea is still the same. Records which share the same id1, id2, and flag, and are consecutive in their dates should all have the same rank. And that rank value should not be used again for any other 'group' of records.
Here is the code to generate a temp table with this structure:
if object_id('tempdb..#temp1') is not null drop table #temp1
CREATE TABLE #temp1 (id1 INT, id2 int, flag varchar(10), startdate DATETIME, enddate DATETIME)
INSERT INTO #temp1 values
(1, 1, 'y', '2007-01-10', '2007-02-12'),
(1, 1, 'y', '2007-02-13', '2007-08-04'),
(1, 1,'y', '2007-08-05', '2008-10-04'),
(1, 1,'n', '2008-10-05', '2008-11-14'),
(1, 1,'n', '2008-11-15', '2008-12-02'),
(1, 1,'n', '2008-12-08', '2008-12-20'),
(2, 2,'y', '2012-01-10', '2012-02-12'),
(2, 2,'y', '2012-02-13', '2012-08-04'),
(2, 3,'y', '2012-01-10', '2012-02-14'),
(2, 4,'y', '2012-08-14', '2013-01-10'),
(2, 4,'y', '2013-01-15', '2013-01-26'),
(2, 4,'y', '2013-01-27', '2013-02-04'),
(2, 4,'n', '2016-03-14', '2016-04-12')
Thanks in advance for any help.
Same logic as existing answer... just done as 2 CTEs (which I find clearer) than a combination of CTE+Sub-query.
with cte1 as (
select *
-- Identify if there is a gap between the current startdate and the previous enddate
, case when lag(enddate,1,dateadd(day,-1,startdate)) over (partition by id1, id2, flag order by startdate asc) = dateadd(day,-1,startdate) then 0 else 1 end DateGap
from #temp1
), cte2 as (
select *
-- Sum every time a gap is detected to generate a new partition
, sum(DateGap) over (order by startdate asc) DateGapSum
from cte1
)
select id1, id2, flag, startdate, enddate
-- Use dense_rank to generate the ranking where ties are allocated the same value
, dense_rank() over (order by id1 asc, id2 asc, flag desc, DateGapSum asc) rank_t
from cte2
order by id1, id2, startdate;
Try the following:
WITH T AS
(
SELECT *,
IIF(
DATEDIFF(DAY,
LAG(enddate, 1, DATEADD(DAY, -1, startdate)) OVER (PARTITION BY id1, id2 ORDER BY enddate),
startdate)=1,
0, 1) AS chk
FROM #temp1
)
SELECT id1, id2, flag, startdate, enddate,
DENSE_RANK() OVER (ORDER BY id1, id2, flag DESC, grp) AS rank_t
FROM
(
SELECT *,
SUM(CHK) OVER (PARTITION BY id1, id2 ORDER BY enddate) AS grp
FROM T
) Groups
ORDER BY id1, id2, startdate
See a demo.
In the CTE T, we check if the date difference between the start date and the previous end date is equal to one, and set a value of zero if that condition is met and a value of one if that condition is not met.
Now, using a running sum of the chk value populated in the CTE, we can define unique groups for the consecutive rows.
At the end, we used DENSE_RANK function using the order (id1, id2, flag DESC, grp) to generate the required ranks.

First 7 days sales

I want to check the sum of the amount for an item from its first day of sale next 7 days. Basically, I want to check the sum of sales for the first 7 days.
I am using the below query.
select item, sum(amt)
from table
where first_sale_dt = (first_sale_dt + 6).
When I run this query, I don't get any results.
Your code as it stands will give you no results, because you are looking at each row, and asking is the value first_sale_dt equal to a values it is not +6
You need to use a WINDOW function to look across many rows, OR self JOIN the table and filter the rows that are joined to give the result you want.
so with the CTE of data for testing:
WITH data as (
select * from values
(1, 2, '2022-03-01'::date),
(1, 4, '2022-03-04'::date),
(1, 200,'2022-04-01'::date),
(3, 20, '2022-03-01'::date)
t(item, amt, first_sale_dt)
)
this SQL show the filtered row that we are wanting to SUM, it is using a sub-select (which could be moved into a CTE) to find the "first first sale" to do the date range of.
select a.item, b.amt
from (
select
item,
min(first_sale_dt) as first_first_sale_dt
from data
group by 1
) as a
join data as b
on a.item = b.item and b.first_sale_dt <= (a.first_first_sale_dt + 6)
ITEM
AMT
1
2
1
4
3
20
and therefore with a SUM added:
select a.item, sum(b.amt)
from (
select
item,
min(first_sale_dt) as first_first_sale_dt
from data
group by 1
) as a
join data as b
on a.item = b.item and b.first_sale_dt <= (a.first_first_sale_dt + 6)
group by 1;
you get:
ITEM
SUM(B.AMT)
1
6
3
20
Sliding Window:
This is relying on dense data (1 row for every day), also the sliding WINDOW is doing work that is getting thrown away, which is a string sign this is not the performant solution and I would stick to the first solution.
WITH data as (
select * from values
(1, 2, '2022-03-01'::date),
(1, 2, '2022-03-02'::date),
(1, 2, '2022-03-03'::date),
(1, 2, '2022-03-04'::date),
(1, 2, '2022-03-05'::date),
(1, 2, '2022-03-06'::date),
(1, 2, '2022-03-07'::date),
(1, 2, '2022-03-08'::date)
t(item, amt, first_sale_dt)
)
select item,
first_sale_dt,
sum(amt) over(partition by item order by first_sale_dt rows BETWEEN current row and 6 following ) as s
,count(amt) over(partition by item order by first_sale_dt rows BETWEEN current row and 6 following ) as c
from data
order by 2;
ITEM
FIRST_SALE_DT
S
C
1
2022-03-01
14
7
1
2022-03-02
14
7
1
2022-03-03
12
6
1
2022-03-04
10
5
1
2022-03-05
8
4
1
2022-03-06
6
3
1
2022-03-07
4
2
1
2022-03-08
2
1
thus you need to then filter out some rows.
WITH data as (
select * from values
(1, 2, '2022-03-01'::date),
(1, 2, '2022-03-02'::date),
(1, 2, '2022-03-03'::date),
(1, 2, '2022-03-04'::date),
(1, 2, '2022-03-05'::date),
(1, 2, '2022-03-06'::date),
(1, 2, '2022-03-07'::date),
(1, 2, '2022-03-08'::date)
t(item, amt, first_sale_dt)
)
select item,
sum(amt) over(partition by item order by first_sale_dt rows BETWEEN current row and 6 following ) as s
from data
qualify row_number() over (partition by item order by first_sale_dt) = 1
gives:
ITEM
S
1
14
If you really want to use window function. Here is beginner friendly version
with cte as
(select *, min(sale_date) over (partition by item) as sale_start_date
from data) --thanks Simeon
select item, sum(amt) as amount
from cte
where sale_date <= sale_start_date + 6 --limit to first week
group by item;
On a side note, I suggest using dateadd instead of + on dates

Get running no series on basis of one column value

I have a table like
SELECT str AS company, item#, Qty
FROM temp_on_hand
WHERE qty > 2
ORDER BY Item# ASC
output of that query is -
company item# Qty
1 746 3
5 9526 1
1 14096 1
2 14096 2
3 14095 2
I want to generate new item#( with addition of '-0001' to current item#) on basis of Qty column i.e. if Qty column has value 3 for company 1 than query should return three rows like -
company NewItem# Item# Qty
1 746-00001 746 3
1 746-00002 746 3
1 746-00003 746 3
5 9526-00001 9526 1
1 14096-00001 14096 1
2 14096-00002 14096 2
2 14096-00003 14096 2
3 14095-00001 14095 3
3 14095-00002 14095 3
3 14095-00003 14095 3
. . . . . . .
Table structure like that
CREATE TABLE temp_on_hand(str INT, item# INT,Qty INT)
INSERT INTO temp_on_hand VALUES (1, 746, 3)
INSERT INTO temp_on_hand VALUES (5, 9526, 1)
INSERT INTO temp_on_hand VALUES (1, 14096, 1)
INSERT INTO temp_on_hand VALUES (2, 14096, 2)
INSERT INTO temp_on_hand VALUES (3, 14095, 2)
ALTER TABLE temp_on_hand ADD new_item# VARCHAR)
similarly for upcoming values.
Thanks in advance
You can join to a Numbers table.
You can use a real one, but I will use Itzik Ben-Gan's on-the-fly tally table (it's actually better as an inline Table-valued Function).
EDIT: According to your comments, you don't actually need the numbering from Nums, you want a fresh overall numbering. So you can just select from L1
WITH
L0 AS ( SELECT 1 AS c
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT 1 AS c FROM L0 A, L0 B ), -- add more cross joins for more rows
SELECT
t.str AS company,
t.item# + FORMAT(ROW_NUMBER() OVER (ORDER BY t.item# ASC), '-0000') NewItem#,
t.item#,
t.Qty
FROM temp_on_hand t
CROSS APPLY(
SELECT TOP (t.Qty) c
FROM L1
) n
WHERE t.qty > 2
ORDER BY t.Item#, n.rownum ASC;
db<>fiddle
The key to good performance using a numbers table based approach is to make sure the row expansion is constrained by a row goal, i.e. SELECT TOP(n), without a row goal the full cartesian product is used. Also, the FORMAT function is known to be slow.
You could try something like this
[EDIT]: The sequence assigned to the NewItem# does not reset for each (startdate, enddate) pair.
drop TABLE if exists #temp_on_hand;
go
CREATE TABLE #temp_on_hand(str INT, item# INT,Qty INT)
INSERT INTO #temp_on_hand VALUES
(1, 746, 3),
(5, 9526, 1),
(1, 14096, 1),
(2, 14096, 2),
(3, 14095, 3);
with
l as (select 1 n from (values (1),(1),(1),(1),(1),(1),(1),(1)) as v(n))
select *, concat_ws('-', item#,
right('00000'+cast(row_number() over (order by (select null)) as varchar(5)), 5)) NewItem#
from #temp_on_hand toh
cross apply (select top (toh.Qty) 1 n
from l l1, l l2,l l3, l l4) tally;
str item# Qty n NewItem#
1 746 3 1 746-00001
1 746 3 1 746-00002
1 746 3 1 746-00003
5 9526 1 1 9526-00004
1 14096 1 1 14096-00005
2 14096 2 1 14096-00006
2 14096 2 1 14096-00007
3 14095 3 1 14095-00008
3 14095 3 1 14095-00009
3 14095 3 1 14095-00010
A recursive CTE is a simple method:
WITH cte as (
SELECT str AS company, item#, Qty, 1 as n
FROM temp_on_hand
WHERE qty > 2
UNION ALL
SELECT company, item#, Qty, n + 1
FROM cte
WHERE n < Qty
)
SELECT str, item# + format(n, '0000') as newitem#, item#, qty
FROM cte;
Note that if qty exceeds 100, you will also need option (maxrecursion 0).
EDIT:
If you want the numbering within a company, you can use window functions and a cumulative sum:
WITH cte as (
SELECT str AS company, item#, Qty, 1 as n,
SUM(qty) OVER (PARTITION BY str ORDER BY item#) - qty as start_qty
FROM temp_on_hand
WHERE qty > 2
UNION ALL
SELECT company, item#, Qty, n + 1, start_qty
FROM cte
WHERE n < Qty
)
SELECT str,
item# + format(n + start_qty, '0000') as newitem#, item#, qty
FROM cte;

Distance from maximum value for each distance

I would like to calculate the distance to maximum value for each possible distance. As an example:
Row Distance Value
1 1 2 --> 1 (Distance from Row 1)
2 2 3 --> 2 (Distance from Row 2)
3 3 3 --> 2 (Distance from Row 2)
4 4 1 --> 2 (Distance from Row 2)
5 5 5 --> 5 (Distance from Row 5)
6 6 1 --> 5 (Distance from Row 5)
Explanation: Row 6 has value of 5 because the first occurrence of maximum value between rows 1 through 6 was at distance 5.
I have tried to use some windows functions but cannot figure out how to put it together.
Sample data:
--drop table tmp_maxval;
create table tmp_maxval (dst number, val number);
insert into tmp_maxval values(1, 3);
insert into tmp_maxval values(2, 2);
insert into tmp_maxval values(3, 1);
insert into tmp_maxval values(4, 2);
insert into tmp_maxval values(5, 4);
insert into tmp_maxval values(6, 2);
insert into tmp_maxval values(7, 2);
insert into tmp_maxval values(8, 5);
insert into tmp_maxval values(9, 5);
insert into tmp_maxval values(10,1);
commit;
Functions I think can be useful in solving this:
select t.*,
max(val) over(order by dst),
case when val >= max(val) over(order by dst) then 1 else 0 end ,
case when row_number() over(partition by val order by dst) = 1 then 1 else 0 end as first_occurence
from
ap_risk.tmp_maxval t
select dst, val,
max(case when flag is null then dst end) over (order by dst)
as first_occurrence
from (
select dst, val,
case when val <= max(val) over (order by dst
rows between unbounded preceding and 1 preceding)
then 1 end as flag
from tmp_maxval
)
order by dst
;
DST VAL FIRST_OCCURRENCE
---------- ---------- ----------------
1 3 1
2 2 1
3 1 1
4 2 1
5 4 5
6 2 5
7 2 5
8 5 8
9 5 8
10 1 8
Or, if you are on Oracle version 12.1 or higher, MATCH_RECOGNIZE can do quick work of this assignment:
select dst, val, first_occurrence
from tmp_maxval t
match_recognize(
order by dst
measures a.dst as first_occurrence
all rows per match
pattern (a x*)
define x as val <= a.val
)
order by dst
;
You can get the maximum value using a cumulative max:
select mv.*, max(mv.value) over (order by mv.distance) as max_value
from ap_risk.tmp_maxval mv;
I think this answers your question. If you want the distance itself:
select mv.*,
min(case when max_value = value then distance end) over (order by distance) as first_distance_at_max_value
from (select mv.*, max(mv.value) over (order by mv.distance) as max_value
from ap_risk.tmp_maxval mv
) mv;
You could use either max() or min() combined with case when:
select t.*,
min(case when val = mv then dst end) over (partition by mv order by dst) v1,
max(case when val = mv then dst end) over (partition by mv order by dst) v2
from (select t.*, max(val) over (order by dst) mv from tmp_maxval t) t
order by dst
Result:
DST VAL MV V1 V2
---------- ---------- ---------- ---------- ----------
1 3 3 1 1
2 2 3 1 1
3 1 3 1 1
4 2 3 1 1
5 4 4 5 5
6 2 4 5 5
7 2 4 5 5
8 5 5 8 8
9 5 5 8 9
10 1 5 8 9
Explained logic and words first occurence suggest that you need min(), but third row in your example suggest max() ;-) In data which you provided you can observe difference in rows 9-10. Choose what you want.

Smarter GROUP BY

Consider Table like this.
I will call it Test
Id A B C D
1 1 1 8 25
2 1 2 5 35
3 1 3 2 75
4 2 2 2 45
5 3 2 5 26
Now I want rows with max 'Id' Grouped by 'A'
Id A B C D
3 1 3 2 75
4 2 2 2 45
5 3 2 5 26
-
--Work, but I do not want
SELECT MAX(Id), A FROM Test GROUP BY A
--I want but do not work
SELECT MAX(Id), A, B, C, D FROM Test GROUP BY A
--Work but I do not want
SELECT MAX(Id), A, B, C, D FROM Test GROUP BY A, B, C, D
--Work and I want
SELECT old.Id, old.A, new.B, new.C, new.D
FROM(
SELECT
MAX(Id) AS Id, A
FROM
Test GROUP BY A
)old
JOIN Test new
ON old.Id = new.Id
Is there a better way to write last query without join
Most databases support window functions:
select *
from (
select *, row_number() over (partition by a order by id desc) rn
from test
) t
where rn = 1
Most DBMS now support Common Table Expressions (CTE). You can use one.
;with maxa as (
select row_number() over(partition by a order by id desc) rn,
id,a,b,c,d from test
)
select id,a,b,c,d
from maxa
where rn=1