Partition by consecutive dates - sql

I have a table with two columns. X being the unique identifier. I want to get the row number when I partition by column Y only if Z is in consecutive order. For example, I have this table
X Y Z
A 1 1-jan
A 1 2-jan
A 1 3-jan
B 3 1-jan
B 3 2-jan
A 1 5-jan
The result should look like this:
X Y Z rn
A 1 1-jan 1
A 1 2-jan 2
A 1 3-jan 3
B 3 1-jan 1
B 3 2-jan 2
A 1 5-jan 1
The code I am using right now:
select X, Y, Z, ROW_NUMBER() over (partition by Y order by Z) as rn
I am getting this as my result (This is not the result I want):
X Y Z rn
A 1 1-jan 1
A 1 2-jan 2
A 1 3-jan 3
B 3 5-jan 1
B 3 6-jan 2
A 1 5-jan 4 <---- Column Z is not 4-Jan therefore it should be the not be row 4. It should be a new row 1

You first need to create data that can be used to partition your table.
The below uses LAG() to determine if a row is a "new partition", then SUM() OVER () to propagate that flag forward and make a "partition id", then finally uses ROW_NUMBER() with that identifier.
WITH
gap_marker AS
(
SELECT
yourTable.*,
IIF(
LAG(z) OVER (PARTITION BY y ORDER BY z)
=
DATEADD(day, -1, z),
0,
1
)
AS new_date_range
FROM
yourTable
),
date_range_partition AS
(
SELECT
gap_marker.*,
SUM(new_date_range) OVER (PARTITION BY y ORDER BY z) AS date_range_id
FROM
gap_marker
)
SELECT
x, y, z,
ROW_NUMBER() OVER (PARTITION BY y, date_range_id ORDER BY z) AS rn
FROM
date_range_partition
Alternatively, you could calculate an amount to deduct from the current rn, to reset to 1 when a date is skipped.
WITH
enumerated AS
(
SELECT
yourTable.*,
ROW_NUMBER() OVER (PARTITION BY y ORDER BY z) AS rn,
DATEDIFF(
day,
LAG(z) OVER (PARTITION BY y ORDER BY z),
z
)
AS delta
FROM
yourTable
)
SELECT
x, y, z,
rn - MAX(IIF(delta = 1, 0, rn - 1)) OVER (PARTITION BY y ORDER BY z) AS rn
FROM
enumerated
Finally, you could use DATEDIFF() if your rows are always whole days apart. Window functions can be used to work out what you should compare the current row against, and avoid ROW_NUMBER() altogether.
WITH
check_previous AS
(
SELECT
yourTable.*,
IIF(
LAG(z) OVER (PARTITION BY y ORDER BY z)
=
DATEADD(day, -1, z),
NULL,
z
)
AS new_base_date
FROM
yourTable
)
SELECT
x, y, z,
DATEDIFF(
day,
MAX(new_base_date) OVER (PARTITION BY y ORDER BY z),
z
) + 1
AS rn
FROM
check_previous
Demo of all three; https://dbfiddle.uk/K8x8gOqh

Supposing that column Z is a date column, you could try the following:
SELECT X, Y, Z,
ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
SELECT *,
DATEDIFF(DAY, ROW_NUMBER() OVER (PARTITION BY X ORDER BY Z), Z) AS GRP
FROM table_name
) T
ORDER BY X, Z
If the Z column datatype is not date, then you may generate the groups of consecutive values as the following:
SELECT X, Y, Z,
ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
SELECT *,
CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT) -
ROW_NUMBER() OVER (PARTITION BY X ORDER BY SUBSTRING(Z, CHARINDEX('-', Z)+1, LEN(Z)), CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT)) AS GRP
FROM table_name2
) T
ORDER BY X, MONTH(SUBSTRING(Z, CHARINDEX('-', Z)+1, LEN(Z))+' 1 1'), CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT)
See a demo.

I solved this problem using postgresql. Extract the logic and convert into your sql dialect.
DDL statement:
create table demo
(
x varchar(10) not null,
y int not null,
z date)
insert into demo(x,y,z) values
('A',1,'2022-01-01'),
('A',1,'2022-01-02'),
('A',1,'2022-01-03'),
('B',3,'2022-01-01'),
('B',3,'2022-01-02'),
('A',1,'2022-01-05');
query:
with base_data as (
select x,y,z,
row_number() over(partition by x,y) as sno
from demo
)
,staging_data as (
select x,y,z, z - coalesce(lag(z) over(partition by x,y),z-1::INT) as diff
from base_data)
select
x,y,z,row_number() over(partition by x,diff)
from staging_data
z-1::INT - instead use date_add(z,-1)- Hope this change will work in sqlserver
output:
x|y|z |row_number|
-+-+----------+----------+
A|1|2022-01-01| 1|
A|1|2022-01-02| 2|
A|1|2022-01-03| 3|
A|1|2022-01-05| 1|
B|3|2022-01-01| 1|
B|3|2022-01-02| 2|

Related

How to extract the first 5 rows for each id user in Oracle? [duplicate]

This question already has answers here:
Get top results for each group (in Oracle)
(5 answers)
Closed 4 years ago.
I have only one table where I want to extract the first 5 rows for each ID_USER.
I execute the following query:
SELECT
ID_USER as U_ID_USER,
COUNT(ID_USER) as NUM_TIC
FROM
TABLE_USERS
GROUP BY ID_USER
ORDER BY ID_USER
Which returns the following information when run:
U_ID_USER NUM_TIC
16469 34
29012 4
33759 2
Then I want to put each value of ID_USER in the following query for extract only the first 5 rows:
SELECT *
FROM(
SELECT
DATE,
ID_USER,
C1,
C2,
C3
FROM
TABLE_USERS
WHERE
ID_USER = '16469'
ORDER BY ID_USER)
WHERE ROWNUM < 6;
For example for the ID_USER "16469" returns the following information when run:
DATE ID_USER C1 C2 C3
13/12/17 16469 X X X
11/12/17 16469 X X X
07/12/17 16469 X X X
04/12/17 16469 X X X
01/12/17 16469 X X X
That I want is an automatic process in PL/SQL or an query that give me an output like this:
DATE ID_USER C1 C2 C3
13/12/17 16469 X X X
11/12/17 16469 X X X
07/12/17 16469 X X X
04/12/17 16469 X X X
01/12/17 16469 X X X
25/12/17 29012 X X X
20/12/17 29012 X X X
15/11/17 29012 X X X
10/11/17 29012 X X X
18/12/17 33759 X X X
15/12/17 33759 X X X
Is it possible to get this output with PL/SQL or with a query?
Use ROW_NUMBER():
SELECT date, id_user, c1, c2, c3
FROM (SELECT u.*,
ROW_NUMBER() OVER (PARTITION BY id_user ORDER BY date DESC) as seqnum
FROM table_users u
)
WHERE seqnum <= 5;
When you use rownum, then it returns that many rows from the result set. ROW_NUMBER() is different. This is a function that enumerates the rows. It starts with "1" for each id_user (based on the PARTITION BY clause). The row with the highest date gets the value of one, the second highest 2, and so on -- based on the ORDER BY clause.
This can be done with row_number.
SELECT DATE,ID_USER,C1,C2,C3
FROM (SELECT
T.*
,ROW_NUMBER() OVER(PARTITION BY ID_USER ORDER BY DATECOL DESC) AS RNUM
FROM TABLE_USERS T
) T
WHERE RNUM < 6

Select values from different rows of same column INTO multiple variables in Oracle SQL

Here's the example:
ID | value
1 51
2 25
3 11
4 27
5 21
I need to get first three parameters and place them into variables e.g. out_x, out_y, out_z.
Is it possible to do it without multiple selects?
You can do something like this:
select max(case when id = 1 then value end),
max(case when id = 2 then value end),
max(case when id = 3 then value end)
into out_x, out_y, out_z
from t
where id in (1, 2, 3);
However, I think three queries of the form:
select value into out_x
from t
where id = 1;
is a cleaner approach.
You can use a PIVOT:
SELECT x, y, z
INTO out_x, out_y, out_z
FROM your_table
PIVOT ( MAX( value ) FOR id IN ( 1 AS x, 2 AS y, 3 AS z ) )
Or, if you do not know which IDs you need (but just want the first 3) then:
SELECT x, y, z
INTO out_x, out_y, out_z
FROM (
SELECT value, ROWNUM AS rn
FROM ( SELECT value FROM your_table ORDER BY id )
WHERE ROWNUM <= 3
)
PIVOT ( MAX( value ) FOR rn IN ( 1 AS x, 2 AS y, 3 AS z ) )

Getting both an individual value and a sum from a table in BigQuery

Suppose after a query from a bigger dataset I have a table like this:
day x y
1 4 5
2 3 6
3 3 2
4 2 1
5 8 3
From that table I want to get the values of x and y from day 1 and the sums of x and y from all days into a new table. And how to have the results in a table with two rows instead of just one? Like this:
x y
day1 4 5
days1-5 20 17
Now the best I can do is this:
SELECT
SUM(x) AS allx,
SUM(y) AS ally,
SUM(CASE WHEN day = 1 THEN x END) AS day1x,
SUM(CASE WHEN day = 1 THEN y END) AS day1y
FROM (
..
..
)
I guess there is a more clever way of doing this.
BigQuery - Legacy SQL:
Using comma style UNION ALL
SELECT
day, x, y
FROM
( SELECT 'day1' AS day, x, y
FROM YourTable
WHERE day = 1 ),
( SELECT
CONCAT('day1-',STRING(COUNT(1))) AS day,
SUM(x) AS x,
SUM(y) AS y
FROM YourTable )
OR
Using ROLLUP
SELECT
CONCAT('day_', IFNULL(STRING(day), 'all')) AS day,
x,
y
FROM (
SELECT
DAY,
SUM(x) AS x,
SUM(y) AS y
FROM YourTable
GROUP BY ROLLUP(day)
)
WHERE IFNULL(day, 1) = 1
BigQuery - Standard SQL:
Don't forget to uncheck Use Legacy SQL checkbox under Show Options
SELECT
'day1' AS day,
x,
y
FROM YourTable
WHERE day = 1
UNION ALL
SELECT
FORMAT('day1-%d', COUNT(1)) AS day,
SUM(x) AS x,
SUM(y) AS y
FROM YourTable
Output from al is as expected:
day x y
day1 4 5
day1-5 20 17

Query to return first date of missing date ranges

Looking for help with a query using SQL 2008 R2... I have a table with client and date fields. Most clients have a record for most dates, however some don't.
For example I have this data:
CLIENTID DT
1 5/1/14
1 5/2/14
2 5/3/14
3 5/1/14
3 5/2/14
I can find the missing dates for each CLIENTID by creating a temp table with all possible dates for the period and then joining that to each CLIENTID and DT and only selecting where there is a NULL.
This is what I can get easily for the date range 5/1/14 to 5/4/14:
CLIENTID DTMISSED
1 5/3/14
1 5/4/14
2 5/1/14
2 5/2/14
2 5/4/14
3 5/3/14
3 5/4/14
However I want to group each consecutive missed period together and get the beginning of each period and the length.
For example, if I use the date range 5/1/14 to 5/4/14 I'd like to get:
CLIENTID DTSTART MISSED
1 5/3/14 2
2 5/1/14 2
2 5/4/14 1
3 5/3/14 2
Thanks for helping!
It's fascinating how more elegantly and also mere efficiently this kind of problems can be solved in 2012.
First, the tables:
create table #t (CLIENTID int, DT date)
go
insert #t values
(1, '5/1/14'),
(1, '5/2/14'),
(2, '5/3/14'),
(3, '5/1/14'),
(3, '5/2/14')
go
create table #calendar (dt date)
go
insert #calendar values ('5/1/14'),('5/2/14'),('5/3/14'),('5/4/14')
go
Here's the 2008 solution:
;with x as (
select *, row_number() over(order by clientid, dt) as rn
from #calendar c
cross join (select distinct clientid from #t) x
where not exists (select * from #t where c.dt=#t.dt and x.clientid=#t.clientid)
),
y as (
select x1.*, x2.dt as x2_dt, x2.clientid as x2_clientid
from x x1
left join x x2 on x1.clientid=x2.clientid and x1.dt=dateadd(day,1,x2.dt)
),
z as (
select *, (select sum(case when x2_dt is null then 1 else 0 end) from y y2 where y2.rn<=y.rn) as grp
from y
)
select clientid, min(dt), count(*)
from z
group by clientid, grp
order by clientid
Compare it to 2012:
;with x as (
select *, row_number() over(order by dt) as rn
from #calendar c
cross join (select distinct clientid from #t) x
where not exists (select * from #t where c.dt=#t.dt and x.clientid=#t.clientid)
),
y as (
select x1.*, sum(case when x2.dt is null then 1 else 0 end) over(order by x1.clientid,x1.dt) as grp
from x x1
left join x x2 on x1.clientid=x2.clientid and x1.dt=dateadd(day,1,x2.dt)
)
select clientid, min(dt), count(*)
from y
group by clientid, grp
order by clientid

How to count most consecutive occurrences of a value in a Column in SQL Server

I have a table Attendance in my database.
Date | Present
------------------------
20/11/2013 | Y
21/11/2013 | Y
22/11/2013 | N
23/11/2013 | Y
24/11/2013 | Y
25/11/2013 | Y
26/11/2013 | Y
27/11/2013 | N
28/11/2013 | Y
I want to count the most consecutive occurrence of a value Y or N.
For example in the above table Y occurs 2, 4 & 1 times. So I want 4 as my result.
How to achieve this in SQL Server?
Any help will be appreciated.
Try this:-
The difference between the consecutive date will remain constant
Select max(Sequence)
from
(
select present ,count(*) as Sequence,
min(date) as MinDt, max(date) as MaxDt
from (
select t.Present,t.Date,
dateadd(day,
-(row_number() over (partition by present order by date))
,date
) as grp
from Table1 t
) t
group by present, grp
)a
where Present ='Y'
SQL FIDDLE
You can do this with a recursive CTE:
;WITH cte AS (SELECT Date,Present,ROW_NUMBER() OVER(ORDER BY Date) RN
FROM Table1)
,cte2 AS (SELECT Date,Present,RN,ct = 1
FROM cte
WHERE RN = 1
UNION ALL
SELECT a.Date,a.Present,a.RN,ct = CASE WHEN a.Present = b.Present THEN ct + 1 ELSE 1 END
FROM cte a
JOIN cte2 b
ON a.RN = b.RN+1)
SELECT TOP 1 *
FROM cte2
ORDER BY CT DESC
Demo: SQL Fiddle
Note, the date's in the demo got altered due to the format you posted the dates in your question.