Sum over the rows using SQL but we need to stop and start the sum at specific condition - sql

Here is an example of the data I have and the output I want in SQL.
id
date
flag
a
2022-04-05
0
a
2022-04-06
1
a
2022-04-07
1
a
2022-04-08
1
a
2022-04-09
0
a
2022-04-10
0
a
2022-04-11
1
a
2022-04-12
1
a
2022-04-13
1
a
2022-04-14
1
a
2022-04-15
0
a
2022-04-16
0
b
2022-04-05
0
b
2022-04-06
1
b
2022-04-07
1
b
2022-04-08
0
Desired Output
id
date
flag
count
a
2022-04-05
0
0
a
2022-04-06
1
1
a
2022-04-07
1
2
a
2022-04-08
1
3
a
2022-04-09
0
0
a
2022-04-10
0
0
a
2022-04-11
1
1
a
2022-04-12
1
2
a
2022-04-13
1
3
a
2022-04-14
1
4
a
2022-04-15
0
0
a
2022-04-16
0
0
b
2022-04-05
0
0
b
2022-04-06
1
1
b
2022-04-07
1
2
b
2022-04-08
0
0
Basically the increment should start if the value of flag is 1 and continue incrementing until a flag of 0 is reached, then continue incrementing from the next flag of 1 until the next 0, and so on.

This is a gaps and islands problem. One approach uses the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn1,
ROW_NUMBER() OVER (PARTITION BY id, flag ORDER BY date) rn2
FROM yourTable
)
SELECT id, date, flag,
SUM(flag) OVER (PARTITION BY id, flag, rn1 - rn2 ORDER BY date) AS count
FROM cte
ORDER BY id, date;

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.
Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

Query to get First Value and Second value with Filter

I have the following need but I am not able to get an effective query:
ID
DATE
PARCEL
STATUS
TYPE
DT_PAY
DT
1
2021-10-15
28
3
R
2021-10-15
2021-10-15
2
2021-11-15
29
0
R
1900-01-01
2021-11-15
3
2021-12-15
30
3
R
2021-12-15
2021-12-15
4
2022-01-15
31
3
R
2022-01-15
2022-01-15
5
2022-02-15
32
3
R
2022-02-15
2022-02-15
6
2022-03-15
33
0
R
1900-01-01
2022-03-15
7
2022-04-15
34
0
R
1900-01-01
2022-04-15
8
2022-05-15
35
0
R
1900-01-01
2022-05-15
9
2022-06-15
36
0
R
1900-01-01
2022-06-15
10
2022-07-15
37
3
R
2022-07-15
2022-07-15
With the data in the table above you would need the following result:
ID
DATE
PARCEL
STATUS
TYPE
DT_PAY
DT
6
2022-03-15
33
0
R
1900-01-01
2022-03-15
2
2021-11-15
29
0
R
1900-01-01
2021-11-15
It is necessary to list the first occurrence of a line where STATUS = 0 appears after a line with STATUS = 3 appears, and the second time this occurs after another line appears with STATUS = 3 as well, but being from the most current to the oldest date, in this case the date 2022-03-15 is more current and the date 2021-11-15 is more old one that meets the STATUS = 0 filter appears after a line with STATUS = 3 appears
My query only works to find STATUS=3, but needed it to be the same for STATUS=0
with TopDates as
(select row_number() over (order by DT desc) as Row, *
from DBO.TABLE
WHERE DT < GETDATE ()
AND DT_PAY <> '1900-01-01'
AND STATUS = '3'
)
select
TB.ID
,TB.DATE
,TB.PARCEL
,TB.STATUS
,TB.DT_PAY
,TB.DT
from TopDates TB
where Row<=2
Just add an OR clause in there?
Or am I not understanding you correctly?
`with TopDates as
(select row_number() over (order by DT desc) as Row, *
from DBO.TABLE
WHERE DT < GETDATE ()
AND DT_PAY <> '1900-01-01'
AND STATUS = '3'
OR STATUS = '0'
)
select
TB.ID
,TB.DATE
,TB.PARCEL
,TB.STATUS
,TB.DT_PAY
,TB.DT
from TopDates TB
where Row<=2

Period and Quarter Sequence

I'm trying to find a way to do a sequence for date periods and quarters(not sure if this is the correct term).
Basically this will help people to navigate dates based on weeks, periods, and quarters once I join this to our sales data. For example, if I just want to know the sales from last week, I could just use WHERE WeekSequence = -1... Another example is, a manager wants to get the sales data for the past quarter, I could just use WHERE QuarterSequence = -1... something like that.
My current table:
WeekStartDate WeekEndDate CurrentWeek Period Quarter WeekSequence
----------------------------------------------------------------------
2020-08-03 2020-08-09 0 2 1 -5
2020-08-10 2020-08-16 0 2 1 -4
2020-08-17 2020-08-23 0 2 1 -3
2020-08-24 2020-08-30 0 2 1 -2
2020-08-31 2020-09-06 0 2 1 -1
2020-09-07 2020-09-13 1 3 1 0
2020-09-14 2020-09-20 0 3 1 1
2020-09-21 2020-09-27 0 3 1 2
2020-09-28 2020-10-04 0 3 1 3
2020-10-05 2020-10-11 0 4 2 4
2020-10-12 2020-10-18 0 4 2 5
What I want it to look like(highlighted):
If I understand correctly, just use window functions:
select t.*,
(period -
max(case when currentweek = 1 then period end) over ()
) as periodsequence,
(quarter -
max(case when currentweek = 1 then quarter end) over ()
) as quartersequence
from t;
You can include this in a view rather than putting it in a table.

SQL Collapse Data

I am trying to collapse data that is in a sequence sorted by date. While grouping on the person and the type.
The data is stored in an SQL server and looks like the following -
seq person date type
--- ------ ------------------- ----
1 1 2018-02-10 08:00:00 1
2 1 2018-02-11 08:00:00 1
3 1 2018-02-12 08:00:00 1
4 1 2018-02-14 16:00:00 1
5 1 2018-02-15 16:00:00 1
6 1 2018-02-16 16:00:00 1
7 1 2018-02-20 08:00:00 2
8 1 2018-02-21 08:00:00 2
9 1 2018-02-22 08:00:00 2
10 1 2018-02-23 08:00:00 1
11 1 2018-02-24 08:00:00 1
12 1 2018-02-25 08:00:00 2
13 2 2018-02-10 08:00:00 1
14 2 2018-02-11 08:00:00 1
15 2 2018-02-12 08:00:00 1
16 2 2018-02-14 16:00:00 3
17 2 2018-02-15 16:00:00 3
18 2 2018-02-16 16:00:00 3
This data set contains about 1.2 million records that resemble the above.
The result that I would like to get from this would be -
person start type
------ ------------------- ----
1 2018-02-10 08:00:00 1
1 2018-02-20 08:00:00 2
1 2018-02-23 08:00:00 1
1 2018-02-25 08:00:00 2
2 2018-02-10 08:00:00 1
2 2018-02-14 16:00:00 3
I have the data in the first format by running the following query -
select
ROW_NUMBER() OVER (ORDER BY date) AS seq
person,
date,
type,
from table
group by person, date, type
I am just not sure how to keep the minimum date with the other distinct values from person and type.
This is a gaps-and-islands problem so, you can use differences of row_number() & use them in grouping :
select person, min(date) as start, type
from (select *,
row_number() over (partition by person order by seq) seq1,
row_number() over (partition by person, type order by seq) seq2
from table
) t
group by person, type, (seq1 - seq2)
order by person, start;
The correct solution using the difference of row numbers is:
select person, type, min(date) as start
from (select t.*,
row_number() over (partition by person order by seq) as seqnum_p,
row_number() over (partition by person, type order by seq) as seqnum_pt
from t
) t
group by person, type, (seqnum_p - seqnum_pt)
order by person, start;
type needs to be included in the GROUP BY.

find nonbreaking period with condition

There are quotas for hotels per day in a table. How to get number of days when hotel is daily available?
q_id q_hotel q_date q_value
1 1 2013-02-01 1
2 1 2013-02-02 1
3 1 2013-02-03 1
4 1 2013-02-04 0
5 1 2013-02-05 2
6 1 2013-02-06 3
7 1 2013-02-07 3
8 1 2013-02-08 2
9 1 2013-02-09 0
10 1 2013-02-10 0
11 1 2013-02-11 1
12 1 2013-02-12 1
Wanted output:
q_hotel q_date days_available
1 2013-02-01 3
1 2013-02-02 2
1 2013-02-03 1
1 2013-02-04 0
1 2013-02-05 4
1 2013-02-06 3
1 2013-02-07 2
1 2013-02-08 1
1 2013-02-09 0
1 2013-02-10 0
1 2013-02-11 2
1 2013-02-12 1
For now I can get number of days if there is zero quote after needed date exists - I find closest unavailable day and calculate dates difference.
http://sqlfiddle.com/#!12/1a64c/14
select q_hotel
,q_date
,(select extract(day from (min(B.q_date)-A.q_date)) from Table1 B where B.q_date>A.q_date
and B.q_value=0 and A.q_value<>0)
from Table1 A
But there is a problem when I don't have a zero closing date.
Here is a solution:
select
a.q_date
, a.q_hotel
, case
when
a.q_value = 0
then
0
else
(
select
extract
( day from
min ( b.q_date ) - a.q_date + interval '1 day'
)
from table1 b
where b.q_date >= a.q_date
and b.q_hotel = a.q_hotel
and not exists
(
select 1
from table1 c
where c.q_date = b.q_date + interval '1 day'
and b.q_hotel = a.q_hotel
and q_value <> 0
)
)
end as days_available
from table1 a