Sum over the rows using SQL but we need to stop and start the sum at specific condition

Sum over the rows using SQL but we need to stop and start the sum at specific condition - sql

Here is an example of the data I have and the output I want in SQL.
id
date
flag
a
2022-04-05
0
a
2022-04-06
1
a
2022-04-07
1
a
2022-04-08
1
a
2022-04-09
0
a
2022-04-10
0
a
2022-04-11
1
a
2022-04-12
1
a
2022-04-13
1
a
2022-04-14
1
a
2022-04-15
0
a
2022-04-16
0
b
2022-04-05
0
b
2022-04-06
1
b
2022-04-07
1
b
2022-04-08
0
Desired Output
id
date
flag
count
a
2022-04-05
0
0
a
2022-04-06
1
1
a
2022-04-07
1
2
a
2022-04-08
1
3
a
2022-04-09
0
0
a
2022-04-10
0
0
a
2022-04-11
1
1
a
2022-04-12
1
2
a
2022-04-13
1
3
a
2022-04-14
1
4
a
2022-04-15
0
0
a
2022-04-16
0
0
b
2022-04-05
0
0
b
2022-04-06
1
1
b
2022-04-07
1
2
b
2022-04-08
0
0
Basically the increment should start if the value of flag is 1 and continue incrementing until a flag of 0 is reached, then continue incrementing from the next flag of 1 until the next 0, and so on.

This is a gaps and islands problem. One approach uses the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn1,
ROW_NUMBER() OVER (PARTITION BY id, flag ORDER BY date) rn2
FROM yourTable
)
SELECT id, date, flag,
SUM(flag) OVER (PARTITION BY id, flag, rn1 - rn2 ORDER BY date) AS count
FROM cte
ORDER BY id, date;

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.

Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

Query to get First Value and Second value with Filter

I have the following need but I am not able to get an effective query:
ID
DATE
PARCEL
STATUS
TYPE
DT_PAY
DT
1
2021-10-15
28
3
R
2021-10-15
2021-10-15
2
2021-11-15
29
0
R
1900-01-01
2021-11-15
3
2021-12-15
30
3
R
2021-12-15
2021-12-15
4
2022-01-15
31
3
R
2022-01-15
2022-01-15
5
2022-02-15
32
3
R
2022-02-15
2022-02-15
6
2022-03-15
33
0
R
1900-01-01
2022-03-15
7
2022-04-15
34
0
R
1900-01-01
2022-04-15
8
2022-05-15
35
0
R
1900-01-01
2022-05-15
9
2022-06-15
36
0
R
1900-01-01
2022-06-15
10
2022-07-15
37
3
R
2022-07-15
2022-07-15
With the data in the table above you would need the following result:
ID
DATE
PARCEL
STATUS
TYPE
DT_PAY
DT
6
2022-03-15
33
0
R
1900-01-01
2022-03-15
2
2021-11-15
29
0
R
1900-01-01
2021-11-15
It is necessary to list the first occurrence of a line where STATUS = 0 appears after a line with STATUS = 3 appears, and the second time this occurs after another line appears with STATUS = 3 as well, but being from the most current to the oldest date, in this case the date 2022-03-15 is more current and the date 2021-11-15 is more old one that meets the STATUS = 0 filter appears after a line with STATUS = 3 appears
My query only works to find STATUS=3, but needed it to be the same for STATUS=0
with TopDates as
(select row_number() over (order by DT desc) as Row, *
from DBO.TABLE
WHERE DT < GETDATE ()
AND DT_PAY <> '1900-01-01'
AND STATUS = '3'
)
select
TB.ID
,TB.DATE
,TB.PARCEL
,TB.STATUS
,TB.DT_PAY
,TB.DT
from TopDates TB
where Row<=2

Just add an OR clause in there?
Or am I not understanding you correctly?
`with TopDates as
(select row_number() over (order by DT desc) as Row, *
from DBO.TABLE
WHERE DT < GETDATE ()
AND DT_PAY <> '1900-01-01'
AND STATUS = '3'
OR STATUS = '0'
)
select
TB.ID
,TB.DATE
,TB.PARCEL
,TB.STATUS
,TB.DT_PAY
,TB.DT
from TopDates TB
where Row<=2

Period and Quarter Sequence

I'm trying to find a way to do a sequence for date periods and quarters(not sure if this is the correct term).
Basically this will help people to navigate dates based on weeks, periods, and quarters once I join this to our sales data. For example, if I just want to know the sales from last week, I could just use WHERE WeekSequence = -1... Another example is, a manager wants to get the sales data for the past quarter, I could just use WHERE QuarterSequence = -1... something like that.
My current table:
WeekStartDate WeekEndDate CurrentWeek Period Quarter WeekSequence
----------------------------------------------------------------------
2020-08-03 2020-08-09 0 2 1 -5
2020-08-10 2020-08-16 0 2 1 -4
2020-08-17 2020-08-23 0 2 1 -3
2020-08-24 2020-08-30 0 2 1 -2
2020-08-31 2020-09-06 0 2 1 -1
2020-09-07 2020-09-13 1 3 1 0
2020-09-14 2020-09-20 0 3 1 1
2020-09-21 2020-09-27 0 3 1 2
2020-09-28 2020-10-04 0 3 1 3
2020-10-05 2020-10-11 0 4 2 4
2020-10-12 2020-10-18 0 4 2 5
What I want it to look like(highlighted):

If I understand correctly, just use window functions:
select t.*,
(period -
max(case when currentweek = 1 then period end) over ()
) as periodsequence,
(quarter -
max(case when currentweek = 1 then quarter end) over ()
) as quartersequence
from t;
You can include this in a view rather than putting it in a table.

SQL Collapse Data

I am trying to collapse data that is in a sequence sorted by date. While grouping on the person and the type.
The data is stored in an SQL server and looks like the following -
seq person date type
--- ------ ------------------- ----
1 1 2018-02-10 08:00:00 1
2 1 2018-02-11 08:00:00 1
3 1 2018-02-12 08:00:00 1
4 1 2018-02-14 16:00:00 1
5 1 2018-02-15 16:00:00 1
6 1 2018-02-16 16:00:00 1
7 1 2018-02-20 08:00:00 2
8 1 2018-02-21 08:00:00 2
9 1 2018-02-22 08:00:00 2
10 1 2018-02-23 08:00:00 1
11 1 2018-02-24 08:00:00 1
12 1 2018-02-25 08:00:00 2
13 2 2018-02-10 08:00:00 1
14 2 2018-02-11 08:00:00 1
15 2 2018-02-12 08:00:00 1
16 2 2018-02-14 16:00:00 3
17 2 2018-02-15 16:00:00 3
18 2 2018-02-16 16:00:00 3
This data set contains about 1.2 million records that resemble the above.
The result that I would like to get from this would be -
person start type
------ ------------------- ----
1 2018-02-10 08:00:00 1
1 2018-02-20 08:00:00 2
1 2018-02-23 08:00:00 1
1 2018-02-25 08:00:00 2
2 2018-02-10 08:00:00 1
2 2018-02-14 16:00:00 3
I have the data in the first format by running the following query -
select
ROW_NUMBER() OVER (ORDER BY date) AS seq
person,
date,
type,
from table
group by person, date, type
I am just not sure how to keep the minimum date with the other distinct values from person and type.

This is a gaps-and-islands problem so, you can use differences of row_number() & use them in grouping :
select person, min(date) as start, type
from (select *,
row_number() over (partition by person order by seq) seq1,
row_number() over (partition by person, type order by seq) seq2
from table
) t
group by person, type, (seq1 - seq2)
order by person, start;

The correct solution using the difference of row numbers is:
select person, type, min(date) as start
from (select t.*,
row_number() over (partition by person order by seq) as seqnum_p,
row_number() over (partition by person, type order by seq) as seqnum_pt
from t
) t
group by person, type, (seqnum_p - seqnum_pt)
order by person, start;
type needs to be included in the GROUP BY.

find nonbreaking period with condition

There are quotas for hotels per day in a table. How to get number of days when hotel is daily available?
q_id q_hotel q_date q_value
1 1 2013-02-01 1
2 1 2013-02-02 1
3 1 2013-02-03 1
4 1 2013-02-04 0
5 1 2013-02-05 2
6 1 2013-02-06 3
7 1 2013-02-07 3
8 1 2013-02-08 2
9 1 2013-02-09 0
10 1 2013-02-10 0
11 1 2013-02-11 1
12 1 2013-02-12 1
Wanted output:
q_hotel q_date days_available
1 2013-02-01 3
1 2013-02-02 2
1 2013-02-03 1
1 2013-02-04 0
1 2013-02-05 4
1 2013-02-06 3
1 2013-02-07 2
1 2013-02-08 1
1 2013-02-09 0
1 2013-02-10 0
1 2013-02-11 2
1 2013-02-12 1
For now I can get number of days if there is zero quote after needed date exists - I find closest unavailable day and calculate dates difference.
http://sqlfiddle.com/#!12/1a64c/14
select q_hotel
,q_date
,(select extract(day from (min(B.q_date)-A.q_date)) from Table1 B where B.q_date>A.q_date
and B.q_value=0 and A.q_value<>0)
from Table1 A
But there is a problem when I don't have a zero closing date.

Here is a solution:
select
a.q_date
, a.q_hotel
, case
when
a.q_value = 0
then
0
else
(
select
extract
( day from
min ( b.q_date ) - a.q_date + interval '1 day'
)
from table1 b
where b.q_date >= a.q_date
and b.q_hotel = a.q_hotel
and not exists
(
select 1
from table1 c
where c.q_date = b.q_date + interval '1 day'
and b.q_hotel = a.q_hotel
and q_value <> 0
)
)
end as days_available
from table1 a

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sum over the rows using SQL but we need to stop and start the sum at specific condition - sql

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Query to get First Value and Second value with Filter

Period and Quarter Sequence

SQL Collapse Data

find nonbreaking period with condition

Categories

Resources