How to get running total from consecutive columns in Oracle SQL - sql

I have troubles to display consecutive holidays from an existing date dataset in Oracle SQL. For example, in December 2017 between 20th and 30th, there are the following days off (because Christmas and weekend days):
23.12.2017 Saturday
24.12.2017 Sunday
25.12.2017 Christmas
30.12.2017 Saturday
Now I want my result dataset to look like this (RUNTOT is needed):
DAT ISOFF RUNTOT
20.12.2017 0 0
21.12.2017 0 0
22.12.2017 0 0
23.12.2017 1 1
24.12.2017 1 2
25.12.2017 1 3
26.12.2017 0 0
27.12.2017 0 0
28.12.2017 0 0
29.12.2017 0 0
30.12.2017 1 1
That means when "ISOFF" changes I want to count (or sum) the consecutive rows where "ISOFF" is 1.
I tried to approach a solution with an analytic function, where I summarize the "ISOFF" to the current row.
SELECT DAT,
ISOFF,
SUM (ISOFF)
OVER (ORDER BY DAT ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RUNTOT
FROM (TIME_DATASET)
WHERE DAT BETWEEN DATE '2017-12-20' AND DATE '2017-12-27'
ORDER BY 1
What I get now is following dataset:
DAT ISOFF RUNTOT
20.12.2017 0 0
21.12.2017 0 0
22.12.2017 0 0
23.12.2017 1 1
24.12.2017 1 2
25.12.2017 1 3
26.12.2017 0 3
27.12.2017 0 3
28.12.2017 0 3
29.12.2017 0 3
30.12.2017 1 4
How can I reset the running total if ISOFF changes to 0? Or is this the wrong approach to solve this problem?
Thank you for your help!

This is a gaps-and-islands problem. Here is one method that assigns the groups by the number of 0s up to that row:
select t.*,
(case when is_off = 1
then row_number() over (partition by grp order by dat)
end) as runtot
from (select t.*,
sum(case when is_off = 0 then 1 else 0 end) over (order by dat) as grp
from TIME_DATASET t
) t;

You may use the recursive recursive subquery factoring - the precondition is, that your dates are consecutive without gaps (or you have some oder row number sequence to follow in steps of one).
WITH t1(dat, isoff, runtot) AS (
SELECT dat, isoff, 0 runtot
FROM tab
WHERE DAT = DATE'2017-12-20'
UNION ALL
SELECT t2.dat, t2.isoff,
case when t2.isoff = 0 then 0 else runtot + t2.isoff end as runtot
FROM tab t2, t1
WHERE t2.dat = t1.dat + 1
)
SELECT dat, isoff, runtot
FROM t1;
DAT ISOFF RUNTOT
------------------- ---------- ----------
20.12.2017 00:00:00 0 0
21.12.2017 00:00:00 0 0
22.12.2017 00:00:00 0 0
23.12.2017 00:00:00 1 1
24.12.2017 00:00:00 1 2
25.12.2017 00:00:00 1 3
26.12.2017 00:00:00 0 0
27.12.2017 00:00:00 0 0
28.12.2017 00:00:00 0 0
29.12.2017 00:00:00 0 0
30.12.2017 00:00:00 1 1

Another variation, which doesn't need a subquery or CTE but does need all days to be present and have the same time, is - for the holiday dates only (where isoff = 1) - to see how many days it's been since the last non-holiday date:
select dat,
isoff,
case
when isoff = 1 then
coalesce(dat - max(case when isoff = 0 then dat end)
over (order by dat range between unbounded preceding and 1 preceding), 1)
else 0
end as runtot
from time_dataset
order by dat;
DAT ISOFF RUNTOT
---------- ---------- ----------
2017-12-20 0 0
2017-12-21 0 0
2017-12-22 0 0
2017-12-23 1 1
2017-12-24 1 2
2017-12-25 1 3
2017-12-26 0 0
2017-12-27 0 0
2017-12-28 0 0
2017-12-29 0 0
2017-12-30 1 1
The coalesce() is there in case the first date in the range is a holiday - as there is no previous non-holiday date to compare against, that subtraction would get null.
db<>fiddle with a slightly larger data set.

Related

SQL Conditional Counting

I am working with a dataset that contains information about train delays. The dataset contains an arrival delay column and departing delay column. Each delay column is measured in minutes. I need to calculate the number of total delays for each day of the week to determine which day has the most train delays. If the delay is equal to or more than 1 minute, it needs to be counted as a delay. How can I complete this in SQL? I have tried the following code.
select dayofweek
count(case when arrivaldelay>=1 then 1 end)+
count(case when departuredelay>=1 then 1 end)
group by dayofweek;
dayofweek arrivaldelay departuredelay
2 12 5
4 7 10
4 6 -3
6 5 4
dayofweek delays
2 1
4 1
6 1
Assuming dayofweek is a stored column and not a function, then you can use either count or sum
select
dayofweek
, count(case when arrivaldelay >= 1 then 1 end)
+ count(case when departuredelay >= 1 then 1 end)
as delays
from mytable as t
group by dayofweek;
select
dayofweek
, sum(case when arrivaldelay >= 1 then 1 else 0 end)
+ sum(case when departuredelay >= 1 then 1 else 0 end)
as delays
from mytable as t
group by dayofweek;
both give the following result from the sample data in the question
+-----------+--------+
| dayofweek | delays |
+-----------+--------+
| 2 | 2 |
| 4 | 3 |
| 6 | 2 |
+-----------+--------+
IF dayofweek is NOT a stored column then you can extract the day of week from a date or timestamp, BUT there are differences in how this is achieved in different databases
demonstrated #db<>fiddle here
You can use sum() like this:
select dayofweek
( sum(case when arrivaldelay >= 1 then 1 else 0 end)+
sum(case when departuredelay >= 1 then 1 else 0 end)
)
from t
group by dayofweek;

SQL query which converts sets or range of records on the basis of record before and after rows of that range

Suppose this table
Day Present Absent Holiday
1/1/2019 1 0 0
1/2/2019 0 1 0
1/3/2019 0 0 1
1/4/2019 0 0 1
1/5/2019 0 0 1
1/6/2019 0 1 0
1/7/2019 1 0 0
1/8/2019 0 1 0
1/9/2019 0 0 1
1/10/2019 0 1 0
I want to mark all holidays zero which are between absents, if an employee is absent before and after the holidays, then holidays will become absent days for him. I don't want to use a loop, I want set base query approach.
As a select, you can use lead() and lag():
select t.*,
(case when prev_absent = 0 and next_absent = 0 and holiday = 1
then 0 else holiday
end) as new_holiday
from (select t.*,
lag(absent) over (order by day) as prev_absent,
lead(absent) over (order by day) as next_absent
from t
) t;
If this does what you want, then you can incorporate this into an update:
with toupdate as (
select t.*,
(case when prev_absent = 0 and next_absent = 0 and holiday = 1
then 0 else holiday
end) as new_holiday
from (select t.*,
lag(absent) over (order by day) as prev_absent,
lead(absent) over (order by day) as next_absent
from t
) t
) t
update toupdate
set holiday = new_holiday
where holiday <> new_holiday;
EDIT:
You can also do this with joins:
select t.*,
(case when tprev.absent = 0 and tnext.absent = 0 and t.holiday = 1
then 0 else holiday
end) as new_holiday
from t left join
t tprev
on tnext.day = dateadd(day, -1, t.day) left join
t tnext
on tprev.day = dateadd(day, 1, tprev.day)

Sequence of Patterns within Date/time range

I have a problem I would need help on ..
In the example below, if I want to get scenarios based on the data patterns 010 as scenario1, 000 as scenario2, 111 as scenario3 within the Id.. Ignore the records that doesn't follow the pattern..
Ex:
id date Status
1 2012-10-18 1
1 2012-10-19 1
1 2012-10-20 0
1 2012-10-21 0
1 2012-10-22 0
1 2012-10-23 0
1 2012-10-24 1
1 2012-10-25 0
1 2012-10-26 0
1 2012-10-27 0
1 2012-10-28 1
2 2012-10-19 0
2 2012-10-20 0
2 2012-10-21 0
2 2012-10-22 1
2 2012-10-23 1
scenario1:
1 2012-10-23 0
1 2012-10-24 1
1 2012-10-25 0
Scenario2:
1 2012-10-20 0
1 2012-10-21 0
1 2012-10-22 0
2 2012-10-19 0
2 2012-10-20 0
2 2012-10-21 0
Scenario3 - none (no records)
You can construct the patterns as strings and then use string comparison.
At least part of the trick is that you want all rows in the pattern, so you need to construct all potential patterns where each row might appear:
select t.*
from (select t.*,
concat(lag(status), -2) over (partition by id order by date),
lag(status), -1) over (partition by id order by date),
status
) as pat1,
concat(lag(status), -1) over (partition by id order by date),
status,
lead(status), 1) over (partition by id order by date)
) as pat2,
concat(status,
lead(status), 1) over (partition by id order by date),
lead(status), 2) over (partition by id order by date)
) as pat3
from t
) t
where '010' in (pat1, pat2, pat3);

SQL cumulative sum until a flag value and resetting the sum

I'm still learning SQL and I'm trying to figure out a problem that I wasn't able to solve. So my problem is that I'm trying to select a table(let say Expense), ordered by date and in the table I have a column named Charged and I want to add charges to be cumulative(This part I figured out). However after that I have another column that will be acting as a flag called PayOut. When the PayOut value is 1 I want the summation of Charged(SumValue) to reset to zero. How would I do this? Here is what I have tried and the current output I get and what output I want. Note: I saw some posts using CTE's but wasn't the same scenario and more complex.
select ex.date,
ex.Charged,
(case when(ex.PayOut=1) then 0
else sum(ex.Charged) over (order by ex.date)end) as SumValue,
ex.PayOut
from Expense ex
order by ex.date asc
The data looks like this
Date Charged PayOut
01/10/2018 10 0
01/20/2018 5 0
01/30/2018 3 0
02/01/2018 0 1
02/11/2018 12 0
02/21/2018 15 0
Output I get
Date Charged PayOut SumValue
01/10/2018 10 0 10
01/20/2018 5 0 15
01/30/2018 3 0 18
02/01/2018 0 1 0
02/11/2018 12 0 30
02/21/2018 15 0 45
Output Wanted
Date Charged PayOut SumValue
01/10/2018 10 0 10
01/20/2018 5 0 15
01/30/2018 3 0 18
02/01/2018 0 1 0
02/11/2018 12 0 12
02/21/2018 15 0 27
Just create group from your PayOut Column and use it as a partition in OVER
WITH Expense AS (
SELECT CAST('01/10/2018' AS DATE) AS Date, 10 AS Charged, 0 AS PayOut
UNION ALL SELECT CAST('01/20/2018' AS DATE), 5, 0
UNION ALL SELECT CAST('01/30/2018' AS DATE), 3, 0
UNION ALL SELECT CAST('02/01/2018' AS DATE), 0, 1
UNION ALL SELECT CAST('02/11/2018' AS DATE), 12, 0
UNION ALL SELECT CAST('02/21/2018' AS DATE), 15, 0
)
SELECT
dat.date
,dat.Charged
,dat.PayOut
,dat.PayOutGroup
,SUM(dat.Charged) OVER (PARTITION BY dat.PayOutGroup ORDER BY dat.date) as SumValue
FROM (
SELECT
e.date
,e.Charged
,e.PayOut
,SUM(e.PayOut) OVER (ORDER BY e.date) AS PayOutGroup
FROM Expense e
) dat

SQL Server 2012 need to increment sequence number when bit column changes

I have a table like this:
dDate amount sigma
-----------------------------
2015-01-01 0,00 1
2015-11-01 150,00 0
2015-11-10 25,00 0
2015-11-11 1028,90 0
2015-11-12 878,90 1
2015-11-15 150,00 0
2015-12-13 723,90 1
I need to have an increment of a sequence number every time the sigma column changes, so this should be the result:
dDate Amount sigma seqNr
------------------------------------
2015-01-01 0,00 1 0
2015-11-01 150,00 0 1
2015-11-10 25,00 0 1
2015-11-11 1028,90 0 1
2015-11-12 878,90 1 2
2015-11-15 150,00 0 3
2015-12-13 723,90 1 4
Should I use the lag function for this?
Thanks
It seems like you want to count the number of "1"s up to any given value. (This assumes the first value should really be "1" in your example.) In SQL Server 2012+, you can do this using a cumulative sum:
select t.*,
sum(case when sigma = 1 then 1 else 0 end) over (order by ddate)
from t;
Actually, if sigma only takes on 0 and 1, you can simplify this to:
select t.*,
sum(cast(sigma as int)) over (order by ddate)
from t;
If you want the actual changes (either 1 --> 0 or 0 --> 1), then lag and cumulative sum are helpful:
select t.*,
sum(inc) over (order by ddate)
from (select t.*,
(case when sigma <> lag(sigma) over (order by ddate) then 1
else 0 end) as inc
from t
) t;
this is different variable names but same thing
same as Gordon but abs rather than case - thought I might be able to do it in one pass
select *, sum(inc) over (order by [ID]) as seq
from
( SELECT [ID], [OnOff]
, isnull(abs(cast(OnOff as int) - lag(cast(OnOff as int)) over (order by [ID])),0) as inc
FROM [test].[dbo].[Table_2]
) tt