SQL Server 2012 need to increment sequence number when bit column changes - sql

I have a table like this:
dDate amount sigma
-----------------------------
2015-01-01 0,00 1
2015-11-01 150,00 0
2015-11-10 25,00 0
2015-11-11 1028,90 0
2015-11-12 878,90 1
2015-11-15 150,00 0
2015-12-13 723,90 1
I need to have an increment of a sequence number every time the sigma column changes, so this should be the result:
dDate Amount sigma seqNr
------------------------------------
2015-01-01 0,00 1 0
2015-11-01 150,00 0 1
2015-11-10 25,00 0 1
2015-11-11 1028,90 0 1
2015-11-12 878,90 1 2
2015-11-15 150,00 0 3
2015-12-13 723,90 1 4
Should I use the lag function for this?
Thanks

It seems like you want to count the number of "1"s up to any given value. (This assumes the first value should really be "1" in your example.) In SQL Server 2012+, you can do this using a cumulative sum:
select t.*,
sum(case when sigma = 1 then 1 else 0 end) over (order by ddate)
from t;
Actually, if sigma only takes on 0 and 1, you can simplify this to:
select t.*,
sum(cast(sigma as int)) over (order by ddate)
from t;
If you want the actual changes (either 1 --> 0 or 0 --> 1), then lag and cumulative sum are helpful:
select t.*,
sum(inc) over (order by ddate)
from (select t.*,
(case when sigma <> lag(sigma) over (order by ddate) then 1
else 0 end) as inc
from t
) t;

this is different variable names but same thing
same as Gordon but abs rather than case - thought I might be able to do it in one pass
select *, sum(inc) over (order by [ID]) as seq
from
( SELECT [ID], [OnOff]
, isnull(abs(cast(OnOff as int) - lag(cast(OnOff as int)) over (order by [ID])),0) as inc
FROM [test].[dbo].[Table_2]
) tt

Related

Query for the longest duration of consecutive TRUE [duplicate]

I have the following table in SQL Server. I would like to find the longest duration for the machine running.
Row
DateTime
Machine On
1
9/22/2022 8:20
1
2
9/22/2022 9:10
0
3
9/22/2022 10:40
1
4
9/22/2022 10:52
0
5
9/22/2022 12:30
1
6
9/22/2022 14:30
0
7
9/22/2022 15:00
1
8
9/22/2022 15:40
0
9
9/22/2022 16:25
1
10
9/22/2022 16:55
0
In the example above, the longest duration for the machine is ON is 2 hours using rows 5 and 6. What would be the best SQL statement that can provide the longest duration given a time range?
Desired Result:
60 minutes
I have looked into the LAG Function and the LEAD Function in SQL.
Here's another way that uses traditional gaps & islands methodology:
WITH src AS
(
SELECT Island, mint = MIN([Timestamp]), maxt = MAX([Timestamp])
FROM
(
SELECT [Timestamp], Island =
ROW_NUMBER() OVER (ORDER BY [Timestamp]) -
ROW_NUMBER() OVER (PARTITION BY Running ORDER BY [Timestamp])
FROM dbo.Machine_Status
) AS x GROUP BY Island
)
SELECT TOP (1) delta =
(DATEDIFF(second, mint, LEAD(mint,1) OVER (ORDER BY island)))
FROM src ORDER BY delta DESC;
Example db<>fiddle based on the sample data in your new duplicate.
If this is really your data, you can simply use INNER JOIN and DATEDIFF:
SELECT MAX(DATEDIFF(MINUTE, T1.[DateTime], T2.[DateTime]))
FROM [my_table] T1
INNER JOIN [my_table] T2
ON T1.[Row] + 1 = T2.[Row];
This is a gaps and islands problem, one option to solve it is to use a running sum that increased by 1 whenever a machine_on = 0, this will define unique groups for consecutive 1s followed by 0.
select top 1 datediff(minute, min([datetime]), max([datetime])) duration
from
(
select *,
sum(case when machine_on = 0 then 1 else 0 end) over (order by datetime desc) grp
from table_name
) T
group by grp
order by datediff(minute, min([datetime]), max([datetime])) desc
See demo
This is a classic Gaps and Islands with a little twist Adj
Example
Select Top 1
Row1 = min(row)
,Row2 = max(row)+1
,TS1 = min(TimeStamp)
,TS2 = dateadd(SECOND,max(Adj),max(TimeStamp))
,Dur = datediff(Second,min(TimeStamp),max(TimeStamp)) + max(Adj)
From (
Select *
,Grp = row_number() over( partition by Running order by TimeStamp) - row_number() over (order by timeStamp)
,Adj = case when Running=1 and lead(Running,1) over (order by timestamp) = 0 then datediff(second,TimeStamp,lead(TimeStamp,1) over (order by TimeStamp) ) else 0 end
From Machine_Status
) A
Where Running=1
Group By Grp
Order By Dur Desc
Results
Row1 Row2 TS1 TS2 Dur
8 12 2023-01-10 08:25:30.000 2023-01-10 08:28:55.000 205

Using the earliest date of a partition to determine what other dates belong to that partition

Assume this is my table:
ID DATE
--------------
1 2018-11-12
2 2018-11-13
3 2018-11-14
4 2018-11-15
5 2018-11-16
6 2019-03-05
7 2019-05-07
8 2019-05-08
9 2019-05-08
I need to have partitions be determined by the first date in the partition. Where, any date that is within 2 days of the first date, belongs in the same partition.
The table would end up looking like this if each partition was ranked
PARTITION ID DATE
------------------------
1 1 2018-11-12
1 2 2018-11-13
1 3 2018-11-14
2 4 2018-11-15
2 5 2018-11-16
3 6 2019-03-05
4 7 2019-05-07
4 8 2019-05-08
4 9 2019-05-08
I've tried using datediff with lag to compare to the previous date but that would allow a partition to be inappropriately sized based on spacing, for example all of these dates would be included in the same partition:
ID DATE
--------------
1 2018-11-12
2 2018-11-14
3 2018-11-16
4 2018-11-18
3 2018-11-20
4 2018-11-22
Previous flawed attempt:
Mark when a date is more than 2 days past the previous date:
(case when datediff(day, lag(event_time, 1) over (partition by user_id, stage order by event_time), event_time) > 2 then 1 else 0 end)
You need to use a recursive CTE for this, so the operation is expensive.
with t as (
-- add an incrementing column with no gaps
select t.*, row_number() over (order by date) as seqnum
from t
),
cte as (
select id, date, date as mindate, seqnum
from t
where seqnum = 1
union all
select t.id, t.date,
(case when t.date <= dateadd(day, 2, cte.mindate)
then cte.mindate else t.date
end) as mindate,
t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select cte.*, dense_rank() over (partition by mindate) as partition_num
from cte;

How to write a query to attach rownumber(1 to n) to each records for each group

I have a dataset something like below
|date|flag|
|20190503|0|
|20190504|1|
|20190505|1|
|20190506|1|
|20190507|1|
|20190508|0|
|20190509|0|
|20190510|0|
|20190511|1|
|20190512|1|
|20190513|0|
|20190514|0|
|20190515|1|
What I want to achieve is to group the consecutive dates by flag=1, and add one column counter to mark 1 for the first day of the consecutive days where flag=1, and 2 for the 2nd day and etc, assign 0 for flag=0
|date|flag|counter|
|20190503|0|0|
|20190504|1|1|
|20190505|1|2|
|20190506|1|3|
|20190507|1|4|
|20190508|0|0|
|20190509|0|0|
|20190510|0|0|
|20190511|1|1|
|20190512|1|2|
|20190513|0|0|
|20190514|0|0|
|20190515|1|1|
I tried analytical function and hierarchy query, but still haven't found a solution, seeking help, any hint is appreciated!
Thanks,
Hong
You can define the groups using a cumulative sum of the zeros. Then use row_number():
select t.*,
(case when flag = 0 then 0
else row_number() over (partition by grp order by date)
end) as counter
from (select t.*,
sum(case when flag = 0 then 1 else 0 end) over (order by date) as grp
from t
) t;
A very different approach is to take the difference between the current date and a cumulative max of the flag = 0 date:
select t.*,
datediff(day,
max(case when flag = 0 then date end) over (order by date),
date
) as counter
from t;
Note that the logic of these two approaches is different -- although they should produce the same results for the data you have provided. For missing dates, the first just ignores missing dates. The second will increment the counter for missing dates.
Well - Vertica has a very nice CONDITIONAL_CHANGE_EVENT() function that could help you there ...
Everytime the expression between the brackets changes, an integer is incremented by 1. This gives you a new group identifier, or a criterion to PARTITION BY, every time the flag changes. So one SELECT to get the grouping info, and then partition by the obtained grouping info. Here goes:
WITH
input(dt,flag) AS (
SELECT '2019-05-03'::DATE,0
UNION ALL SELECT '2019-05-04'::DATE,1
UNION ALL SELECT '2019-05-05'::DATE,1
UNION ALL SELECT '2019-05-06'::DATE,1
UNION ALL SELECT '2019-05-07'::DATE,1
UNION ALL SELECT '2019-05-08'::DATE,0
UNION ALL SELECT '2019-05-09'::DATE,0
UNION ALL SELECT '2019-05-10'::DATE,0
UNION ALL SELECT '2019-05-11'::DATE,1
UNION ALL SELECT '2019-05-12'::DATE,1
UNION ALL SELECT '2019-05-13'::DATE,0
UNION ALL SELECT '2019-05-14'::DATE,0
UNION ALL SELECT '2019-05-15'::DATE,1
)
,
grp_input AS (
SELECT
*
, CONDITIONAL_CHANGE_EVENT(flag) OVER(ORDER BY dt) AS grp
FROM input
)
SELECT
dt
, flag
, CASE FLAG
WHEN 0 THEN 0
ELSE ROW_NUMBER() OVER(PARTITION BY grp ORDER BY dt)
END AS counter
FROM grp_input;
-- out dt | flag | counter
-- out ------------+------+---------
-- out 2019-05-03 | 0 | 0
-- out 2019-05-04 | 1 | 1
-- out 2019-05-05 | 1 | 2
-- out 2019-05-06 | 1 | 3
-- out 2019-05-07 | 1 | 4
-- out 2019-05-08 | 0 | 0
-- out 2019-05-09 | 0 | 0
-- out 2019-05-10 | 0 | 0
-- out 2019-05-11 | 1 | 1
-- out 2019-05-12 | 1 | 2
-- out 2019-05-13 | 0 | 0
-- out 2019-05-14 | 0 | 0
-- out 2019-05-15 | 1 | 1
-- out (13 rows)
-- out

Calculating Percentages in Postgres

I'm completely new to PostgreSQL. I have the following table called my_table:
a b c date
1 0 good 2019-05-02
0 1 good 2019-05-02
1 1 bad 2019-05-02
1 1 good 2019-05-02
1 0 bad 2019-05-01
0 1 good 2019-05-01
1 1 bad 2019-05-01
0 0 bad 2019-05-01
I want to calculate the percentage of 'good' from column c for each date. I know how to get the number of 'good':
SELECT COUNT(c), date FROM my_table WHERE c != 'bad' GROUP BY date;
That returns:
count date
3 2019-05-02
1 2019-05-01
My goal is to get this:
date perc_good
2019-05-02 25
2019-05-01 75
So I tried the following:
SELECT date,
(SELECT COUNT(c)
FROM my_table
WHERE c != 'bad'
GROUP BY date) / COUNT(c) * 100 as perc_good
FROM my_table
GROUP BY date;
And I get an error saying
more than one row returned by a subquery used as an expression.
I found this answer but not sure how to or if it applies to my case:
Calculating percentage in PostgreSql
How do I go about calculating the percentage for multiple rows?
avg() is convenient for this purpose:
select date,
avg( (c = 'good')::int ) * 100 as percent_good
from t
group by date
order by date;
How does this work? c = 'good' is a boolean expression. The ::int converts it to a number, with 1 for true and 0 for false. The average is then the average of a bunch of 1s and 0s -- and is the ratio of the true values.
For this case you need to use conditional AVG():
SELECT
date,
100 * avg(case when c = 'good' then 1 else 0 end) perc_good
FROM my_table
GROUP BY date;
See the demo.
You could use a conditional sum for get the good value and count for total
below an exaustive code sample
select date
, count(c) total
, sum(case when c='good' then 1 else 0 end) total_good
, sum(case when c='bad' then 1 else 0 end) total_bad
, (sum(case when c='good' then 1 else 0 end) / count(c))* 100 perc_good
, (sum(case when c='bad' then 1 else 0 end) / count(c))* 100 perc_bad
from my_table
group by date
and for your result
select date
, (sum(case when c='good' then 1 else 0 end) / count(c))* 100 perc_good
from my_table
group by date
or as suggested by a_horse_with_no_name using count(*) filter()
select date
, ((count(*) filter(where c='good'))/count(*))* 100 perc_good
from my_table
group by date

How to get running total from consecutive columns in Oracle SQL

I have troubles to display consecutive holidays from an existing date dataset in Oracle SQL. For example, in December 2017 between 20th and 30th, there are the following days off (because Christmas and weekend days):
23.12.2017 Saturday
24.12.2017 Sunday
25.12.2017 Christmas
30.12.2017 Saturday
Now I want my result dataset to look like this (RUNTOT is needed):
DAT ISOFF RUNTOT
20.12.2017 0 0
21.12.2017 0 0
22.12.2017 0 0
23.12.2017 1 1
24.12.2017 1 2
25.12.2017 1 3
26.12.2017 0 0
27.12.2017 0 0
28.12.2017 0 0
29.12.2017 0 0
30.12.2017 1 1
That means when "ISOFF" changes I want to count (or sum) the consecutive rows where "ISOFF" is 1.
I tried to approach a solution with an analytic function, where I summarize the "ISOFF" to the current row.
SELECT DAT,
ISOFF,
SUM (ISOFF)
OVER (ORDER BY DAT ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RUNTOT
FROM (TIME_DATASET)
WHERE DAT BETWEEN DATE '2017-12-20' AND DATE '2017-12-27'
ORDER BY 1
What I get now is following dataset:
DAT ISOFF RUNTOT
20.12.2017 0 0
21.12.2017 0 0
22.12.2017 0 0
23.12.2017 1 1
24.12.2017 1 2
25.12.2017 1 3
26.12.2017 0 3
27.12.2017 0 3
28.12.2017 0 3
29.12.2017 0 3
30.12.2017 1 4
How can I reset the running total if ISOFF changes to 0? Or is this the wrong approach to solve this problem?
Thank you for your help!
This is a gaps-and-islands problem. Here is one method that assigns the groups by the number of 0s up to that row:
select t.*,
(case when is_off = 1
then row_number() over (partition by grp order by dat)
end) as runtot
from (select t.*,
sum(case when is_off = 0 then 1 else 0 end) over (order by dat) as grp
from TIME_DATASET t
) t;
You may use the recursive recursive subquery factoring - the precondition is, that your dates are consecutive without gaps (or you have some oder row number sequence to follow in steps of one).
WITH t1(dat, isoff, runtot) AS (
SELECT dat, isoff, 0 runtot
FROM tab
WHERE DAT = DATE'2017-12-20'
UNION ALL
SELECT t2.dat, t2.isoff,
case when t2.isoff = 0 then 0 else runtot + t2.isoff end as runtot
FROM tab t2, t1
WHERE t2.dat = t1.dat + 1
)
SELECT dat, isoff, runtot
FROM t1;
DAT ISOFF RUNTOT
------------------- ---------- ----------
20.12.2017 00:00:00 0 0
21.12.2017 00:00:00 0 0
22.12.2017 00:00:00 0 0
23.12.2017 00:00:00 1 1
24.12.2017 00:00:00 1 2
25.12.2017 00:00:00 1 3
26.12.2017 00:00:00 0 0
27.12.2017 00:00:00 0 0
28.12.2017 00:00:00 0 0
29.12.2017 00:00:00 0 0
30.12.2017 00:00:00 1 1
Another variation, which doesn't need a subquery or CTE but does need all days to be present and have the same time, is - for the holiday dates only (where isoff = 1) - to see how many days it's been since the last non-holiday date:
select dat,
isoff,
case
when isoff = 1 then
coalesce(dat - max(case when isoff = 0 then dat end)
over (order by dat range between unbounded preceding and 1 preceding), 1)
else 0
end as runtot
from time_dataset
order by dat;
DAT ISOFF RUNTOT
---------- ---------- ----------
2017-12-20 0 0
2017-12-21 0 0
2017-12-22 0 0
2017-12-23 1 1
2017-12-24 1 2
2017-12-25 1 3
2017-12-26 0 0
2017-12-27 0 0
2017-12-28 0 0
2017-12-29 0 0
2017-12-30 1 1
The coalesce() is there in case the first date in the range is a holiday - as there is no previous non-holiday date to compare against, that subtraction would get null.
db<>fiddle with a slightly larger data set.