SQL find consecutive days of specific threshold reached - sql

I have two columns; the_day and amount_raised. I want to find the count of consecutive days that at least 1 million dollars was raised. Am I able to do this in SQL? Ideally, I'd like to create a column that counts the consecutive days and then starts over if the 1 million dollar threshold is not reached.
What I've done thus far is create a third column that puts a 1 in the row if 1 million was reached. Could I create a subquery and count the consecutive 1's listed, then reset when it hits 0?
and here is the desired output

select dt,amt,
case when amt>=1000000 then -1+row_number() over(partition by col order by dt)
else 0 end col1
from (select *, sum(case when amt >= 1000000 then 0 else 1 end) over(order by dt) col
from t) x
Sample Demo

SELECT the_day,
amount_raised,
million_threshold,
CASE WHEN million_threshold <> lag_million_threshold AND million_threshold = lead_million_threshold
THEN 1
WHEN million_threshold = lag_million_threshold
THEN SUM(million_threshold) OVER ( ORDER BY the_day ROWS UNBOUNDED PRECEDING )
ELSE 0
END AS consecutive_day_cnt
FROM
(
SELECT the_day,
amount_raised,
million_threshold,
LAG(million_threshold,1) OVER ( ORDER BY the_day ) AS lag_million_threshold,
LEAD(million_threshold,1) OVER ( ORDER BY the_day ) AS lead_million_threshold
FROM
(
SELECT the_day,
amount_raised,
CASE WHEN amount_raised >= 1000000
THEN 1
ELSE 0
END AS million_threshold
FROM Yourtable
)
);

Related

Current and previous days date diff in days with some condition

I have the first three fields of the following table. I want to compute the number of consecutive days an amount was higher than 0 ("days" field).
key
date
amount
days
1
2023-01-23
0
0
1
2023-01-22
10
2
1
2023-01-21
20
1
1
2023-01-20
0
0
1
2023-01-19
0
0
1
2023-01-18
0
0
1
2023-01-17
3
1
1
2023-01-16
0
0
I have tried with some windows function using this link. Did not add and reset to 1 if the previous amount is 0.
My code:
case when f.amount > 0
then SUM ( DATE_PART('day',
date::text::timestamp - previou_bus_date::text::timestamp )
) OVER (partition by f.key
ORDER BY f.date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW )
else 0
end as days
Another option, you could use the difference between two row_numbers approach as the following:
select key, date, amount,
sum(case when amount > 0 then 1 else 0 end) over
(partition by key, grp, case when amount > 0 then 1 else 0 end order by date) days
from
(
select *,
row_number() over (partition by key order by date) -
row_number() over (partition by key, case when amount > 0 then 1 else 0 end order by date) grp
from table_name
) T
order by date desc
See demo
This problem falls into the gaps-and-islands kind of problem, as long as you need to compute consecutive values of non-null amounts.
You can reliably solve this problem in 3 steps:
flagging when there's a change of partition, by using 1 when current amount > 0 and previous amount = 0
compute a running sum (with SUM) on flags generated at step 1, to create your partitioning, which to observe the number of consecutive values on
compute a ranking (with ROW_NUMBER) to rank your non-null consecutive amounts in each partition generated at step 2
WITH cte AS (
SELECT *,
CASE WHEN amount > 0
AND LAG(amount) OVER(PARTITION BY key_ ORDER BY date_) = 0
THEN 1
END AS change_part
FROM tab
), cte2 AS (
SELECT *,
SUM(change_part) OVER(PARTITION BY key_ ORDER BY date_) AS parts
FROM cte
)
SELECT key_, date_, amount,
CASE WHEN amount > 0
THEN ROW_NUMBER() OVER(PARTITION BY key_, parts ORDER BY date_)
ELSE 0
END AS days
FROM cte2
ORDER BY date_ DESC
Check the demo here.
Note: This is not the most performant solution, although I'm leaving it for reference to the next part (missing consecutive dates). #Ahmed's answer is more likely to work better in this case.
If your data should ever have holes in dates (some missing records, making the consecutiveness of amounts no-more valid), you should add a further condition in Step 1, where you create the flag for changing partition.
The partition should change:
either if when current amount > 0 and previous amount = 0
or if current date is greater than previous date + 1 day (consecutive dates are not consecutive in time)
WITH cte AS (
SELECT *,
CASE WHEN (amount > 0
AND LAG(amount) OVER(PARTITION BY key_ ORDER BY date_) = 0)
OR date_ > LAG(date_) OVER(PARTITION BY key_ ORDER BY date_)
+ INTERVAL '1 day'
THEN 1
END AS change_part
FROM tab
), cte2 AS (
...
Check the demo here.

CASE WHEN condition with MAX() function

There are a lot questions on CASE WHEN topic, but the closest my question is related to this How to use CASE WHEN condition with MAX() function query which has not been resolved.
Here is some of my sample data:
date
debet
2022-07-15
57190.33
2022-07-14
815616516.00
2022-07-15
40866.67
2022-07-14
1221510.00
So, I want to all records for the last two dates and three additional columns: sum(sales) for the previous day, sum for the current day and the difference between them:
SELECT
[debet],
[date] ,
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END ) AS sum_act,
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END ) AS sum_prev ,
(
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END )
-
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END )
) AS diff
FROM
Table
WHERE
[date] = ( SELECT MAX(date) FROM Table WHERE date < ( SELECT MAX(date) FROM Table) )
OR
[date] = ( SELECT MAX(date) FROM Table WHERE date = ( SELECT MAX(date) FROM Table ) )
GROUP BY
[date],
[debet]
Further, of course, it informs that I can't use the aggregate function inside CASE WHEN. Now I use this combination: sum(CASE WHEN [date] = dateadd(dd,-3,cast(getdate() as date)) THEN [debet] ELSE 0 END). But here every time I need to make an adjustment for weekends and holidays. The question is, is there any other way than using 'getdate' in 'case when' Statement to get max date?
Expected result:
date
sum_act
sum_prev
diff
2022-07-15
97190.33
0.00
97190.33
2022-07-14
0.00
508769.96
-508769.96
You can use dense_rank() to filter the last 2 dates in your table. After that you can use either conditional case expression with sum() to calculate the required value
select [date],
sum_act = sum(case when rn = 1 then [debet] else 0 end),
sum_prev = sum(case when rn = 2 then [debet] else 0 end),
diff = sum(case when rn = 1 then [debet] else 0 end)
- sum(case when rn = 2 then [debet] else 0 end)
from
(
select *, rn = dense_rank() over (order by [date] desc)
from tbl
) t
where rn <= 2
group by [date]
db<>fiddle demo
Two steps:
Get the sums for the last three dates
Show the results for the last two dates.
Well, we could also get all daily sums in step 1, but we just need the last three in order to calculate the sums for the last two days, so why aggregate more data than necessary?
Here is the query. You may have to put the date column name in brackets in SQL Server, as date is a keyword in SQL.
select top(2)
date,
sum_debit_current,
sum_debit_previous,
sum_debit_current - sum_debit_previous as diff
(
select
date,
sum(debet) as sum_debit_current,
lag(sum(debet)) over (order by date) as sum_debit_previous
from table
where date in (select distinct top(3) date from table order by date desc)
group by date
)
order by date desc;
(SQL Server uses TOP(n) instead of standard SQL FETCH FIRST 3 ROWS and while SELECT DISTINCT TOP(3) date looks like "get the top 3 rows, then apply distinct on their date", it is really "apply distinct on the dates, then get the top 3" like in standard SQL.)

SQL for begin and end of data rows

I've got the following table:
and I was wondering if there is an SQL query, which would give me the begin and end Calender week (CW), where the value is greater than 0.
So in the case of the table above, a result like below:
Thanks in advance!
You can assign a group by counting the number of zeros and then aggregating:
select article_nr, min(year), max(year)
from (select t.*,
sum(case when amount = 0 then 1 else 0 end) over (partition by article_nr order by year) as grp
from t
) t
where amount > 0
group by article_nr, grp;
select Atricle_Nr, min(Year&CW) as 'Begin(Year&CW)',max(Year&CW) as 'End(Year&CW)'
from table where Amount>0 group by Atricle_Nr;

Working out a rolling average where you don't always divide by the total count

I am trying to work out a rolling average for batsmen in cricket. Anyone who knows the sport will know that the average is worked out as runs scored / innings, unless a batsmen is not out. If a batsmen plays 2 innings, and is 'not out' in 1 of those, their average would be worked out as
runs scored innings 1 + runs scored innings 2 / 1
If they were out in both innings, the calculation would be
runs scored innings 1 + runs scored innings 2 / 2
This is easy enough to work for an overall average, however I would like to calculate this as a running average. I have done this before using a loop and calculating the average for each row individually, but can anyone suggest a way to do this using any built in functions?
Current code example:
with cte as (
select
Innings_Player,
Innings_Runs_Scored,
Innings_Date,
CASE WHEN Innings_Runs_Scored = "DNB" THEN null WHEN Innings_Runs_Scored LIKE "%*%" THEN REPLACE(Innings_Runs_Scored,"*","") ELSE Innings_Runs_Scored END AS RunsNum,
CASE WHEN Innings_Runs_Scored LIKE "%*%" THEN 1 ELSE 0 END AS NotOutFlag,
ROW_NUMBER() OVER (PARTITION BY Innings_Player ORDER BY Innings_Date) as RN
from TABLE
where Innings_Player = "JE Root"
AND Innings_Runs_Scored IS NOT NULL
ORDER BY Innings_Date
)
,cte2 as
(
select
*,
SUM(CAST(RunsNum AS INT64)) OVER (PARTITION BY Innings_Player ORDER BY RN) AS RunningTotal,
AVG(CAST(RunsNum AS INT64)) OVER (PARTITION BY Innings_Player ORDER BY RN) AS RunningAvg,
from cte
where runsNum IS NOT NULL AND runsNum <> "TDNB"
)
select * from cte2
Resulting dataset:
So, the average is not correct. For the rolling average, the calcuation for row three should be innings_run_scored for the first three rows, divided by 2 rather than 3, as you can see from the NotOutFlag, that the 3 innings in the list was a not out.
Similarly, row 4 should be divided by 3, row 5 by 4, and then as row 6 was a not out as well, row 6 should be divided by 4, row 7 by 5 etc etc. I think the equation would be
Innings_Run_Scored / Innings - Not Out Count
AVG is basically a SUM / COUNT Since you want to alter the COUNT portion I would suggest forgoing the use of the AVG function. You can count using a SUM with CASE to count only the cases where NotOutFlag is 0
so the line
AVG(CAST(RunsNum AS INT64)) OVER (PARTITION BY Innings_Player ORDER BY RN) AS RunningAvg,
would become
SUM(CAST(RunsNum AS INT64)) OVER (PARTITION BY Innings_Player ORDER BY RN)
/ SUM(CASE WHEN NotOutFlag = 0 THEN 1 ELSE 0 END) OVER (PARTITION BY Innings_Player ORDER BY RN)
AS RunningAvg,
Of course you will need to add some more logic to avoid division by 0.
CASE WHEN SUM(CASE WHEN NotOutFlag = 0 THEN 1 ELSE 0 END) OVER (PARTITION BY Innings_Player ORDER BY RN) = 0
THEN 0
ELSE
SUM(CAST(RunsNum AS INT64)) OVER (PARTITION BY Innings_Player ORDER BY RN)
/
SUM(CASE WHEN NotOutFlag = 0 THEN 1 ELSE 0 END) OVER (PARTITION BY Innings_Player ORDER BY RN)
END AS RunningAvg,

select first non-null row with minimum date (Big Query)

I want to select the first non-null row with the minimum date. I'll like to use a CASE WHEN that condition is met, then 1 ELSE 0.
So more like CASE WHEN row IS NOT and DATE is minimum DATE then 1 ELSE 0. I just need to select ONLY one row.
Another option (for BigQuery Standard SQL)
#standardSQL
SELECT *, 0 AS marker FROM `project.dataset.table` WHERE item_count IS NULL
UNION ALL
SELECT *, IF(1 = ROW_NUMBER() OVER(PARTITION BY user ORDER BY date), 1, 0)
FROM `project.dataset.table` WHERE NOT item_count IS NULL
ORDER BY user, date
Consider:
select
t.*
case when date = min(case when itemcount is not null then date end) over(partition by user order by date)
then 1
else 0
end as marker
from mytable t
I am unsure whether BigQuery supports minif() as a window function:
select
t.*
case when date = minif(date, itemcount is not null) over(partition by user order by date)
then 1
else 0
end as marker
from mytable