Reset rolling sum to 0 after reaching the threshold - sql

I'm trying to compute a running total and reset it to 0 based on 2 conditions or if the limit is reached.
Here is an example.
As in the image above, I need to get the running total while the following conditions are met:
monthly discount = 0 and monthly ticket=1
If one of discount=1 and ticket=0, the next value for running total has to be 0.
running_total<50
If running total>=50, the value for running total has to start from the value on the same row.
Here is what I'm trying to do now:
Is there any possibility to do this in HIVE? Thank you so much!!!
SELECT * ,
SUM(tag_flg) OVER (PARTITION BY account, flg_sum
ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS running_sum
FROM
( SELECT * ,
SUM(CASE
WHEN tag_flg>=50 THEN value
ELSE tag_flg
END) OVER (PARTITION BY account
ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS flg_sum
FROM
( SELECT * ,
CASE
WHEN month_disc =0
AND month_ticket = 1 THEN value
ELSE 0
END AS tag_flg
FROM source_table) x) y

Do the 40, 60 and 20 that aren't being accounted for matter at all in your report? Like would you want them to be counted then a new row added with a total of 0 to restart?

Here is the way I managed to do it:
SELECT *,
SUM(case when month_disc=1 OR month_ticket=0 then 0 else value end) OVER (PARTITION BY account, flg_sum, band_sum ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM (
SELECT *,
FLOOR(SUM(case when month_disc=1 OR month_ticket=0 then 0 else value end) OVER (PARTITION BY account, flg_sum ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)/50.000001) as band_sum ---- create bands for running total
FROM (
SELECT *,
SUM(tag_flg) OVER (PARTITION BY account ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flg_sum
FROM (
SELECT *,
CASE WHEN (month_disc=1 OR month_ticket=0) THEN 1 ELSE 0 END AS tag_flg ---- flag to count when the value is reset due to one of the conditions
FROM source_table) x ) y) z

Related

How to calculate total 6 months in Oracle SQL?

I try this SQL statement:
SELECT
custodycd,
SUM(mramt) mramt_6_month,
txdate,
CASE
WHEN LAG(mramt, 6) OVER (ORDER BY txdate) IS NOT NULL
THEN SUM(mramt) OVER (ORDER BY txdate ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING)
ELSE NULL
END AS mramt_6_month_1
FROM
(SELECT
MAX(mramt) mramt,
t.afacctno,
t.custodycd,
t.txdate
FROM
tbl_mr3007_log t
WHERE
txdate >= '30/nov/2020'
AND mramt <> 0
GROUP BY
t.afacctno,
t.txdate,
t.custodycd)
GROUP BY
custodycd,
txdate
ORDER BY
txdate
and I get an error:
ORA-00979: not a GROUP BY expression
Thanks for your help
The error message seems pretty clear. However, what you want to do is not clear.
I am guessing that for each custodycd you have multiple rows by date. Starting at the seventh row, you want the sum of the previous six rows.
If so, then the code looks like:
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY custodycd ORDER BY txdate) > 6
THEN SUM(SUM(mramt)) OVER (PARTITION BY custodycd
ORDER BY txdate
ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
)
END) AS mramt_6_month_1

Retain function in SQL

Have scenario where need to retain values based on condition.
Assign "Change date" on "AA start Date" then check for "Change in Duration".
If "change in Duration" < 60 then 1st Change date will be assigned till next Change in duration > 60
then retain new change date in "AA start date". Sample is given below.
"AA_START_DATE" is final column which I am looking for.
I think this is a type of gap and islands problem, where you want to remember the first date in sequence that is 60+ days from the previous date.
You can handle this by using lag() to get the previous date. Then use a cumulative conditional maximum to get the time when the most recent change occurred:
select t.*,
max(case when change_date > date_add(change_date, interval -60 day) then null else change_date
end) over (partition by cn, aa_code
order by change_date
) as aa_start_date
from (select t.*,
lag(change_date) over (partition by cn, aa_code order by change_date) as prev_change_date
from t
) t
I have achieved this as follows: Any alternate way is appreciable
(select CIN,AA_CODE,EXT_AA_MESSAGE,CHG_DATE,EXP_DATE,PREV_CHG_DATE,CHG_DATE_DUR,AA_START_DTE,
sum(case when AA_START_DTE is null then 0 else 1 end) over (partition by CIN,AA_CODE order by CHG_DATE) as value_partition from (select CIN,AA_CODE,EXT_AA_MESSAGE,CHG_DATE,EXP_DATE,
case when CIN_AA_CODE_FIRST = 1 then NULL else PREV_CHG_DATE end as PREV_CHG_DATE,
case when CIN_AA_CODE_FIRST <> 1 then date_diff(CHG_DATE,PREV_CHG_DATE,DAY) end as CHG_DATE_DUR,
case when CIN_AA_CODE_FIRST = 1 or (case when CIN_AA_CODE_FIRST <> 1 then date_diff(CHG_DATE,PREV_CHG_DATE,DAY) end ) > 60 then CHG_DATE
end as AA_START_DTE,
from
( select CIN,AA_CODE,EXT_AA_MESSAGE,CHG_DATE,EXP_DATE, LAG(CHG_DATE) OVER (PARTITION BY CIN,AA_CODE ORDER BY CHG_DATE ASC) as PREV_CHG_DATE,
rank() OVER (PARTITION BY CIN,AA_CODE ORDER BY CHG_DATE ASC) AS CIN_AA_CODE_FIRST from TABLE )))`

How to return all the rows in the yellow census blocks?

Hey the schema is like this: for the whole dataset, we should order by machine_id first, then order by ss2k. after that, for each machine, we should find all the rows with at least consecutively 5 flag = 'census'. In this dataset, the result should be all the yellow rows..
I cannot return the last 4 rows of the yellow blocks by using this:
drop table if exists qz_panel_census_228_rank;
create table qz_panel_census_228_rank as
select t.*
from (select t.*,
count(*) filter (where flag = 'census') over (partition by machine_id, date order by ss2k rows between current row and 4 following) as census_cnt5,
count(*) filter (where flag = 'census') over (partition by machine_id, date) as count_census,
row_number() over (partition by machine_id, date order by ss2k) as seqnum,
count(*) over (partition by machine_id, date) as cnt
from qz_panel_census_228 t
) t
where census_cnt5 = 5
group by 1,2,3,4,5,6,7,8,9,10,11
DISTRIBUTED BY (machine_id);
You were close, but you need to search in both directions:
select t.*
from (select t.*,
case when count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between 4 preceding and current row) = 5
or count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between current row and 4 following) = 5
then 1
else 0
end as flag
from qz_panel_census_228 t
) t
where flag = 1
Edit:
This approach will not work unless you add an extra count for each possible 5 row window, e.g. 3 preceding and 1 following, 2 preceding and 2 following, etc. This results in ugly code and is not very flexible.
The common way to solve this gaps & islands problem is to assign consecutive rows to a common group first:
select *
from
(
select t2.*,
count(*) over (partition by machine_id, date, grp) as cnt
from
(
select t1.*
from (select t.*,
-- keep the same number for 'census' rows
sum(case when flag = 'census' then 0 else 1 end)
over (partition by machine_id, date
order by ss2k
rows unbounded preceding) as grp
from qz_panel_census_228 t
) t1
where flag = 'census' -- only census rows
) as t2
) t3
where cnt >= 5 -- only groups of at least 5 census rows
Wow, there has to be a better way of doing this, but the only way I could figure out was to create blocks of consecutive 'census' values. This looks awful but might be a catalyst to a better idea.
with q1 as (
select
machine_id, recorded, ss2k, flag, date,
case
when flag = 'census' and
lag (flag) over (order by machine_id, ss2k) != 'census'
then 1
else 0
end as block
from foo
),
q2 as (
select
machine_id, recorded, ss2k, flag, date,
sum (block) over (order by machine_id, ss2k) as group_id,
case when flag = 'census' then 1 else 0 end as census
from q1
),
q3 as (
select
machine_id, recorded, ss2k, flag, date, group_id,
sum (census) over (partition by group_id order by ss2k) as max_count
from q2
),
groups as (
select group_id
from q3
group by group_id
having max (max_count) >= 5
)
select
q2.machine_id, q2.recorded, q2.ss2k, q2.flag, q2.date
from
q2
join groups g on q2.group_id = g.group_id
where
q2.flag = 'census'
If you run each query within the with clauses in isolation, I think you will see how this evolves.

Calculate moving weather stats in PostgreSQL

I'm trying to calculate the days since last rain and the amount of rain in that event for each day in my PostgreSQL table of weather data. I've been trying to achieve this with window functions but the limitation of ranges having to be unbounded has left me a bit stuck on how to proceed.
Here's the query I have so far:
SELECT
station_num,
ob_date,
rain,
max(rain) OVER (PARTITION BY station_num ORDER BY ob_date ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as prev_rain_mm,
'' as days_since_rain --haven't attempted this calculation yet
FROM
obs_daily_ground_moisture
This results in the following:
but I'm trying to achieve something more like this:
I feel like all the pieces are there in regards to window functions range & filter and nested queries but I'm not sure how to pull it all together. Also the above data is just a subset of the actual dataset, the entire dataset is just over half a million rows.
The key here is to group the observations starting from the first occurrence of rain>0 value to the next occurrence of rain>0 value. Thereafter you can use window functions to calculate the needed columns.
select
x.station_num,
x.ob_date,
max(rain) over(partition by station_num,col) prev_rain,
case when rain > 0 then 0
else row_number() over(partition by station_num, col order by ob_date)-1 end days_since_rain
from (select t.*,
sum(case when rain > 0 then 1 else 0 end) over(partition by station_num order by ob_date) col
from t) x
Sample Demo
try this.
DECLARE #Rain AS FLOAT
UPDATE A
SET
#Rain = CASE WHEN A.Rain = 0 THEN #Rain ELSE A.Rain END,
A.Rain = CASE WHEN #Rain IS NULL OR A.Rain <> 0 THEN A.Rain ELSE #Rain END
FROM obs_daily_ground_moisture A
SELECT ob_date, Rain,
max(rain) OVER (PARTITION BY station_num ORDER BY ob_date ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as prev_rain_mm,
ROW_NUMBER() OVER(PARTITION BY Rain ORDER BY ob_date) - 1 as days_since_rain
FROM obs_daily_ground_moisture ORDER BY ob_date

Query for getting previous date in oracle in specific scenario

I have the below data in a table A which I need to insert into table B along with one computed column.
TABLE A:
Account_No | Balance | As_on_date
1001 |-100 | 1-Jan-2013
1001 |-150 | 2-Jan-2013
1001 | 200 | 3-Jan-2013
1001 |-250 | 4-Jan-2013
1001 |-300 | 5-Jan-2013
1001 |-310 | 6-Jan-2013
Table B:
In table B, there should be no of days to be shown when balance is negative and
the date one which it has gone into negative.
So, for 6-Jan-2013, this table should show below data:
Account_No | Balance | As_on_date | Days_passed | Start_date
1001 | -310 | 6-Jan-2013 | 3 | 4-Jan-2013
Here, no of days should be the days when the balance has gone negative in recent time and
not from the old entry.
I need to write a SQL query to get the no of days passed and the start date from when the
balance has gone negative.
I tried to formulate a query using Lag analytical function, but I am not succeeding.
How should I check the first instance of negative balance by traversing back using LAG function?
Even the first_value function was given a try but not getting how to partition in it based on negative value.
Any help or direction on this will be really helpful.
Here's a way to achive this using analytical functions.
INSERT INTO tableb
WITH tablea_grouped1
AS (SELECT account_no,
balance,
as_on_date,
SUM (CASE WHEN balance >= 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY account_no ORDER BY as_on_date)
grp
FROM tablea),
tablea_grouped2
AS (SELECT account_no,
balance,
as_on_date,
grp,
LAST_VALUE (
balance)
OVER (
PARTITION BY account_no, grp
ORDER BY as_on_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
closing_balance
FROM tablea_grouped1
WHERE balance < 0
AND grp != 0 --keep this, if starting negative balance is to be ignored
)
SELECT account_no,
closing_balance,
MAX (as_on_date),
MAX (as_on_date) - MIN (as_on_date) + 1,
MIN (as_on_date)
FROM tablea_grouped2
GROUP BY account_no, grp, closing_balance
ORDER BY account_no, MIN (as_on_date);
First, SUM is used as analytical function to assign group number to consecutive balances less than 0.
LAST_VALUE function is then used to find the last -ve balance in each group
Finally, the result is aggregated based on each group. MAX(date) gives the last date, MIN(date) gives the starting date, and the difference of the two gives number of days.
Demo at sqlfiddle.
Try this and use gone_negative to computing specified column value for insert into another table:
select temp.account_no,
temp.balance,
temp.prev_balance,
temp.on_date,
temp.prev_on_date,
case
WHEN (temp.balance < 0 and temp.prev_balance >= 0) THEN
1
else
0
end as gone_negative
from (select account_no,
balance,
on_date,
lag(balance, 1, 0) OVER(partition by account_no ORDER BY account_no) prev_balance,
lag(on_date, 1) OVER(partition by account_no ORDER BY account_no) prev_on_date
from tblA
order by account_no) temp;
Hope this helps pal.
Here's on way to do it.
Select all records from my_table where the balance is positive.
Do a self-join and get all the records that have a as_on_date is greater than the current row, but the amounts are in negative
Once we get these, we cut-off the rows WHERE the date difference between the current and the previous row for as_on_date is > 1. We then filter the results a outer sub query
The Final select just groups the rows and gets the min, max values for the filtered rows which are grouped.
Query:
SELECT
account_no,
min(case when row_number = 1 then balance end) as balance,
max(mt2_date) as As_on_date,
max(mt2_date) - mt1_date as Days_passed,
min(mt2_date) as Start_date
FROM
(
SELECT
*,
MIN(break_date) OVER( PARTITION BY mt1_date ) AS min_break_date,
ROW_NUMBER() OVER( PARTITION BY mt1_date ORDER BY mt2_date desc ) AS row_number
FROM
(
SELECT
mt1.account_no,
mt2.balance,
mt1.as_on_date as mt1_date,
mt2.as_on_date as mt2_date,
case when mt2.as_on_date - lag(mt2.as_on_date,1) over () > 1 then mt2.as_on_date end as break_date
FROM
my_table mt1
JOIN my_table mt2 ON ( mt2.balance < mt1.balance AND mt2.as_on_date > mt1.as_on_date )
WHERE
MT1.balance > 0
order by
mt1.as_on_date,
mt2.as_on_date ) sub_query
) T
WHERE
min_break_date is null
OR mt2_date < min_break_date
GROUP BY
mt1_date,
account_no
SQLFIDDLE
I have a added a few more rows in the FIDDLE, just to test it out