Retain function in SQL - sql

Have scenario where need to retain values based on condition.
Assign "Change date" on "AA start Date" then check for "Change in Duration".
If "change in Duration" < 60 then 1st Change date will be assigned till next Change in duration > 60
then retain new change date in "AA start date". Sample is given below.
"AA_START_DATE" is final column which I am looking for.

I think this is a type of gap and islands problem, where you want to remember the first date in sequence that is 60+ days from the previous date.
You can handle this by using lag() to get the previous date. Then use a cumulative conditional maximum to get the time when the most recent change occurred:
select t.*,
max(case when change_date > date_add(change_date, interval -60 day) then null else change_date
end) over (partition by cn, aa_code
order by change_date
) as aa_start_date
from (select t.*,
lag(change_date) over (partition by cn, aa_code order by change_date) as prev_change_date
from t
) t

I have achieved this as follows: Any alternate way is appreciable
(select CIN,AA_CODE,EXT_AA_MESSAGE,CHG_DATE,EXP_DATE,PREV_CHG_DATE,CHG_DATE_DUR,AA_START_DTE,
sum(case when AA_START_DTE is null then 0 else 1 end) over (partition by CIN,AA_CODE order by CHG_DATE) as value_partition from (select CIN,AA_CODE,EXT_AA_MESSAGE,CHG_DATE,EXP_DATE,
case when CIN_AA_CODE_FIRST = 1 then NULL else PREV_CHG_DATE end as PREV_CHG_DATE,
case when CIN_AA_CODE_FIRST <> 1 then date_diff(CHG_DATE,PREV_CHG_DATE,DAY) end as CHG_DATE_DUR,
case when CIN_AA_CODE_FIRST = 1 or (case when CIN_AA_CODE_FIRST <> 1 then date_diff(CHG_DATE,PREV_CHG_DATE,DAY) end ) > 60 then CHG_DATE
end as AA_START_DTE,
from
( select CIN,AA_CODE,EXT_AA_MESSAGE,CHG_DATE,EXP_DATE, LAG(CHG_DATE) OVER (PARTITION BY CIN,AA_CODE ORDER BY CHG_DATE ASC) as PREV_CHG_DATE,
rank() OVER (PARTITION BY CIN,AA_CODE ORDER BY CHG_DATE ASC) AS CIN_AA_CODE_FIRST from TABLE )))`

Related

Create partitions based on column values in sql

I am very new to sql and query writing and after alot of trying, I am asking for help.
As shown in the picture, I want to create partition of data based on is_late = 1 and show its count (that is 2) but at the same time want to capture the value of last_status where is_late = 0 to be displayed in the single row.
The task is to calculate how many time the rider was late and time taken by him from first occurrence of estimated time to the last_status.
Desired output:
You can use following query
SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
last_status,
task_count,
CONVERT(VARCHAR(5), DATEADD(MINUTE, DATEDIFF(MINUTE, expected_time_to_arrive, last_status), 0), 114) AS time_delayed
FROM
(SELECT
rider_id,
task_created_time,
expected_time_to_arrive,
is_late,
SUM(CASE WHEN is_late = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY rider_id ORDER BY rider_id) AS task_count,
ROW_NUMBER() OVER(PARTITION BY rider_id ORDER BY rider_id) AS num,
MAX(last_status) OVER(PARTITION BY rider_id ORDER BY rider_id) AS last_status
FROM myTestTable) t
WHERE num = 1
db<>fiddle

t-sql repeat row numbers within group

I need to create an ID for every time a name changes in the task history.
The rank needs to do restart with each task and step.
The closest I got to my goal is using the code below.
But it does not produce correct result for when a person appears again in the historical list of actions.
DENSE_RANK() OVER (ORDER BY TaskName, Person)
Thanks in advance
You can use lag() to see where a person changes. Then use a cumulative sum:
select t.*,
sum(case when prev_person = person then 0 else 1 end) over
(partition by task_name order by timestamp) as desired_output
from (select t.*,
lag(person) over (partition by task_name order by timestamp) as prev_person
from t
) t ;
Note: I am interpreting your question as your wanting the numbers separately for each task ("every time a name changes in the task history").
EDIT:
Based on your comment:
select t.*,
sum(case when prev_person = person and prev_stop_name = step_name then 0 else 1 end) over
(partition by task_name order by timestamp) as desired_output
from (select t.*,
lag(person) over (partition by task_name order by timestamp) as prev_person,
lag(step_name) over (partition by task_name order by timestamp) as prev_step_name
from t
) t ;

SQL calculation with previous row + current row

I want to make a calculation based on the excel file. I succeed to obtain 2 of the first records with LAG (as you can check on the 2nd screenshot). Im out of ideas how to proceed from now and need help. I just need the Calculation column take its previous data. I want to automatically calculate it over all the dates. I also tried to make a LAG for the calculation but manually and the result was +1 row more data instead of NULL. This is a headache.
LAG(Data ingested, 1) OVER ( ORDER BY DATE ASC ) AS LAG
You seem to want cumulative sums:
select t.*,
(sum(reconciliation + aves - microa) over (order by date) -
first_value(aves - microa) over (order by date)
) as calculation
from CalcTable t;
Here is a SQL Fiddle.
EDIT:
Based on your comment, you just need to define a group:
select t.*,
(sum(reconciliation + aves - microa) over (partition by grp order by date) -
first_value(aves - microa) over (partition by grp order by date)
) as calculation
from (select t.*,
count(nullif(reconciliation, 0)) over (order by date) as grp
from CalcTable t
) t
order by date;
Imo this could be solved using a "gaps and islands" approach. When Reconciliation>0 then create a gap. SUM(GAP) OVER converts the gaps into island groupings. In the outer query the 'sum_over' column (which corresponds to the 'Calculation') is a cumumlative sum partitioned by the island groupings.
with
gap_cte as (
select *, case when [Reconciliation]>0 then 1 else 0 end gap
from CalcTable),
grp_cte as (
select *, sum(gap) over (order by [Date]) grp
from gap_cte)
select *, sum([Reconciliation]+
(case when gap=1 then 0 else Aves end)-
(case when gap=1 then 0 else Microa end))
over (partition by grp order by [Date]) sum_over
from grp_cte;
[EDIT]
The CASE statement could be CROSS APPLY'ed instead
with
grp_cte as (
select c.*, v.gap, sum(v.gap) over (order by [Date]) grp
from #CalcTable c
cross apply (values (case when [Reconciliation]>0 then 1 else 0 end)) v(gap))
select *, sum([Reconciliation]+
(case when gap=1 then 0 else Aves end)-
(case when gap=1 then 0 else Microa end))
over (partition by grp order by [Date]) sum_over
from grp_cte;
Here is a fiddle

Reset rolling sum to 0 after reaching the threshold

I'm trying to compute a running total and reset it to 0 based on 2 conditions or if the limit is reached.
Here is an example.
As in the image above, I need to get the running total while the following conditions are met:
monthly discount = 0 and monthly ticket=1
If one of discount=1 and ticket=0, the next value for running total has to be 0.
running_total<50
If running total>=50, the value for running total has to start from the value on the same row.
Here is what I'm trying to do now:
Is there any possibility to do this in HIVE? Thank you so much!!!
SELECT * ,
SUM(tag_flg) OVER (PARTITION BY account, flg_sum
ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS running_sum
FROM
( SELECT * ,
SUM(CASE
WHEN tag_flg>=50 THEN value
ELSE tag_flg
END) OVER (PARTITION BY account
ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS flg_sum
FROM
( SELECT * ,
CASE
WHEN month_disc =0
AND month_ticket = 1 THEN value
ELSE 0
END AS tag_flg
FROM source_table) x) y
Do the 40, 60 and 20 that aren't being accounted for matter at all in your report? Like would you want them to be counted then a new row added with a total of 0 to restart?
Here is the way I managed to do it:
SELECT *,
SUM(case when month_disc=1 OR month_ticket=0 then 0 else value end) OVER (PARTITION BY account, flg_sum, band_sum ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM (
SELECT *,
FLOOR(SUM(case when month_disc=1 OR month_ticket=0 then 0 else value end) OVER (PARTITION BY account, flg_sum ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)/50.000001) as band_sum ---- create bands for running total
FROM (
SELECT *,
SUM(tag_flg) OVER (PARTITION BY account ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flg_sum
FROM (
SELECT *,
CASE WHEN (month_disc=1 OR month_ticket=0) THEN 1 ELSE 0 END AS tag_flg ---- flag to count when the value is reset due to one of the conditions
FROM source_table) x ) y) z

SQL Date intelligence: filtering data by seconds ran from last known valid result

Help! We're trying to create a new column (Is Valid?) to reproduce the logic below.
It is a binary result that:
it is 1 if it is the first known value of an ID
it is 1 if it is 3 seconds or later than the previous "1" of that ID
Note 1: this is not the difference in seconds from the previous record
It is 0 if it is less than 3 seconds than the previous "1" of that ID
Note 2: there are many IDs in the data set
Note 3: original dataset has ID and Date
Attached a PoC of the data and the expected result.
You would have to do this using a recursive CTE, which is quite expensive:
with tt as (
select t.*, row_number() over (partition by id order by time) as seqnum
from t
),
recursive cte as (
select t.*, time as grp_start
from tt
where seqnum = 1
union all
select tt.*,
(case when tt.time < cte.grp_start + interval '3 second'
then tt.time
else tt.grp_start
end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select cte.*,
(case when grp_start = lag(grp_start) over (partition by id order by time)
then 0 else 1
end) as isValid
from cte;