Below you can see my query, which gives the following result:
select t.actual_date,
t.id_key,
t.attendance_status,
t.money_step,
sum(t.money_step) over (partition by t.id_key order by t.actual_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)as accumulated
from example t
order by t.id_key, t.actual_date
I want the "accumulated" column to add up the value of "money_step" for each id_key.
If the attendance_status is the second time '15' for an Id, the counter should add up from the beginning. For ID_KEY = 1 it should look like this:
accumulated:
Row 1:20
Row 2: 80
Row 3: 100
Row 4: 120
How can I do this in the query? Can someone help me?
I understand that you want to reset the sum after the first row where attendance_status has value 15. One option uses a conditional max() to define the subgroup:
select
actual_date,
id_key,
attendance_status,
money_step,
sum(t.money_step) over (
partition by id_key, grp
order by actual_date
) as accumulated
from (
select
e.*,
max(case when attendance_status = 15 then 1 else 0 end) over(
partition by id_key
order by actual_date
rows between unbounded preceding and 1 preceding
) grp
from example e
) e
order by id_key, actual_date
Related
I have a peculiar problem at hand. I need to rank in the following manner:
Each ID gets a new rank.
rank #1 is assigned to the ID with the lowest date. However, the subsequent dates for that particular ID can be higher but they will get the incremental rank w.r.t other IDs.
(E.g. ADF32 series will be considered to be ranked first as it had the lowest date, although it ends with dates 09-Nov, and RT659 starts with 13-Aug it will be ranked subsequently)
For a particular ID, if the days are consecutive then ranks are same, else they add by 1.
For a particular ID, ranks are given in date ASC.
How to formulate a query?
You need two steps:
select
id_col
,dt_col
,dense_rank()
over (order by min_dt, id_col, dt_col - rnk) as part_col
from
(
select
id_col
,dt_col
,min(dt_col)
over (partition by id_col) as min_dt
,rank()
over (partition by id_col
order by dt_col) as rnk
from tab
) as dt
dt_col - rnk caluclates the same result for consecutives dates -> same rank
Try datediff on lead/lag and then perform partitioned ranking
select t.ID_COL,t.dt_col,
rank() over(partition by t.ID_COL, t.date_diff order by t.dt_col desc) as rankk
from ( SELECT ID_COL,dt_col,
DATEDIFF(day, Lag(dt_col, 1) OVER(ORDER BY dt_col),dt_col) as date_diff FROM table1 ) t
One way to think about this problem is "when to add 1 to the rank". Well, that occurs when the previous value on a row with the same id_col differs by more than one day. Or when the row is the earliest day for an id.
This turns the problem into a cumulative sum:
select t.*,
sum(case when prev_dt_col = dt_col - 1 then 0 else 1
end) over
(order by min_dt_col, id_col, dt_col) as ranking
from (select t.*,
lag(dt_col) over (partition by id_col order by dt_col) as prev_dt_col,
min(dt_col) over (partition by id_col) as min_dt_col
from t
) t;
I need an sql code for the below. I want it to RANK however if DSLR >= 60 then I want the rank to start again like below.
Thanks
Assuming that you have a column that defines the ordering of the rows, say id, you can address this as a gaps-and-islands problem. Islands are group of adjacent record that start with a dslr above 60. We can identify them with a window sum, then rank within each island:
select dslr, rank() over(partition by grp order by id) as rn
from (
select t.*,
sum(case when dslr >= 60 then 1 else 0 end) over(order by id) as grp
from mytable t
) t
I'm trying to put information to identify GROUP ID by replicating this Excel formula:
IF(OR(A2<>A1,AND(B2<>"000",B1="000")),D1+1,D1)
This formula is written when my cursor is in "D2", meaning I've referred to the newly added column value in the previous row to generate the current value.
I'd like to this with Db2 SQL, but I'm not sure how to because I'll need to do LAG function on the column I'm going to add and referring their value.
Kindly advise if having better way to do.
Thanks.
You need nested OLAP-functions, assuming ORDER BY SERIAL_NUMBER, EVENT_TIMESTAMP returns the order shown in Excel:
with cte as
(
select ...
case --IF(OR(A2<>A1,AND(B2<>"000",B1="000"))
when (lag(OPERATION)
over (order by SERIAL_NUMBER, EVENT_TIMESTAMP) = '000'
and OPERATION <> '000')
or lag(SERIAL_NUMBER,1,'')
over (order by SERIAL_NUMBER, EVENT_TIMESTAMP) <> SERIAL_NUMBER
then 1
else 0
end as flag -- start of new group
from tab
)
select ...
sum(flag)
over (order by SERIAL_NUMBER, EVENT_TIMESTAMP
rows unbounded preceding) as GROUP_ID
from cte
Your code is counting the number of "breaks" in your data, where a "break" is defined as 000 or the value in the first column changing.
In SQL, you can do this as a cumulative sum:
select t.*,
sum(case when prev_serial_number = serial_number or operation <> '000'
then 0 else 1
end) over (order by event_timestamp rows between unbounded preceding and current row) as column_d
from (select t.*,
lag(serial_number) over (order by event_timestamp) as prev_serial_number
from t
) t
How would I construct a query to receive the MAX TurnTime per ID of the first 2 rounds? Rounds being defined as minimum Beginning_Date to mininmum End_Date of an ID. Without reusing either of the dates for the second round Turn Time calculation.
You can use row_number() . . . twice:
select d.*
from (select d.*,
row_number() over (partition by id order by turn_time desc) as seqnum_turntime
from (select d.*,
row_number() over (partition by id order by beginning_end desc) as seqnum_round
from data d
) d
where seqnum_round <= 2
) d
where seqnum_turntime = 1;
The innermost subquery gets the first two rounds. The outer subquery gets the maximum.
You could express this without window functions as well:
select top (1) with ties d.*
from data d
where d.beginning_date <= (select d2.beginning_date
from data d2
where d2.id = d.id
offset 1 fetch first 1 row only
)
order by row_number() over (partition by id order by turntime desc);
SELECT
ID
,turn_time
,beginning_date
,end_date
FROM
(
SELECT
ID
,MAX(turn_time) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS turn_time --Maximum turn time of the current row and preceding row
,MIN(BeginningDate) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS beginning_date --Minimum begin date over current row and preceding row (could also use LAG)
,end_date
,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY BeginningDate) AS Turn_Number
FROM
<whatever your table is>
) turn_summary
WHERE
Turn_Number = 2
I am using Exasol, in other DBMS it was possible to use analytical functions such LAST_VALUE() and specify some condition for the ORDER BY clause withing the OVER() function, like:
select ...
LAST_VALUE(customer)
OVER (PARTITION BY ID ORDER BY date_x DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) as the_last
Unfortunately I get the following error:
ERROR: [0A000] Feature not supported: windowing clause (Session:
1606983630649130920)
the same do not happen if instead of AND 1 PRECEDING I use: CURRENT ROW.
Basically what I wanted is to get the last value according the Order by that is NOT the current row. In this example it would be the $customer of the previous row.
I know that I could use the LAG(customer,1) OVER ( ...) but the problem is that I want the previous customer that is NOT null, so the offset is not always 1...
How can I do that?
Many thanks!
Does this work?
select lag(customer) over (partition by id
order by (case when customer is not null then 1 else 0 end),
date
)
You can do this with two steps:
select t.*,
max(customer) over (partition by id, max_date) as max_customer
from (select t.*,
max(case when customer is not null then date end) over (partition by id order by date) as max_date
from t
) t;