I'm looking for a query that will calculate rank based on a condition, as following:
The "rank index" is the calculation I'm looking for - if the difference between previous time and current time is less than 6 hours, than the counter should remain the same. If more than 6 hours have past than promote the index by 1.
Any ideas?
Based on your explanation, the last value should be 4 not 5.
Use lag() and a cumulative sum. Assuming the datetime column is stored as a date:
select t.*,
sum(case when prev_datetime > datetime - interval '6' hour then 0 else 1 end) over
(order by datetime) as rank_index
from (select t.*,
lag(datetime) over (order by datetime) as prev_datetime
from t
) t;
Note: If you want this for each key1/key2 combination, then you want to include partition by key1, key2 in the window specifications.
Related
I'm trying to get the max value of a specific field in the last 3 months, last 6 months and max since inception with respect to a specific date and partitioned by part_id.
For max since inception, I used, below query.
select
part_id,
date_field,
MAX(val) OVER(partition by part_id order by date_field rows unbounded preceding) as max_since_inception
FROM my_table;
How do I add a condition to get the max value only in the last 3 months of my date_field?
Eg. if date_field is 2020-09-25,
max_l3m must have max value between 2020-06-25 and 2020-09-25;
max_l6m must have max value between 2020-03-25 and 2020-09-25;
max_since_inception must have max_value since inception till 2020-09-25
and partitioned by part_id
Redshift doesn't support window frames, so you are stuck with a self join or some other complicated construct:
select t.part_id, t.date_field,
max(case when tprev.date_field > t.date_field - interval '3 month' then value end) as max_l3m,
max(case when tprev.date_field > t.date_field - interval '6 month' then value end) as max_l6m
from t join
t tprev
on tprev.part_id = t.part_id and tprev.date_field <= t.date_field;
You might also want to limit the lookback period to 6 months, if that is the longest timeframe you really need.
I wan this type of calculated date value in SQL.
Is any way to get this type of calculated data ?
I think that you want:
select
t.*,
dateadd(
day,
sum(t.duration) over(order by autoid),
first_value(t.date) over(order by autoid)
) date
from mytable t
Starting from the first value in the date column (which, as I understand, is the only non-null value in that column), this incrementally adds the number of days in the duration column.
You seem to want the cumulative duration. I would do this just by subtracting the earliest:
select t.*,
datediff(day, min(date) over (), date) as total_duration
from t;
I have 3 columns:
Employee ID(numerical)
Day of work(a date yyyy-mm-dd when employee had a shift)
is_consecutive_work_day (1 if days of work are consecutive, else 0)
I need a 4th: Consecutive_work_days (a cumulative sum of is_consecutive_work_day, which resets to 1 when is_consecutive_work_day = 0). So this will go to a maximum of 5 for any employee id. Some will have 1,2,3 others 1,2...etc.
What am failing to figure out is how to write the 4th column (consecutive_work_days). Not how to write a consecutive sum per employee id, but specifically how to reset to 1 when is_consecutive_work_day = 0 per employee id.
May I ask for your help regarding this 4th column please? Thanks.
You can use window functions. lag() lets you access the previous day_of_work for the same employee, which you can compare to the current day_of_work: if there is a one day difference, then you can set is_consecutive_work_day to 1.
select
employee_id,
day_of_work,
case
when day_of_work
= lag(day_of_work) over(partition by employee_id order by day_of_work)
+ interval 1 day
then 1
else 0
end is_consecutive_work_day
from mytable
To compute the cumulative sum, it is a bit more complicated. We can use some gaps-and-island technique to put each record in the group it belongs to: basically, everytime is_consecutive_work_day of 0 is met, a new group starts; we can then do a window sum() over each group:
select
employee_id,
day_of_work,
is_consecutive_work_day,
sum(is_consecutive_work_day)
over(partition by employee_id, grp order by day_of_work)
consecutive_work_days
from (
select
t.*,
sum(1 - is_consecutive_work_day) over(partition by employee_id order by day_of_work) grp
from (
select
t.*,
case
when day_of_work
= lag(day_of_work) over(partition by employee_id order by day_of_work)
+ interval 1 day
then 1
else 0
end is_consecutive_work_day
from mytable t
) t
) t
Although this seem like a gap-and-islands problem, there is a simpler solution. Simply calculate the maximum previous value that is 0 and take the date difference.
The only caveat is if there is none.
That would be:
select t.*,
datediff(day_of_work,
coalesce(max(case when is_consecutive_work_day = 0 then day_of_work end) over (partition by employee_id),
date_add(min(day_of_work) partition by employee_id), 1)
)
) as fourth_column
from t;
Let's say I have a dataset with two columns: ID and timestamp. My goal is to count return IDs that have at least n timestamps in any 30 day window.
Here is an example:
ID Timestamp
1 '2019-01-01'
2 '2019-02-01'
3 '2019-03-01'
1 '2019-01-02'
1 '2019-01-04'
1 '2019-01-17'
So, let's say I want to return a list of IDs that have 3 timestamps in any 30 day window.
Given above, my resultset would just be ID = 1. I'm thinking some kind of windowing function would accomplish this, but I'm not positive.
Any chance you could help me write a query that accomplishes this?
A relatively simple way to do this involves lag()/lead():
select t.*
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;
The lag() looks at the third timestamp in a series. The where checks if this is within 30 days of the original one. The result is rows where this occurs.
If you just want the ids, then:
select distinct id
from (select t.*,
lead(timestamp, 2) over (partition by id order by timestamp) as timestamp_2
from t
) t
where datediff(day, timestamp, timestamp_2) <= 30;
I have a requirement to get values from a table based on an offset conditions on a date column.
Say for eg: for the below attached table, if there is any dates that comes close within 15 days based on effectivedate column I should return only the first one.
So my expected result would be as below:
Here for A1234 policy, it returns 6/18/16 entry and skipped 6/12/16 entry as the offset between these 2 dates is within 15 days and I took the latest one from the list.
If you want to group rows together that are within 15 days of each other, then you have a variant of the gaps-and-islands problem. I would recommend lag() and cumulative sum for this version:
select polno, min(effectivedate), max(expirationdate)
from (select t.*,
sum(case when prev_ed >= dateadd(day, -15, effectivedate)
then 1 else 0
end) over (partition by polno order by effectivedate) as grp
from (select t.*,
lag(expirationdate) over (partition by polno order by effectivedate) as prev_ed
from t
) t
) t
group by polno, grp;