Find non consecutive date ranges - sql

i want to find if some of all the consecutive date ranges has gap between. Some of the dates are not consecutive, in this case it will return the RowId of the single range.
Table Name: Subscriptions
RowId
ClientId
Status
StartDate
EndDate
1
1
1
01/01/2022
02/01/2022
2
1
1
03/01/2022
04/01/2022
3
1
1
12/01/2022
15/01/2022
4
2
1
03/01/2022
06/01/2022
i want a sql statement to find RowId of non consecutive ranges for each client and status in (1,3) (example of result)
RowId
3
I want to solve the problem using SQL only.
thanks

One way you could do this is to use Lag (or lead) to identify gaps in neighbouring rows' date ranges and take the top N rows where the gap exceeds 1 day.
select top (1) with ties rowId
from t
where status in (1,3)
order by
case when DateDiff(day, lag(enddate,1,enddate)
over(partition by clientid order by startdate), startdate) >1
then 0 else 1 end;

You can detect gaps with LAG() and mark them. Then, it's easy to filter out the rows. For example:
select *
from (
select *,
case when dateadd(day, -1, start_date) >
lag(end_date) over(partition by client_id order by start_date)
then 1 else 0 end as i
from t
) x
where i = 1
Or simpler...
select *
from (
select *,
lag(end_date) over(partition by client_id order by start_date) as prev_end
from t
) x
where dateadd(day, -1, start_date) > prev_end

Related

Create months between two dates Snowflake SQL

I just want to generate the months between data range using SQL Query.
example
You can use a table generator:
select '2022-07-04'::date +
row_number() over(partition by 1 order by null) - 1 GENERATED_DATE
from table(generator(rowcount => 365))
;
Just change the start date and the number of days into the series. You can use the datediff function to calculate the number of days between the start end end dates.
Edit: I just realized the generator table function requires a constant for the number of rows. That's easily solvable. Just set a higher number of rows than you'll need and specify the end of the series in a qualify clause:
set startdate = (select '2022-04-15'::date);
set enddate = (select '2022-07-04'::date);
select $startdate::date +
row_number() over(partition by 1 order by null) - 1 GENERATED_DATE
from table(generator(rowcount => 100000))
qualify GENERATED_DATE <= $enddate
;
You can use a table generator in the CTE, and then select from the CTE and cartesian join to your table with data and use a case statement to see if the date in the generator is between your start and to dates.
Then select from it:
select user_id, x_date
from (
with dates as (
select '2019-01-01'::date + row_number() over(order by 0) x_date
from table(generator(rowcount => 1500))
)
select d.x_date, t.*,
case
when d.x_date between t.from_date and t.to_date then 'Y' else 'N' end target_date
from dates d, my_table t --deliberate cartesian join
)
where target_date = 'Y'
order by 1,2
Output:
USER_ID X_DATE
1 2/20/2019
1 2/21/2019
1 2/22/2019
1 2/23/2019
2 2/22/2019
2 2/23/2019
2 2/24/2019
2 2/25/2019
2 2/26/2019
2 2/27/2019
2 2/28/2019
3 3/1/2019
3 3/2/2019
3 3/3/2019
3 3/4/2019
3 3/5/2019
=======EDIT========
Based on your comments below, you are actually looking for something different than your original screenshots. Ok, so here we are still using the table generator, and then we're truncating the month to the first day of the month where the x-date is YES.
select distinct t.user_id, t.from_date, t.to_date, date_trunc('MONTH', z.x_date) as trunc_month
from (
with dates as (
select '2019-01-01'::date + row_number() over(order by 0) x_date
from table(generator(rowcount => 1500))
)
select d.x_date, t.*,
case
when d.x_date between t.from_date and t.to_date then 'Y' else 'N' end target_date
from dates d, my_table t
)z
join my_table t
on z.user_id = t.user_id
where z.target_date = 'Y'
order by 1,2
Output (modified User ID 3 to span 2 months):
USER_ID FROM_DATE TO_DATE TRUNC_MONTH
1 2/20/2019 2/23/2019 2/1/2019
2 2/22/2019 2/28/2019 2/1/2019
3 2/25/2019 3/5/2019 2/1/2019
3 2/25/2019 3/5/2019 3/1/2019

Find repeating values of a certain value

I have a table similar to:
Date
Person
Distance
2022/01/01
John
15
2022/01/02
John
0
2022/01/03
John
0
2022/01/04
John
0
2022/01/05
John
19
2022/01/01
Pete
25
2022/01/02
Pete
12
2022/01/03
Pete
0
2022/01/04
Pete
0
2022/01/05
Pete
1
I want to find all persons who have a distance of 0 for 3 or more consecutive days.
So in the above, it must return John and the count of the days with a zero distance.
I.e.
Person
Consecutive Days with Zero
John
3
I'm looking at something like this, but I think this might be way off:
Select Person, count(*),
(row_number() over (partition by Person, Date order by Person, Date))
from mytable
Provided I understand your requirement you could, for your sample data, just calculate the difference in days of a windowed min/max date:
select distinct Person, Consecutive from (
select *, DateDiff(day,
Min(date) over(partition by person),
Max(date) over(partition by person)
) + 1 Consecutive
from t
where distance = 0
)t
where Consecutive >= 3;
Example Fiddle
If you can have gaps in the dates you could try the following that only considers rows with 1 day between each date (and could probably be simplified):
with c as (
select *, Row_Number() over (partition by person order by date) rn,
DateDiff(day, Lag(date) over(partition by person order by date), date) c
from t
where distance = 0
), g as (
select Person, rn - Row_Number() over(partition by person, c order by date) grp
from c
)
select person, Count(*) + 1 consecutive
from g
group by person, grp
having Count(*) >= 2;
One option is to:
transform your "Distance" values into a boolean, where distance of 0 becomes 1 and any other value becomes zero
compute a running sum over your transformed "Distance" values in a window of three rows, using a frame specification clause
filter out any "Person" value which has at least one sum of 3.
WITH cte AS (
SELECT *, SUM(CASE WHEN Distance = 0 THEN 1 ELSE 0 END) OVER(
PARTITION BY Person
ORDER BY Date_
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS window_of_3
FROM tab
)
SELECT DISTINCT Person
FROM cte
WHERE window_of_3 = 3
Check the demo here.
Note: This solution requires your table to have no missing dates. In case missing dates is a possible scenario, then it's necessary to add missing rows corresponding to the dates not found for each "Person" value, for this solution to work.

sum values based on 7-day cycle in SQL Oracle

I have dates and some value, I would like to sum values within 7-day cycle starting from the first date.
date value
01-01-2021 1
02-01-2021 1
05-01-2021 1
07-01-2021 1
10-01-2021 1
12-01-2021 1
13-01-2021 1
16-01-2021 1
18-01-2021 1
22-01-2021 1
23-01-2021 1
30-01-2021 1
this is my input data with 4 groups to see what groups will create the 7-day cycle.
It should start with first date and sum all values within 7 days after first date included.
then start a new group with next day plus anothe 7 days, 10-01 till 17-01 and then again new group from 18-01 till 25-01 and so on.
so the output will be
group1 4
group2 4
group3 3
group4 1
with match_recognize would be easy current_day < first_day + 7 as a condition for the pattern but please don't use match_recognize clause as solution !!!
One approach is a recursive CTE:
with tt as (
select dte, value, row_number() over (order by dte) as seqnum
from t
),
cte (dte, value, seqnum, firstdte) as (
select tt.dte, tt.value, tt.seqnum, tt.dte
from tt
where seqnum = 1
union all
select tt.dte, tt.value, tt.seqnum,
(case when tt.dte < cte.firstdte + interval '7' day then cte.firstdte else tt.dte end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select firstdte, sum(value)
from cte
group by firstdte
order by firstdte;
This identifies the groups by the first date. You can use row_number() over (order by firstdte) if you want a number.
Here is a db<>fiddle.

Select min/max dates for periods that don't intersect

Example! I have a table with 4 columns. date format dd.MM.yy
id ban start end
1 1 01.01.15 31.12.18
1 1 02.02.15 31.12.18
1 1 05.04.15 31.12.17
In this case dates from rows 2 and 3 are included in dates from row 1
1 1 02.04.19 31.12.20
1 1 05.05.19 31.12.20
In this case dates from row 5 are included in dates from rows 4. Basically we have 2 periods that don't intersect.
01.01.15 31.12.18
and
02.04.19 31.12.20
Situation where a date starts in one period and ends in another are impossible. The end result should look like this
1 1 01.01.15 31.12.18
1 1 02.04.19 31.12.20
I tried using analitical functions(LAG)
select id
, ban
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(start) over (partition by id, ban order by start, end asc), start)
else start
end as start
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(end) over (partition by id, ban order by start, end asc), end)
else end
end as end
from table
Where I order rows and if current dates are included in previous I replace them. It works if I have just 2 rows. For example this
1 1 08.09.15 31.12.99
1 1 31.12.15 31.12.99
turns into this
1 1 08.09.15 31.12.99
1 1 08.09.15 31.12.99
which I can then group by all fields and get what I want, but if there are more
1 2 13.11.15 31.12.99
1 2 31.12.15 31.12.99
1 2 16.06.15 31.12.99
I get
1 2 16.06.15 31.12.99
1 2 16.06.15 31.12.99
1 2 13.11.15 31.12.99
I understand why this happens, but how do I work around it? Running the query multiple times is not an option.
This query looks promising:
-- test data
with t(id, ban, dtstart, dtend) as (
select 1, 1, date '2015-01-01', date '2015-03-31' from dual union all
select 1, 1, date '2015-02-02', date '2015-03-31' from dual union all
select 1, 1, date '2015-03-15', date '2015-03-31' from dual union all
select 1, 1, date '2015-08-05', date '2015-12-31' from dual union all
select 1, 2, date '2015-01-01', date '2016-12-31' from dual union all
select 2, 1, date '2016-01-01', date '2017-12-31' from dual),
-- end of test data
step1 as (select id, ban, dt, to_number(inout) direction
from t unpivot (dt for inout in (dtstart as '1', dtend as '-1'))),
step2 as (select distinct id, ban, dt, direction,
sum(direction) over (partition by id, ban order by dt) sm
from step1),
step3 as (select id, ban, direction, dt dt1,
lead(dt) over (partition by id, ban order by dt) dt2
from step2
where (direction = 1 and sm = 1) or (direction = -1 and sm = 0) )
select id, ban, dt1, dt2
from step3 where direction = 1 order by id, ban, dt1
step1 - unpivot dates and assign 1 for start date, -1 for end
date (column direction)
step2 - add cumulative sum for direction
step3 - filter only interesting dates, pivot second date using lead()
You can shorten this syntax, I divided it to steps to show what's going on.
Result:
ID BAN DT1 DT2
------ ---------- ----------- -----------
1 1 2015-01-01 2015-03-31
1 1 2015-08-05 2015-12-31
1 2 2015-01-01 2016-12-31
2 1 2016-01-01 2017-12-31
I assumed that for different (ID, BAN) we have to make calculations separately. If not - change partitioning and ordering in sum() and lead().
Pivot and unpivot works in Oracle 11 and later, for earlier versions you need case when.
BTW - START is reserved word in Oracle so in my example I changed slightly column names.
I like to do this by identifying the period starts, then doing a cumulative sum to define the group, and a final aggregation:
select id, ban, min(start), max(end)
from (select t.*, sum(start_flag) over (partition by id, bin order by start) as grp
from (select t.*,
(case when exists (select 1
from t t2
where t2.id = t.id and t2.ban = t.ban and
t.start <= t2.end and t.end >= t2.start and
t.start <> t2.start and t.end <> t2.end
)
then 0 else 1
end) as start_flag
from t
) t
) t
group by id, ban, grp;

Redshift SQL Window Function frame_clause with days

I am trying to perform a window function on a data-set in Redshift using days an an interval for the preceding rows.
Example data:
date ID score
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 3
3/5/2017 555 2
SQL window function for avg score from the last 3 scores:
select
date,
id,
avg(score) over
(partition by id order by date rows
between preceding 3 and
current row) LAST_3_SCORES_AVG,
from DATASET
Result:
date ID LAST_3_SCORES_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2
Problem is that I would like the average score from the last 3 DAYS (moving average) and not the last three tests. I have gone over the Redshift and Postgre Documentation and can't seem to find any way of doing it.
Desired Result:
date ID 3_DAY_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2.5
Any direction would be appreciated.
You can use lag() and explicitly calculate the average.
select t.*,
(score +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 1) over (partition by id order by date)
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 2) over (partition by id order by date)
else 0
end)
)
) /
(1 +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end)
)
from dataset t;
The following approach could be used instead of the RANGE window option in a lot of (or all) cases.
You can introduce "expiry" for each of the input records. The expiry record would negate the original one, so when you aggregate all preceding records, only the ones in the desired range will be considered.
AVG is a bit harder as it doesn't have a direct opposite, so we need to think of it as SUM/COUNT and negate both.
SELECT id, date, running_avg_score
FROM
(
SELECT id, date, n,
SUM(score) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
/ NULLIF(SUM(n) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) as running_avg_score
FROM
(
SELECT date, id, score, 1 as n
FROM DATASET
UNION ALL
-- expiry and negate
SELECT DATEADD(DAY, 3, date), id, -1 * score, -1
FROM DATASET
)
) a
WHERE a.n = 1