Create months between two dates Snowflake SQL - sql

I just want to generate the months between data range using SQL Query.
example

You can use a table generator:
select '2022-07-04'::date +
row_number() over(partition by 1 order by null) - 1 GENERATED_DATE
from table(generator(rowcount => 365))
;
Just change the start date and the number of days into the series. You can use the datediff function to calculate the number of days between the start end end dates.
Edit: I just realized the generator table function requires a constant for the number of rows. That's easily solvable. Just set a higher number of rows than you'll need and specify the end of the series in a qualify clause:
set startdate = (select '2022-04-15'::date);
set enddate = (select '2022-07-04'::date);
select $startdate::date +
row_number() over(partition by 1 order by null) - 1 GENERATED_DATE
from table(generator(rowcount => 100000))
qualify GENERATED_DATE <= $enddate
;

You can use a table generator in the CTE, and then select from the CTE and cartesian join to your table with data and use a case statement to see if the date in the generator is between your start and to dates.
Then select from it:
select user_id, x_date
from (
with dates as (
select '2019-01-01'::date + row_number() over(order by 0) x_date
from table(generator(rowcount => 1500))
)
select d.x_date, t.*,
case
when d.x_date between t.from_date and t.to_date then 'Y' else 'N' end target_date
from dates d, my_table t --deliberate cartesian join
)
where target_date = 'Y'
order by 1,2
Output:
USER_ID X_DATE
1 2/20/2019
1 2/21/2019
1 2/22/2019
1 2/23/2019
2 2/22/2019
2 2/23/2019
2 2/24/2019
2 2/25/2019
2 2/26/2019
2 2/27/2019
2 2/28/2019
3 3/1/2019
3 3/2/2019
3 3/3/2019
3 3/4/2019
3 3/5/2019
=======EDIT========
Based on your comments below, you are actually looking for something different than your original screenshots. Ok, so here we are still using the table generator, and then we're truncating the month to the first day of the month where the x-date is YES.
select distinct t.user_id, t.from_date, t.to_date, date_trunc('MONTH', z.x_date) as trunc_month
from (
with dates as (
select '2019-01-01'::date + row_number() over(order by 0) x_date
from table(generator(rowcount => 1500))
)
select d.x_date, t.*,
case
when d.x_date between t.from_date and t.to_date then 'Y' else 'N' end target_date
from dates d, my_table t
)z
join my_table t
on z.user_id = t.user_id
where z.target_date = 'Y'
order by 1,2
Output (modified User ID 3 to span 2 months):
USER_ID FROM_DATE TO_DATE TRUNC_MONTH
1 2/20/2019 2/23/2019 2/1/2019
2 2/22/2019 2/28/2019 2/1/2019
3 2/25/2019 3/5/2019 2/1/2019
3 2/25/2019 3/5/2019 3/1/2019

Related

SQL: How to create a daily view based on different time intervals using SQL logic?

Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.
I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values ​​from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values ​​(let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here

How to cross join but using latest value in BIGQUERY

I have this table below
date
id
value
2021-01-01
1
3
2021-01-04
1
5
2021-01-05
1
10
And I expect output like this, where the date column is always increase daily and value column will generate the last value on an id
date
id
value
2021-01-01
1
3
2021-01-02
1
3
2021-01-03
1
3
2021-01-04
1
5
2021-01-05
1
10
2021-01-06
1
10
I think I can use cross join but I can't get my expected output and think that there are a special syntax/logic to solve this
Consider below approach
select * from `project.dataset.table`
union all
select missing_date, prev_row.id, prev_row.value
from (
select *, lag(t) over(partition by id order by date) prev_row
from `project.dataset.table` t
), unnest(generate_date_array(prev_row.date + 1, date - 1)) missing_date
I would write this using:
select dte, t.id, t.value
from (select t.*,
lead(date, 1, date '2021-01-06') over (partition by id order by date) as next_day
from `table` t
) t cross join
unnest(generate_date_array(
date,
ifnull(
date_add(next_date, interval -1 day), -- generate missing date rows
(select max(date) from `table`) -- add last row
)
)) dte;
Note that this requires neither union all nor window function to fill in the values.
alternative solution using last_value. You may explore the following query and customize your logic to generate days (if needed)
WITH
query AS (
SELECT
date,
id,
value
FROM
`mydataset.newtable`
ORDER BY
date ),
generated_days AS (
SELECT
day
FROM (
SELECT
MIN(date) min_dt,
MAX(date) max_dt
FROM
query),
UNNEST(GENERATE_DATE_ARRAY(min_dt, max_dt)) day )
SELECT
g.day,
LAST_VALUE(q.id IGNORE NULLS) OVER(ORDER BY g.day) id,
LAST_VALUE(q.value IGNORE NULLS) OVER(ORDER BY g.day) value,
FROM
generated_days g
LEFT OUTER JOIN
query q
ON
g.day = q.date
ORDER BY
g.day

MS-SQL how to add missing month in a table values

I have a table with the following entries,
ID
date
Frequency
1
'2012-04-30'
5
1
'2012-06-30'
4
1
'2012-07-31'
25
2
'2012-04-30'
7
2
'2012-05-31'
4
2
'2012-06-30'
1
2
'2012-07-31'
6
I need to add missing month and the date which gets added should be the last date of that month with frequency value as 0.
The expected output is
ID
date
Frequency
1
'2012-04-30'
5
1
'2012-05-31'
0
1
'2012-06-30'
4
1
'2012-07-31'
25
2
'2012-04-30'
7
2
'2012-05-31'
4
2
'2012-06-30'
1
2
'2012-07-31'
6
I need to add missing month and the date which gets added should be the last date of that
I would suggest recursive CTEs:
with cte as (
select id, date, frequency,
lead(date) over (partition by id order by date) as next_date
from t
union all
select id, eomonth(date, 1), 0, next_date
from cte
where eomonth(date, 1) < dateadd(day, -1, next_date)
)
select id, date, frequency
from cte
order by id, date;
The anchor part of the CTE calculates the end date for a given row. The recursive part then just keeps adding months to fill in the missing rows (and none if there are none). The use of eomonth(date, 1) is just a handy way of getting the last day of the next month.
Here is a db<>fiddle.
If you have all dates in the table, you can also use cross join to generate the rows and then left join to bring in the existing data:
select i.id, d.date, coalesce(t.frequency, 0) as frequency
from (select distinct id from t) i cross join
(select distinct date from t) d left join
t
on i.id = t.id and d.date = t.date
order by i.id, d.date;
If you have a large amount of data, you can compare performance. This may be a case where a recursive CTE is faster than alternative methods.

sum values based on 7-day cycle in SQL Oracle

I have dates and some value, I would like to sum values within 7-day cycle starting from the first date.
date value
01-01-2021 1
02-01-2021 1
05-01-2021 1
07-01-2021 1
10-01-2021 1
12-01-2021 1
13-01-2021 1
16-01-2021 1
18-01-2021 1
22-01-2021 1
23-01-2021 1
30-01-2021 1
this is my input data with 4 groups to see what groups will create the 7-day cycle.
It should start with first date and sum all values within 7 days after first date included.
then start a new group with next day plus anothe 7 days, 10-01 till 17-01 and then again new group from 18-01 till 25-01 and so on.
so the output will be
group1 4
group2 4
group3 3
group4 1
with match_recognize would be easy current_day < first_day + 7 as a condition for the pattern but please don't use match_recognize clause as solution !!!
One approach is a recursive CTE:
with tt as (
select dte, value, row_number() over (order by dte) as seqnum
from t
),
cte (dte, value, seqnum, firstdte) as (
select tt.dte, tt.value, tt.seqnum, tt.dte
from tt
where seqnum = 1
union all
select tt.dte, tt.value, tt.seqnum,
(case when tt.dte < cte.firstdte + interval '7' day then cte.firstdte else tt.dte end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select firstdte, sum(value)
from cte
group by firstdte
order by firstdte;
This identifies the groups by the first date. You can use row_number() over (order by firstdte) if you want a number.
Here is a db<>fiddle.

Select min/max dates for periods that don't intersect

Example! I have a table with 4 columns. date format dd.MM.yy
id ban start end
1 1 01.01.15 31.12.18
1 1 02.02.15 31.12.18
1 1 05.04.15 31.12.17
In this case dates from rows 2 and 3 are included in dates from row 1
1 1 02.04.19 31.12.20
1 1 05.05.19 31.12.20
In this case dates from row 5 are included in dates from rows 4. Basically we have 2 periods that don't intersect.
01.01.15 31.12.18
and
02.04.19 31.12.20
Situation where a date starts in one period and ends in another are impossible. The end result should look like this
1 1 01.01.15 31.12.18
1 1 02.04.19 31.12.20
I tried using analitical functions(LAG)
select id
, ban
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(start) over (partition by id, ban order by start, end asc), start)
else start
end as start
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(end) over (partition by id, ban order by start, end asc), end)
else end
end as end
from table
Where I order rows and if current dates are included in previous I replace them. It works if I have just 2 rows. For example this
1 1 08.09.15 31.12.99
1 1 31.12.15 31.12.99
turns into this
1 1 08.09.15 31.12.99
1 1 08.09.15 31.12.99
which I can then group by all fields and get what I want, but if there are more
1 2 13.11.15 31.12.99
1 2 31.12.15 31.12.99
1 2 16.06.15 31.12.99
I get
1 2 16.06.15 31.12.99
1 2 16.06.15 31.12.99
1 2 13.11.15 31.12.99
I understand why this happens, but how do I work around it? Running the query multiple times is not an option.
This query looks promising:
-- test data
with t(id, ban, dtstart, dtend) as (
select 1, 1, date '2015-01-01', date '2015-03-31' from dual union all
select 1, 1, date '2015-02-02', date '2015-03-31' from dual union all
select 1, 1, date '2015-03-15', date '2015-03-31' from dual union all
select 1, 1, date '2015-08-05', date '2015-12-31' from dual union all
select 1, 2, date '2015-01-01', date '2016-12-31' from dual union all
select 2, 1, date '2016-01-01', date '2017-12-31' from dual),
-- end of test data
step1 as (select id, ban, dt, to_number(inout) direction
from t unpivot (dt for inout in (dtstart as '1', dtend as '-1'))),
step2 as (select distinct id, ban, dt, direction,
sum(direction) over (partition by id, ban order by dt) sm
from step1),
step3 as (select id, ban, direction, dt dt1,
lead(dt) over (partition by id, ban order by dt) dt2
from step2
where (direction = 1 and sm = 1) or (direction = -1 and sm = 0) )
select id, ban, dt1, dt2
from step3 where direction = 1 order by id, ban, dt1
step1 - unpivot dates and assign 1 for start date, -1 for end
date (column direction)
step2 - add cumulative sum for direction
step3 - filter only interesting dates, pivot second date using lead()
You can shorten this syntax, I divided it to steps to show what's going on.
Result:
ID BAN DT1 DT2
------ ---------- ----------- -----------
1 1 2015-01-01 2015-03-31
1 1 2015-08-05 2015-12-31
1 2 2015-01-01 2016-12-31
2 1 2016-01-01 2017-12-31
I assumed that for different (ID, BAN) we have to make calculations separately. If not - change partitioning and ordering in sum() and lead().
Pivot and unpivot works in Oracle 11 and later, for earlier versions you need case when.
BTW - START is reserved word in Oracle so in my example I changed slightly column names.
I like to do this by identifying the period starts, then doing a cumulative sum to define the group, and a final aggregation:
select id, ban, min(start), max(end)
from (select t.*, sum(start_flag) over (partition by id, bin order by start) as grp
from (select t.*,
(case when exists (select 1
from t t2
where t2.id = t.id and t2.ban = t.ban and
t.start <= t2.end and t.end >= t2.start and
t.start <> t2.start and t.end <> t2.end
)
then 0 else 1
end) as start_flag
from t
) t
) t
group by id, ban, grp;