create a temporary sql table using recursion as a loop to populate custom time interval - sql

Suppose you have a table like:
id subscription_start subscription_end segment
1 2016-12-01 2017-02-01 87
2 2016-12-01 2017-01-24 87
...
And wish to generate a temporary table with months.
One way would be to encode the month date as:
with months as (
select
'2016-12-01' as 'first',
'2016-12-31' as 'last'
union
select
'2017-01-01' as 'first',
'2017-01-31' as 'last'
...
) select * from months;
So that I have an output table like:
first_day last_day
2017-01-01 2017-01-31
2017-02-01 2017-02-31
2017-03-01 2017-03-31
I would like to generate a temporary table with a custom interval (above), without manually encoding all the dates.
Say the interval is of 12 months, for each year, for as many years there are in the db.
I'd like to have general approach to compute the months table with the same output as above.
Or, one may adjust the range to a custom interval (months split an year in 12 parts, but one may want to split a time in a custom interval of days).
To start, I was thinking to use recursive query like:
with months(id, first_day, last_day, month) as (
select
id,
first_day,
last_day,
0
where
subscriptions.first_day = min(subscriptions.first_day)
union all
select
id,
first_day,
last_day,
months.month + 1
from
subscriptions
left join months on cast(
strftime('%m', datetime(subscriptions.subscription_start)) as int
) = months.month
where
months.month < 13
)
select
*
from
months
where
month = 1;
but it does not do what I'd expect: here I was attempting to select the first row from the table with the minimum date, and populate a table at interval of months, ranging from 1 to 12. For each month, I was comparing the string date field of my table (e.g. 2017-03-01 = 3 is march).
The query above does work and also seems a bit complicated, but for the sake of learning, which alternative would you propose to create a temporary table months without manually coding the intervals ?

Related

How to create monthly snapshots for the last 6 months?

I'm trying to get detailed data (snapshot) for each month on Business Day=1 for the last 6 months and need to pass 6 different dates (BD1's only) through two date variables.
Two variables will be BOM which will be BD1 for the last 6 months and EOM which will be BD1+1.
For e.g
First snapshot will be
declare #BOM date ='2022-08-01'
declare #EOM date ='2022-09-01'
Second snapshot will be
declare #BOM date ='2022-09-01'
declare #EOM date ='2022-10-01'
and so on for the last 6 months from the current month
Here is what I'm trying to do:
declare #BOM date
set #BOM=
(
select top 6 cast(date_datetime as date) date_datetime
from date_dim
where
datediff(month, date_datetime, getdate()) <= 6
and bd=1
order by date_datetime asc);
declare #EOM date
set #EOM=
(
select top 6 date_datetime
from date_dim
where
datediff(month, date_datetime, getdate()) <= 5
and bd=1
order by date_datetime asc);
But my query does not process it as I'm passing more than 1 value through my BOM & EOM variables in my main query WHERE clause.
I need some help with defining and using these variables in my query so that they can take different snapshots and store it in a table.
As you discovered, you cannot store multiple values in a scalar variable. What you possibly need is to use a table variable (which behaves similarly to a temp table). The table variable can have multiple rows (one for each selected month) and multiple columns (BOM and EOM).
The following code defines such a table variable and populates it with BOM and EOM of the most recent 6 full months from the date_dim table. I used the LEAD() window function to select the corresponding EOM for each BOM.
Lacking any provided sample data to actually query, I added a simple query at the end to just list the selected date ranges and calculated number of business days in each.
-- Table variable to hold selected month information
DECLARE #selected_months TABLE (BOM DATE, EOM DATE)
-- Select last 6 full months
INSERT #selected_months
SELECT *
FROM (
SELECT
date_datetime AS BOM,
LEAD(date_datetime) OVER(ORDER BY date_datetime) AS EOM
FROM date_dim
) D
WHERE DATEDIFF(month, BOM, GETDATE()) BETWEEN 1 AND 6
ORDER BY BOM
-- Sample usage
SELECT M.*, DATEDIFF(day, M.BOM, M.EOM) business_days
FROM #selected_months M
-- JOIN your_data D
-- ON D.your_data_date >= SM.BOM
-- AND D.your_data_date < SM.EOM
GROUP BY M.BOM, M.EOM
ORDER BY M.BOM
Sample results:
BOM
EOM
business_days
2022-08-01
2022-09-05
35
2022-09-05
2022-10-03
28
2022-10-03
2022-11-07
35
2022-11-07
2022-12-05
28
2022-12-05
2023-01-02
28
2023-01-02
2023-02-06
35
See this db<>fiddle for a working demo.

Select records by month and year between two dates

I have the table record_b. I want to select the records of an specific month and year between begin_date and end_date.
id
begin_date
end_date
2
2022-09-04
2022-10-03
3
2022-10-04
2022-10-31
4
2022-11-04
2022-12-03
5
2022-12-04
2023-01-03
6
2023-01-04
2023-02-03
7
2023-02-04
null
eg1:
Input: 2023-01
Output should be the record with id 5 and 6
eg2:
Input: 2022-12
Output should be the record with id 4 and 5
I have tried using between however there is a problem evaluating the months after the year.
and v_year BETWEEN EXTRACT(YEAR FROM PC.begin_date)
AND EXTRACT(YEAR FROM PC.end_date)
AND v_month BETWEEN EXTRACT(MONTH FROM PC.begin_date)
AND EXTRACT(MONTH FROM PC.end_date)
A very basic dictate is when you have a date store it as a date.
This can be further extended to when you need to process dates then process dates.
Most of the nothing else will be needed - no conversion, extract, date_part, epoch - just dates.
The task here is to find those rows where a specified Year-Month (yyyy-mm) falls within the period begin and end dates from a table.
Realize that if any portion of the specified year-month falls within the period then the first day of that month (yyyy-mm-01) falls within that period.
You can use the make_date() function to get the first of the specified month. Then JOIN that result with between dates.
with input_val(yr_mon) as (values (:yyyymm)) --select * from input_val
, tgt_date(dt) as
( select 0make_date(substring(yr_mon,1,4)::integer
,substring(yr_mon,6,2)::integer
,01
)
from input_val
) --select * from tgt_date;
select rb.*
from tgt_date t
join record_b rb
on t.dt between date_trunc('month',rb.begin_date)
and date_trunc('month',rb.end_date);
The above however does NOT handle well data point 7 with a null end date (nor would it handle a null start date). But should it?
If so a null value is often interpreted as there in no ending date, which basically says all dates on or after the start date are included.
You can handle the situation by converting the period to a daterange, which will handle it without getting into null processing logic, then use the range containment operator.
with input_val(yr_mon) as (values (:yyyymm)) --select * from input_val
, tgt_date(dt) as
( select make_date(substring(yr_mon,1,4)::integer
,substring(yr_mon,6,2)::integer
,01
)
from input_val
) --select * from tgt_date;
select rb.*
from tgt_date t
join record_b rb
on t.dt <# daterange(date_trunc('month',rb.begin_date)::date
,date_trunc('month',rb.end_date)::date
, '[]'
);
Finally, depending on how you you use the results, you can hide this whole thing within a SQL function, which can then be used in an SQL statement.
create or replace function periods_with_year_month(year_mm text)
returns setof record_b
language sql
as $$
with tgt_date(dt) as
(select make_date(substring(year_mm,1,4)::integer
,substring(year_mm,6,2)::integer
,01
)
)
select rb.*
from tgt_date t
join record_b rb
on t.dt <# daterange( date_trunc('month',rb.begin_date)::date
, date_trunc('month',rb.end_date)::date
, '[]'
);
$$;
See demo here. Unfortunately ,db<>fiddle is non-interactive, so parameters of yyyy-mm are hard coded.

prestosql get average from last 7 days for each day

The question I have is very similar to the question here, but I am using Presto SQL (on aws athena) and couldn't find information on loops in presto.
To reiterate the issue, I want the query that:
Given table that contains: Day, Number of Items for this Day
I want: Day, Average Items for Last 7 Days before "Day"
So if I have a table that has data from Dec 25th to Jan 25th, my output table should have data from Jan 1st to Jan 25th. And for each day from Jan 1-25th, it will be the average number of items from last 7 days.
Is it possible to do this with presto?
maybe you can try this one
calendar Common Table Expression (CTE) is used to generate dates between two dates range.
with calendar as (
select date_generated
from (
values (sequence(date'2021-12-25', date'2022-01-25', interval '1' day))
) as t1(date_array)
cross join unnest(date_array) as t2(date_generated)),
temp CTE is basically used to make a date group which contains last 7 days for each date group.
temp as (select c1.date_generated as date_groups
, format_datetime(c2.date_generated, 'yyyy-MM-dd') as dates
from calendar c1, calendar c2
where c2.date_generated between c1.date_generated - interval '6' day and c1.date_generated
and c1.date_generated >= date'2021-12-25' + interval '6' day)
Output for this part:
date_groups
dates
2022-01-01
2021-12-26
2022-01-01
2021-12-27
2022-01-01
2021-12-28
2022-01-01
2021-12-29
2022-01-01
2021-12-30
2022-01-01
2021-12-31
2022-01-01
2022-01-01
last part is joining day column from your table with each date and then group it by the date group
select temp.date_groups as day
, avg(your_table.num_of_items) avg_last_7_days
from your_table
join temp on your_table.day = temp.dates
group by 1
You want a running average (AVG OVER)
select
day, amount,
avg(amount) over (order by day rows between 6 preceding and current row) as avg_amount
from mytable
order by day
offset 6;
I tried many different variations of getting the "running average" (which I now know is what I was looking for thanks to Thorsten's answer), but couldn't get the output I wanted exactly with my other columns (that weren't included in my original question) in the table, but this ended up working:
SELECT day, <other columns>, avg(amount) OVER (
PARTITION BY <other columns>
ORDER BY date(day) ASC
ROWS 6 PRECEDING) as avg_7_days_amount FROM table ORDER BY date(day) ASC

T-sql count number of times a week on rows with date interval

If you have table like this:
Name
Data type
UserID
INT
StartDate
DATETIME
EndDate
DATETIME
With data like this:
UserID
StartDate
EndDate
21
2021-01-02 00:00:00
2021-01-02 23:59:59
21
2021-01-03 00:00:00
2021-01-04 15:42:00
24
2021-01-02 00:00:00
2021-01-06 23:59:59
And you want to calculate number of users that is represented on each day in a week with a result like this:
Year
Week
NumberOfTimes
2021
1
8
2021
2
10
2021
3
4
Basically I want to to a Select like this:
SELECT YEAR(dateColumn) AS yearname, WEEK(dateColumn)as week name, COUNT(somecolumen)
GROUP BY YEAR(dateColumn) WEEK(dateColumn)
The problem I have is the start and end date if the date goes over several days I want it to counted each day. Preferably I don't want the same user counted twice each day. There are millions of rows that are constantly being deleted and added so speed is key.
The database is MS-SQL 2019
I would suggest a recursive CTE:
with cte as (
select userid, startdate, enddate
from t
union all
select userid, startdate,
enddate
from cte
where startdate < enddate and
week(startdate) <> week(enddate)
)
select year(startdate), week(startdate), count(*)
from cte
group by year(startdate), week(startdate)
option (maxrecursion 0);
The CTE expands the data by adding 7 days to each row. This should be one day per week.
There is a little logic in the second part to handle the situation where the enddate ends in the same week as the last start date. The above solution assumes that the dates are all in the same year -- which seems quite reasonable given the sample data. There are other ways to prevent this problem.
You need to cross-join each row with the relevant dates.
Create a calendar table with columns of years and weeks, include a start and end date of the week. See here for an example of how to create one, and make sure you index those columns.
Then you can cross-join like this
SELECT
YEAR(dateColumn) AS yearname,
WEEK(dateColumn)as weekname,
COUNT(somecolumen)
FROM Table t
JOIN CalendarWeek c ON c.StartDate >= t.StartDate AND c.EndDate <= t.EndDate
GROUP BY YEAR(dateColumn), WEEK(dateColumn)

Create empty output table with set number of rows with Teradata

I know how I would do this with something like SAS, but if I wanted to create a table that had as many rows as there were month intervals derived from this statement:
cast((cast(2017-03-31 as date) - cast(2016-01-31 as date) month(4)) as int) as date_range
....to give an output like this:
2017-03-31
2017-02-28
2017-01-31
2017-12-31
2017-11-30
2017-10-31
2017-09-30
2017-08-31
2017-07-31
2017-06-30
2017-05-31
2017-04-30
What statement would I need to do this in Teradata?
Thanks
Are those dates calulated based on existing columns?
Or do you just need that list?
In both cases you can utilze Teradata's proprietary EXPAND ON feature:
SELECT BEGIN(pd)
FROM SYS_CALENDAR.CALENDAR -- your table here
WHERE calendar_date = DATE -- EXPAND requires FROM, so this is just to get a single row
EXPAND ON PERIOD(date '2016-01-31' -- start date
,date '2017-03-31' + 1 -- end date (+1 because it's not included in the date range)
) AS pd BY ANCHOR PERIOD MONTH_END -- one row for each month end within the period
It is safer to get the first of the month because adding months to the last day of the month can be problematic (if you started with '2016-02-29' you would get the 29th of succeeding months).
You can do what you want with a recursive cte:
with recursive cte(dte) as (
select cast('2016-02-01' as date)
union all
select add_months(cte.dte, 1)
from cte
where dte <= '2017-05-01'
),
dates as (
select dte - interval '1 day'
from cte
)
. . .