Query a 30 day interval for every 30 day interval in the last year - sql

I want to query every 30 day interval in 2021, but I don't know how to do it without a for loop in SQL.
Here's psuedo code of what I want to do with a table called _table and a date column called application_date:
for _day in range(335):
select '2021-01-01' + _day as start_date, count(*) as _count
from _table
where '2021-01-01' + _day <= application_date <= ('2021-01-01' + _day + interval '30' day )
It would output something like this:
start_date
_count
2021-01-01
{number of rows between 2021-01-01 and 2021-01-31}
2021-01-02
{number of rows between 2021-01-02 and 2021-02-01}
...
...
2021-11-31
{number of rows between 2021-11-31 and 2021-12-30}
2021-12-01
{number of rows between 2021-12-01 and 2021-12-31}

Assuming that you have rows for each day you can group data by date, count it in the group and then use sum window function with range of 30 rows (current + next 30 rows, note that {rows between 2021-01-01 and 2021-01-31} have interval of 31 day, not 30):
-- sample data
WITH dataset(start_date) AS (
VALUES (date '2021-01-01'),
(date '2021-01-01'),
(date '2021-01-01'),
(date '2021-01-02'),
(date '2021-01-03'),
(date '2021-01-03')
)
-- query
select start_date
, sum(cnt) over (order by start_date ROWS BETWEEN CURRENT ROW AND 30 FOLLOWING) rolling_count_31_days
from (
select start_date
, count(*) cnt
from dataset
where year(start_date) = 2021
group by start_date
)
Output:
start_date
rolling_count_31_days
2021-01-01
6
2021-01-02
3
2021-01-03
2
If some dates are missing - checkout this or this answer describing how to insert missing dates and insert dates into the group result with cnt set to 0.
Note that Trino (the new name for PrestoSQL) updated support for RANGE frame type and you can implement this without need to insert missing rows.

Related

Get days of the week from a date range in Postgres

So I have the following table :
id end_date name number_of_days start_date
1 "2022-01-01" holiday1 1 "2022-01-01"
2 "2022-03-20" holiday2 1 "2022-03-20"
3 "2022-04-09" holiday3 1 "2022-04-09"
4 "2022-05-01" holiday4 1 "2022-05-01"
5 "2022-05-04" holiday5 3 "2022-05-02"
6 "2022-07-12" holiday6 9 "2022-07-20"
I want to check if a week falls in a holiday range.
So far I can select the holidays that overlap with my choosen week( week_start_date, week_end_date) , but i cant get the exact days in which the overlap happens.
this is the query i'm using, i want to add a mechanism to detect the DAYS OF THE WEEK IN WHICH THE OVERLAP HAPPENS
SELECT * FROM holidays
where daterange(CAST(start_date AS date), CAST(end_date as date), '[]') && daterange('2022-07-18', '2022-07-26','[]')
THE CURRENT QUERY RETURNS THE OVERLLAPPING HOLIDA, (id = 6), however i'm trying to get the exact DAYS OF THE WEEK in which the overlap happens ( in this case, it should be monday,tuesday , wednesday)
You can use the * operator with tsranges, generate a series of dates with the lower and upper dates and finally with to_char print the days of the week, e.g.
SELECT
id, name, start_date, end_date, array_agg(dow) AS days
FROM (
SELECT *,
trim(
to_char(
generate_series(lower(overlap), upper(overlap),'1 day'),
'Day')) AS dow
FROM holidays
CROSS JOIN LATERAL (SELECT tsrange(start_date,end_date) *
tsrange('2022-07-18', '2022-07-26')) t (overlap)
WHERE tsrange(start_date,end_date) && tsrange('2022-07-18', '2022-07-26')) j
GROUP BY id,name,start_date,end_date,number_of_days;
id | name | start_date | end_date | days
----+----------+------------+------------+----------------------------
6 | holiday6 | 2022-07-12 | 2022-07-20 | {Monday,Tuesday,Wednesday}
(1 row)
Demo: db<>fiddle

How to work out overlap of the union of date intervals in BigQuery

In BigQuery, given a table of date intervals, how can I find the overlap of their union with a single date interval of interest?
For example, given a table of date intervals (call this table A) as:
start_date end_date
2021-02-01 2021-05-01
2021-04-01 2021-07-01
2020-12-01 2021-03-01
2021-09-01 2021-12-01
And the single date interval of interest (call this table B) as:
start_date end_date
2021-01-01 2021-11-01
I would like to calculate the overlap between the intervals in A with the interval in B as 8 months.
When A's intervals are disjoint, I can solve this with the following:
SELECT
SUM(GREATEST(0, DATE_DIFF(LEAST(B.end_date, A.end_date),
GREATEST(B.start_date,A.start_date), MONTH)))
AS months_overlap
FROM
A, B
The problem comes in when the date intervals in A overlap with each other, as in the above example, in which case the above code double counts overlapping intervals in A i.e. it will return 10 months for the above example.
Any suggestions on how to calculate the overlap of these intervals without double counting? I thought about introducing Lags into the date diff function but I'm not coming right.
Consider below approach
select count(1) as months_overlap
from (
select distinct date_trunc(day, month) month
from tableA, unnest(generate_date_array(start_date, end_date - 1)) day
)
join (
select distinct date_trunc(day, month) month
from tableB, unnest(generate_date_array(start_date, end_date - 1)) day
)
using(month)
if applied to sample data in your question - output is
One approach is to expand the various intervals into months, join and count:
with b as (
select mon
from b cross join
unnest(generate_date_array(b.start_date, b.end_date, interval 1 month)) mon
),
a as (
select mon
from a cross join
unnest(generate_date_array(a.start_date, a.end_date, interval 1 month)) mon
)
select count(distinct mon)
from a join
b
using (mon);

Count days from start_date to end_date or end of month

With datediff() I can count the days between two dates, but how can I count the days between the later date or the end of the month and the start date?
CREATE TABLE table1 (id int, start_date datetime, end_date datetime, jan int);
INSERT INTO table1 (id, start_date, end_date) VALUES
(1, '2016-12-12', '2017-01-17'),
(2, '2017-01-10', '2017-01-10'),
(3, '2017-01-10', '2017-02-10'),
(4, '2017-01-03', '2017-02-03'),
(5, '2016-12-03', '2017-02-03');
If I run:
select id, month(start_date) as month, datediff(end_date, start_date) as diff
from table1;
it returns
id month diff
1 12 36
2 1 0
3 1 31
4 1 31
5 12 62
but I would like it to return:
id month diff
1 12 19
5 12 28
1 1 17
2 1 0
3 1 21
4 1 28
5 1 31
3 2 10
4 2 3
5 2 3
I'm trying to get the amount of days in a month a event occurs by month.
I've created a separated query to update a new column with the values, but ideally it shouldn't have a new column, since I would need several new columns for each year-month combination and one for each year-month combination:
update table1 set jan= case
when start_date >= "2017-01-01" and end_date <= last_day("2017-01-01") then datediff(end_date, start_date)+1
when start_date >= "2017-01-01" and start_date <= last_day("2017-01-01") and end_date > last_day("2017-01-01") then datediff(last_day("2017-01-01"), start_date)+1
when start_date < "2017-01-01" and end_date between "2017-01-01" and last_day("2017-01-01") then datediff(end_date, "2017-01-01")+1
when start_date < "2017-01-01" and end_date > last_day("2017-01-01") then day(last_day("2017-01-01"))
else null
end;
Your problem is going to be getting multiple rows... so let's take a different tack.
This ends up being trivial if you have a calendar table: a table with a row-per-date (and a bunch of individual columns and indices):
SELECT Table1.id, Calendar.calendar_month, COUNT(*)
FROM Table1
JOIN Calendar
ON Calendar.calendar_date >= start_date
AND Calendar.calendar_date < end_date
GROUP BY Table1.id, Calendar.calendar_month
ORDER BY Table1.id, MIN(Calendar.calendar_date)
Fiddle Demo
I don't know if this is what you're looking for.
select month(start_date) as month,
datediff(LAST_DAY(start_date), start_date) as diff
from table1
UNION ALL
select month(end_date) as month,
IF(end_date < LAST_DAY(start_date), datediff(start_date, end_date),
datediff(end_date, LAST_DAY(start_date)))
from table1;
DEMO

Find the first missing date in a column (Oracle)

I need to find the first missing date in a date column from plan_table table. which should not be in holiday_table or it should be belongs to any week end.
holiday_table stores all the holiday dates.
Plan_table contains dates. here we have to find the first missing date
Plan_id Date
1 10/2/2016
2 10/3/2016
3 10/6/2016
4 10/9/2016
5 10/10/2016
6 10/12/2016
7 10/13/2016
8 10/16/2016
Here the first missing date is 10/4/2016, but if this date is in holiday_table then we have to show 10/5/2016 or next first occurrence..
Please help me to write a query for the same.
you can use the LEAD analytic function like this
select d
from
(
select
date + 1 as d
from
(
select
date,
lead(date) over(order by date) as next_date
from
(
select date from plan_table
union
select date from holliday_table
)
order by date
)
where
trunc(date) + 1 < trunc(next_date)
order by d
)
where rownum = 1
;

Calculating difference between daily sum and a average for the same day of the week in defined time range. SQL 10g Oracle

Hi I'm working with data depending mostly on the day of the week. Data is formatted in a table
Date - position - count/number.
There are multiple different positions.
I was able to sort my data for a each day of the week using.
select MOD(to_char(time, 'J'),7),
sum(COUNT))
from TABLE
where time > sysdate -x
group by to_char(time, 'J')
order by to_char(time, 'J');
This outputs daily sums according to day of the week.
Now I'm able to get an average for a single day of a week in a year.
This code outputs an average for only Sunday
SELECT AVG(asset_sums)
FROM (
select MOD(to_char(time, 'J'),7),
sum(COUNT)) as asset_sums
from table
where time > sysdate -365
and MOD(TO_CHAR(time, 'J'), 7) + 1 IN (7)
group by to_char(time, 'J')
order by to_char(time, 'J')
);
My goal is to be able to get a table with daily sum compared with yearly average for that particular day of the week.
For example yearly average number for Mondays is 57 , Tuesdays 60.
This week my Monday is 59 and Tuesday is 57. Output of the table is
Monday +2, Tuesday -3.
What is the easiest way / most efficient ?
Thanks for your help.
Edit : Format of my data
Date : yyyy-mm-dd | Place : xxxx | Number( of customers) 0 to 10000
2013-09-16 | AAAA | 1534
2013-09-16 | AAAB | 534
2013-09-17 | AAAA | 1434
2013-09-17 | AAAC | 834
2013-09-18 | AAAA | 134
2013-09-18 | AAAD | 183
Needed output
2013-09-16 | Day of the week | Sum | Average monday this year | Difference Sum-AVG
2013-09-16 | 1 (= Monday) | 2068 | 2015| 53
For clarity I will use subquery factoring. First, select the current weeks data. Next, subquery the sum for the day over the current week. Then, subquery the sum for each day over the past year. Then, average the daily sum of each day for each day of the week. Finally, join the two and display the difference.
with
this_week as (
select
time
from table
where time > x - 7
group by time
),
this_week_dly_sum as (
select
to_char(time, 'd') day,
sum(count) sum
from this_week
group by to_char(time, 'd')
),
this_year_dly_sum as (
select
time,
sum(count) sum
from table
where time > x - 365
group by time
),
this_year_dly_avg as (
select
to_char(day, 'd'),
avg(sum) avg
from this_year_dly_sum
group by to_char(day, 'd')
)
select
this_week.time,
to_char(this_week.time, 'day') day of week,
this_week_dly_sum.sum,
this_year_dly_avg.avg,
this_week_dly_sum.sum - this_year_dly_avg.avg difference
from this_week
inner join this_week_dly_sum
on to_char(this_week.time, 'd') = this_week_dly_sum.day
inner join this_year_dly_avg
on to_char(this_week.time, 'd').day = this_year_dly_avg.
group by time
;
You can use analytic function for this.
select date1, to_char(date1, 'd'),
sum(val) over(partition by to_char(date1, 'd')),
avg(val) over(partition by to_char(date1, 'd')),
sum(val) over(partition by to_char(date1, 'd'))-
avg(val) over(partition by to_char(date1, 'd'))
from table1
time > add_month(sysdate,-12);
This will give you daily counts for the last year:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
You can modify it to additionally return averages per day of the week for the specified range:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
At this point you have all the initial data you need but probably for more days than necessary. You can use the above query as a derived table to limit the rows to just those where date > SYSDATE - x:
WITH last_year_by_day AS
(
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
)
SELECT date,
TO_CHAR(TRUNC(time, 'DD'), 'D') AS day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
As some expressions are being repeated multiple times, it can be a good idea to re-factor the query to avoid the repetition. Here's one way:
WITH last_year AS
(
SELECT TRUNC(time, 'DD') AS date,
TO_CHAR(time, 'D') AS day_of_week,
count
FROM yourtable
WHERE time > SYSDATE - 365
),
last_year_by_day AS
(
SELECT date,
day_of_week,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER (PARTITION BY day_of_week) AS asset_sum_avg
FROM last_year
GROUP BY date, day_of_week
)
SELECT date,
day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
One last note is about TO_CHAR('D'), which is used to obtain the day_of_week values. Since you are using a different method for the same results, you may not be aware that the results of TO_CHAR('D') are affected by the NLS_TERRITORY setting. You may want to use an ALTER SESSION statement to set NLS_TERRITORY to the value that would cause TO_CHAR('D') to return 1 for Monday, 2 for Tuesday etc. Here is the list of territories supported.