Get days of the week from a date range in Postgres - sql

So I have the following table :
id end_date name number_of_days start_date
1 "2022-01-01" holiday1 1 "2022-01-01"
2 "2022-03-20" holiday2 1 "2022-03-20"
3 "2022-04-09" holiday3 1 "2022-04-09"
4 "2022-05-01" holiday4 1 "2022-05-01"
5 "2022-05-04" holiday5 3 "2022-05-02"
6 "2022-07-12" holiday6 9 "2022-07-20"
I want to check if a week falls in a holiday range.
So far I can select the holidays that overlap with my choosen week( week_start_date, week_end_date) , but i cant get the exact days in which the overlap happens.
this is the query i'm using, i want to add a mechanism to detect the DAYS OF THE WEEK IN WHICH THE OVERLAP HAPPENS
SELECT * FROM holidays
where daterange(CAST(start_date AS date), CAST(end_date as date), '[]') && daterange('2022-07-18', '2022-07-26','[]')
THE CURRENT QUERY RETURNS THE OVERLLAPPING HOLIDA, (id = 6), however i'm trying to get the exact DAYS OF THE WEEK in which the overlap happens ( in this case, it should be monday,tuesday , wednesday)

You can use the * operator with tsranges, generate a series of dates with the lower and upper dates and finally with to_char print the days of the week, e.g.
SELECT
id, name, start_date, end_date, array_agg(dow) AS days
FROM (
SELECT *,
trim(
to_char(
generate_series(lower(overlap), upper(overlap),'1 day'),
'Day')) AS dow
FROM holidays
CROSS JOIN LATERAL (SELECT tsrange(start_date,end_date) *
tsrange('2022-07-18', '2022-07-26')) t (overlap)
WHERE tsrange(start_date,end_date) && tsrange('2022-07-18', '2022-07-26')) j
GROUP BY id,name,start_date,end_date,number_of_days;
id | name | start_date | end_date | days
----+----------+------------+------------+----------------------------
6 | holiday6 | 2022-07-12 | 2022-07-20 | {Monday,Tuesday,Wednesday}
(1 row)
Demo: db<>fiddle

Related

Query a 30 day interval for every 30 day interval in the last year

I want to query every 30 day interval in 2021, but I don't know how to do it without a for loop in SQL.
Here's psuedo code of what I want to do with a table called _table and a date column called application_date:
for _day in range(335):
select '2021-01-01' + _day as start_date, count(*) as _count
from _table
where '2021-01-01' + _day <= application_date <= ('2021-01-01' + _day + interval '30' day )
It would output something like this:
start_date
_count
2021-01-01
{number of rows between 2021-01-01 and 2021-01-31}
2021-01-02
{number of rows between 2021-01-02 and 2021-02-01}
...
...
2021-11-31
{number of rows between 2021-11-31 and 2021-12-30}
2021-12-01
{number of rows between 2021-12-01 and 2021-12-31}
Assuming that you have rows for each day you can group data by date, count it in the group and then use sum window function with range of 30 rows (current + next 30 rows, note that {rows between 2021-01-01 and 2021-01-31} have interval of 31 day, not 30):
-- sample data
WITH dataset(start_date) AS (
VALUES (date '2021-01-01'),
(date '2021-01-01'),
(date '2021-01-01'),
(date '2021-01-02'),
(date '2021-01-03'),
(date '2021-01-03')
)
-- query
select start_date
, sum(cnt) over (order by start_date ROWS BETWEEN CURRENT ROW AND 30 FOLLOWING) rolling_count_31_days
from (
select start_date
, count(*) cnt
from dataset
where year(start_date) = 2021
group by start_date
)
Output:
start_date
rolling_count_31_days
2021-01-01
6
2021-01-02
3
2021-01-03
2
If some dates are missing - checkout this or this answer describing how to insert missing dates and insert dates into the group result with cnt set to 0.
Note that Trino (the new name for PrestoSQL) updated support for RANGE frame type and you can implement this without need to insert missing rows.

Get count of susbcribers for each month in current year even if count is 0

I need to get the count of new subscribers each month of the current year.
DB Structure: Subscriber(subscriber_id, create_timestamp, ...)
Expected result:
date | count
-----------+------
2021-01-01 | 3
2021-02-01 | 12
2021-03-01 | 0
2021-04-01 | 8
2021-05-01 | 0
I wrote the following query:
SELECT
DATE_TRUNC('month',create_timestamp)
AS create_timestamp,
COUNT(subscriber_id) AS count
FROM subscriber
GROUP BY DATE_TRUNC('month',create_timestamp);
Which works but does not include months where the count is 0. It's only returning the ones that are existing in the table. Like:
"2021-09-01 00:00:00" 3
"2021-08-01 00:00:00" 9
First subquery is used for retrieving year wise each month row then LEFT JOIN with another subquery which is used to retrieve month wise total_count. COALESCE() is used for replacing NULL value to 0.
-- PostgreSQL (v11)
SELECT t.cdate
, COALESCE(p.total_count, 0) total_count
FROM (select generate_series('2021-01-01'::timestamp, '2021-12-15', '1 month') as cdate) t
LEFT JOIN (SELECT DATE_TRUNC('month',create_timestamp) create_timestamp
, SUM(subscriber_id) total_count
FROM subscriber
GROUP BY DATE_TRUNC('month',create_timestamp)) p
ON t.cdate = p.create_timestamp
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=20dcf6c1784ed0d9c5772f2487bcc221
get the count of new subscribers each month of the current year
SELECT month::date, COALESCE(s.count, 0) AS count
FROM generate_series(date_trunc('year', LOCALTIMESTAMP)
, date_trunc('year', LOCALTIMESTAMP) + interval '11 month'
, interval '1 month') m(month)
LEFT JOIN (
SELECT date_trunc('month', create_timestamp) AS month
, count(*) AS count
FROM subscriber
GROUP BY 1
) s USING (month);
db<>fiddle here
That's assuming every row is a "new subscriber". So count(*) is simplest and fastest.
See:
Join a count query on generate_series() and retrieve Null values as '0'
Generating time series between two dates in PostgreSQL

Find the first missing date in a column (Oracle)

I need to find the first missing date in a date column from plan_table table. which should not be in holiday_table or it should be belongs to any week end.
holiday_table stores all the holiday dates.
Plan_table contains dates. here we have to find the first missing date
Plan_id Date
1 10/2/2016
2 10/3/2016
3 10/6/2016
4 10/9/2016
5 10/10/2016
6 10/12/2016
7 10/13/2016
8 10/16/2016
Here the first missing date is 10/4/2016, but if this date is in holiday_table then we have to show 10/5/2016 or next first occurrence..
Please help me to write a query for the same.
you can use the LEAD analytic function like this
select d
from
(
select
date + 1 as d
from
(
select
date,
lead(date) over(order by date) as next_date
from
(
select date from plan_table
union
select date from holliday_table
)
order by date
)
where
trunc(date) + 1 < trunc(next_date)
order by d
)
where rownum = 1
;

Total Number of Records per Week

I have a Postgres 9.1 database. I am trying to generate the number of records per week (for a given date range) and compare it to the previous year.
I have the following code used to generate the series:
select generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
However, I am not sure how to join the counted records to the dates generated.
So, using the following records as an example:
Pt_ID exam_date
====== =========
1 2012-01-02
2 2012-01-02
3 2012-01-08
4 2012-01-08
1 2013-01-02
2 2013-01-02
3 2013-01-03
4 2013-01-04
1 2013-01-08
2 2013-01-10
3 2013-01-15
4 2013-01-24
I wanted to have the records return as:
series thisyr lastyr
=========== ===== =====
2013-01-01 4 2
2013-01-08 3 2
2013-01-15 1 0
2013-01-22 1 0
2013-01-29 0 0
Not sure how to reference the date range in the subsearch. Thanks for any assistance.
The simple approach would be to solve this with a CROSS JOIN like demonstrated by #jpw. However, there are some hidden problems:
The performance of an unconditional CROSS JOIN deteriorates quickly with growing number of rows. The total number of rows is multiplied by the number of weeks you are testing for, before this huge derived table can be processed in the aggregation. Indexes can't help.
Starting weeks with January 1st leads to inconsistencies. ISO weeks might be an alternative. See below.
All of the following queries make heavy use of an index on exam_date. Be sure to have one.
Only join to relevant rows
Should be much faster:
SELECT d.day, d.thisyr
, count(t.exam_date) AS lastyr
FROM (
SELECT d.day::date, (d.day - '1 year'::interval)::date AS day0 -- for 2nd join
, count(t.exam_date) AS thisyr
FROM generate_series('2013-01-01'::date
, '2013-01-31'::date -- last week overlaps with Feb.
, '7 days'::interval) d(day) -- returns timestamp
LEFT JOIN tbl t ON t.exam_date >= d.day::date
AND t.exam_date < d.day::date + 7
GROUP BY d.day
) d
LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year
AND t.exam_date < d.day0 + 7
GROUP BY d.day, d.thisyr
ORDER BY d.day;
This is with weeks starting from Jan. 1st like in your original. As commented, this produces a couple of inconsistencies: Weeks start on a different day each year and since we cut off at the end of the year, the last week of the year consists of just 1 or 2 days (leap year).
The same with ISO weeks
Depending on requirements, consider ISO weeks instead, which start on Mondays and always span 7 days. But they cross the border between years. Per documentation on EXTRACT():
week
The number of the week of the year that the day is in. By definition (ISO 8601), weeks start on Mondays and the first week of a
year contains January 4 of that year. In other words, the first
Thursday of a year is in week 1 of that year.
In the ISO definition, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for
late-December dates to be part of the first week of the next year. For
example, 2005-01-01 is part of the 53rd week of year 2004, and
2006-01-01 is part of the 52nd week of year 2005, while 2012-12-31 is
part of the first week of 2013. It's recommended to use the isoyear
field together with week to get consistent results.
Above query rewritten with ISO weeks:
SELECT w AS isoweek
, day::text AS thisyr_monday, thisyr_ct
, day0::text AS lastyr_monday, count(t.exam_date) AS lastyr_ct
FROM (
SELECT w, day
, date_trunc('week', '2012-01-04'::date)::date + 7 * w AS day0
, count(t.exam_date) AS thisyr_ct
FROM (
SELECT w
, date_trunc('week', '2013-01-04'::date)::date + 7 * w AS day
FROM generate_series(0, 4) w
) d
LEFT JOIN tbl t ON t.exam_date >= d.day
AND t.exam_date < d.day + 7
GROUP BY d.w, d.day
) d
LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year
AND t.exam_date < d.day0 + 7
GROUP BY d.w, d.day, d.day0, d.thisyr_ct
ORDER BY d.w, d.day;
January 4th is always in the first ISO week of the year. So this expression gets the date of Monday of the first ISO week of the given year:
date_trunc('week', '2012-01-04'::date)::date
Simplify with EXTRACT()
Since ISO weeks coincide with the week numbers returned by EXTRACT(), we can simplify the query. First, a short and simple form:
SELECT w AS isoweek
, COALESCE(thisyr_ct, 0) AS thisyr_ct
, COALESCE(lastyr_ct, 0) AS lastyr_ct
FROM generate_series(1, 5) w
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS thisyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2013
GROUP BY 1
) t13 USING (w)
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS lastyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2012
GROUP BY 1
) t12 USING (w);
Optimized query
The same with more details and optimized for performance
WITH params AS ( -- enter parameters here, once
SELECT date_trunc('week', '2012-01-04'::date)::date AS last_start
, date_trunc('week', '2013-01-04'::date)::date AS this_start
, date_trunc('week', '2014-01-04'::date)::date AS next_start
, 1 AS week_1
, 5 AS week_n -- show weeks 1 - 5
)
SELECT w.w AS isoweek
, p.this_start + 7 * (w - 1) AS thisyr_monday
, COALESCE(t13.ct, 0) AS thisyr_ct
, p.last_start + 7 * (w - 1) AS lastyr_monday
, COALESCE(t12.ct, 0) AS lastyr_ct
FROM params p
, generate_series(p.week_1, p.week_n) w(w)
LEFT JOIN (
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.this_start -- only relevant dates
AND t.exam_date < p.this_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.next_start -- don't cross over into next year
GROUP BY 1
) t13 USING (w)
LEFT JOIN ( -- same for last year
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.last_start
AND t.exam_date < p.last_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.this_start
GROUP BY 1
) t12 USING (w);
This should be very fast with index support and can easily be adapted to intervals of choice.
The implicit JOIN LATERAL for generate_series() in the last query requires Postgres 9.3.
SQL Fiddle.
Using across joinshould work, I'm just going to paste the markdown output from SQL Fiddle below. It would seem that your sample output is incorrect for series 2013-01-08: the thisyr should be 2, not 3. This might not be the best way to do this though, my Postgresql knowledge leaves a lot to be desired.
SQL Fiddle
PostgreSQL 9.2.4 Schema Setup:
CREATE TABLE Table1
("Pt_ID" varchar(6), "exam_date" date);
INSERT INTO Table1
("Pt_ID", "exam_date")
VALUES
('1', '2012-01-02'),('2', '2012-01-02'),
('3', '2012-01-08'),('4', '2012-01-08'),
('1', '2013-01-02'),('2', '2013-01-02'),
('3', '2013-01-03'),('4', '2013-01-04'),
('1', '2013-01-08'),('2', '2013-01-10'),
('3', '2013-01-15'),('4', '2013-01-24');
Query 1:
select
series,
sum (
case
when exam_date
between series and series + '6 day'::interval
then 1
else 0
end
) as thisyr,
sum (
case
when exam_date + '1 year'::interval
between series and series + '6 day'::interval
then 1 else 0
end
) as lastyr
from table1
cross join generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
group by series
order by series
Results:
| SERIES | THISYR | LASTYR |
|--------------------------------|--------|--------|
| January, 01 2013 00:00:00+0000 | 4 | 2 |
| January, 08 2013 00:00:00+0000 | 2 | 2 |
| January, 15 2013 00:00:00+0000 | 1 | 0 |
| January, 22 2013 00:00:00+0000 | 1 | 0 |
| January, 29 2013 00:00:00+0000 | 0 | 0 |

Calculating difference between daily sum and a average for the same day of the week in defined time range. SQL 10g Oracle

Hi I'm working with data depending mostly on the day of the week. Data is formatted in a table
Date - position - count/number.
There are multiple different positions.
I was able to sort my data for a each day of the week using.
select MOD(to_char(time, 'J'),7),
sum(COUNT))
from TABLE
where time > sysdate -x
group by to_char(time, 'J')
order by to_char(time, 'J');
This outputs daily sums according to day of the week.
Now I'm able to get an average for a single day of a week in a year.
This code outputs an average for only Sunday
SELECT AVG(asset_sums)
FROM (
select MOD(to_char(time, 'J'),7),
sum(COUNT)) as asset_sums
from table
where time > sysdate -365
and MOD(TO_CHAR(time, 'J'), 7) + 1 IN (7)
group by to_char(time, 'J')
order by to_char(time, 'J')
);
My goal is to be able to get a table with daily sum compared with yearly average for that particular day of the week.
For example yearly average number for Mondays is 57 , Tuesdays 60.
This week my Monday is 59 and Tuesday is 57. Output of the table is
Monday +2, Tuesday -3.
What is the easiest way / most efficient ?
Thanks for your help.
Edit : Format of my data
Date : yyyy-mm-dd | Place : xxxx | Number( of customers) 0 to 10000
2013-09-16 | AAAA | 1534
2013-09-16 | AAAB | 534
2013-09-17 | AAAA | 1434
2013-09-17 | AAAC | 834
2013-09-18 | AAAA | 134
2013-09-18 | AAAD | 183
Needed output
2013-09-16 | Day of the week | Sum | Average monday this year | Difference Sum-AVG
2013-09-16 | 1 (= Monday) | 2068 | 2015| 53
For clarity I will use subquery factoring. First, select the current weeks data. Next, subquery the sum for the day over the current week. Then, subquery the sum for each day over the past year. Then, average the daily sum of each day for each day of the week. Finally, join the two and display the difference.
with
this_week as (
select
time
from table
where time > x - 7
group by time
),
this_week_dly_sum as (
select
to_char(time, 'd') day,
sum(count) sum
from this_week
group by to_char(time, 'd')
),
this_year_dly_sum as (
select
time,
sum(count) sum
from table
where time > x - 365
group by time
),
this_year_dly_avg as (
select
to_char(day, 'd'),
avg(sum) avg
from this_year_dly_sum
group by to_char(day, 'd')
)
select
this_week.time,
to_char(this_week.time, 'day') day of week,
this_week_dly_sum.sum,
this_year_dly_avg.avg,
this_week_dly_sum.sum - this_year_dly_avg.avg difference
from this_week
inner join this_week_dly_sum
on to_char(this_week.time, 'd') = this_week_dly_sum.day
inner join this_year_dly_avg
on to_char(this_week.time, 'd').day = this_year_dly_avg.
group by time
;
You can use analytic function for this.
select date1, to_char(date1, 'd'),
sum(val) over(partition by to_char(date1, 'd')),
avg(val) over(partition by to_char(date1, 'd')),
sum(val) over(partition by to_char(date1, 'd'))-
avg(val) over(partition by to_char(date1, 'd'))
from table1
time > add_month(sysdate,-12);
This will give you daily counts for the last year:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
You can modify it to additionally return averages per day of the week for the specified range:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
At this point you have all the initial data you need but probably for more days than necessary. You can use the above query as a derived table to limit the rows to just those where date > SYSDATE - x:
WITH last_year_by_day AS
(
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
)
SELECT date,
TO_CHAR(TRUNC(time, 'DD'), 'D') AS day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
As some expressions are being repeated multiple times, it can be a good idea to re-factor the query to avoid the repetition. Here's one way:
WITH last_year AS
(
SELECT TRUNC(time, 'DD') AS date,
TO_CHAR(time, 'D') AS day_of_week,
count
FROM yourtable
WHERE time > SYSDATE - 365
),
last_year_by_day AS
(
SELECT date,
day_of_week,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER (PARTITION BY day_of_week) AS asset_sum_avg
FROM last_year
GROUP BY date, day_of_week
)
SELECT date,
day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
One last note is about TO_CHAR('D'), which is used to obtain the day_of_week values. Since you are using a different method for the same results, you may not be aware that the results of TO_CHAR('D') are affected by the NLS_TERRITORY setting. You may want to use an ALTER SESSION statement to set NLS_TERRITORY to the value that would cause TO_CHAR('D') to return 1 for Monday, 2 for Tuesday etc. Here is the list of territories supported.