SQL Query to group dates and includes different dates in the aggregation - sql

I have a table with two columns, dates and number of searches in each date. What I want to do is group by the dates, and find the sum of number of searches for each date.
The trick is that for each group, I also want to include the number of searches for the date exactly the following week, and the number of searches for the date exactly the previous week.
So If I have
Date
Searches
2/3/2023
2
2/10/2023
4
2/17/2023
1
2/24/2023
5
I want the output for the 2/10/2023 and 2/17/2023 groups to be
Date
Sum
2/10/2023
7
2/17/2023
10
How can I write a query for this?

You can use a correlated query for this:
select date, (
select sum(searches)
from t as x
where x.date between t.date - interval '7 day' and t.date + interval '7 day'
) as sum_win
from t
Replace interval 'x day' with the appropriate date add function for your RDBMS.
If your RDBMS supports interval in window functions then a much better solution would be:
select date, sum(searches) over (
order by date
range between interval '7 day' preceding and interval '7 day' following
) as sum_win
from t

Assuming weekly rows
CREATE TABLE Table1
([Dates] date, [Searches] int)
;
INSERT INTO Table1
([Dates], [Searches])
VALUES
('2023-02-03 00:00:00', 2),
('2023-02-10 00:00:00', 4),
('2023-02-17 00:00:00', 1),
('2023-02-24 00:00:00', 5)
;
;with cte as (
select dates
, searches
+ lead(searches) over(order by dates)
+ lag(searches) over(order by dates) as sum_searches
from table1)
select * from cte
where sum_searches is not null;
dates
sum_searches
2023-02-10
7
2023-02-17
10
fiddle

Related

Comparing Selected Date Range With Current & Previous Month

I would like to provide the user to option to select a range of date in current month and results should be comparison of same date range for current & previous month.
Eg. selected date 1-12-2022 to 15-12-2022
Result:
Count X 1-11-2022 to 15-11-2022
Count X 1-12-200 to 15-12-2022
Can this be achieved through date_part function?
Suppose you have a column with type date named ts in your table:
SELECT
count(*) FILTER ( WHERE ts BETWEEN cast(:lower AS DATE) - INTERVAL '1 month' AND cast(:upper AS DATE) - INTERVAL '1 month') previous,
count(*) FILTER ( WHERE ts BETWEEN cast(:lower AS DATE) AND cast(:upper AS DATE) ) selected,
count(*) FILTER ( WHERE ts BETWEEN cast(:lower AS DATE) + INTERVAL '1 month' AND cast(:upper AS DATE) + INTERVAL '1 month' ) next
FROM your_table;
You just need to provide the lower and upper bound values as dates (e.g. '01-12-2022'.
This will give you 3 columns -- previous, selected and next -- with the corresponding row-counts.
BTW: the upper bound is exclusive.

Rewrite PostgreSQL query using CTE:

I have the following code to pull records from a daterange in PostgreSQL, it works as intended. The "end date" is determined by the "date" column from the last record, and the "start date" is calculated by subtracting a 7-day interval from the "end date".
SELECT date
FROM files
WHERE daterange((
(SELECT date FROM files ORDER BY date DESC LIMIT 1) - interval '7 day')::date, -- "start date"
(SELECT date FROM files ORDER BY date DESC LIMIT 1)::date, -- "end date"
'(]') #> date::date
ORDER BY date ASC
I'm trying to rewrite this query using CTEs, so I can replace those subqueries with values such as end_date and start_date. Is this possible using this method or should I look for other alternatives like variables? I'm still learning SQL.
WITH end_date AS
(
SELECT date FROM files ORDER BY date DESC LIMIT 1
),
start_date AS
(
SELECT date FROM end_date - INTERVAL '7 day'
)
SELECT date
FROM files
WHERE daterange(
start_date::date,
end_date::date,
'(]') #> date::date
ORDER BY date ASC
Right now I'm getting the following error:
ERROR: syntax error at or near "-"
LINE 7: SELECT date FROM end_date - INTERVAL '7 day'
You do not need two CTEs, it's one just fine, which can be joined to filter data.
WITH RECURSIVE files AS (
SELECT CURRENT_DATE date, 1 some_value
UNION ALL
SELECT (date + interval '1 day')::date, some_value + 1 FROM files
WHERE date < (CURRENT_DATE + interval '1 month')::date
),
dates AS (
SELECT
(MAX(date) - interval '7 day')::date from_date,
MAX(date) to_date
FROM files
)
SELECT f.* FROM files f
JOIN dates d ON daterange(d.from_date, d.to_date, '(]') #> f.date
You even can make it to be a daterange initially in CTE and use it later like this
WITH dates AS (
SELECT
daterange((MAX(date) - interval '7 day')::date, MAX(date), '(]') range
FROM files
)
SELECT f.* FROM files f
JOIN dates d ON d.range #> f.date
Here the first CTE is used just to generate some data.
It will get all file lines for dates in the last week, excluding from_date and including to_date.
date
some_value
2022-09-26
25
2022-09-27
26
2022-09-28
27
2022-09-29
28
2022-09-30
29
2022-10-01
30
2022-10-02
31
I think this is what you want:
WITH end_date AS
(
SELECT date FROM files ORDER BY date DESC LIMIT 1
),
start_date AS
(
SELECT date - INTERVAL '7 day' as date
FROM end_date
)
SELECT F.date, S.date startDate, E.date endDate
FROM files F
JOIN start_date S on F.date >= S.date
JOIN end_date E on F.date <= E.date
ORDER BY date ASC;
I hope I'm not repeating anything, but if I understand your problem correctly I think this will work:
with cte as (
select max (date)::date as max_date from files
)
select date
from files
cross join cte
where date >= max_date - 7
Or perhaps even:
select date
from files
where date >= (select max (date)::date - 7 from files)
Since you have already determined that the CTE has the max date, there is really no need to further bound it with a between, <= or range. You can simply say anything after that date minus 7 days.
The error in your code above is because you want this:
SELECT date - INTERVAL '7 day' as date FROM end_date
And not this:
SELECT date FROM end_date - INTERVAL '7 day'
You are subtracting from the table, which doesn't make sense.

create table with dates - sql

I have a query that can create a table with dates like below:
with digit as (
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
),
seq as (
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
)
select (last_day(sysdate)::date - seq.num)::date as "Date"
from seq;
How could this be changed to generate only dates
Thanks
demo:db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
generate_series(first_day_of_month, last_day_of_month, interval '1 day')::date
FROM dates
date_trunc() truncates a type date (or timestamp) to a certain date part. date_trunc('month', ...) removes all parts but year and month. All other parts are set to their lowest possible values. So, the day part is set to 1. That's why you get the first day of month with this.
adding a month returns the first of the next month, subtracting a day from this results in the last day of the current month.
Finally you can generate a date series with start and end date using the generate_series() function
Edit: Redshift does not support generate_series() with type date and timestamp but with integer. So, we need to create an integer series instead and adding the results to the first of the month:
db<>fiddle
WITH dates AS (
SELECT
date_trunc('month', CURRENT_DATE) AS first_day_of_month,
date_trunc('month', CURRENT_DATE) + interval '1 month -1 day' AS last_day_of_month
)
SELECT
first_day_of_month::date + gs
FROM
dates,
generate_series(
date_part('day', first_day_of_month)::int - 1,
date_part('day', last_day_of_month)::int - 1
) as gs
This answers the original version of the question.
You would use generate_series():
select gs.dte
from generate_series(date_trunc('month', now()::date),
date_trunc('month', now()::date) + interval '1 month' - interval '1 day',
interval '1 day'
) gs(dte);
Here is a db<>fiddle.

How do I set up a rolling 7 day 75th percentile in SQL?

I have a table that has the following columns:
Event Date
Location
Employee Id
Task Name
Volume Per Hour
Using PostgreSQL, I need to calculate the 75th percentile of Volume Per Hour for a given location and task name across all employee ids and event dates assuming a rolling 7 day window. For example, if the event date is 11/16/2020, I would take the 75th percentile of volume per hour for all the individual dates and employee ids between 11/09/2020 and 11/16/2020.
Can someone help me with this problem?
Sample Data:
Sample Output:
You should be able to achieve this by using generate_series and percentile_disc
with data_example as
(
SELECT * FROM (VALUES
(date '2020-11-16','ABC',1,'Inbound',10),
(date '2020-11-16','ABC',2,'Inbound',20),
(date '2020-11-15','ABC',1,'Inbound',30),
(date '2020-11-17','ABC',1,'Inbound',10)
) AS t (event_date,location,emp_id,task_name,volume)
)
,dates as
(
select generate_series(
(date '2020-11-10')::timestamp,
(date '2020-11-25')::timestamp,
interval '1 day'
) as event_date
)
select d.event_date
, d.event_date - INTERVAL '7 day' AS window_start
,location
,task_name
,percentile_disc(0.75) within group (order by de.volume) perc_volume
,count(1) cnt
from dates d
join data_example de
on de.event_date between d.event_date- INTERVAL '7 day' and d.event_date
group by 1,2,3,4
order by 1,2,3,4;

Get value zero if data is not there in PostgreSQL

I have a table employee in Postgres:
Query:
SELECT DISTINCT month_last_date,number_of_cases,reopens,csat
FROM employee
WHERE month_last_date >=(date('2017-01-31') - interval '6 month')
AND month_last_date <= date('2017-01-31')
AND agent_id='analyst'
AND name='SAM';
Output:
But if data is not in table for other month I want column value as 0.
Generate all dates you are interested in, LEFT JOIN to the table and default to 0 with COALESCE:
SELECT DISTINCT -- see below
i.month_last_date
, COALESCE(number_of_cases, 0) AS number_of_cases -- see below
, COALESCE(reopens, 0) AS reopens
, COALESCE(csat, 0) AS csat
FROM (
SELECT date '2017-01-31' - i * interval '1 mon' AS month_last_date
FROM generate_series(0, 5) i -- see below
) i
LEFT JOIN employee e ON e.month_last_date = i.month_last_date
AND e.agent_id = 'analyst' -- see below
AND e.name = 'SAM';
Notes
If you add or subtract an interval of 1 month and the same day does not exist in the target month, Postgres defaults to the latest existing day of that moth. So this works as desired, you get the last day of each month:
SELECT date '2017-12-31' - i * interval '1 mon' -- note 31
FROM generate_series(0,11) i;
But this does not, you'd get the 28th of each month:
SELECT date '2017-02-28' - i * interval '1 mon' -- note 28
FROM generate_series(0,11) i;
The safe alternative is to subtract 1 day from the first day of the next month, like #Oto demonstrated. Related:
Daily average for the month (needs number of days in month)
Here are two optimized ways to generate a series of last days of the month - up to and including a given month:
1.
SELECT (timestamp '2017-01-01' - i * interval '1 month')::date - 1 AS month_last_date
FROM generate_series(-1, 10) i; -- generate 12 months, off-by-1
Input is the first day of the month - or calculate it from a given date or timestamp with date_trunc():
SELECT date_trunc('month', timestamp '2017-01-17')::date AS this_mon1
Subtracting an interval from a date produces a timestamp. After the cast back to date we can simply subtract an integer to subtract days.
2.
SELECT m::date - 1 AS month_last_date
FROM generate_series(timestamp '2017-02-01' - interval '11 month' -- for 12 months
, timestamp '2017-02-01'
, interval '1 mon') m;
Input is the first day of the next month - or calculate it from any given date or timestamp with:
SELECT date_trunc('month', timestamp '2017-01-17' + interval '1 month')::date AS next_mon1
Related:
How do I determine the last day of the previous month using PostgreSQL?
Create list with first and last day of month for given period
Not sure you actually need DISTINCT. Typically, (agent_id, month_last_date) would be defined unique, then remove DISTINCT ...
Be sure to use the LEFT JOIN correctly. Join conditions go into the join clause, not the WHERE clause:
Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail
Finally, default to 0 with COALESCE where NULL values are filled in by the LEFT JOIN.
Note that COALESCE cannot distinguish between actual NULL values from the right table and NULL values filled in for missing rows. If your columns are not defined NOT NULL, there may be ambiguity to address.
As I see, you need generate last days of all last 6 months, before certain date. (before "2017-01-31" in this case).
If I correctly understand, then you can use this query, which generates all of these days
SELECT (date_trunc('MONTH', mnth) + INTERVAL '1 MONTH - 1 day')::DATE
FROM
generate_series('2017-01-31'::date - interval '6 month', '2017-01-31'::date, '1 month') as mnth;
You just need LEFT JOIN this query to your existing query, and you get desirable result
Please note that this will returns 7 record (days), not 6.