I have a PostgreSql table that contains, among others, the following columns:
…
estimate_close_date date,
duration_months int,
…
I have to create a View object from the table that generates a row for every month between (estimate_close_date + 1) and (estimate_close_date + duration_months). An example:
estimate_close_date = ‘2022-10-01’ and duration_months = 2
The result should be 2 rows: for ‘2022-11-01’ and ‘2022-12-01’
How can I implement it in a View ?
Use generate_series to generate the needed rows and lateral join to correlate them to your table.
select t.*, gs::date
from the_table t
cross join lateral generate_series
(
estimate_close_date + interval '1 month',
estimate_close_date + duration_months * interval '1 month',
interval '1 month'
) gs
order by estimate_close_date, gs;
DB-fiddle demo
Related
I have a User table, where there are the following fields.
| id | created_at | username |
I want to filter this table so that I can get the number of users who have been created in a datetime range, separated into N intervals. e.g. for users having created_at in between 2019-01-01T00:00:00 and 2019-01-02T00:00:00 separated into 2 intervals, I will get something like this.
_______________________________
| dt | count |
-------------------------------
| 2019-01-01T00:00:00 | 6 |
| 2019-01-01T12:00:00 | 7 |
-------------------------------
Is it possible to do so in one hit? I am currently using my Django ORM to create N date ranges and then making N queries, which isn't very efficient.
Generate the times you want and then use left join and aggregation:
select gs.ts, count(u.id)
from generate_series('2019-01-01T00:00:00'::timestamp,
'2019-01-01T12:00:00'::timestamp,
interval '12 hour'
) gs(ts) left join
users u
on u.created_at >= gs.ts and
u.created_at < gs.ts + interval '12 hour'
group by 1
order by 1;
EDIT:
If you want to specify the number of rows, you can use something similar:
from generate_series(1, 10, 1) as gs(n) cross join lateral
(values ('2019-01-01T00:00:00'::timestamp + (gs.n - 1) * interval '12 hour')
) v(ts) left join
users u
on u.created_at >= v.ts and
u.created_at < v.ts + interval '12 hour'
In Postgres, there is a dedicated function for this (several overloaded variants, really): width_bucket().
One additional difficulty: it does not work on type timestamp directly. But you can work with extracted epoch values like this:
WITH cte(min_ts, max_ts, buckets) AS ( -- interval and nr of buckets here
SELECT timestamp '2019-01-01T00:00:00'
, timestamp '2019-01-02T00:00:00'
, 2
)
SELECT width_bucket(extract(epoch FROM t.created_at)
, extract(epoch FROM c.min_ts)
, extract(epoch FROM c.max_ts)
, c.buckets) AS bucket
, count(*) AS ct
FROM tbl t
JOIN cte c ON t.created_at >= min_ts -- incl. lower
AND t.created_at < max_ts -- excl. upper
GROUP BY 1
ORDER BY 1;
Empty buckets (intervals with no rows in it) are not returned at all. Your
comment seems to suggest you want that.
Notably, this accesses the table once - as requested and as opposed to generating intervals first and then joining to the table (repeatedly).
See:
How to reduce result rows of SQL query equally in full range?
Aggregating (x,y) coordinate point clouds in PostgreSQL
That does not yet include effective bounds, just bucket numbers. Actual bounds can be added cheaply:
WITH cte(min_ts, max_ts, buckets) AS ( -- interval and nr of buckets here
SELECT timestamp '2019-01-01T00:00:00'
, timestamp '2019-01-02T00:00:00'
, 2
)
SELECT b.*
, min_ts + ((c.max_ts - c.min_ts) / c.buckets) * (bucket-1) AS lower_bound
FROM (
SELECT width_bucket(extract(epoch FROM t.created_at)
, extract(epoch FROM c.min_ts)
, extract(epoch FROM c.max_ts)
, c.buckets) AS bucket
, count(*) AS ct
FROM tbl t
JOIN cte c ON t.created_at >= min_ts -- incl. lower
AND t.created_at < max_ts -- excl. upper
GROUP BY 1
ORDER BY 1
) b, cte c;
Now you only change input values in the CTE to adjust results.
db<>fiddle here
I am trying to get data to populate a multi-line graph. The table jobs has the columns id, created_at, and partner_id. I would like to display the sum of jobs for each partner_id each day. My current query has 2 problems. 1) It is missing a lot of jobs. 2) It only contains an entry for a given day if there was a row on that day. My current query is where start is an integer denoting how many days back we are looking for data:
SELECT d.date, count(j.id), j.partner_id FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD')
AS date
FROM generate_series(0, #{start}, 1)
AS offs
) d
JOIN (
SELECT jobs.id, jobs.created_at, jobs.partner_id FROM jobs
WHERE jobs.created_at > now() - INTERVAL '#{start} days'
) j
ON (d.date=to_char(date_trunc('day', j.created_at), 'YYYY-MM-DD'))
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;
This returns records like the following:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
what I want is something like this:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-22", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
So that for every day in the query I have an entry for every partner even if that count is 0. How can I adjust the query to populate data even when the count is 0?
Use a LEFT JOIN. You also don't need so many subqueries and there is no need to translate to a date to a string and then back to a date:
SELECT d.date, count(j.id), j.partner_id
FROM (SELECT to_char(dte, 'YYYY-MM-DD') AS date , dte
FROM generate_series(current_date - {start} * interval '1 day', current_date, interval '1 day') gs(dte)
) d LEFT JOIN
jobs j
ON DATE_TRUNC('day', j.created_at) = d.dte
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;
I am trying to calculate the median response time for conversations on each date for the last X days.
I use the following query below, but for some reason, it will generate multiple rows with the same date.
with grouping as (
SELECT a.id, d.date, extract(epoch from (first_response_at - started_at)) as response_time
FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD') AS date
FROM generate_series(0, 2) AS offs
) d
LEFT OUTER JOIN apps a on true
LEFT OUTER JOIN conversations c ON (d.date=to_char(date_trunc('day'::varchar, c.started_at), 'YYYY-MM-DD')) and a.id = c.app_id
and c.app_id = a.id and c.first_response_at > (current_date - (2 || ' days')::interval)::date
)
select
*
from grouping
where grouping.id = 'ASnYW1-RgCl0I'
Any ideas?
First a number of issues with your query, assuming there aren't any parts you haven't shown us:
You don't need a CTE for this query.
From table apps you only use column id whose value is the same as c.app_id. You can remove the table apps and select c.app_id for the same result.
When you use to_char() you do not first have to date_trunc() to a date, the to_char() function handles that.
generate_series() also works with timestamps. Just enter day values with an interval and cast the end result to date before using it.
So, removing all the flotsam we end up with this which does exactly the same as the query in your question but now we can at least see what is going on.
SELECT c.app_id, to_date(d.date, 'YYYY-MM-DD') AS date,
extract(epoch from (first_response_at - started_at)) AS response_time
FROM generate_series(CURRENT_DATE - 2, CURRENT_DATE, interval '1 day') d(date)
LEFT JOIN conversations c ON d.date::date = c.started_at::date
AND c.app_id = 'ASnYW1-RgCl0I'
AND c.first_response_at > CURRENT_DATE - 2;
You don't calculate the median response time anywhere, so that is a big problem you need to solve. This only requires data from table conversations and would look somewhat like this to calculate the median response time for the past 2 days:
SELECT app_id, started_at::date AS start_date,
percentile_disc(0.5) WITHIN GROUP (ORDER BY first_response_at - started_at) AS median_response
FROM conversations
WHERE app_id = 'ASnYW1-RgCl0I'
AND first_response_at > CURRENT_DATE - 2
GROUP BY 2;
When we fold the two queries, and put the parameters handily in a single place, this is the final result:
SELECT p.id, to_date(d.date, 'YYYY-MM-DD') AS date,
extract(epoch from (c.median_response)) AS response_time
FROM (VALUES ('ASnYW1-RgCl0I', 2)) p(id, days)
JOIN generate_series(CURRENT_DATE - p.days, CURRENT_DATE, interval '1 day') d(date) ON true
LEFT JOIN LATERAL (
SELECT started_at::date AS start_date,
percentile_disc(0.5) WITHIN GROUP (ORDER BY first_response_at - started_at) AS median_response
FROM conversations
WHERE app_id = p.id
AND first_response_at > CURRENT_DATE - p.days
GROUP BY 2) c ON d.date::date = c.start_date;
If you want to change the id of the app or the number of days to look back, you only have to change the VALUES clause accordingly. You can also wrap the whole thing in a SQL function and convert the VALUES clause into two parameters.
I have a list of dates which I can generate using:
SELECT date from
generate_series(
'2016-05-09'::date,
CURRENT_DATE,
'1 day'::interval
) date
I want to perform another query on a table using each value in the above list:
An example of what I want to achieve for one of the date value:
SELECT COUNT(*) FROM table,
WHERE table.datecolumn > date
How do I perform the second query for all the values in the first query to get final output somewhat in the form:
datecol count
2016-07-09 100
2016-07-10 200
2016-07-11 100
I'd use LATERAL join. See 7.2.1.5. LATERAL Subqueries in Postgres docs.
SELECT
dates.dt, Counts.c
FROM
generate_series(
'2016-05-09'::date,
CURRENT_DATE,
'1 day'::interval
) AS dates(dt)
INNER JOIN LATERAL
(
SELECT COUNT(*) AS c
FROM table
WHERE table.datecolumn > dates.dt
) AS Counts ON true
I want to count ID's per month using generate_series(). This query works in PostgreSQL 9.1:
SELECT (to_char(serie,'yyyy-mm')) AS year, sum(amount)::int AS eintraege FROM (
SELECT
COUNT(mytable.id) as amount,
generate_series::date as serie
FROM mytable
RIGHT JOIN generate_series(
(SELECT min(date_from) FROM mytable)::date,
(SELECT max(date_from) FROM mytable)::date,
interval '1 day') ON generate_series = date(date_from)
WHERE version = 1
GROUP BY generate_series
) AS foo
GROUP BY Year
ORDER BY Year ASC;
This is my output:
"2006-12" | 4
"2007-02" | 1
"2007-03" | 1
But what I want to get is this output ('0' value in January):
"2006-12" | 4
"2007-01" | 0
"2007-02" | 1
"2007-03" | 1
Months without id should be listed nevertheless.
Any ideas how to solve this?
Sample data:
drop table if exists mytable;
create table mytable(id bigint, version smallint, date_from timestamp);
insert into mytable(id, version, date_from) values
(4084036, 1, '2006-12-22 22:46:35'),
(4084938, 1, '2006-12-23 16:19:13'),
(4084938, 2, '2006-12-23 16:20:23'),
(4084939, 1, '2006-12-23 16:29:14'),
(4084954, 1, '2006-12-23 16:28:28'),
(4250653, 1, '2007-02-12 21:58:53'),
(4250657, 1, '2007-03-12 21:58:53')
;
Untangled, simplified and fixed, it might look like this:
SELECT to_char(s.tag,'yyyy-mm') AS monat
, count(t.id) AS eintraege
FROM (
SELECT generate_series(min(date_from)::date
, max(date_from)::date
, interval '1 day'
)::date AS tag
FROM mytable t
) s
LEFT JOIN mytable t ON t.date_from::date = s.tag AND t.version = 1
GROUP BY 1
ORDER BY 1;
db<>fiddle here
Among all the noise, misleading identifiers and unconventional format the actual problem was hidden here:
WHERE version = 1
You made correct use of RIGHT [OUTER] JOIN. But adding a WHERE clause that requires an existing row from mytable converts the RIGHT [OUTER] JOIN to an [INNER] JOIN effectively.
Move that filter into the JOIN condition to make it work.
I simplified some other things while being at it.
Better, yet
SELECT to_char(mon, 'yyyy-mm') AS monat
, COALESCE(t.ct, 0) AS eintraege
FROM (
SELECT date_trunc('month', date_from)::date AS mon
, count(*) AS ct
FROM mytable
WHERE version = 1
GROUP BY 1
) t
RIGHT JOIN (
SELECT generate_series(date_trunc('month', min(date_from))
, max(date_from)
, interval '1 mon')::date
FROM mytable
) m(mon) USING (mon)
ORDER BY mon;
db<>fiddle here
It's much cheaper to aggregate first and join later - joining one row per month instead of one row per day.
It's cheaper to base GROUP BY and ORDER BY on the date value instead of the rendered text.
count(*) is a bit faster than count(id), while equivalent in this query.
generate_series() is a bit faster and safer when based on timestamp instead of date. See:
Generating time series between two dates in PostgreSQL