grouping by column but getting multiple results for each - sql

I am trying to calculate the median response time for conversations on each date for the last X days.
I use the following query below, but for some reason, it will generate multiple rows with the same date.
with grouping as (
SELECT a.id, d.date, extract(epoch from (first_response_at - started_at)) as response_time
FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD') AS date
FROM generate_series(0, 2) AS offs
) d
LEFT OUTER JOIN apps a on true
LEFT OUTER JOIN conversations c ON (d.date=to_char(date_trunc('day'::varchar, c.started_at), 'YYYY-MM-DD')) and a.id = c.app_id
and c.app_id = a.id and c.first_response_at > (current_date - (2 || ' days')::interval)::date
)
select
*
from grouping
where grouping.id = 'ASnYW1-RgCl0I'
Any ideas?

First a number of issues with your query, assuming there aren't any parts you haven't shown us:
You don't need a CTE for this query.
From table apps you only use column id whose value is the same as c.app_id. You can remove the table apps and select c.app_id for the same result.
When you use to_char() you do not first have to date_trunc() to a date, the to_char() function handles that.
generate_series() also works with timestamps. Just enter day values with an interval and cast the end result to date before using it.
So, removing all the flotsam we end up with this which does exactly the same as the query in your question but now we can at least see what is going on.
SELECT c.app_id, to_date(d.date, 'YYYY-MM-DD') AS date,
extract(epoch from (first_response_at - started_at)) AS response_time
FROM generate_series(CURRENT_DATE - 2, CURRENT_DATE, interval '1 day') d(date)
LEFT JOIN conversations c ON d.date::date = c.started_at::date
AND c.app_id = 'ASnYW1-RgCl0I'
AND c.first_response_at > CURRENT_DATE - 2;
You don't calculate the median response time anywhere, so that is a big problem you need to solve. This only requires data from table conversations and would look somewhat like this to calculate the median response time for the past 2 days:
SELECT app_id, started_at::date AS start_date,
percentile_disc(0.5) WITHIN GROUP (ORDER BY first_response_at - started_at) AS median_response
FROM conversations
WHERE app_id = 'ASnYW1-RgCl0I'
AND first_response_at > CURRENT_DATE - 2
GROUP BY 2;
When we fold the two queries, and put the parameters handily in a single place, this is the final result:
SELECT p.id, to_date(d.date, 'YYYY-MM-DD') AS date,
extract(epoch from (c.median_response)) AS response_time
FROM (VALUES ('ASnYW1-RgCl0I', 2)) p(id, days)
JOIN generate_series(CURRENT_DATE - p.days, CURRENT_DATE, interval '1 day') d(date) ON true
LEFT JOIN LATERAL (
SELECT started_at::date AS start_date,
percentile_disc(0.5) WITHIN GROUP (ORDER BY first_response_at - started_at) AS median_response
FROM conversations
WHERE app_id = p.id
AND first_response_at > CURRENT_DATE - p.days
GROUP BY 2) c ON d.date::date = c.start_date;
If you want to change the id of the app or the number of days to look back, you only have to change the VALUES clause accordingly. You can also wrap the whole thing in a SQL function and convert the VALUES clause into two parameters.

Related

Filling in empty dates

This query returns the number of alarms created by day between a specific date range.
SELECT CAST(created_at AS DATE) AS date, SUM(1) AS count
FROM ew_alarms
LEFT JOIN site ON site.id = ew_alarms.site_id
AND ew_alarms.created_at BETWEEN '12/22/2020' AND '01/22/2021' AND (CAST(EXTRACT(HOUR FROM ew_alarms.created_at) AS INT) BETWEEN 0 AND 23.99)
GROUP BY CAST(created_at AS DATE)
ORDER BY date DESC
Result: screenshot
What the best way to fill in the missing dates (1/16, 1/17, 1/18, etc)? Due to no alarms created on those days these results throw off the daily average I'm ultimately trying to achieve.
Would it be a generate_series query?
Yes, use generate_series(). I would suggest:
SELECT gs.date, COUNT(s.site_id) AS count
FROM GENERATE_SERIES('2020-12-22'::date, '2021-01-22'::date, INTERVAL '1 DAY') gs(dte) LEFT JOIN
ew_alarms a
ON ew.created_at >= gs.dte AND
ew.created_at < gs.dte + INTERVAL '1 DAY' LEFT JOIN
site s
ON s.id = a.site_id
GROUP BY gs.dte
ORDER BY date DESC;
I don't know what the hour comparison is supposed to be doing. The hour is always going to be between 0 and 23, so I removed that logic.
Note: Presumably, you want to count something from either site or ew_alarms. That is expected with LEFT JOINs so 0 can be returned.

How to get a count of data for every date in postgres

I am trying to get data to populate a multi-line graph. The table jobs has the columns id, created_at, and partner_id. I would like to display the sum of jobs for each partner_id each day. My current query has 2 problems. 1) It is missing a lot of jobs. 2) It only contains an entry for a given day if there was a row on that day. My current query is where start is an integer denoting how many days back we are looking for data:
SELECT d.date, count(j.id), j.partner_id FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD')
AS date
FROM generate_series(0, #{start}, 1)
AS offs
) d
JOIN (
SELECT jobs.id, jobs.created_at, jobs.partner_id FROM jobs
WHERE jobs.created_at > now() - INTERVAL '#{start} days'
) j
ON (d.date=to_char(date_trunc('day', j.created_at), 'YYYY-MM-DD'))
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;
This returns records like the following:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
what I want is something like this:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-22", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
So that for every day in the query I have an entry for every partner even if that count is 0. How can I adjust the query to populate data even when the count is 0?
Use a LEFT JOIN. You also don't need so many subqueries and there is no need to translate to a date to a string and then back to a date:
SELECT d.date, count(j.id), j.partner_id
FROM (SELECT to_char(dte, 'YYYY-MM-DD') AS date , dte
FROM generate_series(current_date - {start} * interval '1 day', current_date, interval '1 day') gs(dte)
) d LEFT JOIN
jobs j
ON DATE_TRUNC('day', j.created_at) = d.dte
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;

Postgres Query to get details before 2 months

I want to get the details of customers who has not visited for last two months by using postgresql. Below is my Query. I am getting all the data before 2months, but also wants to know whether the customer visits in between two months. If so, the result of query must not show that customer.
select usr.name,usr.mobile,ihv.create_date,ihv.partner_id from
invoice_header_view ihv join user_store_mapper usm on ihv.partner_id =
usm. partner_id join public.user usr on usr.user_id = usm.user_id
where usm.store_id = '123' and cast(ihv.create_date as date) =
cast(now() as date) - interval '2 month' and (cast(ihv.create_date as
date) between cast(ihv.create_date as date) and cast(now() as
date) - interval '2 months')
One of ways to solve it is NOT EXISTS. You can create a corelated subquery which will check if there are any newer rows in invoice_header_view.
Another way is to use GROUP BY, like
SELECT
usr.name, usr.mobile, ihv.partner_id,
max(ihv.create_date) AS max_create_date,
count(*) AS invoice_header_count
FROM invoice_header_view ihv
JOIN user_store_mapper usm ON ihv.partner_id = usm.partner_id
JOIN public.user usr ON usr.user_id = usm.user_id
WHERE
usm.store_id = '123'
AND ihv.create_date >= now() - interval '2 months'
GROUP BY
1,2,3
HAVING
max(ihv.create_date) <= now() - interval '2 months';
If you want customers whose most recent date is more than two months ago, then you can use:
SELECT u.name, u.mobile
FROM invoice_header_view ihv JOIN
user_store_mapper usm
ON ihv.partner_id = usm.partner_id JOIN
public.user u
ON u.user_id = usm.user_id
WHERE usm.store_id = 123 -- I'm guessing this really a number
GROUP BY u.name, u.mobile
HAVING max(ihv.create_date) <= curdate() - interval '2 months';

Summing Dates in Range Postgres

I have a table defined like so:
CREATE TABLE Items (
Barcode CHAR(50) PRIMARY KEY NOT NULL
Location CHAR(15) ,
ManufacturedAt TIMESTAMP WITH TIMEZONE,
ShippedOutAt TIMESTAMP WITH TIMEZONE,
ReceivedAt TIMESTAMP WITH TIMEZONE,
SoldAt TIMESTAMP WITH TIMEZONE,
DiscardedAt TIMESTAMP WITH TIMEZONE,
);
I am trying to get a sum of each date field for a location over the last twelve months.
So example results I am trying to get:
Date NumManu NumShip NumRece NumSold NumDisc
DEC 5 3 3 2 1
NOV 3 5 5 3 2
I am no sql expert by any means, but I am unsure of how to do this without doing 12 different sql queries (one for each month), or is that the only way? Thanks in advance!
It can be done with a single query with either sub-selects (and with set-returning functions):
SELECT lo, hi, to_char(lo, 'MON') Date,
(SELECT count(*) FROM Items WHERE ManufacturedAt BETWEEN lo AND hi) NumManu,
(SELECT count(*) FROM Items WHERE ShippedOutAt BETWEEN lo AND hi) NumShip,
(SELECT count(*) FROM Items WHERE ReceivedAt BETWEEN lo AND hi) NumRece,
(SELECT count(*) FROM Items WHERE SoldAt BETWEEN lo AND hi) NumSold,
(SELECT count(*) FROM Items WHERE DiscardedAt BETWEEN lo AND hi) NumDisc
FROM generate_series(current_timestamp, current_timestamp - interval '11 mon', interval '-1 mon') ts,
LATERAL (select date_trunc('month', ts)) lo(lo),
LATERAL (select lo + interval '1 mon') hi(hi)
... or with multiple joins of the same table:
SELECT lo, hi, to_char(lo, 'MON') Date,
count(DISTINCT JManu) NumManu,
count(DISTINCT JShip) NumShip,
count(DISTINCT JRece) NumRece,
count(DISTINCT JSold) NumSold,
count(DISTINCT JDisc) NumDisc
FROM generate_series(current_timestamp, current_timestamp - interval '11 mon', interval '-1 mon') ts,
LATERAL (select date_trunc('month', ts)) lo(lo),
LATERAL (select lo + interval '1 mon') hi(hi)
LEFT JOIN Items JManu ON JManu.ManufacturedAt BETWEEN lo AND hi
LEFT JOIN Items JShip ON JShip.ShippedOutAt BETWEEN lo AND hi
LEFT JOIN Items JRece ON JRece.ReceivedAt BETWEEN lo AND hi
LEFT JOIN Items JSold ON JSold.SoldAt BETWEEN lo AND hi
LEFT JOIN Items JDisc ON JDisc.DiscardedAt BETWEEN lo AND hi
GROUP BY lo, hi
I may be missing something but it seems as though you could just use COUNT and then either ORDER BY or GROUP BY month.
Although I see now that you are only keeping track of the month as actual dates, so you would have to perform some sort of logic check to get dates that are BETWEEN the start and end of each month.
I believe you would have to manually account for each month in the where clause if this is the case.

Aggregates for today and the previous day depending on data

Having trouble putting together a query to pull the aggregate values of a give timestamp and the timestamp before it. Given the following schema:
name TEXT,
ts TIMESTAMP,
X NUMERIC,
Y NUMERIC
where there are gaps in the ts column due to gaps in data, I'm trying to construct a query to produce
name,
date_trunc('day' q1.ts),
avg(q1.X),
sum(q2.Y),
date_trunc('day', q2.ts),
avg(q2.X),
sum(q2.Y)
The first half is straightforward:
SELECT q1.name, date_trunc('day', q1.ts), avg(q1.X), sum(q1.Y)
FROM data as q1
GROUP BY 1, 2
ORDER BY 1, 2;
But not sure how to generate the relation to find the "day" before for each row. I'm trying to work an inner join like this:
SELECT q1.name, q1.day, q1.avg, q1.sum, q2.day, q2.avg, q2.sum
FROM (
SELECT name, date_trunc('day', ts) AS day, avg(X) AS avg, sum(Y) as sum
FROM data
GROUP BY 1,2
ORDER BY 1,2
) q1 INNER JOIN (
SELECT name, date_trunc('day', ts) AS day, avg(X) AS avg, sum(Y) as sum
FROM data
GROUP BY 1,2
ORDER BY 1,2
) q2 ON (
q1.name = q2.name
AND q2.day = q1.day - interval '1 day'
);
The problem with this is, it doesn't cover the cases when the next "day" is more than 1 day before the current day.
The special difficulty here is that you need to number days after aggregating rows. You can do this in a single query level with the window function row_number(), since window functions are applied after aggregation by GROUP BY.
Also, use a CTE to avoid executing the same subquery multiple times:
WITH q AS (
SELECT name, ts::date AS day
,avg(x) AS avg_x, sum(y) AS sum_y
,row_number() OVER (PARTITION BY name ORDER BY ts::date) AS rn
FROM data
GROUP BY 1,2
)
SELECT q1.name, q1.day, q1.avg_x, q1.sum_y
,q2.day AS day2, q2.avg_x AS avg_x2, q2.sum_y AS sum_y2
FROM q q1
LEFT JOIN q q2 ON q1.name = q2.name
AND q1.rn = q2.rn + 1
ORDER BY 1,2;
Using the simpler cast to date (ts::date) instead of date_trunc('day', ts) to get "days".
LEFT [OUTER] JOIN (as opposed to [INNER] JOIN) is instrumental to preserve the corner case of the first row, where there is no previous day.
And ORDER BY should be applied to the outer query.
The question isn't crystal clear, but it sounds like you're actually trying to fill gaps while keeping track of leading/lagging rows.
To fill the gaps, look into generate_series() and left join it with your table:
select d
from generate_series(timestamp '2013-12-01', timestamp '2013-12-31', interval '1 day') d;
http://www.postgresql.org/docs/current/static/functions-srf.html
For previous and next row values, look into lead() and lag() window functions:
select date_trunc('day', ts) as curr_row_day,
lag(date_trunc('day', ts)) over w as prev_row_day
from data
window w as (order by ts)
http://www.postgresql.org/docs/current/static/tutorial-window.html