Postgres Query to get details before 2 months - sql

I want to get the details of customers who has not visited for last two months by using postgresql. Below is my Query. I am getting all the data before 2months, but also wants to know whether the customer visits in between two months. If so, the result of query must not show that customer.
select usr.name,usr.mobile,ihv.create_date,ihv.partner_id from
invoice_header_view ihv join user_store_mapper usm on ihv.partner_id =
usm. partner_id join public.user usr on usr.user_id = usm.user_id
where usm.store_id = '123' and cast(ihv.create_date as date) =
cast(now() as date) - interval '2 month' and (cast(ihv.create_date as
date) between cast(ihv.create_date as date) and cast(now() as
date) - interval '2 months')

One of ways to solve it is NOT EXISTS. You can create a corelated subquery which will check if there are any newer rows in invoice_header_view.
Another way is to use GROUP BY, like
SELECT
usr.name, usr.mobile, ihv.partner_id,
max(ihv.create_date) AS max_create_date,
count(*) AS invoice_header_count
FROM invoice_header_view ihv
JOIN user_store_mapper usm ON ihv.partner_id = usm.partner_id
JOIN public.user usr ON usr.user_id = usm.user_id
WHERE
usm.store_id = '123'
AND ihv.create_date >= now() - interval '2 months'
GROUP BY
1,2,3
HAVING
max(ihv.create_date) <= now() - interval '2 months';

If you want customers whose most recent date is more than two months ago, then you can use:
SELECT u.name, u.mobile
FROM invoice_header_view ihv JOIN
user_store_mapper usm
ON ihv.partner_id = usm.partner_id JOIN
public.user u
ON u.user_id = usm.user_id
WHERE usm.store_id = 123 -- I'm guessing this really a number
GROUP BY u.name, u.mobile
HAVING max(ihv.create_date) <= curdate() - interval '2 months';

Related

In Postgres how do I write a SQL query to select distinct values overall but aggregated over a set time period

What I mean by this is if I have a table called payments with a created_at column and user_id column I want to select the count of purchases aggregated weekly (can be any interval I want) but only selecting first time purchases e.g. if a user purchased for the first time in week 1 it would be counted but if he purchased again in week 2 he would not be counted.
created_at
user_id
timestamp
1
timestamp
1
This is the query I came up with. The issue is if the user purchases multiple times they are all included. How can I improve this?
WITH dates AS
(
SELECT *
FROM generate_series(
'2022-07-22T15:30:06.687Z'::DATE,
'2022-11-21T17:04:59.457Z'::DATE,
'1 week'
) date
)
SELECT
dates.date::DATE AS date,
COALESCE(COUNT(DISTINCT(user_id)), 0) AS registrations
FROM
dates
LEFT JOIN
payment ON created_at::DATE BETWEEN dates.date AND dates.date::date + '1 ${dateUnit}'::INTERVAL
GROUP BY
dates.date
ORDER BY
dates.date DESC;
You want to count only first purchases. So get those first purchases in the first step and work with these.
WITH dates AS
(
SELECT *
FROM generate_series(
'2022-07-22T15:30:06.687Z'::DATE,
'2022-11-21T17:04:59.457Z'::DATE,
'1 week'
) date
)
, first_purchases AS
(
SELECT user_id, MIN(created_at:DATE) AS purchase_date
FROM payment
GROUP BY user_id
)
SELECT
d.date,
COALESCE(COUNT(p.purchase_date), 0) AS registrations
FROM
dates d
LEFT JOIN
first_purchases p ON p.purchase_date >= d.date
AND p.purchase_date < d.date + '1 ${dateUnit}'::INTERVAL
GROUP BY
d.date
ORDER BY
d.date DESC;

Filling in empty dates

This query returns the number of alarms created by day between a specific date range.
SELECT CAST(created_at AS DATE) AS date, SUM(1) AS count
FROM ew_alarms
LEFT JOIN site ON site.id = ew_alarms.site_id
AND ew_alarms.created_at BETWEEN '12/22/2020' AND '01/22/2021' AND (CAST(EXTRACT(HOUR FROM ew_alarms.created_at) AS INT) BETWEEN 0 AND 23.99)
GROUP BY CAST(created_at AS DATE)
ORDER BY date DESC
Result: screenshot
What the best way to fill in the missing dates (1/16, 1/17, 1/18, etc)? Due to no alarms created on those days these results throw off the daily average I'm ultimately trying to achieve.
Would it be a generate_series query?
Yes, use generate_series(). I would suggest:
SELECT gs.date, COUNT(s.site_id) AS count
FROM GENERATE_SERIES('2020-12-22'::date, '2021-01-22'::date, INTERVAL '1 DAY') gs(dte) LEFT JOIN
ew_alarms a
ON ew.created_at >= gs.dte AND
ew.created_at < gs.dte + INTERVAL '1 DAY' LEFT JOIN
site s
ON s.id = a.site_id
GROUP BY gs.dte
ORDER BY date DESC;
I don't know what the hour comparison is supposed to be doing. The hour is always going to be between 0 and 23, so I removed that logic.
Note: Presumably, you want to count something from either site or ew_alarms. That is expected with LEFT JOINs so 0 can be returned.

How to get a count of data for every date in postgres

I am trying to get data to populate a multi-line graph. The table jobs has the columns id, created_at, and partner_id. I would like to display the sum of jobs for each partner_id each day. My current query has 2 problems. 1) It is missing a lot of jobs. 2) It only contains an entry for a given day if there was a row on that day. My current query is where start is an integer denoting how many days back we are looking for data:
SELECT d.date, count(j.id), j.partner_id FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD')
AS date
FROM generate_series(0, #{start}, 1)
AS offs
) d
JOIN (
SELECT jobs.id, jobs.created_at, jobs.partner_id FROM jobs
WHERE jobs.created_at > now() - INTERVAL '#{start} days'
) j
ON (d.date=to_char(date_trunc('day', j.created_at), 'YYYY-MM-DD'))
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;
This returns records like the following:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
what I want is something like this:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-22", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
So that for every day in the query I have an entry for every partner even if that count is 0. How can I adjust the query to populate data even when the count is 0?
Use a LEFT JOIN. You also don't need so many subqueries and there is no need to translate to a date to a string and then back to a date:
SELECT d.date, count(j.id), j.partner_id
FROM (SELECT to_char(dte, 'YYYY-MM-DD') AS date , dte
FROM generate_series(current_date - {start} * interval '1 day', current_date, interval '1 day') gs(dte)
) d LEFT JOIN
jobs j
ON DATE_TRUNC('day', j.created_at) = d.dte
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;

grouping by column but getting multiple results for each

I am trying to calculate the median response time for conversations on each date for the last X days.
I use the following query below, but for some reason, it will generate multiple rows with the same date.
with grouping as (
SELECT a.id, d.date, extract(epoch from (first_response_at - started_at)) as response_time
FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD') AS date
FROM generate_series(0, 2) AS offs
) d
LEFT OUTER JOIN apps a on true
LEFT OUTER JOIN conversations c ON (d.date=to_char(date_trunc('day'::varchar, c.started_at), 'YYYY-MM-DD')) and a.id = c.app_id
and c.app_id = a.id and c.first_response_at > (current_date - (2 || ' days')::interval)::date
)
select
*
from grouping
where grouping.id = 'ASnYW1-RgCl0I'
Any ideas?
First a number of issues with your query, assuming there aren't any parts you haven't shown us:
You don't need a CTE for this query.
From table apps you only use column id whose value is the same as c.app_id. You can remove the table apps and select c.app_id for the same result.
When you use to_char() you do not first have to date_trunc() to a date, the to_char() function handles that.
generate_series() also works with timestamps. Just enter day values with an interval and cast the end result to date before using it.
So, removing all the flotsam we end up with this which does exactly the same as the query in your question but now we can at least see what is going on.
SELECT c.app_id, to_date(d.date, 'YYYY-MM-DD') AS date,
extract(epoch from (first_response_at - started_at)) AS response_time
FROM generate_series(CURRENT_DATE - 2, CURRENT_DATE, interval '1 day') d(date)
LEFT JOIN conversations c ON d.date::date = c.started_at::date
AND c.app_id = 'ASnYW1-RgCl0I'
AND c.first_response_at > CURRENT_DATE - 2;
You don't calculate the median response time anywhere, so that is a big problem you need to solve. This only requires data from table conversations and would look somewhat like this to calculate the median response time for the past 2 days:
SELECT app_id, started_at::date AS start_date,
percentile_disc(0.5) WITHIN GROUP (ORDER BY first_response_at - started_at) AS median_response
FROM conversations
WHERE app_id = 'ASnYW1-RgCl0I'
AND first_response_at > CURRENT_DATE - 2
GROUP BY 2;
When we fold the two queries, and put the parameters handily in a single place, this is the final result:
SELECT p.id, to_date(d.date, 'YYYY-MM-DD') AS date,
extract(epoch from (c.median_response)) AS response_time
FROM (VALUES ('ASnYW1-RgCl0I', 2)) p(id, days)
JOIN generate_series(CURRENT_DATE - p.days, CURRENT_DATE, interval '1 day') d(date) ON true
LEFT JOIN LATERAL (
SELECT started_at::date AS start_date,
percentile_disc(0.5) WITHIN GROUP (ORDER BY first_response_at - started_at) AS median_response
FROM conversations
WHERE app_id = p.id
AND first_response_at > CURRENT_DATE - p.days
GROUP BY 2) c ON d.date::date = c.start_date;
If you want to change the id of the app or the number of days to look back, you only have to change the VALUES clause accordingly. You can also wrap the whole thing in a SQL function and convert the VALUES clause into two parameters.

Summing Dates in Range Postgres

I have a table defined like so:
CREATE TABLE Items (
Barcode CHAR(50) PRIMARY KEY NOT NULL
Location CHAR(15) ,
ManufacturedAt TIMESTAMP WITH TIMEZONE,
ShippedOutAt TIMESTAMP WITH TIMEZONE,
ReceivedAt TIMESTAMP WITH TIMEZONE,
SoldAt TIMESTAMP WITH TIMEZONE,
DiscardedAt TIMESTAMP WITH TIMEZONE,
);
I am trying to get a sum of each date field for a location over the last twelve months.
So example results I am trying to get:
Date NumManu NumShip NumRece NumSold NumDisc
DEC 5 3 3 2 1
NOV 3 5 5 3 2
I am no sql expert by any means, but I am unsure of how to do this without doing 12 different sql queries (one for each month), or is that the only way? Thanks in advance!
It can be done with a single query with either sub-selects (and with set-returning functions):
SELECT lo, hi, to_char(lo, 'MON') Date,
(SELECT count(*) FROM Items WHERE ManufacturedAt BETWEEN lo AND hi) NumManu,
(SELECT count(*) FROM Items WHERE ShippedOutAt BETWEEN lo AND hi) NumShip,
(SELECT count(*) FROM Items WHERE ReceivedAt BETWEEN lo AND hi) NumRece,
(SELECT count(*) FROM Items WHERE SoldAt BETWEEN lo AND hi) NumSold,
(SELECT count(*) FROM Items WHERE DiscardedAt BETWEEN lo AND hi) NumDisc
FROM generate_series(current_timestamp, current_timestamp - interval '11 mon', interval '-1 mon') ts,
LATERAL (select date_trunc('month', ts)) lo(lo),
LATERAL (select lo + interval '1 mon') hi(hi)
... or with multiple joins of the same table:
SELECT lo, hi, to_char(lo, 'MON') Date,
count(DISTINCT JManu) NumManu,
count(DISTINCT JShip) NumShip,
count(DISTINCT JRece) NumRece,
count(DISTINCT JSold) NumSold,
count(DISTINCT JDisc) NumDisc
FROM generate_series(current_timestamp, current_timestamp - interval '11 mon', interval '-1 mon') ts,
LATERAL (select date_trunc('month', ts)) lo(lo),
LATERAL (select lo + interval '1 mon') hi(hi)
LEFT JOIN Items JManu ON JManu.ManufacturedAt BETWEEN lo AND hi
LEFT JOIN Items JShip ON JShip.ShippedOutAt BETWEEN lo AND hi
LEFT JOIN Items JRece ON JRece.ReceivedAt BETWEEN lo AND hi
LEFT JOIN Items JSold ON JSold.SoldAt BETWEEN lo AND hi
LEFT JOIN Items JDisc ON JDisc.DiscardedAt BETWEEN lo AND hi
GROUP BY lo, hi
I may be missing something but it seems as though you could just use COUNT and then either ORDER BY or GROUP BY month.
Although I see now that you are only keeping track of the month as actual dates, so you would have to perform some sort of logic check to get dates that are BETWEEN the start and end of each month.
I believe you would have to manually account for each month in the where clause if this is the case.