Get count of susbcribers for each month in current year even if count is 0 - sql

I need to get the count of new subscribers each month of the current year.
DB Structure: Subscriber(subscriber_id, create_timestamp, ...)
Expected result:
date | count
-----------+------
2021-01-01 | 3
2021-02-01 | 12
2021-03-01 | 0
2021-04-01 | 8
2021-05-01 | 0
I wrote the following query:
SELECT
DATE_TRUNC('month',create_timestamp)
AS create_timestamp,
COUNT(subscriber_id) AS count
FROM subscriber
GROUP BY DATE_TRUNC('month',create_timestamp);
Which works but does not include months where the count is 0. It's only returning the ones that are existing in the table. Like:
"2021-09-01 00:00:00" 3
"2021-08-01 00:00:00" 9

First subquery is used for retrieving year wise each month row then LEFT JOIN with another subquery which is used to retrieve month wise total_count. COALESCE() is used for replacing NULL value to 0.
-- PostgreSQL (v11)
SELECT t.cdate
, COALESCE(p.total_count, 0) total_count
FROM (select generate_series('2021-01-01'::timestamp, '2021-12-15', '1 month') as cdate) t
LEFT JOIN (SELECT DATE_TRUNC('month',create_timestamp) create_timestamp
, SUM(subscriber_id) total_count
FROM subscriber
GROUP BY DATE_TRUNC('month',create_timestamp)) p
ON t.cdate = p.create_timestamp
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=20dcf6c1784ed0d9c5772f2487bcc221

get the count of new subscribers each month of the current year
SELECT month::date, COALESCE(s.count, 0) AS count
FROM generate_series(date_trunc('year', LOCALTIMESTAMP)
, date_trunc('year', LOCALTIMESTAMP) + interval '11 month'
, interval '1 month') m(month)
LEFT JOIN (
SELECT date_trunc('month', create_timestamp) AS month
, count(*) AS count
FROM subscriber
GROUP BY 1
) s USING (month);
db<>fiddle here
That's assuming every row is a "new subscriber". So count(*) is simplest and fastest.
See:
Join a count query on generate_series() and retrieve Null values as '0'
Generating time series between two dates in PostgreSQL

Related

Get days of the week from a date range in Postgres

So I have the following table :
id end_date name number_of_days start_date
1 "2022-01-01" holiday1 1 "2022-01-01"
2 "2022-03-20" holiday2 1 "2022-03-20"
3 "2022-04-09" holiday3 1 "2022-04-09"
4 "2022-05-01" holiday4 1 "2022-05-01"
5 "2022-05-04" holiday5 3 "2022-05-02"
6 "2022-07-12" holiday6 9 "2022-07-20"
I want to check if a week falls in a holiday range.
So far I can select the holidays that overlap with my choosen week( week_start_date, week_end_date) , but i cant get the exact days in which the overlap happens.
this is the query i'm using, i want to add a mechanism to detect the DAYS OF THE WEEK IN WHICH THE OVERLAP HAPPENS
SELECT * FROM holidays
where daterange(CAST(start_date AS date), CAST(end_date as date), '[]') && daterange('2022-07-18', '2022-07-26','[]')
THE CURRENT QUERY RETURNS THE OVERLLAPPING HOLIDA, (id = 6), however i'm trying to get the exact DAYS OF THE WEEK in which the overlap happens ( in this case, it should be monday,tuesday , wednesday)
You can use the * operator with tsranges, generate a series of dates with the lower and upper dates and finally with to_char print the days of the week, e.g.
SELECT
id, name, start_date, end_date, array_agg(dow) AS days
FROM (
SELECT *,
trim(
to_char(
generate_series(lower(overlap), upper(overlap),'1 day'),
'Day')) AS dow
FROM holidays
CROSS JOIN LATERAL (SELECT tsrange(start_date,end_date) *
tsrange('2022-07-18', '2022-07-26')) t (overlap)
WHERE tsrange(start_date,end_date) && tsrange('2022-07-18', '2022-07-26')) j
GROUP BY id,name,start_date,end_date,number_of_days;
id | name | start_date | end_date | days
----+----------+------------+------------+----------------------------
6 | holiday6 | 2022-07-12 | 2022-07-20 | {Monday,Tuesday,Wednesday}
(1 row)
Demo: db<>fiddle

SQL - How to group/count items by age and status on every date of a year?

I am trying to build a query from multi-year data set (tickets table) of support tickets, with relevant columns of ticked_id, status, created_on date and closed_on date for each ticket. There is also a generic dates table I can join/query to a list of dates.
I'd like to create a "burn down" chart for this year that displays the number of open tickets that were at least a year old on any given date this year. I have been able to create tables that use a sum(case... statement to group by a date - for example to show how many tickets were created on a given week - but I can't figure out how to group by every day or week this year the number of tickets that were open on that day and at least a year old.
Any help is appreciated.
Example Data:
ticket_id | status | created_on | closed_on
--------------------------------------------
1 open 1/5/2019
2 open 1/26/2019
3 closed 1/28/2019 2/1/2020
4 open 6/1/2019
5 closed 6/5/2019 1/1/2020
Example Results I Seek:
Date (2020) | Count of Year+ Aged Tickets
------------------------------------------------
1/1/2020 0
1/2/2020 0
1/3/2020 0
1/4/2020 0
1/5/2020 1
1/6/2020 1
... (skipping dates here but want all dates in results)...
1/25/2020 1
1/26/2020 2
1/27/2020 2
1/28/2020 3
1/29/2020 3
1/30/2020 3
1/31/2020 3
2/1/2020 2
... (skipping dates here but want all dates up to current date in results)...
ticket_id 1 reached one year of age on 1/5/2020 and is still open
(remains in count)
ticket_id 2 reached one year of age on 1/26/2020 and is still open (remains in count)
ticket_id 3 reached one year of age on 1/28/2020 and was still open, adding to the count, but was closed on 2/1/2020, reducing the count
ticket_id 4 will only add to the count if it is still open on 6/1/2020, but not if it is closed before then
ticket_id 5 will never appear in the count because it never reached one year of age and is closed
One option is to build a sequential list of dates, then bring the table with a ‘left join` and conditional logic, and finally aggregate.
This would give the results you want for year 2020.
select d.dt, count(t.ticket_id) no_tickets
from (
select date '2020-01-01' + I * interval '1 day' dt
from generate_series(0, 365) i
) d
left join mytable t
on t.created_on + interval '1 year' <= d.dt
and (
t.closed_on is null
or t.closed_on > d.dt
)
group by d.dt
If your version of Redshift does not support generate_series(), you can emulate it a custom number table, or with row_number() against a large table (say mylargetable):
select d.dt, count(t.ticket_id) no_tickets
from (
select date '2020-01-01' + row_number() over(order by 1) * interval '1 day' dt
from mylargetable
) d
left join mytable t
on t.created_on + interval '1 year' <= d.dt
and (
t.closed_on is null
or t.closed_on > d.dt
)
where d.dt < date '2021-01-01'
group by d.dt
If ticket_id is unique then you can do this to get all ticket at least 1 year old
select ticket_id, created_on , status where status = 'open' and created_on <= dateadd(year,-1,getdate())
if you want to count number of ticket per month then
select count(ticket_id), month(created_on) , status where status = 'open' and created_on <= dateadd(year,-1,getdate())
group by month(created_on)

Repeating a query which returns multiple columns over a date range in postgresql

If I have a table like this called orders:
id date value client_name
1 01/02/2017 50 Name 1
2 02/02/2017 60 Name 2
3 02/02/2017 60 Name 2
4 03/02/2017 20 Name 2
And a query which gets me details about the orders for a given day:
Select
sum(value) as total_order_value,
count(id) as number_of_orders
from orders
where orders.date = '02/03/2017'::date
Is there a way to construct a single query which will execute that query for a range of dates, giving one row per date executed for?
I've used something like this in the past:
select date(a),
(SUB QUERY WHICH RETURNS A SINGLE VALUE)
FROM generate_series(
(CURRENT_DATE - interval '2 months')::date,
CURRENT_DATE,
'1 day'
) s(a)
But this only works if the subquery returns a single value. I'm looking for a way to do this when the subquery returns multiple columns (such as in the first example above).
So from the first example above, a query which would generate the result set:
date total_order_value number_of_orders
01/02/2017 50 1
02/02/2017 120 2
03/02/2017 20 1
04/02/2017 0 0
05/02/2017 0 0
etc
An outer join to generate_series will include the days for which there are no orders
select
d::date,
sum(value) as total_order_value,
count(id) as number_of_orders
from
orders
right join
generate_series(
(current_date - interval '2 months')::date,
current_date,
'1 day'
) gs (d) on d = orders.date
group by d

Add Missing monthly dates in a timeseries data in Postgresql

I have monthly time series data in table where dates are as a last day of month. Some of the dates are missing in the data. I want to insert those dates and put zero value for other attributes.
Table is as follows:
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-04-30 34
2 2014-05-31 45
2 2014-08-31 47
I want to convert this table to
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-03-31 0
1 2015-04-30 34
2 2014-05-31 45
2 2014-06-30 0
2 2014-07-31 0
2 2014-08-31 47
Is there any way we can do this in Postgresql?
Currently we are doing this in Python. As our data is growing day by day and its not efficient to handle I/O just for one task.
Thank you
You can do this using generate_series() to generate the dates and then left join to bring in the values:
with m as (
select id, min(report_date) as minrd, max(report_date) as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) as report_date
from m
) m left join
t
on m.report_date = t.report_date;
EDIT:
Turns out that the above doesn't quite work, because adding months to the end of month doesn't keep the last day of the month.
This is easily fixed:
with t as (
select 1 as id, date '2012-01-31' as report_date, 10 as price union all
select 1 as id, date '2012-04-30', 20
), m as (
select id, min(report_date) - interval '1 day' as minrd, max(report_date) - interval '1 day' as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) + interval '1 day' as report_date
from m
) m left join
t
on m.report_date = t.report_date;
The first CTE is just to generate sample data.
This is a slight improvement over Gordon's query which fails to get the last date of a month in some cases.
Essentially you generate all the month end dates between the min and max date for each id (using generate_series) and left join on this generated table to show the missing dates with 0 price.
with minmax as (
select id, min(report_date) as mindt, max(report_date) as maxdt
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select *,
generate_series(date_trunc('MONTH',mindt+interval '1' day),
date_trunc('MONTH',maxdt+interval '1' day),
interval '1' month) - interval '1 day' as report_date
from minmax
) m
left join t on m.report_date = t.report_date
Sample Demo

Need to find Average of top 3 records grouped by ID in SQL

I have a postgres table with customer ID's, dates, and integers. I need to find the average of the top 3 records for each customer ID that have dates within the last year. I can do it with a single ID using the SQL below (id is the customer ID, weekending is the date, and maxattached is the integer).
One caveat: the maximum values are per month, meaning we're only looking at the highest value in a given month to create our dataset, thus why we're extracting month from the date.
SELECT
id,
round(avg(max),0)
FROM
(
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max
FROM
myTable
WHERE
weekending >= now() - interval '1 year' AND
id=110070 group by id,month,year
ORDER BY
max desc limit 3
) AS t
GROUP BY id;
How can I expand this query to include all ID's and a single averaged number for each one?
Here is some sample data:
ID | MaxAttached | Weekending
110070 | 5 | 2011-11-10
110070 | 6 | 2011-11-17
110071 | 4 | 2011-11-10
110071 | 7 | 2011-11-17
110070 | 3 | 2011-12-01
110071 | 8 | 2011-12-01
110070 | 5 | 2012-01-01
110071 | 9 | 2012-01-01
So, for this sample table, I would expect to receive the following results:
ID | MaxAttached
110070 | 5
110071 | 8
This averages the highest value in a given month for each ID (6,3,5 for 110070 and 7,8,9 for 110071)
Note: postgres version 8.1.15
First - get the max(maxattached) for every customer and month:
SELECT id,
max(maxattached) as max_att
FROM myTable
WHERE weekending >= now() - interval '1 year'
GROUP BY id, date_trunc('month',weekending);
Next - for every customer rank all his values:
SELECT id,
max_att,
row_number() OVER (PARTITION BY id ORDER BY max_att DESC) as max_att_rank
FROM <previous select here>;
Next - get the top 3 for every customer:
SELECT id,
max_att
FROM <previous select here>
WHERE max_att_rank <= 3;
Next - get the avg of the values for every customer:
SELECT id,
avg(max_att) as avg_att
FROM <previous select here>
GROUP BY id;
Next - just put all the queries together and rewrite/simplify them for your case.
UPDATE: Here is an SQLFiddle with your test data and the queries: SQLFiddle.
UPDATE2: Here is the query, that will work on 8.1 :
SELECT customer_id,
(SELECT round(avg(max_att),0)
FROM (SELECT max(maxattached) as max_att
FROM table1
WHERE weekending >= now() - interval '2 year'
AND id = ct.customer_id
GROUP BY date_trunc('month',weekending)
ORDER BY max_att DESC
LIMIT 3) sub
) as avg_att
FROM customer_table ct;
The idea - to take your initial query and run it for every customer (customer_table - table with all unique id for customers).
Here is SQLFiddle with this query: SQLFiddle.
Only tested on version 8.3 (8.1 is too old to be on SQLFiddle).
8.3 version
8.3 is the oldest version I've got access to, so I can't guarantee it'll work in 8.1
I'm using a temporary table to work out the best three records.
CREATE TABLE temp_highest_per_month as
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month,
0 as priority
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year;
UPDATE temp_highest_per_month t
SET priority =
(select count(*) from temp_highest_per_month t2
where t2.id = t.id and
(t.max_in_month < t2.max_in_month or
(t.max_in_month= t2.max_in_month and
t.year * 12 + t.month > t2.year * 12 + t.month)));
select id,round(avg(max_in_month),0)
from temp_highest_per_month
where priority <= 3
group by id;
The year & month are included in the working out the priority so that if two months have the same maximum, they'll still be included in the numbering correctly.
9.1 version
Similar to Igor's answer, but I used the With clause to split the steps.
with highest_per_month as
( select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year),
prioritised as
( select id, month, year, max_in_month,
row_number() over (partition by id, month, year
order by max_in_month desc)
as priority
from highest_per_month
)
select id, round(avg(max_in_month),0)
from prioritised
where priority <= 3
group by id;