Get last 12 months data from Db with year in Postgres - sql

I want to fetch last 12 months data from db, I have written a query for that but that only giving me count and month but not year means month related to which year.
My Sql :
Select count(B.id),date_part('month',revision_timestamp) from package AS
A INNER JOIN package_revision AS B ON A.revision_id=B.revision_id
WHERE revision_timestamp > (current_date - INTERVAL '12 months')
GROUP BY date_part('month',revision_timestamp)
it gives me output like this
month | count
-------+-------
7 | 21
8 | 4
9 | 10
but I want year with month like 7 - 2012, or year in other col, doesn't matter

I believe you wanted this:
SELECT to_char(revision_timestamp, 'YYYY-MM'),
count(b.id)
FROM package a
JOIN package_revision b ON a.revision_id = b.revision_id
WHERE revision_timestamp >
date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY 1

select
count(B.id),
date_part('year', revision_timestamp) as year,
date_part('month',revision_timestamp) as month
from package as A
inner join package_revision as B on A.revision_id=B.revision_id
where
revision_timestamp > (current_date - INTERVAL '12 months')
group by
date_part('year', revision_timestamp)
date_part('month', revision_timestamp)
or
select
count(B.id),
to_char(revision_timestamp, 'YYYY-MM') as month
from package as A
inner join package_revision as B on A.revision_id=B.revision_id
where
revision_timestamp > (current_date - INTERVAL '12 months')
group by
to_char(revision_timestamp, 'YYYY-MM')
Keep in mind that, if you filter by revision_timestamp > (current_date - INTERVAL '12 months'), you'll get range from current date in last year (so if today is '2013-09-04' you'll get range from '2012-09-04')

Related

Get count of susbcribers for each month in current year even if count is 0

I need to get the count of new subscribers each month of the current year.
DB Structure: Subscriber(subscriber_id, create_timestamp, ...)
Expected result:
date | count
-----------+------
2021-01-01 | 3
2021-02-01 | 12
2021-03-01 | 0
2021-04-01 | 8
2021-05-01 | 0
I wrote the following query:
SELECT
DATE_TRUNC('month',create_timestamp)
AS create_timestamp,
COUNT(subscriber_id) AS count
FROM subscriber
GROUP BY DATE_TRUNC('month',create_timestamp);
Which works but does not include months where the count is 0. It's only returning the ones that are existing in the table. Like:
"2021-09-01 00:00:00" 3
"2021-08-01 00:00:00" 9
First subquery is used for retrieving year wise each month row then LEFT JOIN with another subquery which is used to retrieve month wise total_count. COALESCE() is used for replacing NULL value to 0.
-- PostgreSQL (v11)
SELECT t.cdate
, COALESCE(p.total_count, 0) total_count
FROM (select generate_series('2021-01-01'::timestamp, '2021-12-15', '1 month') as cdate) t
LEFT JOIN (SELECT DATE_TRUNC('month',create_timestamp) create_timestamp
, SUM(subscriber_id) total_count
FROM subscriber
GROUP BY DATE_TRUNC('month',create_timestamp)) p
ON t.cdate = p.create_timestamp
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=20dcf6c1784ed0d9c5772f2487bcc221
get the count of new subscribers each month of the current year
SELECT month::date, COALESCE(s.count, 0) AS count
FROM generate_series(date_trunc('year', LOCALTIMESTAMP)
, date_trunc('year', LOCALTIMESTAMP) + interval '11 month'
, interval '1 month') m(month)
LEFT JOIN (
SELECT date_trunc('month', create_timestamp) AS month
, count(*) AS count
FROM subscriber
GROUP BY 1
) s USING (month);
db<>fiddle here
That's assuming every row is a "new subscriber". So count(*) is simplest and fastest.
See:
Join a count query on generate_series() and retrieve Null values as '0'
Generating time series between two dates in PostgreSQL

Calculating Aggregates on subset of data based on condition

I have a DB as follows:
| company | timestamp | value |
| ------- | ---------- | ----- |
| google | 2020-09-01 | 5 |
| google | 2020-08-01 | 4 |
| amazon | 2020-09-02 | 3 |
I'd like to calculate the average value for each company within the last year if there are >= 20 datapoints. If there are less than 20 datapoints then I'd like the average during the entire time duration. I know I can do two separate queries and get the averages for each scenario. The question I suppose is how do I merge them back in a single table based on the criteria I have.
select company, avg(value) from my_db GROUP BY company;
select company, avg(value) from my_db
where timestamp > (CURRENT_DATE - INTERVAL '12 months')
GROUP BY company;
WITH last_year AS (
SELECT company, avg(value), 'year' AS range -- optional tag
FROM tbl
WHERE timestamp >= now() - interval '1 year'
GROUP BY 1
HAVING count(*) >= 20 -- 20+ rows in range
)
SELECT company, avg(value), 'all' AS range
FROM tbl
WHERE NOT EXISTS (SELECT FROM last_year WHERE company = t.company)
GROUP BY 1
UNION ALL TABLE last_year;
db<>fiddle here
An index on (timestamp) will only be used if your table is big and holds many years.
If most companies have 20+ rows in range, an index on (company) will be used for the 2nd SELECT to retrieve the few outliers.
Use conditional aggregation:
select company,
case
when sum(case when timestamp > CURRENT_DATE - INTERVAL '12 months' then value end) >= 20 then
avg(case when timestamp > CURRENT_DATE - INTERVAL '12 months' then value end)
else avg(value)
end
from my_db
group by company
If by 20 datapoints you mean 20 rows in the last 12 months for each company, then:
select company,
case
when count(case when timestamp > CURRENT_DATE - INTERVAL '12 months' then value end) >= 20 then
avg(case when timestamp > CURRENT_DATE - INTERVAL '12 months' then value end)
else avg(value)
end
from my_db
group by company
You can use window functions to provide the information for filtering:
select company, avg(value),
(count(*) = cnt_this_year) as only_this_year
from (select t.*,
count(*) filter (where date_trunc('year', datecol) = date_trunc('year', now()) over (partition by company) as cnt_this_year
from t
) t
where cnt_this_year >= 20 and date_trunc('year', datecol) = date_trunc('year', now()) or
cnt_this_year < 20
group by company;
The third column specifies if all the rows are from this year. By filtering in the where clause, it is simple to add other calculations as well (such as min(), max(), and so on).

For LOOP in PostgreSQL

I have a table with the following columns:
(client_id, start_contract_date, end_contract_date)
Every client has a start_contract_date but some clients have a NULL for end_contract_date since they may still be active today.
If we check for a certain date D, a client is active if D is between start_contract_date and end_contract_date (or start_contract_date <= D, if end_contract_date is NULL.)
I want to count, for each month of each year, over 2016 until today, how many customers are active. My problem is that I do not know how to LOOP on the months and years.
I have a partial solution. I can count how many active clients for a specific month of a specific year.
SELECT 2016 as year , 7 as month, count(id_client)
FROM table
WHERE
EXTRACT(year from start_contract_date)<=2016
AND EXTRACT(month from start_contract_date)<=7
AND (EXTRACT(year from end_contract_date)>=2016 OR end_contract_date IS NULL)
AND (EXTRACT(month from end_contract_date)>=7 OR end_contract_date IS NULL)
;
So, how can I run a nested for loop that would be something like
FOR y IN 2016..2017
FOR m IN 1..12
I want the output to be
Year , Month , Count
2016 , 1 , 234
2016 , 2 , 54
…
2017 , 12 , 543
Use the function generate_series() to generate arbitrary series of months, e.g.:
select extract(year from d) as year, extract(month from d) as month
from generate_series('2017-11-01'::date, '2018-02-01', '1 month') d
year | month
------+-------
2017 | 11
2017 | 12
2018 | 1
2018 | 2
(4 rows)
Use the above and the function date_trunc() to extract year-month value from dates:
select extract(year from d) as year, extract(month from d) as month, count(id_client)
from generate_series('2016-01-01'::date, '2019-03-01', '1 month') d
left join my_table
on date_trunc('month', start_contract_date) <= date_trunc('month', d)
and (end_contract_date is null or date_trunc('month', end_contract_date) >= date_trunc('month', d))
group by d
order by d
Note also that the conditions in your query contain logical error.

How to combine two different counts from two different date ranges sql

I'll try to keep this simple.
I have two queries that work just fine, they both count how many users signed up that day between a specific date range.
Query 1 - gets a list of users that signed up for each day a year from today. Here is a picture of the outcome.
SELECT users.created::date,
count(users.id)
FROM users
WHERE users.created::date < now() - interval '12 month'
AND users.created::date > now() - interval '13 month'
AND users.thirdpartyid = 100
GROUP BY users.created::date
ORDER BY users.created::date
Query 2 - gets a list of users that signed up for each day a month ago from today. Here is a picture of this outcome.
SELECT users.created::date,
count(users.id)
FROM users
WHERE users.created::date > now() - interval '1 month'
AND users.thirdpartyid = 100
GROUP BY users.created::date
ORDER BY users.created::date
What I'm stuck on is how can I combine these two queries so that I could create a stack bar graph on my redash website. They are obviously both different years but I'd like my X axis to be the day of the month and the Y to be the number of users. Thank you.
Edit:
Here is an example output that I think would work perfectly for me.
| Day of the month | Users signed up December 2017 | Users signed up December 2018
|------------------ | ----------------------------- | -----------------------------|
| 01 45 56
| ----------------- | ---------------------------- | -----------------------------|
| 02 47 32
| ----------------- | ---------------------------- | -----------------------------|
etc...
You could try using filters. I took the liberty to select the day of month as you seem to want that rather than the full date.
SELECT date_part('day', users.created::date) as day_of_month,
count(users.id) FILTER (
WHERE users.created::date < now() - interval '12 month'
AND users.created::date > now() - interval '13 month') AS month_12,
count(users.id) FILTER (
WHERE users.created::date > now() - interval '1 month') AS month_1
FROM users
WHERE (
(
users.created::date < now() - interval '12 month'
AND users.created::date > now() - interval '13 month'
) OR users.created::date > now() - interval '1 month'
)
AND users.thirdpartyid = 100
GROUP BY day_of_month
ORDER BY day_of_month

Dynamic column names in view (Postgres)

I am currently programming an SQL view which should provide a count of a populated field for a particular month.
This is how I would like the view to be constructed:
Country | (Current Month - 12) Eg Feb 2011 | (Current Month - 11) | (Current Month - 10)
----------|----------------------------------|----------------------|---------------------
UK | 10 | 11 | 23
The number under the month should be a count of all populated fields for a particular country. The field is named eldate and is a date (cast as a char) of format 10-12-2011. I want the count to only count dates which match the month.
So column "Current Month - 12" should only include a count of dates which fall within the month which is 12 months before now. Eg Current Month - 12 for UK should include a count of dates which fall within February-2011.
I would like the column headings to actually reflect the month it is looking at so:
Country | Feb 2011 | March 2011 | April 2011
--------|----------|------------|------------
UK | 4 | 12 | 0
So something like:
SELECT c.country_name,
(SELECT COUNT("C1".eldate) FROM "C1" WHERE "C1".eldate LIKE %NOW()-12 Months% AS NOW() - 12 Months
(SELECT COUNT("C1".eldate) FROM "C1" WHERE "C1".eldate LIKE %NOW()-11 Months% AS NOW() - 11 Months
FROM country AS c
INNER JOIN "site" AS s using (country_id)
INNER JOIN "subject_C1" AS "C1" ON "s"."site_id" = "C1"."site_id"
Obviously this doesn't work but just to give you an idea of what I am getting at.
Any ideas?
Thank you for your help, any more queries please ask.
My first inclination is to produce this table:
+---------+-------+--------+
| Country | Month | Amount |
+---------+-------+--------+
| UK | Jan | 4 |
+---------+-------+--------+
| UK | Feb | 12 |
+---------+-------+--------+
etc. and pivot it. So you'd start with (for example):
SELECT
c.country,
EXTRACT(MONTH FROM s.eldate) AS month,
COUNT(*) AS amount
FROM country AS c
JOIN site AS s ON s.country_id = c.id
WHERE
s.eldate > NOW() - INTERVAL '1 year'
GROUP BY c.country, EXTRACT(MONTH FROM s.eldate);
You could then plug that into one the crosstab functions from the tablefunc module to achieve the pivot, doing something like this:
SELECT *
FROM crosstab('<query from above goes here>')
AS ct(country varchar, january integer, february integer, ... december integer);
You could truncate the dates to make the comparable:
WHERE date_trunc('month', eldate) = date_trunc('month', now()) - interval '12 months'
UPDATE
This kind of replacement for your query:
(SELECT COUNT("C1".eldate) FROM "C1" WHERE date_trunc('month', "C1".eldate) =
date_trunc('month', now()) - interval '12 months') AS TWELVE_MONTHS_AGO
But that would involve a scan of the table for each month, so you could do a single scan with something more along these lines:
SELECT SUM( CASE WHEN date_trunc('month', "C1".eldate) = date_trunc('month', now()) - interval '12 months' THEN 1 ELSE 0 END ) AS TWELVE_MONTHS_AGO
,SUM( CASE WHEN date_trunc('month', "C1".eldate) = date_trunc('month', now()) - interval '11 months' THEN 1 ELSE 0 END ) AS ELEVEN_MONTHS_AGO
...
or do a join with a table of months as others are showing.
UPDATE2
Further to the comment on fixing the columns from Jan to Dec, I was thinking something like this: filter on the last years worth of records, then sum on the appropriate month. Perhaps like this:
SELECT SUM( CASE WHEN EXTRACT(MONTH FROM "C1".eldate) = 1 THEN 1 ELSE 0 END ) AS JAN
,SUM( CASE WHEN EXTRACT(MONTH FROM "C1".eldate) = 2 THEN 1 ELSE 0 END ) AS FEB
...
WHERE date_trunc('month', "C1".eldate) < date_trunc('month', now())
AND date_trunc('month', "C1".eldate) >= date_trunc('month', now()) - interval '12 months'