How to combine two different counts from two different date ranges sql - sql

I'll try to keep this simple.
I have two queries that work just fine, they both count how many users signed up that day between a specific date range.
Query 1 - gets a list of users that signed up for each day a year from today. Here is a picture of the outcome.
SELECT users.created::date,
count(users.id)
FROM users
WHERE users.created::date < now() - interval '12 month'
AND users.created::date > now() - interval '13 month'
AND users.thirdpartyid = 100
GROUP BY users.created::date
ORDER BY users.created::date
Query 2 - gets a list of users that signed up for each day a month ago from today. Here is a picture of this outcome.
SELECT users.created::date,
count(users.id)
FROM users
WHERE users.created::date > now() - interval '1 month'
AND users.thirdpartyid = 100
GROUP BY users.created::date
ORDER BY users.created::date
What I'm stuck on is how can I combine these two queries so that I could create a stack bar graph on my redash website. They are obviously both different years but I'd like my X axis to be the day of the month and the Y to be the number of users. Thank you.
Edit:
Here is an example output that I think would work perfectly for me.
| Day of the month | Users signed up December 2017 | Users signed up December 2018
|------------------ | ----------------------------- | -----------------------------|
| 01 45 56
| ----------------- | ---------------------------- | -----------------------------|
| 02 47 32
| ----------------- | ---------------------------- | -----------------------------|
etc...

You could try using filters. I took the liberty to select the day of month as you seem to want that rather than the full date.
SELECT date_part('day', users.created::date) as day_of_month,
count(users.id) FILTER (
WHERE users.created::date < now() - interval '12 month'
AND users.created::date > now() - interval '13 month') AS month_12,
count(users.id) FILTER (
WHERE users.created::date > now() - interval '1 month') AS month_1
FROM users
WHERE (
(
users.created::date < now() - interval '12 month'
AND users.created::date > now() - interval '13 month'
) OR users.created::date > now() - interval '1 month'
)
AND users.thirdpartyid = 100
GROUP BY day_of_month
ORDER BY day_of_month

Related

SQL why does dateA - dateB <= '3 years' give a different result than dateA <= dateB + '3 years'

I was doing a MODE.com SQL practice question about date format.
The practice question is: Write a query that counts the number of companies acquired within 3 years, 5 years, and 10 years of being founded (in 3 separate columns). Include a column for total companies acquired as well. Group by category and limit to only rows with a founding date.
It uses two tables:
tutorial.crunchbase_companies_clean_date table, which includes information about all the companies, like company name, founded year, etc.
tutorial.crunchbase_acquisitions_clean_datetable, which includes the information about all the acquired companies, like acquired company name, acquired date, etc.
My code is:
SELECT companies.category_code,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '10 years' THEN 1 ELSE NULL END) AS within_10_years,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acq
ON companies.permalink = acq.company_permalink
WHERE companies.founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY total DESC
The result is:
My result
The answer query is:
SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '5 years'
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '10 years'
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY 5 DESC
The result is:
The answer result
You can see in the screenshots that the results are very similar, but some numbers are different.
The only difference I can see between my query and the answer is in the COUNT statements, but I don't really see the difference, for example, between: acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= '3 years' and acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL '3 years'
I tried adding INTERVAL in my SELECT statement:
SELECT companies.category_code,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '3 years' THEN 1 ELSE NULL END) AS less_than_3_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '5 years' THEN 1 ELSE NULL END) AS between_3_to_5_years,
COUNT(CASE WHEN acq.acquired_at_cleaned - companies.founded_at_clean:: timestamp <= INTERVAL '10 years' THEN 1 ELSE NULL END) AS within_10_years,
COUNT(1) AS total
and remove the INTERVAL from the answer query:
SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '3 years'
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '5 years'
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + '10 years'
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
But the results are the same.
I tried to know the result of just the difference between the acquired_date and founded_date, to see if the value can be compared with INTERVAL. The result is in days, which looks promising to me.
The result
I try to give all the information for your consideration. Hope somebody could help. Thank you in advance!
My suggestion is to add/subtract the INTERVAL to/from one date/time and then compare with the other date/time. Don't subtract the date/times and then compare to a string literal. Your database seems to understand '3 YEARS' as 3 * 365 days, regardless of the actual number of days between someDateTime and someDateTime +/- '3 YEARS'. The actual number of days from year to year could be 365 or 366, depending on whether a leap year is crossed.
Here's a simple example of comparing with a specific interval, which also requires we know whether and how many leap years were crossed.
Fiddle
The test case:
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
, xdate - (xdate - INTERVAL '1' YEAR) = '1 YEAR' AS b1
, xdate - (xdate - INTERVAL '1' YEAR) = '365 DAYS' AS b2
, xdate - (xdate - INTERVAL '1' YEAR) = '366 DAYS' AS b3
FROM dates
;
-- AND --
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '1' YEAR AS b1
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '365 DAYS' AS b2
, xdate - (xdate - INTERVAL '1' YEAR) = INTERVAL '366 DAYS' AS b3
FROM dates
;
Result:
diff
b1
b2
b3
366 days
f
f
t
Fiddle
WITH dates AS (
SELECT '2021-01-01'::date AS xdate
)
, diff AS (
SELECT xdate - (xdate - INTERVAL '1' YEAR) AS diff
FROM dates
)
SELECT diff
, CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
THEN 1
END AS compare1
, 366*24*60*60 AS seconds
, CASE WHEN diff = (366*24*60*60 * INTERVAL '1' SECOND)
THEN 1
END AS compare2
, CASE WHEN diff = '31622400 SECONDS'
THEN 1
END AS compare3
FROM diff
;
The result:
diff
compare1
seconds
compare2
compare3
366 days
1
31622400
1
1
Original response:
The fiddle for PostgreSQL
The behavior shown here (below) is similar to the posted behavior.
The problem is the value generated isn't necessarily what you think.
Here's a test case in postgresql which might be representative of your issue.
Maybe this is related to leap year, where the number of days in a year isn't constant.
So it's probably safer to compare the dates rather than assume some number of days, which is probably the assumption <= '3 years' makes.
The test SQL:
WITH test (acquired_at_cleaned, founded_at_clean, n) AS (
SELECT current_date, current_date - INTERVAL '4' YEAR, 4 UNION
SELECT current_date, current_date - INTERVAL '3' YEAR, 3 UNION
SELECT current_date, current_date - INTERVAL '2' YEAR, 2 UNION
SELECT current_date, current_date - INTERVAL '1' YEAR, 1
)
, cases AS (
SELECT test.*
, CASE WHEN acquired_at_cleaned <= founded_at_clean::timestamp + INTERVAL '3' year
THEN 1 ELSE NULL
END AS acquired_3_yrs_case1
, CASE WHEN acquired_at_cleaned - founded_at_clean::timestamp <= '3 year'
THEN 1 ELSE NULL
END AS acquired_3_yrs_case2
, acquired_at_cleaned - founded_at_clean::timestamp AS x1
, acquired_at_cleaned - (n * INTERVAL '1' YEAR) AS x2
FROM test
)
SELECT acquired_at_cleaned AS acquired
, founded_at_clean AS founded
, n
, acquired_3_yrs_case1 AS case1
, acquired_3_yrs_case2 AS case2
, x1, x2
FROM cases
ORDER BY founded_at_clean
;
The result:
acquired
founded
n
case1
case2
x1
x2
2021-12-25
2017-12-25 00:00:00
4
null
null
1461 days
2017-12-26 00:00:00
2021-12-25
2018-12-25 00:00:00
3
1
null
1096 days
2018-12-26 00:00:00
2021-12-25
2019-12-25 00:00:00
2
1
1
731 days
2019-12-26 00:00:00
2021-12-25
2020-12-25 00:00:00
1
1
1
365 days
2020-12-26 00:00:00
Interesting result.

Last 14 days vs before last 14 days data

orders
id
order_id
created_at
updated_at
total_amount
1
abc123
2021-06-13 11:00:00
2021-06-13 11:00:00
230.5
2
abc456
2021-06-01 07:00:00
2021-06-01 07:00:00
240
To get no of purchases on last 7 days vs before last 7 days I wrote the following query
select
date_trunc('week', created_at) as "Week",
count(*) "No of purchases"
from orders
How can I get no of purchases on last 14 days vs before last 14 days?
Is there a way I can pass like '14 days' or something like that to date_turnc method?
If not How Can I write this query?
Why not just use date comparisons?
select count(*) as num_purchases
from orders
where created_at >= current_date - interval '14 day'
Just subtract the respective intervals from now() (or any other function/variable that gives you the right time for the current moment) and compare that to the creation timestamp. Something along the lines of:
SELECT count(*)
FROM orders
WHERE created_at >= now() - '14 days'::interval
AND created_at < now() - '7 days'::interval;

Sum results on constant timeframe range on each date in table

I'm using PostGres DB.
I have a table that contains test names, their results and reported time:
|test_name|result |report_time|
| A |error |29/11/2020 |
| A |failure|28/12/2020 |
| A |error |29/12/2020 |
| B |passed |30/12/2020 |
| C |failure|31/12/2020 |
| A |error |31/12/2020 |
I'd like to sum how many tests have failed or errored in the last 30 days, per date (and limit it to be 5 days back from the current date), so the final result will be:
| date | sum | (notes)
| 29/11/2020 | 1 | 1 failed/errored test in range (29/11 -> 29/10)
| 28/12/2020 | 2 | 2 failed/errored tests in range (28/12 -> 28/11)
| 29/12/2020 | 3 | 3 failed/errored tests in range (29/12 -> 29/11)
| 30/12/2020 | 2 | 2 failed/errored tests in range (30/12 -> 30/11)
| 31/12/2020 | 4 | 4 failed/errored tests in range (31/12 -> 31/11)
I know how to sum the results per date (i.e, how many failures/errors were on a specific date):
SELECT report_time::date AS "Report Time", count(case when result in ('failure', 'error') then 1 else
null end) from table
where report_time::date = now()::date
GROUP BY report_time::date, count(case when result in ('failure', 'error') then 1 else null end)
But I'm struggling to sum each date 30 days back.
You can generate the dates and then use window functions:
select gs.dte, num_failed_error, num_failed_error_30
from genereate_series(current_date - interval '5 day', current_date, interval '1 day') gs(dte) left join
(select t.report_time, count(*) as num_failed_error,
sum(count(*)) over (order by report_time range between interval '30 day' preceding and current row) as num_failed_error_30
from t
where t.result in ('failed', 'error') and
t.report_time >= current_date - interval '35 day'
group by t.report_time
) t
on t.report_time = gs.dte ;
Note: This assumes that report_time is only the date with no time component. If it has a time component, use report_time::date.
If you have data on each day, then this can be simplified to:
select t.report_time, count(*) as num_failed_error,
sum(count(*)) over (order by report_time range between interval '30 day' preceding and current row) as num_failed_error_30
from t
where t.result in ('failed', 'error') and
t.report_time >= current_date - interval '35 day'
group by t.report_time
order by report_time desc
limit 5;
Since I'm using PostGresSql 10.12 and update is currently not an option, I took a different approach, where I calculate the dates of the last 30 days and for each date I calculate the cumulative distinct sum for the past 30 days:
SELECT days_range::date, SUM(number_of_tests)
FROM generate_series (now() - interval '30 day', now()::timestamp , '1 day'::interval) days_range
CROSS JOIN LATERAL (
SELECT environment, COUNT(DISTINCT(test_name)) as number_of_tests from tests
WHERE report_time > days_range - interval '30 day'
GROUP BY report_time::date
HAVING COUNT(case when result in ('failure', 'error') then 1 else null end) > 0
ORDER BY report_time::date asc
) as lateral_query
GROUP BY days_range
ORDER BY days_range desc
It is definitely not the best optimized query, it takes ~1 minute for it to compute.

Get last 12 months data from Db with year in Postgres

I want to fetch last 12 months data from db, I have written a query for that but that only giving me count and month but not year means month related to which year.
My Sql :
Select count(B.id),date_part('month',revision_timestamp) from package AS
A INNER JOIN package_revision AS B ON A.revision_id=B.revision_id
WHERE revision_timestamp > (current_date - INTERVAL '12 months')
GROUP BY date_part('month',revision_timestamp)
it gives me output like this
month | count
-------+-------
7 | 21
8 | 4
9 | 10
but I want year with month like 7 - 2012, or year in other col, doesn't matter
I believe you wanted this:
SELECT to_char(revision_timestamp, 'YYYY-MM'),
count(b.id)
FROM package a
JOIN package_revision b ON a.revision_id = b.revision_id
WHERE revision_timestamp >
date_trunc('month', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY 1
select
count(B.id),
date_part('year', revision_timestamp) as year,
date_part('month',revision_timestamp) as month
from package as A
inner join package_revision as B on A.revision_id=B.revision_id
where
revision_timestamp > (current_date - INTERVAL '12 months')
group by
date_part('year', revision_timestamp)
date_part('month', revision_timestamp)
or
select
count(B.id),
to_char(revision_timestamp, 'YYYY-MM') as month
from package as A
inner join package_revision as B on A.revision_id=B.revision_id
where
revision_timestamp > (current_date - INTERVAL '12 months')
group by
to_char(revision_timestamp, 'YYYY-MM')
Keep in mind that, if you filter by revision_timestamp > (current_date - INTERVAL '12 months'), you'll get range from current date in last year (so if today is '2013-09-04' you'll get range from '2012-09-04')

Dynamic column names in view (Postgres)

I am currently programming an SQL view which should provide a count of a populated field for a particular month.
This is how I would like the view to be constructed:
Country | (Current Month - 12) Eg Feb 2011 | (Current Month - 11) | (Current Month - 10)
----------|----------------------------------|----------------------|---------------------
UK | 10 | 11 | 23
The number under the month should be a count of all populated fields for a particular country. The field is named eldate and is a date (cast as a char) of format 10-12-2011. I want the count to only count dates which match the month.
So column "Current Month - 12" should only include a count of dates which fall within the month which is 12 months before now. Eg Current Month - 12 for UK should include a count of dates which fall within February-2011.
I would like the column headings to actually reflect the month it is looking at so:
Country | Feb 2011 | March 2011 | April 2011
--------|----------|------------|------------
UK | 4 | 12 | 0
So something like:
SELECT c.country_name,
(SELECT COUNT("C1".eldate) FROM "C1" WHERE "C1".eldate LIKE %NOW()-12 Months% AS NOW() - 12 Months
(SELECT COUNT("C1".eldate) FROM "C1" WHERE "C1".eldate LIKE %NOW()-11 Months% AS NOW() - 11 Months
FROM country AS c
INNER JOIN "site" AS s using (country_id)
INNER JOIN "subject_C1" AS "C1" ON "s"."site_id" = "C1"."site_id"
Obviously this doesn't work but just to give you an idea of what I am getting at.
Any ideas?
Thank you for your help, any more queries please ask.
My first inclination is to produce this table:
+---------+-------+--------+
| Country | Month | Amount |
+---------+-------+--------+
| UK | Jan | 4 |
+---------+-------+--------+
| UK | Feb | 12 |
+---------+-------+--------+
etc. and pivot it. So you'd start with (for example):
SELECT
c.country,
EXTRACT(MONTH FROM s.eldate) AS month,
COUNT(*) AS amount
FROM country AS c
JOIN site AS s ON s.country_id = c.id
WHERE
s.eldate > NOW() - INTERVAL '1 year'
GROUP BY c.country, EXTRACT(MONTH FROM s.eldate);
You could then plug that into one the crosstab functions from the tablefunc module to achieve the pivot, doing something like this:
SELECT *
FROM crosstab('<query from above goes here>')
AS ct(country varchar, january integer, february integer, ... december integer);
You could truncate the dates to make the comparable:
WHERE date_trunc('month', eldate) = date_trunc('month', now()) - interval '12 months'
UPDATE
This kind of replacement for your query:
(SELECT COUNT("C1".eldate) FROM "C1" WHERE date_trunc('month', "C1".eldate) =
date_trunc('month', now()) - interval '12 months') AS TWELVE_MONTHS_AGO
But that would involve a scan of the table for each month, so you could do a single scan with something more along these lines:
SELECT SUM( CASE WHEN date_trunc('month', "C1".eldate) = date_trunc('month', now()) - interval '12 months' THEN 1 ELSE 0 END ) AS TWELVE_MONTHS_AGO
,SUM( CASE WHEN date_trunc('month', "C1".eldate) = date_trunc('month', now()) - interval '11 months' THEN 1 ELSE 0 END ) AS ELEVEN_MONTHS_AGO
...
or do a join with a table of months as others are showing.
UPDATE2
Further to the comment on fixing the columns from Jan to Dec, I was thinking something like this: filter on the last years worth of records, then sum on the appropriate month. Perhaps like this:
SELECT SUM( CASE WHEN EXTRACT(MONTH FROM "C1".eldate) = 1 THEN 1 ELSE 0 END ) AS JAN
,SUM( CASE WHEN EXTRACT(MONTH FROM "C1".eldate) = 2 THEN 1 ELSE 0 END ) AS FEB
...
WHERE date_trunc('month', "C1".eldate) < date_trunc('month', now())
AND date_trunc('month', "C1".eldate) >= date_trunc('month', now()) - interval '12 months'