Subtraction of counts of 2 tables - sql

I have 2 different tables, A and B. A is something like created and b is removed
I want to obtain the nett difference of the counts per week in an SQL query.
Currently I have
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "A - New"
FROM table_name.A
GROUP BY 1
ORDER BY 1
This gets me the count per week for table A only. How could I incorporate the logic of subtracting the same Count(id) from B, for the same timeframe?
Thanks! :)

The potential issue here is that for any week you might only have additions or removals, so to align a count from the 2 tables - by week - an approach would be to use a full outer join, like this:
SELECT COALESECE(A.week, b.week) as week
, count_a
, count_b
, COALESECE(count_a,0) - COALESECE(count_b,0) net
FROM (
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS week
, Count(*) AS count_A
FROM table_a
GROUP BY DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08')
) a
FUUL OUTER JOIN (
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS week
, Count(*) AS count_b
FROM table_b
GROUP BY DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08')
) b on a.week = b.week

The usual syntex for substracting values from 2 queries is as follows
Select (Query1) - (Query2) from dual;
Assuming both the tables have same number of id in 'id' column and your given query works for tableA, following query will subtract the count(id) from both tables.
select(SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "A - New" FROM table_name.A GROUP BY 1 ORDER BY 1) - (SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "B - New" FROM table_name.B GROUP BY 1 ORDER BY 1) from dual
Or you can also try the following approach
Select c1-c2 from(Query1 count()as c1),(Query2 count() as c2);
So your query will be like
Select c1-c2 from (SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week, Count(id) AS c1 FROM table_name.A GROUP BY 1 ORDER BY 1),(SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week, Count(id) AS c2 FROM table_name.B GROUP BY 1 ORDER BY 1);

Related

In Postgres how do I write a SQL query to select distinct values overall but aggregated over a set time period

What I mean by this is if I have a table called payments with a created_at column and user_id column I want to select the count of purchases aggregated weekly (can be any interval I want) but only selecting first time purchases e.g. if a user purchased for the first time in week 1 it would be counted but if he purchased again in week 2 he would not be counted.
created_at
user_id
timestamp
1
timestamp
1
This is the query I came up with. The issue is if the user purchases multiple times they are all included. How can I improve this?
WITH dates AS
(
SELECT *
FROM generate_series(
'2022-07-22T15:30:06.687Z'::DATE,
'2022-11-21T17:04:59.457Z'::DATE,
'1 week'
) date
)
SELECT
dates.date::DATE AS date,
COALESCE(COUNT(DISTINCT(user_id)), 0) AS registrations
FROM
dates
LEFT JOIN
payment ON created_at::DATE BETWEEN dates.date AND dates.date::date + '1 ${dateUnit}'::INTERVAL
GROUP BY
dates.date
ORDER BY
dates.date DESC;
You want to count only first purchases. So get those first purchases in the first step and work with these.
WITH dates AS
(
SELECT *
FROM generate_series(
'2022-07-22T15:30:06.687Z'::DATE,
'2022-11-21T17:04:59.457Z'::DATE,
'1 week'
) date
)
, first_purchases AS
(
SELECT user_id, MIN(created_at:DATE) AS purchase_date
FROM payment
GROUP BY user_id
)
SELECT
d.date,
COALESCE(COUNT(p.purchase_date), 0) AS registrations
FROM
dates d
LEFT JOIN
first_purchases p ON p.purchase_date >= d.date
AND p.purchase_date < d.date + '1 ${dateUnit}'::INTERVAL
GROUP BY
d.date
ORDER BY
d.date DESC;

How can I calculate an "active users" aggregation from an activity log in SQL?

In PostgreSQL, I have a table that logs activity for all users, with an account ID and a timestamp field:
SELECT account_id, created FROM activity_log;
A single account_id can appear many times in a day, or not at all.
I would like a chart showing the number of "active users" each day, where "active users"
means "users who have done any activity within the previous X days".
If X is 1, then we can just truncate timestamp to 'day' and aggregate:
SELECT date_trunc('day', created) AS date, count(DISTINCT account_id)
FROM activity_log
GROUP BY date_trunc('day', created) ORDER BY date;
If X is exactly 7, then we could truncate to 'week' and aggregate - although this gives
me only one data point for a week, when I actually want one data point per day.
But I need to solve for the general case of different X, and give a distinct data point for each day.
One method is to generate the dates and then count using left join and group by or similar logic. The following uses a lateral join:
select gs.dte, al.num_accounts
from generate_series('2021-01-01'::date, '2021-01-31'::date, interval '1 day'
) gs(dte) left join lateral
(select count(distinct al.account_id) as num_accounts
from activity_log al
where al.created >= gs.dte - (<n - 1>) * interval '1 day' and
al.created < gs.dte + interval '1 day'
) al
on 1=1
order by gs.dte;
<n - 1> is one less than the number of days. So for one week, it would be 6.
If your goal is to get day wise distinct account_id for last X days you can use below query. Instead of 7 you can use any number as you wise:
SELECT date_trunc('day', created) AS date, count(DISTINCT account_id)
FROM activity_log
where date_trunc('day', created)>=date_trunc('day',CURRENT_DATE) +interval '-7' day
GROUP BY date_trunc('day', created)
ORDER BY date
(If there is no activity in any given date then the date will not be in the output.)

How to get a count of data for every date in postgres

I am trying to get data to populate a multi-line graph. The table jobs has the columns id, created_at, and partner_id. I would like to display the sum of jobs for each partner_id each day. My current query has 2 problems. 1) It is missing a lot of jobs. 2) It only contains an entry for a given day if there was a row on that day. My current query is where start is an integer denoting how many days back we are looking for data:
SELECT d.date, count(j.id), j.partner_id FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD')
AS date
FROM generate_series(0, #{start}, 1)
AS offs
) d
JOIN (
SELECT jobs.id, jobs.created_at, jobs.partner_id FROM jobs
WHERE jobs.created_at > now() - INTERVAL '#{start} days'
) j
ON (d.date=to_char(date_trunc('day', j.created_at), 'YYYY-MM-DD'))
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;
This returns records like the following:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
what I want is something like this:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-22", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
So that for every day in the query I have an entry for every partner even if that count is 0. How can I adjust the query to populate data even when the count is 0?
Use a LEFT JOIN. You also don't need so many subqueries and there is no need to translate to a date to a string and then back to a date:
SELECT d.date, count(j.id), j.partner_id
FROM (SELECT to_char(dte, 'YYYY-MM-DD') AS date , dte
FROM generate_series(current_date - {start} * interval '1 day', current_date, interval '1 day') gs(dte)
) d LEFT JOIN
jobs j
ON DATE_TRUNC('day', j.created_at) = d.dte
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;

Multiple SELECT on the same field in one statement

I got the following table:
**stats**
id INT FK
day INT
value INT
I would like to create an SQL query that will sum the values in value column in the last day, last week and last month, in one statement.
So Far i got this:
select sum(value) from stats as A where A.day > now() - 1
union
select sum(value) from stats as B where B.day > now() - 7
union
select sum(value) from stats as C where C.day > now() - 30
This returns just the first sum(value), i was expecting 3 values to return.
Running: select sum(value) from stats as A where A.day > now() - X ( Where x = 1/7/30) in different queries works as it should.
What's wrong with the query? Thanks!
UNION is implicit distinct. Use UNION ALL instead like so:
SELECT 'last day' ItemType, sum(value) FROM stats as A WHERE A.day > now() - 1
UNION ALL
SELECT 'last week', SUM(value) FROM stats as B WHERE B.day > now() - 7
UNION ALL
SELECT 'last month', SUM(value) FROM stats as C WHERE C.day > now() - 30
Note that: I added a new column ItemType to indicate what is the type of the sum value whether it is last day, last week or last month

how to make this query also return rows with 0 count value?

I have written a small PostgreSQL query that helps me total amount of jobs executed per hourly intervals in every day within two certain dates -e.g. all the jobs executed between February 2, 2012 and March 3, 2012 hour by hour starting with the hour given in February 2 and ending with the hour given in March 3- I have noticed that this query doesn't print the rows with 0 count -no job executed within that time interval e.g. at February 21, 2012 between 5 and 6pm-. How can I make this also return results(rows) with 0 count? The code is as below:
SELECT date_trunc('hour', executiontime), count(executiontime)
FROM mytable
WHERE executiontime BETWEEN '2011-2-2 0:00:00' AND '2012-3-2 5:00:00'
GROUP BY date_trunc('hour', executiontime)
ORDER BY date_trunc('hour', executiontime) ASC;
Thanks in advance.
-- CTE to the rescue!!!
WITH cal AS (
SELECT generate_series('2012-02-02 00:00:00'::timestamp , '2012-03-02 05:00:00'::timestamp , '1 hour'::interval) AS stamp
)
, qqq AS (
SELECT date_trunc('hour', executiontime) AS stamp
, count(*) AS zcount
FROM mytable
GROUP BY date_trunc('hour', executiontime)
)
SELECT cal.stamp
, COALESCE (qqq.zcount, 0) AS zcount
FROM cal
LEFT JOIN qqq ON cal.stamp = qqq.stamp
ORDER BY stamp ASC
;
Look this. Idea is to generate array or table with dates in this period and join with job execution table.