I have 2 different tables, A and B. A is something like created and b is removed
I want to obtain the nett difference of the counts per week in an SQL query.
Currently I have
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "A - New"
FROM table_name.A
GROUP BY 1
ORDER BY 1
This gets me the count per week for table A only. How could I incorporate the logic of subtracting the same Count(id) from B, for the same timeframe?
Thanks! :)
The potential issue here is that for any week you might only have additions or removals, so to align a count from the 2 tables - by week - an approach would be to use a full outer join, like this:
SELECT COALESECE(A.week, b.week) as week
, count_a
, count_b
, COALESECE(count_a,0) - COALESECE(count_b,0) net
FROM (
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS week
, Count(*) AS count_A
FROM table_a
GROUP BY DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08')
) a
FUUL OUTER JOIN (
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS week
, Count(*) AS count_b
FROM table_b
GROUP BY DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08')
) b on a.week = b.week
The usual syntex for substracting values from 2 queries is as follows
Select (Query1) - (Query2) from dual;
Assuming both the tables have same number of id in 'id' column and your given query works for tableA, following query will subtract the count(id) from both tables.
select(SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "A - New" FROM table_name.A GROUP BY 1 ORDER BY 1) - (SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "B - New" FROM table_name.B GROUP BY 1 ORDER BY 1) from dual
Or you can also try the following approach
Select c1-c2 from(Query1 count()as c1),(Query2 count() as c2);
So your query will be like
Select c1-c2 from (SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week, Count(id) AS c1 FROM table_name.A GROUP BY 1 ORDER BY 1),(SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week, Count(id) AS c2 FROM table_name.B GROUP BY 1 ORDER BY 1);
Related
What I mean by this is if I have a table called payments with a created_at column and user_id column I want to select the count of purchases aggregated weekly (can be any interval I want) but only selecting first time purchases e.g. if a user purchased for the first time in week 1 it would be counted but if he purchased again in week 2 he would not be counted.
created_at
user_id
timestamp
1
timestamp
1
This is the query I came up with. The issue is if the user purchases multiple times they are all included. How can I improve this?
WITH dates AS
(
SELECT *
FROM generate_series(
'2022-07-22T15:30:06.687Z'::DATE,
'2022-11-21T17:04:59.457Z'::DATE,
'1 week'
) date
)
SELECT
dates.date::DATE AS date,
COALESCE(COUNT(DISTINCT(user_id)), 0) AS registrations
FROM
dates
LEFT JOIN
payment ON created_at::DATE BETWEEN dates.date AND dates.date::date + '1 ${dateUnit}'::INTERVAL
GROUP BY
dates.date
ORDER BY
dates.date DESC;
You want to count only first purchases. So get those first purchases in the first step and work with these.
WITH dates AS
(
SELECT *
FROM generate_series(
'2022-07-22T15:30:06.687Z'::DATE,
'2022-11-21T17:04:59.457Z'::DATE,
'1 week'
) date
)
, first_purchases AS
(
SELECT user_id, MIN(created_at:DATE) AS purchase_date
FROM payment
GROUP BY user_id
)
SELECT
d.date,
COALESCE(COUNT(p.purchase_date), 0) AS registrations
FROM
dates d
LEFT JOIN
first_purchases p ON p.purchase_date >= d.date
AND p.purchase_date < d.date + '1 ${dateUnit}'::INTERVAL
GROUP BY
d.date
ORDER BY
d.date DESC;
In PostgreSQL, I have a table that logs activity for all users, with an account ID and a timestamp field:
SELECT account_id, created FROM activity_log;
A single account_id can appear many times in a day, or not at all.
I would like a chart showing the number of "active users" each day, where "active users"
means "users who have done any activity within the previous X days".
If X is 1, then we can just truncate timestamp to 'day' and aggregate:
SELECT date_trunc('day', created) AS date, count(DISTINCT account_id)
FROM activity_log
GROUP BY date_trunc('day', created) ORDER BY date;
If X is exactly 7, then we could truncate to 'week' and aggregate - although this gives
me only one data point for a week, when I actually want one data point per day.
But I need to solve for the general case of different X, and give a distinct data point for each day.
One method is to generate the dates and then count using left join and group by or similar logic. The following uses a lateral join:
select gs.dte, al.num_accounts
from generate_series('2021-01-01'::date, '2021-01-31'::date, interval '1 day'
) gs(dte) left join lateral
(select count(distinct al.account_id) as num_accounts
from activity_log al
where al.created >= gs.dte - (<n - 1>) * interval '1 day' and
al.created < gs.dte + interval '1 day'
) al
on 1=1
order by gs.dte;
<n - 1> is one less than the number of days. So for one week, it would be 6.
If your goal is to get day wise distinct account_id for last X days you can use below query. Instead of 7 you can use any number as you wise:
SELECT date_trunc('day', created) AS date, count(DISTINCT account_id)
FROM activity_log
where date_trunc('day', created)>=date_trunc('day',CURRENT_DATE) +interval '-7' day
GROUP BY date_trunc('day', created)
ORDER BY date
(If there is no activity in any given date then the date will not be in the output.)
I am trying to get data to populate a multi-line graph. The table jobs has the columns id, created_at, and partner_id. I would like to display the sum of jobs for each partner_id each day. My current query has 2 problems. 1) It is missing a lot of jobs. 2) It only contains an entry for a given day if there was a row on that day. My current query is where start is an integer denoting how many days back we are looking for data:
SELECT d.date, count(j.id), j.partner_id FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD')
AS date
FROM generate_series(0, #{start}, 1)
AS offs
) d
JOIN (
SELECT jobs.id, jobs.created_at, jobs.partner_id FROM jobs
WHERE jobs.created_at > now() - INTERVAL '#{start} days'
) j
ON (d.date=to_char(date_trunc('day', j.created_at), 'YYYY-MM-DD'))
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;
This returns records like the following:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
what I want is something like this:
[{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"099"},
{"date"=>"2019-06-22", "count"=>1, "partner_id"=>"099"},
{"date"=>"2019-06-21", "count"=>3, "partner_id"=>"075"},
{"date"=>"2019-06-22", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>0, "partner_id"=>"075"},
{"date"=>"2019-06-23", "count"=>1, "partner_id"=>"099"}]
So that for every day in the query I have an entry for every partner even if that count is 0. How can I adjust the query to populate data even when the count is 0?
Use a LEFT JOIN. You also don't need so many subqueries and there is no need to translate to a date to a string and then back to a date:
SELECT d.date, count(j.id), j.partner_id
FROM (SELECT to_char(dte, 'YYYY-MM-DD') AS date , dte
FROM generate_series(current_date - {start} * interval '1 day', current_date, interval '1 day') gs(dte)
) d LEFT JOIN
jobs j
ON DATE_TRUNC('day', j.created_at) = d.dte
GROUP BY d.date, j.partner_id
ORDER BY j.partner_id, d.date;
I got the following table:
**stats**
id INT FK
day INT
value INT
I would like to create an SQL query that will sum the values in value column in the last day, last week and last month, in one statement.
So Far i got this:
select sum(value) from stats as A where A.day > now() - 1
union
select sum(value) from stats as B where B.day > now() - 7
union
select sum(value) from stats as C where C.day > now() - 30
This returns just the first sum(value), i was expecting 3 values to return.
Running: select sum(value) from stats as A where A.day > now() - X ( Where x = 1/7/30) in different queries works as it should.
What's wrong with the query? Thanks!
UNION is implicit distinct. Use UNION ALL instead like so:
SELECT 'last day' ItemType, sum(value) FROM stats as A WHERE A.day > now() - 1
UNION ALL
SELECT 'last week', SUM(value) FROM stats as B WHERE B.day > now() - 7
UNION ALL
SELECT 'last month', SUM(value) FROM stats as C WHERE C.day > now() - 30
Note that: I added a new column ItemType to indicate what is the type of the sum value whether it is last day, last week or last month
I have written a small PostgreSQL query that helps me total amount of jobs executed per hourly intervals in every day within two certain dates -e.g. all the jobs executed between February 2, 2012 and March 3, 2012 hour by hour starting with the hour given in February 2 and ending with the hour given in March 3- I have noticed that this query doesn't print the rows with 0 count -no job executed within that time interval e.g. at February 21, 2012 between 5 and 6pm-. How can I make this also return results(rows) with 0 count? The code is as below:
SELECT date_trunc('hour', executiontime), count(executiontime)
FROM mytable
WHERE executiontime BETWEEN '2011-2-2 0:00:00' AND '2012-3-2 5:00:00'
GROUP BY date_trunc('hour', executiontime)
ORDER BY date_trunc('hour', executiontime) ASC;
Thanks in advance.
-- CTE to the rescue!!!
WITH cal AS (
SELECT generate_series('2012-02-02 00:00:00'::timestamp , '2012-03-02 05:00:00'::timestamp , '1 hour'::interval) AS stamp
)
, qqq AS (
SELECT date_trunc('hour', executiontime) AS stamp
, count(*) AS zcount
FROM mytable
GROUP BY date_trunc('hour', executiontime)
)
SELECT cal.stamp
, COALESCE (qqq.zcount, 0) AS zcount
FROM cal
LEFT JOIN qqq ON cal.stamp = qqq.stamp
ORDER BY stamp ASC
;
Look this. Idea is to generate array or table with dates in this period and join with job execution table.