Count of active user sessions per hour - sql

For each user login to our website, we insert a record into our user_session table with the user's login and logout timestamps. If I wanted to produce a graph of the number of logins per hour over time, it would be easy with the following SQL.
SELECT
date_trunc('hour',login_time) AS "time",
count(*)
FROM user_session
group by time
order by time
Time would be the X-axis and count would be the Y-axis.
But what I really need is the number of active sessions in each hour where "active" means
login_time <= foo and logout_time >= foo where foo is the particular time slot.
How can I do this in a single SELECT statement?

One brute force method generates the hours and then uses a lateral join or correlated subquery to do the calculation:
select gs.ts, us.num_active
from generate_series('2021-03-21'::timestamp, '2021-03-22'::timestamp, interval '1 hour') gs(ts) left join lateral
(select count(*) as num_active
from user_session us
where us.login_time <= gs.ts and
us.logout_time > gs.ts
) us
on 1=1;
A more efficient method -- particularly for longer periods of time -- is to pivot the times and keep an incremental count is ins and outs:
with cte as (
select date_trunc('hour', login_time) as hh, count(*) as inc
from user_session
group by hh
select date_trunc('hour', logout_time + interval '1 hour') as hh, - count(*) as inc
from user_session
group by hh
)
select hh, sum(inc) as net_in_hour,
sum(sum(inc)) over (order by hh) as active_in_hour
from cte
group by hh;

Related

In Postgres how do I write a SQL query to select distinct values overall but aggregated over a set time period

What I mean by this is if I have a table called payments with a created_at column and user_id column I want to select the count of purchases aggregated weekly (can be any interval I want) but only selecting first time purchases e.g. if a user purchased for the first time in week 1 it would be counted but if he purchased again in week 2 he would not be counted.
created_at
user_id
timestamp
1
timestamp
1
This is the query I came up with. The issue is if the user purchases multiple times they are all included. How can I improve this?
WITH dates AS
(
SELECT *
FROM generate_series(
'2022-07-22T15:30:06.687Z'::DATE,
'2022-11-21T17:04:59.457Z'::DATE,
'1 week'
) date
)
SELECT
dates.date::DATE AS date,
COALESCE(COUNT(DISTINCT(user_id)), 0) AS registrations
FROM
dates
LEFT JOIN
payment ON created_at::DATE BETWEEN dates.date AND dates.date::date + '1 ${dateUnit}'::INTERVAL
GROUP BY
dates.date
ORDER BY
dates.date DESC;
You want to count only first purchases. So get those first purchases in the first step and work with these.
WITH dates AS
(
SELECT *
FROM generate_series(
'2022-07-22T15:30:06.687Z'::DATE,
'2022-11-21T17:04:59.457Z'::DATE,
'1 week'
) date
)
, first_purchases AS
(
SELECT user_id, MIN(created_at:DATE) AS purchase_date
FROM payment
GROUP BY user_id
)
SELECT
d.date,
COALESCE(COUNT(p.purchase_date), 0) AS registrations
FROM
dates d
LEFT JOIN
first_purchases p ON p.purchase_date >= d.date
AND p.purchase_date < d.date + '1 ${dateUnit}'::INTERVAL
GROUP BY
d.date
ORDER BY
d.date DESC;

How can I calculate an "active users" aggregation from an activity log in SQL?

In PostgreSQL, I have a table that logs activity for all users, with an account ID and a timestamp field:
SELECT account_id, created FROM activity_log;
A single account_id can appear many times in a day, or not at all.
I would like a chart showing the number of "active users" each day, where "active users"
means "users who have done any activity within the previous X days".
If X is 1, then we can just truncate timestamp to 'day' and aggregate:
SELECT date_trunc('day', created) AS date, count(DISTINCT account_id)
FROM activity_log
GROUP BY date_trunc('day', created) ORDER BY date;
If X is exactly 7, then we could truncate to 'week' and aggregate - although this gives
me only one data point for a week, when I actually want one data point per day.
But I need to solve for the general case of different X, and give a distinct data point for each day.
One method is to generate the dates and then count using left join and group by or similar logic. The following uses a lateral join:
select gs.dte, al.num_accounts
from generate_series('2021-01-01'::date, '2021-01-31'::date, interval '1 day'
) gs(dte) left join lateral
(select count(distinct al.account_id) as num_accounts
from activity_log al
where al.created >= gs.dte - (<n - 1>) * interval '1 day' and
al.created < gs.dte + interval '1 day'
) al
on 1=1
order by gs.dte;
<n - 1> is one less than the number of days. So for one week, it would be 6.
If your goal is to get day wise distinct account_id for last X days you can use below query. Instead of 7 you can use any number as you wise:
SELECT date_trunc('day', created) AS date, count(DISTINCT account_id)
FROM activity_log
where date_trunc('day', created)>=date_trunc('day',CURRENT_DATE) +interval '-7' day
GROUP BY date_trunc('day', created)
ORDER BY date
(If there is no activity in any given date then the date will not be in the output.)

PostgreSQL - generating an hourly list

I have an API that counts events from a table and groups them by the hour of day and severity that I use to draw a graph. this is my current query
SELECT
extract(hour FROM time) AS hours,
alarm. "severity",
COUNT(*)
FROM
alarm
WHERE
date = '2019-06-12'
GROUP BY
extract(hour FROM time),
alarm. "severity"
ORDER BY
extract(hour FROM time),
alarm. "severity"
what I really want to do is get a list of hours from 00 to 24 with the corresponding event counts and 0 if there are no events that hour. is there a way to make postgres generate such a structure?
Use generate_series() to generate the hours and a cross join for the severities:
SELECT gs.h, s.severity, COUNT(a.time)
FROM GENERATE_SERIES(0, 23, 1) gs(h) CROSS JOIN
(SELECT DISTINCT a.severity FROM alarm
) s LEFT JOIN
alarm a
ON extract(hour FROM a.time) = gs.h AND
a.severity = s.severity AND
a.date = '2019-06-12'
GROUP BY gs.h, s.severity
ORDER BY gs.h, s.severity;

LEFT OUTER JOIN Error creating a subquery on bigquery

I'm trying to eval MAL, WAL and DAU from a event table on my bq...
I create a query find DAU and with him find WAU and MAU,
but it does not work, i received this error:
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
It's my query
WITH dau AS (
SELECT
date,
COUNT(DISTINCT(events.device_id)) as DAU_explorer
FROM `workspace.event_table` as events
GROUP BY 1
)
SELECT
date,
dau,
(SELECT
COUNT(DISTINCT(device_id))
FROM `workspace.event_table` as events
WHERE events.date BETWEEN DATE_ADD(dau.date, INTERVAL -30 DAY) AND dau.date
) AS mau,
(SELECT
COUNT(DISTINCT(device_id)) as DAU_explorer
FROM `workspace.event_table` as events
WHERE events.date BETWEEN DATE_ADD(dau.date, INTERVAL -7 DAY) AND dau.date
) AS wau
FROM dau
Where is my error? Is not possible run subqueries like this on bq?
Try this instead:
WITH data AS (
SELECT DATE(creation_date) date, owner_user_id device_id
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE EXTRACT(YEAR FROM creation_date)=2017
)
#standardSQL
SELECT DATE_SUB(date, INTERVAL i DAY) date_grp
, COUNT(DISTINCT IF(i<31,device_id,null)) unique_30_day_users
, COUNT(DISTINCT IF(i<8,device_id,null)) unique_7_day_users
FROM `data`, UNNEST(GENERATE_ARRAY(1, 30)) i
GROUP BY 1
ORDER BY date_grp
LIMIT 100
OFFSET 30
And if you are looking for a more efficient solution, try approximate results.

Subtraction of counts of 2 tables

I have 2 different tables, A and B. A is something like created and b is removed
I want to obtain the nett difference of the counts per week in an SQL query.
Currently I have
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "A - New"
FROM table_name.A
GROUP BY 1
ORDER BY 1
This gets me the count per week for table A only. How could I incorporate the logic of subtracting the same Count(id) from B, for the same timeframe?
Thanks! :)
The potential issue here is that for any week you might only have additions or removals, so to align a count from the 2 tables - by week - an approach would be to use a full outer join, like this:
SELECT COALESECE(A.week, b.week) as week
, count_a
, count_b
, COALESECE(count_a,0) - COALESECE(count_b,0) net
FROM (
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS week
, Count(*) AS count_A
FROM table_a
GROUP BY DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08')
) a
FUUL OUTER JOIN (
SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS week
, Count(*) AS count_b
FROM table_b
GROUP BY DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08')
) b on a.week = b.week
The usual syntex for substracting values from 2 queries is as follows
Select (Query1) - (Query2) from dual;
Assuming both the tables have same number of id in 'id' column and your given query works for tableA, following query will subtract the count(id) from both tables.
select(SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "A - New" FROM table_name.A GROUP BY 1 ORDER BY 1) - (SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week,
Count(id) AS "B - New" FROM table_name.B GROUP BY 1 ORDER BY 1) from dual
Or you can also try the following approach
Select c1-c2 from(Query1 count()as c1),(Query2 count() as c2);
So your query will be like
Select c1-c2 from (SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week, Count(id) AS c1 FROM table_name.A GROUP BY 1 ORDER BY 1),(SELECT DATE_TRUNC('week', TIMESTAMP AT time ZONE '+08') AS Week, Count(id) AS c2 FROM table_name.B GROUP BY 1 ORDER BY 1);