Daily Rolling Count of Distinct Users on Different time periods - sql

I am trying to find the most optimal way to run the following query which I need to connect to tableau and visualise. The idea is to count 7 day active users, 30 day active users and 90 day active users for each day. So for today I want who was active and for yesterday and I want who was active within those timeframes.
To clarify users can be active multiple times within my time frames.
A count of 7 day actives users would be the distinct number of users who had a session with in the period todays date and todays date -6. I need to calculate this for every date within the last 6 month.
This is the query I have.
with dau as (
select date_trunc('day', created_date) created_at,
count(distinct customer_id) dau
from sessions
where created_date >= date_trunc('day', dateadd('month', -6, getdate()))
group by date_trunc('day', created_date)
)
select created_at,
dau,
(select count(distinct customer_id)
from sessions
where date_trunc('day', created_date) between created_at - 6 and created_at) wau,
(select count(distinct customer_id)
from sessions
where date_trunc('day', created_date) between created_at - 29 and created_at) as mau,
(select count(distinct customer_id)
from session_s
where date_trunc('day', created_date) between created_at - 89 and created_at) as three_mau
from dau
It takes 30 min to run which seems crazy. Is there a better way to do it? I am also looking into the use of materialised views as a faster way to use this in a dashboard. Would this work?
The result I am looking to get would be a table where the rows are dates within the last 6 months and each column is the count of distinct users on 7, 30 and 90 periods from that date.
Thanks in advance!

Related

Day wise Rolling 30 day uniques user count bigquery

I am trying to generate a day on day rolling 30 days unique count using this query but the problem is running this query day on the day I need aug full month rolling 30 days day on day count in one script pls help
-----------------------------------------
SELECT max(date),count(DISTINCT user_id) as MAU
FROM user_data
WHERE date between DATE_SUB('2020-08-31' ,INTERVAL 29 DAY) and '2020-08-31';
BigQuery doesn't support rolling windows for count(distinct). So, one approach is a brute force method:
select dte,
(select count(distinct ud.user_id)
from user_data ud
where ud.date between DATE_SUB(dte, INTERVAL 29 DAY) and dte
) as num_users
from unnest(generate_date_array(date('2020-08-01'), date('2020-08-31'))) dte
Gordon approach works great.
If you need to calculate more numbers - Cross join the data.
SELECT
date_gen,
COUNT(DISTINCT IF(ud.date BETWEEN DATE_SUB(date_gen ,INTERVAL 29 DAY) AND date_gen,ud.user_id,NULL)) as MAU
FROM
UNNEST(GENERATE_DATE_ARRAY(DATE_SUB('2020-08-31' ,INTERVAL 29 DAY), date('2020-08-31'))) date_gen,
(SELECT * FROM user_data WHERE date BETWEEN DATE_SUB('2020-08-31' ,INTERVAL 60 DAY) AND '2020-08-31') AS ud
GROUP BY 1
ORDER BY 1 DESC
With SET and DECLARE you can get rid of replacing the 'DATE' multiple times.
Below is for BigQuery Standard SQL
#standardSQL
SELECT date, (SELECT COUNT(DISTINCT id) FROM t.users AS id) AS MAU
FROM (
SELECT date, ARRAY_AGG(user_id) OVER(mau_win) users
FROM `project.dataset.user_data`
WINDOW mau_win AS (
ORDER BY UNIX_DATE(date) DESC RANGE BETWEEN CURRENT ROW AND 29 FOLLOWING
)
) t
Above assumes you have entries in project.dataset.user_data table for all days in time period of your interest
If this is not a case, and you actually have some gaps in your data - you can use below
#standardSQL
SELECT date, (SELECT COUNT(DISTINCT id) FROM t.users AS id) AS MAU
FROM (
SELECT date, ARRAY_AGG(user_id) OVER(mau_win) users
FROM UNNEST(GENERATE_DATE_ARRAY('2020-08-01', '2020-08-31')) AS date
LEFT JOIN `project.dataset.user_data`
USING(date)
WINDOW mau_win AS (
ORDER BY UNIX_DATE(date) DESC RANGE BETWEEN CURRENT ROW AND 29 FOLLOWING
)
) t

how to find consecutive user login across week

I'm fairly new to SQL & maybe the complexity level for this report is above my pay grade
I need help to figure out the list of users who are logging to the app consecutively every week in the time period chosen(this logic eventually needs to be extended to a month, quarter & year ultimately but a week is good for now)
Table structure for ref
events: User_id int, login_date timestamp
The table events can have 1 or more entries for a user. This inherently means that the user can login multiple times to the app. To shed some light, if we focus on Jan 2020- Mar2020 then I need the following in the output
user_id who logged into the app every week from 2020wk1 to 2020Wk14
at least once
the week they logged in
number of times they logged in that week
I'm also okay if the output of the query is just the user_id. The thing is I'm unable to make sense out of the output that I'm seeing on my end after trying the following SQL code, perhaps working on this problem for so long might be the reason for that!
SQL code tried so far:
SELECT DISTINCT user_id
,extract('year' FROM timestamp)||'Wk'|| extract('week' FROM timestamp)
,lead(extract('week' FROM timestamp)) over (partition by user_id, extract('week' FROM timestamp) order by extract('week' FROM timestamp))
FROM events
WHERE user_id = 'Anything that u wish to enter'
You can get the summary you want as:
select user_id, date_trunc('week', timestamp) as week, count(*)
from events
group by user_id, week;
But the filtering is tricker. It is better to go with dates rather than week numbers:
select user_id, date_trunc('week', timestamp) as week, count(*) as cnt,
count(*) over (partition by user_id) as num_weeks
from events
where timestamp >= ? and timestamp < ?
group by user_id, week;
Then you can use a subquery:
select uw.*
from (select user_id, date_trunc('week', timestamp) as week, count(*) as cnt,
count(*) over (partition by user_id) as num_weeks
from events
where timestamp >= ? and timestamp < ?
group by user_id, week
) uw
where num_weeks = ? -- 14 in your example

Unique new users per period

In psql, I've written a query that returns unique users per week with
COUNT(DISTINCT user_id)
However, I am also interested in counting the number of unique new users per week, in other words, users that have never been active before in any of the previous weeks.
How would one write this query in postgresql?
Current query:
SELECT TO_CHAR(date_trunc('week', start_time::date), 'YYYY-MM-DD')
AS weekly, COUNT(*) AS total_transactions, COUNT(DISTINCT user_id) AS unique_users
FROM transactions
GROUP BY weekly ORDER BY weekly
Use min to get the first appearance of a user_id. Use that to calculate unique users per week. You may also want to include grouping on year.
select
TO_CHAR(date_trunc('week', first_appearance), 'YYYY-MM-DD') AS weekly,
COUNT(*) AS total_transactions,
COUNT(DISTINCT user_id) AS unique_users
from (SELECT t.*,
MIN(start_time::date) OVER(PARTITION BY user_id) AS first_appearance
FROM transactions t
) t
GROUP BY weekly

how to perform query in Postresql that returns a data count created grouped by month?

In postgresql, how do I perform a query that returns the sum amounts of rows created of a particular table by month? I would like the result to be something like:
month: January
count: 67
month: February
count: 85
....
....
Let's suppose a I have a table, users. This table has a primary key, id, and a created_at column with time stored in ISO8601 formatting. Last year n number of users were created, and now I want to know how many were created by month, and I want the data returned to me in the above format -- grouped by month and an associated count reflecting how many users were created that month.
Does anyone know how to perform the above SQL query in postgresql?
The query would look something like this:
select date_trunc('month', created_at) as mm, count(*)
from users u
where subscribed = true and
created_at >= '2016-01-01' and
created_at < '2017-01-01'
group by date_trunc('month', created_at);
I don't know where the constant '2017-03-20 13:38:46.688-04' is coming from.
Of course you can make the year comparison dynamic:
select date_trunc('month', created_at) as mm, count(*)
from users u
where subscribed = true and
created_at >= date_trunc('year', now()) - interval '1 year' and
created_at < date_trunc('year', now())
group by date_trunc('month', created_at);

How many distinct active users did I have on a 90 day window? [duplicate]

This question already has answers here:
Query for count of distinct values in a rolling date range
(5 answers)
Closed 6 years ago.
I have a complex problem that seems to be trivial at first sight:
for a given 90 day window, how many distinct active users did I have?
The table I will use to query this is the login table (hosted in Redshift), and it has a timestamp with the logintime and usertoken as the user identifier.
Whenever I want to answer this for a single day, the query is easy and straightforward:
select count (distinct usertoken)
from logins
where datediff('d',logintime,getdate()) <= 90
The problem becomes complex because I want to have this in a table with the number for every given date.
07/07 100k
07/06 98k
07/05 99k
07/04 101k
(...)
Window functions do not help me because I need to count distinct, and this is not possible in a window function.
To my knowledge, there is no way to iterate in a SQL query.
How should I go about this?
Perhaps I am missing something but from what I understand this should do :
-- In SQL Server
select cast(logintime As Date) , count (distinct usertoken) from logins
where datediff(D,logintime,getdate()) <= 90 Group by
cast(logintime As Date)
in PostGreSQL
Change cast(logintime As Date) to trunc_Date(Day, logintime )
and datediff(D,logintime,getdate()) to datediff('d',logintime,getdate())
I am assuming that if a day has zero users logging in you don't mind not showing it in the list.
First we get a set of all the days we care about and call that set "days".
with days as (
select date_trunc('day', date) as day from logins
where date > now() - '90 days'::interval
group by day
)
Then we join the days set with the logins.
select day, count(distinct userid)
from days
join logins on date_trunc('day', logins.date) = days.day
group by day
order by day
The trivial way is very computationally expensive:
select days.d, count(distinct l.userid)
from (select distinct date_trunc('day', logintime) as d
from logins l
) days left join
(select distinct userid, date_trunc('day', logintime) as d
from logins
) l
on datediff('d', l.d, days.d) between 0 and 89
group by days.d
order by days.d;