Finding DAU/MAU ratio in SQL - sql

There's a table with customerID, timestamp, activity columns and I found DAU(DailyActiveUsers) and MAU(MonthlyActiveUsers) from this table. Now I need to find DAU/MAU. The problem is I got DAU and MAU as two separate queries as they both need to be grouped by day and month respectively.
Also, DAU would be a table since it's grouped by day and would have 30 rows in the table. MAU is just a single number. How can I find DAU/MAU which is apparently a ratio?
My query for DAU
select date, count(distinct customerID) as dau
from table
where extract(month from timestamp) = 1 and extract(year from timestamp) = 2020
and activity = 'opened_the_app'
group by date
This gives me dau for all the 31 days in month of january.
Similarly i found MAU by grouping month which gives me a single value for the month of january.
How can I find the DAU/MAU ratio for january?

You can join them together:
select d.*, d.dau * 1.0 / m.mau
from (select date, count(distinct customerID) as dau
from table
where timestamp >= '2020-01-01' and
timestamp < '2020-02-01' and
activity = 'opened_the_app'
group by date
) d cross join
(select count(distinct customerID) as mau
from table
where timestamp >= '2020-01-01' and
timestamp < '2020-02-01' and
activity = 'opened_the_app'
) m

You can find it from the DAU table itself since the MAU will be the sum of DAU
select dau/sum(dau) as result from (
select date, count(distinct customerID) as dau
from table
where extract(month from timestamp) = 1 and extract(year from timestamp) = 2020
and activity = 'opened_the_app'
) dau_table

Related

How to Count Entries on Same Day and Sum Amount based on the Count?

I am attempting to produce Table2 below - which essentially counts the rows that have the same day and adds up the "amount" column for the rows that are on the same day.
I found a solution online that can count entries from the same day, which works:
SELECT
DATE_TRUNC('day', datetime) AS date,
COUNT(datetime) AS date1
FROM Table1
GROUP BY DATE_TRUNC('day', datetime);
It is partially what I am looking for, but I am having difficulty trying to display all the column names.
In my attempt, I have all the columns I want but the Accumulated Count is not accurate since it counts the rows with unique IDs (because I put "id" in GROUP BY):
SELECT *, count(id) OVER(ORDER BY DateTime) as accumulated_count,
SUM(Amount) OVER(ORDER BY DateTime) AS Accumulated_Amount
FROM Table1
GROUP BY date(datetime), id
I've been working on this for days and seemingly have come across every possible outcome that is not what I am looking for. Does anyone have an idea as to what I'm missing here?
Cumulative sum and count should be calculated for each day
with Table1 (id,datetime,client,product,amount) as(values
(1 ,to_timestamp('2020-07-08 07:30:10','YYYY-MM-DD HH24:MI:SS'),'Tom','Bill Payment',24),
(2 ,to_timestamp('2020-07-08 07:50:30','YYYY-MM-DD HH24:MI:SS'),'Tom','Bill Payment',27),
(3 ,to_timestamp('2020-07-09 08:20:10','YYYY-MM-DD HH24:MI:SS'),'Tom','Bill Payment',37)
)
SELECT
Table1.*,
count(*) over (partition by DATE_TRUNC('day', datetime)
order by datetime asc ) accumulated_count,
sum(amount) over (partition by DATE_TRUNC('day', datetime) order by datetime asc) accumulated_sum
FROM Table1;
Not to familiar with postgresql but this does what you ask fror.
with data (id,date_time,client,product,amount) as(
select 1 ,to_timestamp('Jul 08 2020, 07:30:10','Mon DD YYYY, HH24:MI:SS'),'Tom','Bill',24 Union all
select 2 ,to_timestamp('Jul 08 2020, 07:50:30','Mon DD YYYY, HH24:MI:SS'),'Tom','Bill',27 Union all
select 3 ,to_timestamp('Jul 09 2020, 08:20:10','Mon DD YYYY, HH24:MI:SS'),'Tom','Bill',37
)
select d.id,d.date_time,d.client,d.product,d.amount,
(select count(*) from data d1
where d1.date_time <= d.date_time and date(d1.date_time) = date(d.date_time) ) acc_count,
(select sum(amount) from data d1
where d1.date_time <= d.date_time and date(d1.date_time) = date(d.date_time) ) acc_amount
from data d

Frequency distinct values grouped by date

I am trying to get the frequency of unique ID values for each month of the last year. However, I don't get the outcome.. including the error message "SELECT list expression references column user_id which is neither grouped nor aggregated".
How can I get the count of unique IDs in each month and them group them by month?
What I tried:
SELECT
user_id,
EXTRACT(MONTH FROM date) as month
FROM
TABLE
WHERE
date >= '2020-09-01'
GROUP BY
month
I want something like this:
month
count of unique user_id
1
300
2
200
...
...
12
250
You would use GROUP BY and COUNT(DISTINCT):
SELECT EXTRACT(MONTH FROM date) as month, COUNT(DISTINCT user_id)
FROM TABLE
WHERE date >= '2020-09-01'
GROUP BY 1;
I would advise you to include the year in the query. In BigQuery, this is simplest using DATE_TRUNC():
SELECT DATE_TRUNC(date, MONTH) as month, COUNT(DISTINCT user_id)
FROM TABLE
WHERE date >= '2020-09-01'
GROUP BY 1;

Month over Month percent change in user registrations

I am trying to write a query to find month over month percent change in user registration. \
Users table has the logs for user registrations
user_id - pk, integer
created_at - account created date, varchar
activated_at - account activated date, varchar
state - active or pending, varchar
I found the number of users for each year and month. How do I find month over month percent change in user registration? I think I need a window function?
SELECT
EXTRACT(month from created_at::timestamp) as created_month
,EXTRACT(year from created_at::timestamp) as created_year
,count(distinct user_id) as number_of_registration
FROM users
GROUP BY 1,2
ORDER BY 1,2
This is the output of above query:
Then I wrote this to find the difference in user registration in the previous year.
SELECT
*
,number_of_registration - lag(number_of_registration) over (partition by created_month) as difference_in_previous_year
FROM (
SELECT
EXTRACT(month from created_at::timestamp) as created_month
,EXTRACT(year from created_at::timestamp) as created_year
,count( user_id) as number_of_registration
FROM users as u
GROUP BY 1,2
ORDER BY 1,2) as temp
The output is this:
You want an order by clause that contains created_year.
number_of_registration
- lag(number_of_registration) over (partition by created_month order by created_year) as difference_in_previous_year
Note that you don't actually need a subquery for this. You can do:
select
extract(year from created_at) as created_year,
extract(month from created_at) as created_year
count(*) as number_of_registration,
count(*) - lag(count(*)) over(partition by extract(month from created_at) order by extract(year from created_at))
from users as u
group by created_year, created_month
order by created_year, created_month
I used count(*) instead of count(user_id), because I assume that user_id is not nullable (in which case count(*) is equivalent, and more efficient). Casting to a timestamp is also probably superfluous.
These queries work as long as you have data for every month. If you have gaps, then the problem should be addressed differently - but this is not the question you asked here.
I can get the registrations from each year as two tables and join them. But it is not that effective
SELECT
t1.created_year as year_2013
,t2.created_year as year_2014
,t1.created_month as month_of_year
,t1.number_of_registration_2013
,t2.number_of_registration_2014
,(t2.number_of_registration_2014 - t1.number_of_registration_2013) / t1.number_of_registration_2013 * 100 as percent_change_in_previous_year_month
FROM
(select
extract(year from created_at) as created_year
,extract(month from created_at) as created_month
,count(*) as number_of_registration_2013
from users
where extract(year from created_at) = '2013'
group by 1,2) t1
inner join
(select
extract(year from created_at) as created_year
,extract(month from created_at) as created_month
,count(*) as number_of_registration_2014
from users
where extract(year from created_at) = '2014'
group by 1,2) t2
on t1.created_month = t2.created_month
First off, Why are you using strings to hold date/time values? Your 1st step should to define created_at, activated_at as a proper timestamps. In the resulting query I assume this correction. If this is faulty (you do not correct it) then cast the string to timestamp in the CTE generating the date range. But keep in mind that if you leave it as text you will at some point get a conversion exception.
To calculate month-over-month use the formula "100*(Nt - Nl)/Nl" where Nt is the number of users this month and Nl is the number of users last month. There are 2 potential issues:
There are gaps in the data.
Nl is 0 (would incur divide by 0 exception)
The following handles this by first generating the months between the earliest date to the latest date then outer joining monthly counts to the generated dates. When Nl = 0 the query returns NULL indication the percent change could not be calculated.
with full_range(the_month) as
(select generate_series(low_month, high_month, interval '1 month')
from (select min(date_trunc('month',created_at)) low_month
, max(date_trunc('month',created_at)) high_month
from users
) m
)
select to_char(the_month,'yyyy-mm')
, users_this_month
, case when users_last_month = 0
then null::float
else round((100.00*(users_this_month-users_last_month)/users_last_month),2)
end percent_change
from (
select the_month, users_this_month , lag(users_this_month) over(order by the_month) users_last_month
from ( select f.the_month, count(u.created_at) users_this_month
from full_range f
left join users u on date_trunc('month',u.created_at) = f.the_month
group by f.the_month
) mc
) pc
order by the_month;
NOTE: There are several places there the above can be shortened. But the longer form is intentional to show how the final vales are derived.

Day wise Rolling 30 day uniques user count bigquery

I am trying to generate a day on day rolling 30 days unique count using this query but the problem is running this query day on the day I need aug full month rolling 30 days day on day count in one script pls help
-----------------------------------------
SELECT max(date),count(DISTINCT user_id) as MAU
FROM user_data
WHERE date between DATE_SUB('2020-08-31' ,INTERVAL 29 DAY) and '2020-08-31';
BigQuery doesn't support rolling windows for count(distinct). So, one approach is a brute force method:
select dte,
(select count(distinct ud.user_id)
from user_data ud
where ud.date between DATE_SUB(dte, INTERVAL 29 DAY) and dte
) as num_users
from unnest(generate_date_array(date('2020-08-01'), date('2020-08-31'))) dte
Gordon approach works great.
If you need to calculate more numbers - Cross join the data.
SELECT
date_gen,
COUNT(DISTINCT IF(ud.date BETWEEN DATE_SUB(date_gen ,INTERVAL 29 DAY) AND date_gen,ud.user_id,NULL)) as MAU
FROM
UNNEST(GENERATE_DATE_ARRAY(DATE_SUB('2020-08-31' ,INTERVAL 29 DAY), date('2020-08-31'))) date_gen,
(SELECT * FROM user_data WHERE date BETWEEN DATE_SUB('2020-08-31' ,INTERVAL 60 DAY) AND '2020-08-31') AS ud
GROUP BY 1
ORDER BY 1 DESC
With SET and DECLARE you can get rid of replacing the 'DATE' multiple times.
Below is for BigQuery Standard SQL
#standardSQL
SELECT date, (SELECT COUNT(DISTINCT id) FROM t.users AS id) AS MAU
FROM (
SELECT date, ARRAY_AGG(user_id) OVER(mau_win) users
FROM `project.dataset.user_data`
WINDOW mau_win AS (
ORDER BY UNIX_DATE(date) DESC RANGE BETWEEN CURRENT ROW AND 29 FOLLOWING
)
) t
Above assumes you have entries in project.dataset.user_data table for all days in time period of your interest
If this is not a case, and you actually have some gaps in your data - you can use below
#standardSQL
SELECT date, (SELECT COUNT(DISTINCT id) FROM t.users AS id) AS MAU
FROM (
SELECT date, ARRAY_AGG(user_id) OVER(mau_win) users
FROM UNNEST(GENERATE_DATE_ARRAY('2020-08-01', '2020-08-31')) AS date
LEFT JOIN `project.dataset.user_data`
USING(date)
WINDOW mau_win AS (
ORDER BY UNIX_DATE(date) DESC RANGE BETWEEN CURRENT ROW AND 29 FOLLOWING
)
) t

sql to find row for min date in each month

I have a table, lets say "Records" with structure:
id date
-- ----
1 2012-08-30
2 2012-08-29
3 2012-07-25
I need to write an SQL query in PostgreSQL to get record_id for MIN date in each month.
month record_id
----- ---------
8 2
7 3
as we see 2012-08-29 < 2012-08-30 and it is 8 month, so we should show record_id = 2
I tried something like this,
SELECT
EXTRACT(MONTH FROM date) as month,
record_id,
MIN(date)
FROM Records
GROUP BY 1,2
but it shows 3 records.
Can anybody help?
SELECT DISTINCT ON (EXTRACT(MONTH FROM date))
id,
date
FROM Records1
ORDER BY EXTRACT(MONTH FROM date),date
SQLFiddle http://sqlfiddle.com/#!12/76ca2/3
UPD: This query:
1) Orders the records by month and date
2) For every month picks the first record (the first record has MIN(date) because of ordering)
Details here http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
This will return multiples if you have duplicate minimum dates:
Select
minbymonth.Month,
r.record_id
From (
Select
Extract(Month From date) As Month,
Min(date) As Date
From
records
Group By
Extract(Month From date)
) minbymonth
Inner Join
records r
On minbymonth.date = r.date
Order By
1;
Or if you have CTEs
With MinByMonth As (
Select
Extract(Month From date) As Month,
Min(date) As Date
From
records
Group By
Extract(Month From date)
)
Select
m.Month,
r.record_id
From
MinByMonth m
Inner Join
Records r
On m.date = r.date
Order By
1;
http://sqlfiddle.com/#!1/2a054/3
select extract(month from date)
, record_id
, date
from
(
select
record_id
, date
, rank() over (partition by extract(month from date) order by date asc) r
from records
) x
where r=1
order by date
SQL Fiddle
select distinct on (date_trunc('month', date))
date_trunc('month', date) as month,
id,
date
from records
order by 1, 3 desc
I think you need use sub-query, something like this:
SELECT
EXTRACT(MONTH FROM r.date) as month,
r.record_id
FROM Records as r
INNER JOIN (
SELECT
EXTRACT(MONTH FROM date) as month,
MIN(date) as mindate
FROM Records
GROUP BY EXTRACT(MONTH FROM date)
) as sub on EXTRACT(MONTH FROM r.date) = sub.month and r.date = sub.mindate