Rolling NEW active users in SQL (BigQuery) - sql

I have already computed rolling active users (on a weekly basis) as follow:
SELECT
DATE_TRUNC(EXTRACT(DATE FROM tracks.timestamp), WEEK),
COUNT(DISTINCT tracks.user_id)
FROM `company.dataset.tracks` AS tracks
WHERE tracks.timestamp > TIMESTAMP('2020-01-01')
AND tracks.event = 'activation_event'
GROUP BY 1
ORDER BY 1
I am interested in knowing the number of distinct users who performed the activation event for the 1st time on a rolling weekly basis.

If I follow you correctly, you can use two levels of aggrgation:
select
date_trunc(date(activation_timestamp), week) activation_week,
count(*) cnt_active_users
from (
select min(timestamp) activation_timestamp
from `company.dataset.tracks` t
where event = 'activation_event'
group by user_id
) t
where activation_timestamp > timestamp('2020-01-01
The subquery comptes the date of the first activation event per user, then the outer query counts the number of such events per week.

If you want both the actives and starts in the same query:
SELECT week, COUNT(*) as users_in_week,
COUNTIF(seqnum = 1) as new_users
FROM (SELECT DATE_TRUNC(EXTRACT(DATE FROM t.timestamp), WEEK) as week,
t.user_id, COUNT(*) as cnt,
ROW_NUMBER() OVER (PARTITION BY t.user_id ORDER BY MIN(t.timestamp)) as seqnum
FROM `company.dataset.tracks` t
WHERE t.event = 'activation_event'
GROUP BY 1, 2
) t
WHERE tracks.timestamp > TIMESTAMP('2020-01-01')
GROUP BY 1
ORDER BY 1;

Related

Select users which have at least one successful and one failed payment per month in PostgresQl

How can I select users from this kind of table that have at least one successful and one failed payment per month?
For example if the user has payments in March, April and May and in may there was only ten successful payments we don't want to show that user but if the user has had payments for as long as 10 months an in each month there are failed(false) and successful(true) payments, we want to show that user....
In this case we would only show users with user id 1 and 3
for now my query looks like this :
SELECT DISTINCT date_trunc('month', paydate)as uniquemonth
, success
,user_id
FROM payments order by user_id,uniquemonth
You can count the distinct months with failed and successful payments:
select user_id
from t
group by user_id
having count(*) filter (where success) = count(*) and
count(*) filter (where not success) = count(*);
You seem to have only one or two records per month. If you have more and the first column were really not the first of the month, you could use count(distinct):
select user_id
from t
group by user_id
having count(distinct date_trunc('month', uniquemonth)) filter (where success) = count(distinct date_trunc('month', uniquemonth)) and
count(distinct date_trunc('month', uniquemonth)) filter (where not success) = count(distinct date_trunc('month', uniquemonth))
here is one way:
select
to_char(uniquemonth, 'YYYY-MM')
, users
from tablename
group by to_char(uniquemonth, 'YYYY-MM'), users
having count(*) filter (where success) > 1
and count(*) filter (where not success) > 1;
Use MIN() and MAX() window functions to get the min and max values of success for each user/month and then use aggregation:
SELECT user_id
FROM (
SELECT *,
MIN(success::int) OVER (PARTITION BY user_id, to_char(paydate, 'YYYY-MM')) min_success,
MAX(success::int) OVER (PARTITION BY user_id, to_char(paydate, 'YYYY-MM')) max_success
FROM payments
) t
GROUP BY user_id
HAVING MAX(min_success) = 0 AND MIN(max_success) = 1

How to pull a list of all visitor_ids that generated more than $500 combined in their first two sessions in the month of January 2020?

Tables:
Sessions
session_ts
visitor_id
vertical
session_id
Transactions
session_ts
session_id
rev_bucket
revenue
Currently have the following query (using SQLite):
SELECT
s.visitor_id,
sub.session_id,
month,
year,
total_rev,
CASE
WHEN (row_num IN (1,2) >= total_rev >= 500) THEN 'Yes'
ELSE 'No' END AS High_Value_Transactions,
sub.row_num
FROM
sessions s
JOIN
(
SELECT
s.visitor_id,
t.session_id,
strftime('%m',t.session_ts) as month,
strftime('%Y',t.session_ts) as year,
SUM(t.revenue) as total_rev,
row_number() OVER(PARTITION BY s.visitor_id ORDER BY s.session_ts) as row_num
FROM
Transactions t
JOIN
sessions s
ON
s.session_id = t.session_id
WHERE strftime('%m',t.session_ts) = '01'
AND strftime('%Y',t.session_ts) = '2020'
GROUP BY 1,2
) sub
ON
s.session_id = sub.session_id
WHERE sub.row_num IN (1,2)
ORDER BY 1
I'm having trouble identifying the first two sessions that combine for $500.
Open to any feedback and simplifying of query. Thanks!
You can use window functions and aggregation:
select visitor_id, sum(t.revenue) total_revenue
from (
select
s.visitor_id,
t.revenue,
row_number() over(partition by s.visitor_id order by t.session_ts) rn
from transactions t
inner join sessions s on s.session_id = t.session_id
where t.session_ts >= '2020-01-01' and t.session_ts < '2020-02-01'
) t
where rn <= 2
group by visitor_id
having sum(t.revenue) >= 500
The subquery joins the two tables, filters on the target month (note that using half-open interval predicates is more efficient than applying date functions on the date column), and ranks each row within groups of visits of the same customer.
Then, the outer query filters on the first two visits per visitor, aggregates by visitor, computes the corresponding revenue, and filters it with a having clause.

SQL order with equal group size

I have a table with columns month, name and transaction_id. I would like to count the number of transactions per month and name. However, for each month I want to have the top N names with the highest transaction counts.
The following query groups by month and name. However the LIMIT is applied to the complete result and not per month:
SELECT
month,
name,
COUNT(*) AS transaction_count
FROM my_table
GROUP BY month, name
ORDER BY month, transaction_count DESC
LIMIT N
Does anyone have an idea how I can get the top N results per month?
Use row_number():
SELECT month, name, transaction_count
FROM (SELECT month, name, COUNT(*) AS transaction_count,
ROW_NUMBER() OVER (PARTITION BY month ORDER BY COUNT(*) DESC) as seqnum
FROM my_table
GROUP BY month, name
) mn
WHERE seqnum <= N
ORDER BY month, transaction_count DESC

How can I find continuously events groups using BigQuery?

I'm using Firebase Analytics with BigQuery. Assume I need to give a voucher to users who shares a service everyday in at least 7 continuously days. If someone share in 2 weeks continuously, those will get 2 vouchers and so on.
How can I find out the segments of continuously events logged in Firebase Analytics?
Here is the query that I can find out the individual days that users give a sharing. But I can't recognize the continuous segments.
SELECT event.user_id, event.event_date,
MAX((SELECT p.value FROM UNNEST(user_properties) p WHERE p.key='name').string_value) as name,
MAX((SELECT p.value FROM UNNEST(user_properties) p WHERE p.key='email').string_value ) as email,
SUM((SELECT event_params.value.int_value from event.event_params where event_params.key = 'share_session_length')) as total_share_session_length
FROM `myProject.analytics_183565123.*` as event
where event_name like 'share_end'
group by user_id,event_date
having total_share_session_length >= 1
order by user_id desc
And this is the output:
How can I find out the segments of continuously events logged
Below example for BigQuery Standard SQL - hope you can adopt approach to your specific use case
#standardSQL
SELECT id, ARRAY_AGG(STRUCT(first_day, days) ORDER BY grp) continuous_groups
FROM (
SELECT id, grp, MIN(day) first_day, MAX(day) last_day, COUNT(1) days
FROM (
SELECT id, day,
COUNTIF(gap != 1) OVER(PARTITION BY id ORDER BY day) grp
FROM (
SELECT id, day,
DATE_DIFF(day,LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) gap
FROM (
SELECT DISTINCT fullVisitorId id, PARSE_DATE('%Y%m%d', t.date) day
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` t
)
)
)
GROUP BY id, grp
HAVING days >= 7
)
GROUP BY id
ORDER BY ARRAY_LENGTH(continuous_groups) DESC
with result

Group By - select by a criteria that is met every month

The below query returns all USERS that have SUM(AMOUNT) > 10 in a given month. It includes Users in a month even if they don't meet the criteria in other months.
But I'd like to transform this query to return all USERS who must meet the criteria SUM(AMOUNT) > 10 every single month (i.e., from the first month in the table to the last one) across the entire data.
Put another way, exclude users who don't meet SUM(AMOUNT) > 10 every single month.
select USERS, to_char(transaction_date, 'YYYY-MM') as month
from Table
GROUP BY USERS, month
HAVING SUM(AMOUNT) > 10;
One approach uses a generated calendar table representing all months in your data set. We can left join this calendar table to your current query, and then aggregate over all months by user:
WITH months AS (
SELECT DISTINCT TO_CHAR(transaction_date, 'YYYY-MM') AS month
FROM yourTable
),
cte AS (
SELECT USERS, TO_CHAR(transaction_date, 'YYYY-MM') AS month
FROM yourTable
GROUP BY USERS, month
HAVING SUM(AMOUNT) > 10
)
SELECT
t.USERS
FROM months m
LEFT JOIN cte t
ON m.month = t.month
GROUP BY
t.USERS
HAVING
COUNT(t.USERS) = (SELECT COUNT(*) FROM months);
The HAVING clause above asserts that the number of months to which a user matches is in fact the total number of months. This would imply that the user meets the sum criteria for every month.
Perhaps you could use a correlated subquery, such as:
select t.*
from (select distinct table.users from table) t
where not exists
(
select to_char(u.transaction_date, 'YYYY-MM') as month
from table u
where u.users = t.users
group by month
having sum(u.amount) <= 10
)
One option would be using sign(amount-10) vs. sign(amount) logic as
SELECT q.users
FROM
(
with tab(users, transaction_date,amount) as
(
select 1,date'2018-11-24',8 union all
select 1,date'2018-11-24',18 union all
select 2,date'2018-10-24',13 union all
select 3,date'2018-11-24',18 union all
select 3,date'2018-10-24',28 union all
select 3,date'2018-09-24', 3 union all
select 4,date'2018-10-24',28
)
SELECT users, to_char(transaction_date, 'YYYY-MM') as month,
sum(sign(amount-10)) as cnt1,
sum(sign(amount)) as cnt2
FROM tab t
GROUP BY users, month
) q
GROUP BY q.users
HAVING sum(q.cnt1) = sum(q.cnt2)
GROUP BY q.users
users
-----
2
4
Rextester Demo
You need to compare the number of months > 10 to the number of months between the min and the max date:
SELECT users, Count(flag) AS months, Min(mth), Max(mth)
FROM
(
SELECT users, date_trunc('month',transaction_date) AS mth,
CASE WHEN Sum(amount) > 10 THEN 1 end AS flag
FROM tab t
GROUP BY users, mth
) AS dt
GROUP BY users
HAVING -- adding the number of months > 10 to the min date and compare to max
Min(mth) + (INTERVAL '1' MONTH * (Count(flag)-1)) = Max(mth)
If missing months don't count it would be a simple count(flag) = count(*)