BigQuery Monthly Active Users? - sql

I'm currently working off a query from this post. That query is written in Legacy SQL and will not work in my environment. I've modified the query to use the modern SQL functions and updated the SELECT date as date to use timestamp_micros.
I should also mention that the rows I'm trying to select are coming in from Firebase Analytics.
My Query:
SELECT
FORMAT_TIMESTAMP('%Y-%m-%d', TIMESTAMP_MICROS(event.timestamp_micros)) as date,
SUM(CASE WHEN period = 7 THEN users END) as days_07,
SUM(CASE WHEN period = 14 THEN users END) as days_14,
SUM(CASE WHEN period = 30 THEN users END) as days_30
FROM (
SELECT
FORMAT_TIMESTAMP('%Y-%m-%d', TIMESTAMP_MICROS(event.timestamp_micros)) as date,
periods.period as period,
COUNT(DISTINCT user_dim.app_info.app_instance_id) as users
FROM `com_sidearm_fanapp_uiowa_IOS.*` as activity
CROSS JOIN
UNNEST(event_dim) as event
CROSS JOIN (
SELECT
FORMAT_TIMESTAMP('%Y-%m-%d', TIMESTAMP_MICROS(event.timestamp_micros)) as date
FROM `com_sidearm_fanapp_uiowa_IOS.*`
CROSS JOIN
UNNEST(event_dim) as event
GROUP BY event.timestamp_micros
) as dates
CROSS JOIN (
SELECT
period
FROM
(
SELECT 7 as period
UNION ALL
SELECT 14 as period
UNION ALL
SELECT 30 as period
)
) as periods
WHERE
dates.date >= activity.date
AND
SAFE_CAST(FLOOR(TIMESTAMP_DIFF(dates.date, activity.date, DAY)/periods.period) AS INT64) = 0
GROUP BY 1,2
)
CROSS JOIN
UNNEST(event_dim) as event
GROUP BY date
ORDER BY date DESC

Column name period is ambiguous at [24:13] error.
to fix this particular error - you should fix below
CROSS JOIN (
SELECT
period
FROM
(SELECT 7 as period),
(SELECT 14 as period),
(SELECT 30 as period)
) as periods
so it should look like:
CROSS JOIN (
SELECT
period
FROM
(SELECT 7 as period UNION ALL
SELECT 14 as period UNION ALL
SELECT 30 as period)
) as periods
Answer on your updated question
Try below. I didn't have chance to test it but hope it can help you fix your query
SELECT
date,
SUM(CASE WHEN period = 7 THEN users END) as days_07,
SUM(CASE WHEN period = 14 THEN users END) as days_14,
SUM(CASE WHEN period = 30 THEN users END) as days_30
FROM (
SELECT
activity.date as date,
periods.period as period,
COUNT(DISTINCT user) as users
FROM (
SELECT
event.timestamp_micros as date,
user_dim.app_info.app_instance_id as user
FROM `yourTable` CROSS JOIN UNNEST(event_dim) as event
) as activity
CROSS JOIN (
SELECT
event.timestamp_micros as date
FROM `yourTable` CROSS JOIN UNNEST(event_dim) as event
GROUP BY event.timestamp_micros
) as dates
CROSS JOIN (
SELECT period
FROM
(SELECT 7 as period UNION ALL
SELECT 14 as period UNION ALL
SELECT 30 as period)
) as periods
WHERE dates.date >= activity.date
AND SAFE_CAST(FLOOR(TIMESTAMP_DIFF(TIMESTAMP_MICROS(dates.date), TIMESTAMP_MICROS(activity.date), DAY)/periods.period) AS INT64) = 0
GROUP BY 1,2
)
GROUP BY date
ORDER BY date DESC

Related

Column name is ambiguous in bigquery

I am implementing the following solution: https://stackoverflow.com/a/32663098/19903400
Here is the code which I copied from that accepted answer, and used my datasource instead:
SELECT
date,
SUM(CASE WHEN period = 7 THEN users END) as days_07,
SUM(CASE WHEN period = 14 THEN users END) as days_14,
SUM(CASE WHEN period = 30 THEN users END) as days_30
FROM (
SELECT
dates.date as date,
periods.period as period,
EXACT_COUNT_DISTINCT(activity.user_pseudo_id) as users
FROM `rayn-deen-app.analytics_317927526.events_*` as activity
CROSS JOIN (SELECT DATE_TRUNC(EXTRACT(DATE from TIMESTAMP_MICROS(event_timestamp)), DAY) as date FROM `rayn-deen-app.analytics_317927526.events_*` GROUP BY date) as dates
CROSS JOIN (SELECT period FROM (SELECT 7 as period),
(SELECT 14 as period),(SELECT 30 as period)) as periods
WHERE dates.date >= activity.date
AND INTEGER(FLOOR(DATEDIFF(dates.date, activity.date)/periods.period)) = 0
GROUP BY 1,2
)
GROUP BY date
ORDER BY date DESC
But I am getting the following error:
Column name period is ambiguous at [13:22]
So it seems here is the code snippet which is problematic:
CROSS JOIN (SELECT period FROM (SELECT 7 as period),
(SELECT 14 as period),(SELECT 30 as period)) as periods
If the goal is to have a fixed set of records, then you can replace this:
SELECT period FROM (SELECT 7 as period),
(SELECT 14 as period),(SELECT 30 as period)
with this:
SELECT period FROM (SELECT 7 as period UNION ALL
SELECT 14 UNION ALL
SELECT 30)

Want a SQL statement to count number

I have a table have 3 columns id, open_time, close_time, the data looks like this:
then I want a SQL to get result like this:
the rule is : if the date equals to open time then New, if the date > open_time and date < close_time then Open, if the date equals close_time then Closed
how can I write the SQL in Oracle?
First build a table on-the-fly containing all dates from the minimum date in the table until today. You need a recursive query for this.
Then build a table on-the-fly for the three statuses.
Now cross join the two to get all combinations. These are the rows you want.
The rest is counting per day and status, which can be achieved with a join and grouping or with one or more subqueries. I'm showing the join:
with days(day) as
(
select min(open_time) as day from opentimes
union all
select day + 1 from days where day < trunc(sysdate)
)
, statuses as
(
select 'New' as status, 1 as sortkey from dual
union all
select 'Open' as status, 2 as sortkey from dual
union all
select 'Close' as status, 3 as sortkey from dual
)
select
d.day,
s.status,
count(case when (s.status = 'New' and d.day = o.open_time)
or (s.status = 'Open' and d.day = o.close_time)
or (s.status = 'Close' and d.day > cls.open_time and d.day < cls.close_time)
then 1 end) as cnt
from days d
cross join statuses s
join opentimes o on d.day between o.open_time and o.close_time
group by d.day, s.status
order by d.day, max(s.sortkey);

Billing tier issues with 30 day active user query within bigquery

Is there a way using bigquery that I can run this query and not have to use such a huge billing tier? It ranges anywhere from 11 - 20 on the billing tier. Is my only option to crank up the billing tier and let the charges flow?
WITH allTables AS (SELECT
app,
date,
SUM(CASE WHEN period = 1 THEN users END) as days_1
FROM (
SELECT
CONCAT(user_dim.app_info.app_id, ':', user_dim.app_info.app_platform) as app,
dates.date as date,
periods.period as period,
COUNT(DISTINCT user_dim.app_info.app_instance_id) as users
FROM `table.*` as activity
CROSS JOIN
UNNEST(event_dim) AS event
CROSS JOIN (
SELECT DISTINCT
TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event.timestamp_micros), DAY, 'UTC') as date
FROM `table.*`
CROSS JOIN
UNNEST(event_dim) as event) as dates
CROSS JOIN (
SELECT
period
FROM (
SELECT 1 as period
)
) as periods
WHERE
dates.date >= TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event.timestamp_micros), DAY, 'UTC')
AND
FLOOR(TIMESTAMP_DIFF(dates.date, TIMESTAMP_MICROS(event.timestamp_micros), DAY)/periods.period) = 0
GROUP BY 1,2,3
)
GROUP BY 1,2) SELECT
app as target,
UNIX_SECONDS(date) as datapoint_time,
SUM(days_1) as datapoint_value
FROM allTables
WHERE
date >= TIMESTAMP_ADD(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP, Day, 'UTC'), INTERVAL -1 DAY)
GROUP BY date,1
ORDER BY date ASC

BigQuery Tier 20 or higher required

I'm attempting to run the following query within BigQuery:
SELECT
FORMAT_TIMESTAMP('%Y-%m-%d', TIMESTAMP_MICROS(date)) as target,
SUM(CASE WHEN period = 7 THEN users END) as days_07,
SUM(CASE WHEN period = 14 THEN users END) as days_14,
SUM(CASE WHEN period = 30 THEN users END) as days_30
FROM (
SELECT
activity.date as date,
periods.period as period,
COUNT(DISTINCT user) as users
FROM (
SELECT
event.timestamp_micros as date,
user_dim.app_info.app_instance_id as user
FROM `table.*`
CROSS JOIN
UNNEST(event_dim) as event
) as activity
CROSS JOIN (
SELECT
event.timestamp_micros as date
FROM `table.*`
CROSS JOIN
UNNEST(event_dim) as event
GROUP BY event.timestamp_micros
) as dates
CROSS JOIN (
SELECT period
FROM
(
SELECT 7 as period
UNION ALL
SELECT 14 as period
UNION ALL
SELECT 30 as period
)
) as periods
WHERE
dates.date >= activity.date
AND
SAFE_CAST(FLOOR(TIMESTAMP_DIFF(TIMESTAMP_MICROS(dates.date), TIMESTAMP_MICROS(activity.date), DAY)/periods.period) AS INT64) = 0
GROUP BY 1,2
)
GROUP BY date
ORDER BY date DESC
It is working and will select the active users for specific time frames if I run it on a single table but within my actual application I'm going to be running this on all my datasets (40+). When I attempt to run it on a single dataset with all tables dataset.* I get this error:
Query exceeded resource limits for tier 1. Tier 20 or higher required.
I'm unsure what I can do now. I'm thinking that possibly I might have to end up moving this to code instead of SQL for performance sake.
I think I see the reason for this query to be CPU expensive so it gets "promoted" to that high billing tier
The reason is that sub-selects dates and activity have huge amount of rows because each row represents timestamp in microsecond so no pre-grouping is happenning at all
So, I recommend to transform below
FROM (
SELECT
event.timestamp_micros as date,
user_dim.app_info.app_instance_id as user
FROM `table.*`
CROSS JOIN
UNNEST(event_dim) as event
) as activity
into
FROM (
SELECT DISTINCT
DATE(TIMESTAMP_MICROS(event.timestamp_micros)) AS DATE,
user_dim.app_info.app_instance_id AS user
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
CROSS JOIN UNNEST(event_dim) AS event
) AS activity
and respectively below
CROSS JOIN (
SELECT
event.timestamp_micros as date
FROM `table.*`
CROSS JOIN
UNNEST(event_dim) as event
GROUP BY event.timestamp_micros
) as dates
into
CROSS JOIN (
SELECT DATE(TIMESTAMP_MICROS(event.timestamp_micros)) AS DATE
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
CROSS JOIN UNNEST(event_dim) AS event
GROUP BY 1
) AS dates
above change will make number of rows much more lower so than CROSS JOIN will be not that expensive
of course than you need respectively modify other pieces of your query to accommodate fact that now date fields are actually of DATE type and not microseconds anymore
Hope this helps!

How to get the monthly 7-day active users?

In my database I have two fields that are used to identify a user, timestamp and instance_id. I want to be able to get the monthly 7-day active users from this data. I have tried the following query but it just returns the same timestamp and 1 for every row.
SELECT
FORMAT_TIMESTAMP('%Y-%m-%d', TIMESTAMP_MICROS(date)) as target,
SUM(CASE WHEN period = 7 THEN users END) as days_07
# SUM(CASE WHEN period = 14 THEN users END) as days_14,
# SUM(CASE WHEN period = 30 THEN users END) as days_30
FROM (
SELECT
activity.date as date,
periods.period as period,
COUNT(DISTINCT user) as users
FROM (
SELECT
event.timestamp_micros as date,
user_dim.app_info.app_instance_id as user
FROM `hidden.*`
CROSS JOIN
UNNEST(event_dim) as event
) as activity
CROSS JOIN (
SELECT
event.timestamp_micros as date
FROM `hidden.*`
CROSS JOIN
UNNEST(event_dim) as event
GROUP BY event.timestamp_micros
) as dates
CROSS JOIN (
SELECT period
FROM
(
SELECT 7 as period
# UNION ALL
# SELECT 14 as period
# UNION ALL
# SELECT 30 as period
)
) as periods
WHERE
dates.date >= activity.date
AND
SAFE_CAST(FLOOR(TIMESTAMP_DIFF(TIMESTAMP_MICROS(dates.date), TIMESTAMP_MICROS(activity.date), DAY)/periods.period) AS INT64) = 0
GROUP BY 1,2
)
GROUP BY date
ORDER BY date DESC
I'm not too sure where to go from here and it's quite challenging to me because I'm not the best with SQL. Any assistance at all would be great. Thanks!
I should also mention that these queries are going to be run within BigQuery and the data is being exported to BigQuery from Firebase.
Try below
SELECT
DATE,
SUM(CASE WHEN period = 7 THEN users END) AS days_07,
SUM(CASE WHEN period = 14 THEN users END) AS days_14,
SUM(CASE WHEN period = 30 THEN users END) AS days_30
FROM (
SELECT
activity.date AS DATE,
periods.period AS period,
COUNT(DISTINCT user) AS users
FROM (
SELECT DISTINCT
DATE(TIMESTAMP_MICROS(event.timestamp_micros)) AS DATE,
user_dim.app_info.app_instance_id AS user
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
CROSS JOIN UNNEST(event_dim) AS event
) AS activity
CROSS JOIN (
SELECT DATE(TIMESTAMP_MICROS(event.timestamp_micros)) AS DATE
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
CROSS JOIN UNNEST(event_dim) AS event
GROUP BY 1
) AS dates
CROSS JOIN (
SELECT period FROM
(SELECT 7 AS period UNION ALL
SELECT 14 AS period UNION ALL
SELECT 30 AS period)
) AS periods
WHERE dates.date >= activity.date
AND SAFE_CAST(FLOOR(DATE_DIFF(dates.date, activity.date, DAY)/periods.period) AS INT64) = 0
GROUP BY 1,2
)
GROUP BY DATE
ORDER BY DATE DESC