I am using BigQuery to query my events that are sent to firebase.
My daily query runs a union between the events_* table and the events_intraday_xxx table.
Recently, we've added integration with ad tracking framework, and my query got broken. I saw that it is because there is a new column added to the events_* table in the last days, called 'privacy_info'. This causes the union to break.
My guess is that this column is added by the tracking integration, what is the recommended way to resolve it?
Thanks
I encountered this same scenario 3 days ago and it broke my query as well. In order to fix it all I had to do was explicitly list all the columns I was querying for (i.e., don't do "select * from events union all select * from events_intraday").
with intra_prev_table as (
SELECT
event_date, event_timestamp, event_name, event_params, event_previous_timestamp, event_value_in_usd, event_bundle_sequence_id, event_server_timestamp_offset, user_id, user_pseudo_id, privacy_info, user_properties, user_first_touch_timestamp, user_ltv, device, geo, app_info, traffic_source, stream_id, platform, event_dimensions, ecommerce, items
FROM `analytics_XXXXXXXXXX.events_intraday_*`
UNION ALL
SELECT
event_date, event_timestamp, event_name, event_params, event_previous_timestamp, event_value_in_usd, event_bundle_sequence_id, event_server_timestamp_offset, user_id, user_pseudo_id, privacy_info, user_properties, user_first_touch_timestamp, user_ltv, device, geo, app_info, traffic_source, stream_id, platform, event_dimensions, ecommerce, items
FROM FROM `analytics_XXXXXXXXXX.events_*` WHERE _TABLE_SUFFIX = REPLACE(CAST(DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) AS STRING), '-', '')
)
select * from intra_prev_table
Related
I'm using Firebase Analytics with BigQuery. Assume I need to give a voucher to users who shares a service everyday in at least 7 continuously days. If someone share in 2 weeks continuously, those will get 2 vouchers and so on.
How can I find out the segments of continuously events logged in Firebase Analytics?
Here is the query that I can find out the individual days that users give a sharing. But I can't recognize the continuous segments.
SELECT event.user_id, event.event_date,
MAX((SELECT p.value FROM UNNEST(user_properties) p WHERE p.key='name').string_value) as name,
MAX((SELECT p.value FROM UNNEST(user_properties) p WHERE p.key='email').string_value ) as email,
SUM((SELECT event_params.value.int_value from event.event_params where event_params.key = 'share_session_length')) as total_share_session_length
FROM `myProject.analytics_183565123.*` as event
where event_name like 'share_end'
group by user_id,event_date
having total_share_session_length >= 1
order by user_id desc
And this is the output:
How can I find out the segments of continuously events logged
Below example for BigQuery Standard SQL - hope you can adopt approach to your specific use case
#standardSQL
SELECT id, ARRAY_AGG(STRUCT(first_day, days) ORDER BY grp) continuous_groups
FROM (
SELECT id, grp, MIN(day) first_day, MAX(day) last_day, COUNT(1) days
FROM (
SELECT id, day,
COUNTIF(gap != 1) OVER(PARTITION BY id ORDER BY day) grp
FROM (
SELECT id, day,
DATE_DIFF(day,LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) gap
FROM (
SELECT DISTINCT fullVisitorId id, PARSE_DATE('%Y%m%d', t.date) day
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` t
)
)
)
GROUP BY id, grp
HAVING days >= 7
)
GROUP BY id
ORDER BY ARRAY_LENGTH(continuous_groups) DESC
with result
I've been thinking about how to do this and have been searching for answers but to no avail. I'm actually wondering if what I want to achieve is actually possible.
What I want to do is...
I want to first find all users that have triggered the event 'app_remove' (thus ignoring all users that haven't triggered that event).
Once I find all these users, for each of those users, I want to check all events that got triggered in the past X minutes before the 'app_remove' event got triggered.
I basically want to know what the last thing a user did before they uninstalled an app.
Is it possible to do this for all users or can this only be done on a per user basis?
I didn't know where to even begin to do the group thing so I currently opted to instead try and look at one user at a time. This is where I am at currently and I got stuck after writing that incomplete IF statement:
SELECT
TIMESTAMP_MICROS(event_timestamp) as time_stamp, event_name, user_pseudo_id
FROM
`mana-monsters.analytics_182655472.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20180101' AND '20190121' AND
user_pseudo_id = '026e1dd2cfe2344cdf2acf6dab2a123c' AND
IF (event_name = 'app_remove')
GROUP BY
time_stamp, event_name, user_pseudo_id
ORDER BY
time_stamp DESC
I hope I gave enough information.
Thanks in advance for any help or leads as to how to approach this in theory.
I think using INNER JOIN as filter could be an option here. The code should look something like this:
SELECT
TIMESTAMP_MICROS(a.event_timestamp) as time_stamp, a.event_name, a.user_pseudo_id
FROM
`mana-monsters.analytics_182655472.events_*` a INNER JOIN (SELECT user_pseudo_id, event_timestamp from `mana-monsters.analytics_182655472.events_*` where event_name = 'app_remove') b on
a.user_pseudo_id = b.user_pseudo_id and b.event_timestamp - a.event_timestamp < 1000 and b.event_timestamp - a.event_timestamp >= 0
ORDER BY
user_pseudo_id, time_stamp DESC
I have been playing with some dummy data and it worked for me:
#standardSQL
WITH my_table as(
select 1454911123456789 as event_timestamp, 'app_remove' as event_name, '1' as user_pseudo_id UNION ALL
select 1454911123456788 as event_timestamp, 'app_close' as event_name, '1' as user_pseudo_id UNION ALL
select 1454911123456778 as event_timestamp, 'connection_lost' as event_name, '1' as user_pseudo_id UNION ALL
select 1457911123451231 as event_timestamp, 'app_open' as event_name, '2' as user_pseudo_id UNION ALL
select 1457911123450123 as event_timestamp, 'app_close' as event_name, '2' as user_pseudo_id UNION ALL
select 1457911123450035 as event_timestamp, 'connection_lost' as event_name, '2' as user_pseudo_id UNION ALL
select 1459911123455664 as event_timestamp, 'app_remove' as event_name, '3' as user_pseudo_id UNION ALL
select 1459911123455456 as event_timestamp, 'app_close' as event_name, '3' as user_pseudo_id UNION ALL
select 1459911123455354 as event_timestamp, 'game_lost' as event_name, '3' as user_pseudo_id)
SELECT
TIMESTAMP_MICROS(a.event_timestamp) as time_stamp, a.event_name, a.user_pseudo_id
FROM
my_table a INNER JOIN (SELECT user_pseudo_id, event_timestamp from my_table where event_name = 'app_remove') b on
a.user_pseudo_id = b.user_pseudo_id and b.event_timestamp - a.event_timestamp < 1000 and b.event_timestamp - a.event_timestamp >= 0
ORDER BY
user_pseudo_id, time_stamp DESC
The dummy data would be too simple so you may have to make some modification to fit this query for your use case. This is just an example to show a possible solution.
I need to get count of events which has specific parameter in it. Let's say, I have param event_notification_received with params (type, title, code_id). And in param code_id - I have unique name of advertisement. And I need to count how much events where received with such parameter. I am using UNNEST function, to get access to params of event. But it gives too much of results after execution, I think it's because of UNNEST function. How can I count correctly events? Thanks.
Here is my standard SQL query:
#standardSQL
SELECT event_date, event_timestamp, event_name, user_id, app_info.version,
geo.country, geo.region, geo.city,
my_event_params,
user_prop,
platform
FROM
`myProject.analytics_199660162.events_201807*`,
UNNEST(event_params) as my_event_params,
UNNEST(user_properties) as user_prop
WHERE
_TABLE_SUFFIX BETWEEN '24' AND '30' AND
event_name = "event_notification_received"
AND
my_event_params.value.string_value = "my_adverticement_name"
AND
platform = "ANDROID"
ORDER BY event_timestamp DESC
Is this what you want?
SELECT . . .,
(SELECT COUNT(*)
FROM UNNEST(event_params) as my_event_params
WHERE my_event_params.value.string_value = 'my_adverticement_name'
) as event_count
FROM `myProject.analytics_199660162.events_201807*`,
UNNEST(user_properties) as user_prop
WHERE _TABLE_SUFFIX BETWEEN '24' AND '30' AND
event_name = 'event_notification_received' AND
platform = 'ANDROID'
ORDER BY event_timestamp DESC;
If you UNNEST() and CROSS JOIN more than two columns at the FROM level, you'll get duplicated rows - yup.
Instead UNNEST() at the SELECT level, just to extract and COUNT the values you are looking for:
SELECT COUNT(DISTINCT (
SELECT value.string_value FROM UNNEST(user_properties) WHERE key='powers')
) AS distinct_powers
FROM `firebase-sample-for-bigquery.analytics_bingo_sample.events_20160607`
I'm trying to check if a datetime is contained in current date, but I'm not veing able to do it.
This is my query:
select
date(timestamp) as event_date,
count(*)
from pixel_logs.full_logs f
where 1=1
where event_date = CUR_DATE()
How can I fix it?
Like Mikhail said, you need to use CURRENT_DATE(). Also, count(*) requires you to GROUP BY the date in your example. I do not know how your data is formatted, but one way to modify your query:
#standardSQL
WITH
table AS (
SELECT
1494977678 AS timestamp_secs) -- Current timestamp (in seconds)
SELECT
event_date,
COUNT(*) as count
FROM (
SELECT
DATE(TIMESTAMP_SECONDS(timestamp_secs)) AS event_date,
CURRENT_DATE()
FROM
table)
WHERE
event_date = CURRENT_DATE()
GROUP BY
event_date;
How to calculate Session Duration in Firebase analytics raw data which is linked to BigQuery?
I have used the following blog to calculate the users by using the flatten command for the events which are nested within each record, but I would like to know how to proceed with in calculating the Session and Session duration by country and time.
(I have many apps configured, but if you could help me with the SQL query for calculating the session duration and session, It would be of immense help)
Google Blog on using Firebase and big query
First you need to define a session - in the following query I'm going to break a session whenever a user is inactive for more than 20 minutes.
Now, to find all sessions with SQL you can use a trick described at https://blog.modeanalytics.com/finding-user-sessions-sql/.
The following query finds all sessions and their lengths:
#standardSQL
SELECT app_instance_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, MAX(sess_id) OVER(PARTITION BY app_instance_id) total_sessions,
(ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
FROM (
SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
FROM (
SELECT *, IF(
previous IS null
OR (min_time-previous)>(20*60*1000*1000), # sessions broken by this inactivity
1, 0) session_start
#https://blog.modeanalytics.com/finding-user-sessions-sql/
FROM (
SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
FROM (
SELECT user_dim.app_info.app_instance_id
, (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
, (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`
)
)
)
)
GROUP BY 1, 2
ORDER BY 1, 2
With the new schema of Firebase in BigQuery, I found that the answer by #Maziar did not work for me, but I am not sure why.
Instead I have used the following to calculate it, where a session is defined as a user engaging with your app for a minimum of 10 seconds and where the session stops if a user doesn't engage with the app for 30 minutes.
It provides total number of sessions and the session length in minutes, and it is based on this query: https://modeanalytics.com/modeanalytics/reports/5e7d902f82de/queries/2cf4af47dba4
SELECT COUNT(*) AS sessions,
AVG(length) AS average_session_length
FROM (
SELECT global_session_id,
(MAX(event_timestamp) - MIN(event_timestamp))/(60 * 1000 * 1000) AS length
FROM (
SELECT user_pseudo_id,
event_timestamp,
SUM(is_new_session) OVER (ORDER BY user_pseudo_id, event_timestamp) AS global_session_id,
SUM(is_new_session) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS user_session_id
FROM (
SELECT *,
CASE WHEN event_timestamp - last_event >= (30*60*1000*1000)
OR last_event IS NULL
THEN 1 ELSE 0 END AS is_new_session
FROM (
SELECT user_pseudo_id,
event_timestamp,
LAG(event_timestamp,1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS last_event
FROM `dataset.events_2019*`
) last
) final
) session
GROUP BY 1
) agg
WHERE length >= (10/60)
As you know, Google has changed the schema of BigQuery firebase databases:
https://support.google.com/analytics/answer/7029846
Thanks to #Felipe answer, the new format will be changed as follow:
SELECT SUM(total_sessions) AS Total_Sessions, AVG(sess_length_seconds) AS Average_Session_Duration
FROM (
SELECT user_pseudo_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records,
MAX(sess_id) OVER(PARTITION BY user_pseudo_id) total_sessions,
(ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
FROM (
SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY min_time) sess_id
FROM (
SELECT *, IF(previous IS null OR (min_time-previous) > (20*60*1000*1000), 1, 0) session_start
FROM (
SELECT *, LAG(max_time, 1) OVER(PARTITION BY user_pseudo_id ORDER BY max_time) previous
FROM (SELECT user_pseudo_id, MIN(event_timestamp) AS min_time, MAX(event_timestamp) AS max_time
FROM `dataset_name.table_name` GROUP BY user_pseudo_id)
)
)
)
GROUP BY 1, 2
ORDER BY 1, 2
)
Note: change dataset_name and table_name based on your project info
Sample result:
With the recent change in which we have ga_session_id with each event row in the BigQuery table you can calculate number of sessions and average session length much more easily.
The value of the ga_session_id would remain same for the whole session, so you don't need to define the session separately.
You take the Min and the Max value of the event_timestamp column by grouping the result by user_pseudo_id , ga_session_id and event_date so that you get the session duration of the particular session of any user on any given date.
WITH
UserSessions as (
SELECT
user_pseudo_id,
event_timestamp,
event_date,
(Select value.int_value from UNNEST(event_params) where key = "ga_session_id") as session_id,
event_name
FROM `projectname.dataset_name.events_*`
),
SessionDuration as (
SELECT
user_pseudo_id,
session_id,
COUNT(*) AS events,
TIMESTAMP_DIFF(MAX(TIMESTAMP_MICROS(event_timestamp)), MIN(TIMESTAMP_MICROS(event_timestamp)), SECOND) AS session_duration
,event_date
FROM
UserSessions
WHERE session_id is not null
GROUP BY
user_pseudo_id,
session_id
,event_date
)
Select count(session_id) as NumofSessions,avg(session_duration) as AverageSessionLength from SessionDuration
At last you just do the count of the session_id to get the total number of sessions and do the average of the session duration to get the value of the average session length.