How to get COUNT with UNNEST function in Google BigQuery? - sql

I need to get count of events which has specific parameter in it. Let's say, I have param event_notification_received with params (type, title, code_id). And in param code_id - I have unique name of advertisement. And I need to count how much events where received with such parameter. I am using UNNEST function, to get access to params of event. But it gives too much of results after execution, I think it's because of UNNEST function. How can I count correctly events? Thanks.
Here is my standard SQL query:
#standardSQL
SELECT event_date, event_timestamp, event_name, user_id, app_info.version,
geo.country, geo.region, geo.city,
my_event_params,
user_prop,
platform
FROM
`myProject.analytics_199660162.events_201807*`,
UNNEST(event_params) as my_event_params,
UNNEST(user_properties) as user_prop
WHERE
_TABLE_SUFFIX BETWEEN '24' AND '30' AND
event_name = "event_notification_received"
AND
my_event_params.value.string_value = "my_adverticement_name"
AND
platform = "ANDROID"
ORDER BY event_timestamp DESC

Is this what you want?
SELECT . . .,
(SELECT COUNT(*)
FROM UNNEST(event_params) as my_event_params
WHERE my_event_params.value.string_value = 'my_adverticement_name'
) as event_count
FROM `myProject.analytics_199660162.events_201807*`,
UNNEST(user_properties) as user_prop
WHERE _TABLE_SUFFIX BETWEEN '24' AND '30' AND
event_name = 'event_notification_received' AND
platform = 'ANDROID'
ORDER BY event_timestamp DESC;

If you UNNEST() and CROSS JOIN more than two columns at the FROM level, you'll get duplicated rows - yup.
Instead UNNEST() at the SELECT level, just to extract and COUNT the values you are looking for:
SELECT COUNT(DISTINCT (
SELECT value.string_value FROM UNNEST(user_properties) WHERE key='powers')
) AS distinct_powers
FROM `firebase-sample-for-bigquery.analytics_bingo_sample.events_20160607`

Related

Querying users that triggered 'app_remove' event then checking what event triggered before it

I've been thinking about how to do this and have been searching for answers but to no avail. I'm actually wondering if what I want to achieve is actually possible.
What I want to do is...
I want to first find all users that have triggered the event 'app_remove' (thus ignoring all users that haven't triggered that event).
Once I find all these users, for each of those users, I want to check all events that got triggered in the past X minutes before the 'app_remove' event got triggered.
I basically want to know what the last thing a user did before they uninstalled an app.
Is it possible to do this for all users or can this only be done on a per user basis?
I didn't know where to even begin to do the group thing so I currently opted to instead try and look at one user at a time. This is where I am at currently and I got stuck after writing that incomplete IF statement:
SELECT
TIMESTAMP_MICROS(event_timestamp) as time_stamp, event_name, user_pseudo_id
FROM
`mana-monsters.analytics_182655472.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20180101' AND '20190121' AND
user_pseudo_id = '026e1dd2cfe2344cdf2acf6dab2a123c' AND
IF (event_name = 'app_remove')
GROUP BY
time_stamp, event_name, user_pseudo_id
ORDER BY
time_stamp DESC
I hope I gave enough information.
Thanks in advance for any help or leads as to how to approach this in theory.
I think using INNER JOIN as filter could be an option here. The code should look something like this:
SELECT
TIMESTAMP_MICROS(a.event_timestamp) as time_stamp, a.event_name, a.user_pseudo_id
FROM
`mana-monsters.analytics_182655472.events_*` a INNER JOIN (SELECT user_pseudo_id, event_timestamp from `mana-monsters.analytics_182655472.events_*` where event_name = 'app_remove') b on
a.user_pseudo_id = b.user_pseudo_id and b.event_timestamp - a.event_timestamp < 1000 and b.event_timestamp - a.event_timestamp >= 0
ORDER BY
user_pseudo_id, time_stamp DESC
I have been playing with some dummy data and it worked for me:
#standardSQL
WITH my_table as(
select 1454911123456789 as event_timestamp, 'app_remove' as event_name, '1' as user_pseudo_id UNION ALL
select 1454911123456788 as event_timestamp, 'app_close' as event_name, '1' as user_pseudo_id UNION ALL
select 1454911123456778 as event_timestamp, 'connection_lost' as event_name, '1' as user_pseudo_id UNION ALL
select 1457911123451231 as event_timestamp, 'app_open' as event_name, '2' as user_pseudo_id UNION ALL
select 1457911123450123 as event_timestamp, 'app_close' as event_name, '2' as user_pseudo_id UNION ALL
select 1457911123450035 as event_timestamp, 'connection_lost' as event_name, '2' as user_pseudo_id UNION ALL
select 1459911123455664 as event_timestamp, 'app_remove' as event_name, '3' as user_pseudo_id UNION ALL
select 1459911123455456 as event_timestamp, 'app_close' as event_name, '3' as user_pseudo_id UNION ALL
select 1459911123455354 as event_timestamp, 'game_lost' as event_name, '3' as user_pseudo_id)
SELECT
TIMESTAMP_MICROS(a.event_timestamp) as time_stamp, a.event_name, a.user_pseudo_id
FROM
my_table a INNER JOIN (SELECT user_pseudo_id, event_timestamp from my_table where event_name = 'app_remove') b on
a.user_pseudo_id = b.user_pseudo_id and b.event_timestamp - a.event_timestamp < 1000 and b.event_timestamp - a.event_timestamp >= 0
ORDER BY
user_pseudo_id, time_stamp DESC
The dummy data would be too simple so you may have to make some modification to fit this query for your use case. This is just an example to show a possible solution.

Cannot query the cross product of repeated fields event_params.key and user_properties.key, with daily event tables

If I am undetanding, the issue is that I have two fields that need to be flattened for the query to work...Fields in question in the below are event_params.value.string_value and user_properties.value.string_value
I am able to do this correctly with a single table, but I need to span a date range of dailies and cannot get the syntax right. Any help?
SELECT
event_params.value.string_value,
event_timestamp,
event_name,
user_properties.value.string_value
FROM
(TABLE_DATE_RANGE([[kiehls-kinetic:analytics_180943775.events_],
DATE_ADD(CURRENT_TIMESTAMP(), -365, 'DAY'),
CURRENT_TIMESTAMP())),
WHERE
(event_params.key = 'session')
AND (user_properties.key = 'associate_name')
ORDER BY
event_params.value.string_value ASC,
event_timestamp ASC
You'd want something like this, using standard SQL:
SELECT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'session') AS event_value,
event_timestamp,
event_name,
(SELECT value.string_value
FROM UNNEST(user_properties)
WHERE key = 'associate_name') AS user_value
FROM
`kiehls-kinetic.analytics_180943775.events_*`
WHERE _TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY)) AND
FORMAT_DATE('%Y%m%d', CURRENT_DATE())
ORDER BY
event_value ASC,
event_timestamp ASC

Check if timestamp is contained in date

I'm trying to check if a datetime is contained in current date, but I'm not veing able to do it.
This is my query:
select
date(timestamp) as event_date,
count(*)
from pixel_logs.full_logs f
where 1=1
where event_date = CUR_DATE()
How can I fix it?
Like Mikhail said, you need to use CURRENT_DATE(). Also, count(*) requires you to GROUP BY the date in your example. I do not know how your data is formatted, but one way to modify your query:
#standardSQL
WITH
table AS (
SELECT
1494977678 AS timestamp_secs) -- Current timestamp (in seconds)
SELECT
event_date,
COUNT(*) as count
FROM (
SELECT
DATE(TIMESTAMP_SECONDS(timestamp_secs)) AS event_date,
CURRENT_DATE()
FROM
table)
WHERE
event_date = CURRENT_DATE()
GROUP BY
event_date;

How to calculate Session and Session duration in Firebase Analytics raw data?

How to calculate Session Duration in Firebase analytics raw data which is linked to BigQuery?
I have used the following blog to calculate the users by using the flatten command for the events which are nested within each record, but I would like to know how to proceed with in calculating the Session and Session duration by country and time.
(I have many apps configured, but if you could help me with the SQL query for calculating the session duration and session, It would be of immense help)
Google Blog on using Firebase and big query
First you need to define a session - in the following query I'm going to break a session whenever a user is inactive for more than 20 minutes.
Now, to find all sessions with SQL you can use a trick described at https://blog.modeanalytics.com/finding-user-sessions-sql/.
The following query finds all sessions and their lengths:
#standardSQL
SELECT app_instance_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, MAX(sess_id) OVER(PARTITION BY app_instance_id) total_sessions,
(ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
FROM (
SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
FROM (
SELECT *, IF(
previous IS null
OR (min_time-previous)>(20*60*1000*1000), # sessions broken by this inactivity
1, 0) session_start
#https://blog.modeanalytics.com/finding-user-sessions-sql/
FROM (
SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
FROM (
SELECT user_dim.app_info.app_instance_id
, (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
, (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`
)
)
)
)
GROUP BY 1, 2
ORDER BY 1, 2
With the new schema of Firebase in BigQuery, I found that the answer by #Maziar did not work for me, but I am not sure why.
Instead I have used the following to calculate it, where a session is defined as a user engaging with your app for a minimum of 10 seconds and where the session stops if a user doesn't engage with the app for 30 minutes.
It provides total number of sessions and the session length in minutes, and it is based on this query: https://modeanalytics.com/modeanalytics/reports/5e7d902f82de/queries/2cf4af47dba4
SELECT COUNT(*) AS sessions,
AVG(length) AS average_session_length
FROM (
SELECT global_session_id,
(MAX(event_timestamp) - MIN(event_timestamp))/(60 * 1000 * 1000) AS length
FROM (
SELECT user_pseudo_id,
event_timestamp,
SUM(is_new_session) OVER (ORDER BY user_pseudo_id, event_timestamp) AS global_session_id,
SUM(is_new_session) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS user_session_id
FROM (
SELECT *,
CASE WHEN event_timestamp - last_event >= (30*60*1000*1000)
OR last_event IS NULL
THEN 1 ELSE 0 END AS is_new_session
FROM (
SELECT user_pseudo_id,
event_timestamp,
LAG(event_timestamp,1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS last_event
FROM `dataset.events_2019*`
) last
) final
) session
GROUP BY 1
) agg
WHERE length >= (10/60)
As you know, Google has changed the schema of BigQuery firebase databases:
https://support.google.com/analytics/answer/7029846
Thanks to #Felipe answer, the new format will be changed as follow:
SELECT SUM(total_sessions) AS Total_Sessions, AVG(sess_length_seconds) AS Average_Session_Duration
FROM (
SELECT user_pseudo_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records,
MAX(sess_id) OVER(PARTITION BY user_pseudo_id) total_sessions,
(ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
FROM (
SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY min_time) sess_id
FROM (
SELECT *, IF(previous IS null OR (min_time-previous) > (20*60*1000*1000), 1, 0) session_start
FROM (
SELECT *, LAG(max_time, 1) OVER(PARTITION BY user_pseudo_id ORDER BY max_time) previous
FROM (SELECT user_pseudo_id, MIN(event_timestamp) AS min_time, MAX(event_timestamp) AS max_time
FROM `dataset_name.table_name` GROUP BY user_pseudo_id)
)
)
)
GROUP BY 1, 2
ORDER BY 1, 2
)
Note: change dataset_name and table_name based on your project info
Sample result:
With the recent change in which we have ga_session_id with each event row in the BigQuery table you can calculate number of sessions and average session length much more easily.
The value of the ga_session_id would remain same for the whole session, so you don't need to define the session separately.
You take the Min and the Max value of the event_timestamp column by grouping the result by user_pseudo_id , ga_session_id and event_date so that you get the session duration of the particular session of any user on any given date.
WITH
UserSessions as (
SELECT
user_pseudo_id,
event_timestamp,
event_date,
(Select value.int_value from UNNEST(event_params) where key = "ga_session_id") as session_id,
event_name
FROM `projectname.dataset_name.events_*`
),
SessionDuration as (
SELECT
user_pseudo_id,
session_id,
COUNT(*) AS events,
TIMESTAMP_DIFF(MAX(TIMESTAMP_MICROS(event_timestamp)), MIN(TIMESTAMP_MICROS(event_timestamp)), SECOND) AS session_duration
,event_date
FROM
UserSessions
WHERE session_id is not null
GROUP BY
user_pseudo_id,
session_id
,event_date
)
Select count(session_id) as NumofSessions,avg(session_duration) as AverageSessionLength from SessionDuration
At last you just do the count of the session_id to get the total number of sessions and do the average of the session duration to get the value of the average session length.

PostgreSQL How to add WHERE where is count() and GROUP

How can I add WHERE function into my query ?
SELECT count(*), date_trunc('year', "createdAt") AS txn_year
FROM tables
WHERE active = 1 // not working I don't know why
GROUP BY txn_year;
Thanks for any opinion
SELECT count(*), date_trunc('year', "createdAt") AS txn_year
FROM tables
WHERE column_active = 1
GROUP BY txn_year;
active is of type character varying, i.e. a string type. This should work:
SELECT count(*), date_trunc('year', "createdAt") AS txn_year
FROM tables
WHERE active = '1'
GROUP BY txn_year;