UPDATE table using SELECT IN with multiple output column - sql

Query error: Subquery of type IN must have only one output column.
How can I update this table? thank you
UPDATE table
SET ViewItem = SessionStart
WHERE (event_date, user_pseudo_id)
IN
(SELECT event_date, user_pseudo_id
FROM table
WHERE ViewItem > SessionStart
GROUP BY event_date, user_pseudo_id
ORDER BY event_date, user_pseudo_id
)
;

try below
UPDATE table
SET ViewItem = SessionStart
WHERE (event_date, user_pseudo_id)
IN
(SELECT AS STRUCT event_date, user_pseudo_id
FROM table
WHERE ViewItem > SessionStart
GROUP BY event_date, user_pseudo_id
ORDER BY event_date, user_pseudo_id
)
;

Related

Bigquery resources exceeded during query execution - optimization

I have got a problem with this query.
SELECT event_date, country, COUNT(*) AS sessions,
AVG(length) AS average_session_length
FROM (
SELECT country, event_date, global_session_id,
(MAX(event_timestamp) - MIN(event_timestamp))/(60 * 1000 * 1000) AS length
FROM (
SELECT user_pseudo_id,
event_timestamp,
country,
event_date,
SUM(is_new_session) OVER (ORDER BY user_pseudo_id, event_timestamp) AS global_session_id,
SUM(is_new_session) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS user_session_id
FROM (
SELECT *,
CASE WHEN event_timestamp - last_event >= (30*60*1000*1000)
OR last_event IS NULL
THEN 1 ELSE 0 END AS is_new_session
FROM (
SELECT user_pseudo_id,
event_timestamp,
geo.country,
event_date,
LAG(event_timestamp,1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS last_event
FROM `xxx.events*`
) last
) final
) session
GROUP BY global_session_id, country, event_date
) agg
WHERE length >= (10/60)
group by country, event_date
Google Cloud Console gives that error
Resources exceeded during query execution: The query could not be executed in the allotted memory.
I know that it is probably a problem with OVER clauses, but I do not have idea how to edit query to get the same results.
I would be thankful for some help.
Thank you guys!
If I had to guess, it is this line:
SUM(is_new_session) OVER (ORDER BY user_pseudo_id, event_timestamp) AS global_session_id,
I would recommend changing the code so the "global" session id is really local to each user:
SUM(is_new_session) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS global_session_id,
If you adjust the query and this basically works, then the resource problem is fixed. The next step is to figure out how to get the global id that you want. The simplest solution is to use a local id for each user.

Querying users that triggered 'app_remove' event then checking what event triggered before it

I've been thinking about how to do this and have been searching for answers but to no avail. I'm actually wondering if what I want to achieve is actually possible.
What I want to do is...
I want to first find all users that have triggered the event 'app_remove' (thus ignoring all users that haven't triggered that event).
Once I find all these users, for each of those users, I want to check all events that got triggered in the past X minutes before the 'app_remove' event got triggered.
I basically want to know what the last thing a user did before they uninstalled an app.
Is it possible to do this for all users or can this only be done on a per user basis?
I didn't know where to even begin to do the group thing so I currently opted to instead try and look at one user at a time. This is where I am at currently and I got stuck after writing that incomplete IF statement:
SELECT
TIMESTAMP_MICROS(event_timestamp) as time_stamp, event_name, user_pseudo_id
FROM
`mana-monsters.analytics_182655472.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20180101' AND '20190121' AND
user_pseudo_id = '026e1dd2cfe2344cdf2acf6dab2a123c' AND
IF (event_name = 'app_remove')
GROUP BY
time_stamp, event_name, user_pseudo_id
ORDER BY
time_stamp DESC
I hope I gave enough information.
Thanks in advance for any help or leads as to how to approach this in theory.
I think using INNER JOIN as filter could be an option here. The code should look something like this:
SELECT
TIMESTAMP_MICROS(a.event_timestamp) as time_stamp, a.event_name, a.user_pseudo_id
FROM
`mana-monsters.analytics_182655472.events_*` a INNER JOIN (SELECT user_pseudo_id, event_timestamp from `mana-monsters.analytics_182655472.events_*` where event_name = 'app_remove') b on
a.user_pseudo_id = b.user_pseudo_id and b.event_timestamp - a.event_timestamp < 1000 and b.event_timestamp - a.event_timestamp >= 0
ORDER BY
user_pseudo_id, time_stamp DESC
I have been playing with some dummy data and it worked for me:
#standardSQL
WITH my_table as(
select 1454911123456789 as event_timestamp, 'app_remove' as event_name, '1' as user_pseudo_id UNION ALL
select 1454911123456788 as event_timestamp, 'app_close' as event_name, '1' as user_pseudo_id UNION ALL
select 1454911123456778 as event_timestamp, 'connection_lost' as event_name, '1' as user_pseudo_id UNION ALL
select 1457911123451231 as event_timestamp, 'app_open' as event_name, '2' as user_pseudo_id UNION ALL
select 1457911123450123 as event_timestamp, 'app_close' as event_name, '2' as user_pseudo_id UNION ALL
select 1457911123450035 as event_timestamp, 'connection_lost' as event_name, '2' as user_pseudo_id UNION ALL
select 1459911123455664 as event_timestamp, 'app_remove' as event_name, '3' as user_pseudo_id UNION ALL
select 1459911123455456 as event_timestamp, 'app_close' as event_name, '3' as user_pseudo_id UNION ALL
select 1459911123455354 as event_timestamp, 'game_lost' as event_name, '3' as user_pseudo_id)
SELECT
TIMESTAMP_MICROS(a.event_timestamp) as time_stamp, a.event_name, a.user_pseudo_id
FROM
my_table a INNER JOIN (SELECT user_pseudo_id, event_timestamp from my_table where event_name = 'app_remove') b on
a.user_pseudo_id = b.user_pseudo_id and b.event_timestamp - a.event_timestamp < 1000 and b.event_timestamp - a.event_timestamp >= 0
ORDER BY
user_pseudo_id, time_stamp DESC
The dummy data would be too simple so you may have to make some modification to fit this query for your use case. This is just an example to show a possible solution.

Cannot query the cross product of repeated fields event_params.key and user_properties.key, with daily event tables

If I am undetanding, the issue is that I have two fields that need to be flattened for the query to work...Fields in question in the below are event_params.value.string_value and user_properties.value.string_value
I am able to do this correctly with a single table, but I need to span a date range of dailies and cannot get the syntax right. Any help?
SELECT
event_params.value.string_value,
event_timestamp,
event_name,
user_properties.value.string_value
FROM
(TABLE_DATE_RANGE([[kiehls-kinetic:analytics_180943775.events_],
DATE_ADD(CURRENT_TIMESTAMP(), -365, 'DAY'),
CURRENT_TIMESTAMP())),
WHERE
(event_params.key = 'session')
AND (user_properties.key = 'associate_name')
ORDER BY
event_params.value.string_value ASC,
event_timestamp ASC
You'd want something like this, using standard SQL:
SELECT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'session') AS event_value,
event_timestamp,
event_name,
(SELECT value.string_value
FROM UNNEST(user_properties)
WHERE key = 'associate_name') AS user_value
FROM
`kiehls-kinetic.analytics_180943775.events_*`
WHERE _TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY)) AND
FORMAT_DATE('%Y%m%d', CURRENT_DATE())
ORDER BY
event_value ASC,
event_timestamp ASC

How to get COUNT with UNNEST function in Google BigQuery?

I need to get count of events which has specific parameter in it. Let's say, I have param event_notification_received with params (type, title, code_id). And in param code_id - I have unique name of advertisement. And I need to count how much events where received with such parameter. I am using UNNEST function, to get access to params of event. But it gives too much of results after execution, I think it's because of UNNEST function. How can I count correctly events? Thanks.
Here is my standard SQL query:
#standardSQL
SELECT event_date, event_timestamp, event_name, user_id, app_info.version,
geo.country, geo.region, geo.city,
my_event_params,
user_prop,
platform
FROM
`myProject.analytics_199660162.events_201807*`,
UNNEST(event_params) as my_event_params,
UNNEST(user_properties) as user_prop
WHERE
_TABLE_SUFFIX BETWEEN '24' AND '30' AND
event_name = "event_notification_received"
AND
my_event_params.value.string_value = "my_adverticement_name"
AND
platform = "ANDROID"
ORDER BY event_timestamp DESC
Is this what you want?
SELECT . . .,
(SELECT COUNT(*)
FROM UNNEST(event_params) as my_event_params
WHERE my_event_params.value.string_value = 'my_adverticement_name'
) as event_count
FROM `myProject.analytics_199660162.events_201807*`,
UNNEST(user_properties) as user_prop
WHERE _TABLE_SUFFIX BETWEEN '24' AND '30' AND
event_name = 'event_notification_received' AND
platform = 'ANDROID'
ORDER BY event_timestamp DESC;
If you UNNEST() and CROSS JOIN more than two columns at the FROM level, you'll get duplicated rows - yup.
Instead UNNEST() at the SELECT level, just to extract and COUNT the values you are looking for:
SELECT COUNT(DISTINCT (
SELECT value.string_value FROM UNNEST(user_properties) WHERE key='powers')
) AS distinct_powers
FROM `firebase-sample-for-bigquery.analytics_bingo_sample.events_20160607`

How to calculate Session and Session duration in Firebase Analytics raw data?

How to calculate Session Duration in Firebase analytics raw data which is linked to BigQuery?
I have used the following blog to calculate the users by using the flatten command for the events which are nested within each record, but I would like to know how to proceed with in calculating the Session and Session duration by country and time.
(I have many apps configured, but if you could help me with the SQL query for calculating the session duration and session, It would be of immense help)
Google Blog on using Firebase and big query
First you need to define a session - in the following query I'm going to break a session whenever a user is inactive for more than 20 minutes.
Now, to find all sessions with SQL you can use a trick described at https://blog.modeanalytics.com/finding-user-sessions-sql/.
The following query finds all sessions and their lengths:
#standardSQL
SELECT app_instance_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, MAX(sess_id) OVER(PARTITION BY app_instance_id) total_sessions,
(ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
FROM (
SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
FROM (
SELECT *, IF(
previous IS null
OR (min_time-previous)>(20*60*1000*1000), # sessions broken by this inactivity
1, 0) session_start
#https://blog.modeanalytics.com/finding-user-sessions-sql/
FROM (
SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
FROM (
SELECT user_dim.app_info.app_instance_id
, (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
, (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`
)
)
)
)
GROUP BY 1, 2
ORDER BY 1, 2
With the new schema of Firebase in BigQuery, I found that the answer by #Maziar did not work for me, but I am not sure why.
Instead I have used the following to calculate it, where a session is defined as a user engaging with your app for a minimum of 10 seconds and where the session stops if a user doesn't engage with the app for 30 minutes.
It provides total number of sessions and the session length in minutes, and it is based on this query: https://modeanalytics.com/modeanalytics/reports/5e7d902f82de/queries/2cf4af47dba4
SELECT COUNT(*) AS sessions,
AVG(length) AS average_session_length
FROM (
SELECT global_session_id,
(MAX(event_timestamp) - MIN(event_timestamp))/(60 * 1000 * 1000) AS length
FROM (
SELECT user_pseudo_id,
event_timestamp,
SUM(is_new_session) OVER (ORDER BY user_pseudo_id, event_timestamp) AS global_session_id,
SUM(is_new_session) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS user_session_id
FROM (
SELECT *,
CASE WHEN event_timestamp - last_event >= (30*60*1000*1000)
OR last_event IS NULL
THEN 1 ELSE 0 END AS is_new_session
FROM (
SELECT user_pseudo_id,
event_timestamp,
LAG(event_timestamp,1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS last_event
FROM `dataset.events_2019*`
) last
) final
) session
GROUP BY 1
) agg
WHERE length >= (10/60)
As you know, Google has changed the schema of BigQuery firebase databases:
https://support.google.com/analytics/answer/7029846
Thanks to #Felipe answer, the new format will be changed as follow:
SELECT SUM(total_sessions) AS Total_Sessions, AVG(sess_length_seconds) AS Average_Session_Duration
FROM (
SELECT user_pseudo_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records,
MAX(sess_id) OVER(PARTITION BY user_pseudo_id) total_sessions,
(ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
FROM (
SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY min_time) sess_id
FROM (
SELECT *, IF(previous IS null OR (min_time-previous) > (20*60*1000*1000), 1, 0) session_start
FROM (
SELECT *, LAG(max_time, 1) OVER(PARTITION BY user_pseudo_id ORDER BY max_time) previous
FROM (SELECT user_pseudo_id, MIN(event_timestamp) AS min_time, MAX(event_timestamp) AS max_time
FROM `dataset_name.table_name` GROUP BY user_pseudo_id)
)
)
)
GROUP BY 1, 2
ORDER BY 1, 2
)
Note: change dataset_name and table_name based on your project info
Sample result:
With the recent change in which we have ga_session_id with each event row in the BigQuery table you can calculate number of sessions and average session length much more easily.
The value of the ga_session_id would remain same for the whole session, so you don't need to define the session separately.
You take the Min and the Max value of the event_timestamp column by grouping the result by user_pseudo_id , ga_session_id and event_date so that you get the session duration of the particular session of any user on any given date.
WITH
UserSessions as (
SELECT
user_pseudo_id,
event_timestamp,
event_date,
(Select value.int_value from UNNEST(event_params) where key = "ga_session_id") as session_id,
event_name
FROM `projectname.dataset_name.events_*`
),
SessionDuration as (
SELECT
user_pseudo_id,
session_id,
COUNT(*) AS events,
TIMESTAMP_DIFF(MAX(TIMESTAMP_MICROS(event_timestamp)), MIN(TIMESTAMP_MICROS(event_timestamp)), SECOND) AS session_duration
,event_date
FROM
UserSessions
WHERE session_id is not null
GROUP BY
user_pseudo_id,
session_id
,event_date
)
Select count(session_id) as NumofSessions,avg(session_duration) as AverageSessionLength from SessionDuration
At last you just do the count of the session_id to get the total number of sessions and do the average of the session duration to get the value of the average session length.