Firebase Analytics vs. BigQuery - Average engagement time per screen - google-bigquery

My engagement time shown in Firebase analytics, under Engagement -> Pages and Screens -> Page Title and Screen Name differs from that which is returned by the following BigQuery query.
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key="firebase_screen") AS screen,
AVG((SELECT value.int_value FROM UNNEST(event_params) WHERE key="engagement_time_msec")) AS avg_engagement_time,
FROM
`ABC.events_20210923`
GROUP BY screen
ORDER BY avg_engagement_time DESC
However the numbers shown in Firebase Analytics are completely different from the numbers returned by the query. The descending order in which they are shown is about 65% right. Is this a completely different metric or is my query just wrong?

Related

Matching BigQuery data with Traffic Acquisition GA4 report

I'm new to BigQuery and I'm trying to replicate the Traffic Acquisition GA4 report, but not very successfully at the moment, as my results are not even remotely close to the GA4 view.
I understand that the source/medium/campaign fields are event-based and not session-based in GA4 / BQ. My question is, why not every event has a source/medium/campaign as an event_parameter_key? It seems logical for me to have these parameters for the 'session_start' event, but unfortunately, it's not the case
I tried the following options to replicate the Traffic Acquisition report:
2.1 To check the first medium for sessions:
with cte as ( select
PARSE_DATE("%Y%m%d", event_date) AS Date,
user_pseudo_id,
concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
FIRST_VALUE((select value.string_value from unnest(event_params) where key = 'medium')) OVER (PARTITION BY concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) ORDER BY event_timestamp) as first_medium
FROM `project`)
select Date, first_medium, count(distinct user_pseudo_id) as Users, count (distinct session_id) as Sessions
from cte
group by 1,2;
The query returns 44k users with 'null' medium and 1.8k organic users while there are 17k users with the 'none' medium and 8k organic users in GA4.
2.2 If I change the first medium to the last medium:
FIRST_VALUE((select value.string_value from unnest(event_params) where key = 'medium')) OVER (PARTITION BY concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) ORDER BY event_timestamp desc) as last_medium
Organic medium increases to 9k users, though the results are still not matching the GA4 data.
2.3 I've also tried this code - https://www.ga4bigquery.com/traffic-source-dimensions-metrics-ga4/ - source / medium (based on session), and still got completely different results compared to the GA4.
Any help would be much appreciated!
I have noticed the samething, looking deeper I pulled 1 days worth of data from big query into google sheets and examined it.
Unsurprisingly I could replicate the results from ga4bigquery codes you have mentioned above results but they did not align with GA4 and although close for high traffic pages could be wildly out for the lower ones.
I then did a count for 'email' in event parmas source & ea_tracking_id as well as traffic_source and found they are all lower than the GA4 analytics.
I went to my dev site where I know exactly how many sessions have a source of email GA4 analytics agreed but big query did not, Google seems to be allocating a some traffic to not set randomly.
I have concluded the problem is not in the SQL and not in the tagging but in the bigquery GA4 data source. I have logged a query with google and we will see what happens. Sorry its not a solution

GA4 traffic source data do not match with bigquery

I have try to export traffic source data and event attribtion from bigquery and match with GA4 (session_source and session_medium)
I am extract the event params (source ad medium) from bigquery but have a big gap between two data source
Any solution to solve it?
I have try to use use below SQL
with prep as (
select
user_pseudo_id,
(select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id,
max((select value.string_value from unnest(event_params) where key = 'source')) as source,
max((select value.string_value from unnest(event_params) where key = 'medium')) as medium,
max((select value.string_value from unnest(event_params) where key = 'name')) as campaign,
max((select value.string_value from unnest(event_params) where key = 'term')) as term,
max((select value.string_value from unnest(event_params) where key = 'content')) as coXXntent,
platform,
FROM `XXX`
group by
user_pseudo_id,
session_id,
platform
)
select
-- session medium (dimension | the value of a medium associated with a session)
platform,
coalesce(source,'(none)') as source_session,
coalesce(medium,'(none)') as medium_session,
coalesce(campaign,'(none)') as campaign_session,
coalesce(content,'(none)') as content,
coalesce(term,'(none)') as term,
count(distinct concat(user_pseudo_id,session_id)) as sessions
from
prep
group by
platform,
source_session,
medium_session,
campaign_session,
content,
term
order by
sessions desc
I'm also trying to figure out why BigQuery can't correctly match the source and medium to the event. The issue I found is that it assigns the source/medium as google/organic even though there is a gclid parameter in the link. The second issue is the huge deficiencies in recognizing the source as direct - in such cases I do not have these parameters for events at all.
The values are valid, but only for the source and medium that acquired the user.
As I compare data in UA and GA4 session attribution is correct. So it looks like a problem when exporting to BigQuery. I reported this to support and am waiting for a response.
I have also noticed source/medium does not align between BigQuery and GA4 and like Justyna has commented a lot of my source/medium come through as google/organic even when they are not. I am hoping Justyna will post here when there is a solution.
Looking at your code I can see 2 other areas that would cause discrepancies
1)
count(distinct concat(user_pseudo_id,session_id)) as sessions
This will only capture events with a valid pseudo_id and session_id, this is the correct way to count, but in my data there tends to be a few events without the ids are null so your session count included them but GA4 does.so use your preferred method of counting nulls to work out if this is an issue for you.
2):
You are also doing an exact count which again is correct but GA4 does an approximant match see link below for details.
https://developers.google.com/analytics/blog/2022/hll#using_bigquery_hll_functions_with_google_analytics_event_data
Using the above two techniques I can get a lot closer to the GA4 number of session but they are still not attributed correctly

How to get firebase console event details such as first_open, app_remove and Registration_Success using big query for last two weeks?

I'm creating visualization for App download count, the app removes count and user registration counts from firebase console data for the last two weeks. It gives us the total count of the selected period but we need date wise count for each. For that, we plan to get the data count using a big query. how do we get all metrics by writing a single query?
We will get all the metrics using single query has below
SELECT event_date,count(*),platform,event_name FROM `apple-XYZ.analytics_XXXXXX.events_*` where
(event_name = "app_remove" or event_name = "first_open" or event_name = "Registration_Success") and
(event_date between "20200419" and "20200502") and (stream_id = "XYZ" or stream_id = "ZYX") and
(platform = "ANDROID" or platform = "IOS") group by platform, event_date, event_name order by event_date;
Result: for two weeks (From 19-04-2020 to 02-04-2020)

BigQuery Firebase Average Coins Per Level In The Game

I developed a words game (using firebase as my backend) with levels and coins.
Now, I'm facing some difficulties while trying to query my DB, so that it will output a table with all levels in the game and average user coins for each level. For example :
Level Avg User Coins
0 50
1 12
2 2
Attached is a picture of my events table:
So as you can see, there is an event of 'level_end', then we can see the 'user coins' and 'level_num'. What is the right way to do that?
This is what I managed to do so far, obviously the wrong way :
SELECT event_name,user_id
FROM `words-game-en.analytics_208527783.events_20191004`,
UNNEST(event_params) as event_param
WHERE event_name = "level_end"
AND event_param.key = "user_coins"
You seem to want something like this:
SELECT event_param.level_num, AVG(event_param.user_coins)
FROM `words-game-en.analytics_208527783.events_20191004` CROSS JOIN
UNNEST(event_params) as event_param
WHERE event_name = 'level_end' AND event_param.key = 'user_coins'
GROUP BY level_num
ORDER BY level_num;
I'm a little confused by what is in event_params and what is directly in events, so you might need to properly qualify the column references.

Trying to determine screen views by region

I'm new to BigQuery and have limited experience with SQL, but have been making a few queries successfully.
One complicated one which I am a bit stuck on is breaking down the number of screen views by a user's region.
My SQL query looks like this
SELECT
geo.region, COUNT(params.value.string_value) as count
FROM
`xxx`,
UNNEST(event_params) as params
WHERE
geo.country = "Australia" AND geo.region > "" AND event_name = "screen_view" AND params.key = "firebase_screen"
GROUP BY
geo.region
ORDER BY
count DESC
I get some output which is quite a significant amount less than what the Firebase console reports for total screen views in Australia.
Row region count
1 Victoria 25613
2 South Australia 3557
...
Is there something wrong with my query?