BigQuery Firebase Average Coins Per Level In The Game - sql

I developed a words game (using firebase as my backend) with levels and coins.
Now, I'm facing some difficulties while trying to query my DB, so that it will output a table with all levels in the game and average user coins for each level. For example :
Level Avg User Coins
0 50
1 12
2 2
Attached is a picture of my events table:
So as you can see, there is an event of 'level_end', then we can see the 'user coins' and 'level_num'. What is the right way to do that?
This is what I managed to do so far, obviously the wrong way :
SELECT event_name,user_id
FROM `words-game-en.analytics_208527783.events_20191004`,
UNNEST(event_params) as event_param
WHERE event_name = "level_end"
AND event_param.key = "user_coins"

You seem to want something like this:
SELECT event_param.level_num, AVG(event_param.user_coins)
FROM `words-game-en.analytics_208527783.events_20191004` CROSS JOIN
UNNEST(event_params) as event_param
WHERE event_name = 'level_end' AND event_param.key = 'user_coins'
GROUP BY level_num
ORDER BY level_num;
I'm a little confused by what is in event_params and what is directly in events, so you might need to properly qualify the column references.

Related

Matching BigQuery data with Traffic Acquisition GA4 report

I'm new to BigQuery and I'm trying to replicate the Traffic Acquisition GA4 report, but not very successfully at the moment, as my results are not even remotely close to the GA4 view.
I understand that the source/medium/campaign fields are event-based and not session-based in GA4 / BQ. My question is, why not every event has a source/medium/campaign as an event_parameter_key? It seems logical for me to have these parameters for the 'session_start' event, but unfortunately, it's not the case
I tried the following options to replicate the Traffic Acquisition report:
2.1 To check the first medium for sessions:
with cte as ( select
PARSE_DATE("%Y%m%d", event_date) AS Date,
user_pseudo_id,
concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
FIRST_VALUE((select value.string_value from unnest(event_params) where key = 'medium')) OVER (PARTITION BY concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) ORDER BY event_timestamp) as first_medium
FROM `project`)
select Date, first_medium, count(distinct user_pseudo_id) as Users, count (distinct session_id) as Sessions
from cte
group by 1,2;
The query returns 44k users with 'null' medium and 1.8k organic users while there are 17k users with the 'none' medium and 8k organic users in GA4.
2.2 If I change the first medium to the last medium:
FIRST_VALUE((select value.string_value from unnest(event_params) where key = 'medium')) OVER (PARTITION BY concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) ORDER BY event_timestamp desc) as last_medium
Organic medium increases to 9k users, though the results are still not matching the GA4 data.
2.3 I've also tried this code - https://www.ga4bigquery.com/traffic-source-dimensions-metrics-ga4/ - source / medium (based on session), and still got completely different results compared to the GA4.
Any help would be much appreciated!
I have noticed the samething, looking deeper I pulled 1 days worth of data from big query into google sheets and examined it.
Unsurprisingly I could replicate the results from ga4bigquery codes you have mentioned above results but they did not align with GA4 and although close for high traffic pages could be wildly out for the lower ones.
I then did a count for 'email' in event parmas source & ea_tracking_id as well as traffic_source and found they are all lower than the GA4 analytics.
I went to my dev site where I know exactly how many sessions have a source of email GA4 analytics agreed but big query did not, Google seems to be allocating a some traffic to not set randomly.
I have concluded the problem is not in the SQL and not in the tagging but in the bigquery GA4 data source. I have logged a query with google and we will see what happens. Sorry its not a solution

How to get firebase console event details such as first_open, app_remove and Registration_Success using big query for last two weeks?

I'm creating visualization for App download count, the app removes count and user registration counts from firebase console data for the last two weeks. It gives us the total count of the selected period but we need date wise count for each. For that, we plan to get the data count using a big query. how do we get all metrics by writing a single query?
We will get all the metrics using single query has below
SELECT event_date,count(*),platform,event_name FROM `apple-XYZ.analytics_XXXXXX.events_*` where
(event_name = "app_remove" or event_name = "first_open" or event_name = "Registration_Success") and
(event_date between "20200419" and "20200502") and (stream_id = "XYZ" or stream_id = "ZYX") and
(platform = "ANDROID" or platform = "IOS") group by platform, event_date, event_name order by event_date;
Result: for two weeks (From 19-04-2020 to 02-04-2020)

SQL Time Series Homework

Imagine you have this two tables.
a) streamers: it contains time series data, at a 1-min granularity, of all the channels that broadcast on
Twitch. The columns of the table are:
username: Channel username
timestamp: Epoch timestamp, in seconds, corresponding to the moment the data was captured
game: Name of the game that the user was playing at that time
viewers: Number of concurrent viewers that the user had at that time
followers: Number of total followers that the channel had at that time
b) games_metadata: it contains information of all the games that have ever been broadcasted on Twitch.
The columns of the table are:
game: Name of the game
release_date: Timestamp, in seconds, corresponding to the date when the game was released
publisher: Publisher of the game
genre: Genre of the game
Now I want the Top 10 publishers that have been watched the most during the first quarter of 2019. The output should contain publisher and hours_watched.
The problem is I don't have any database, I created one and inputted some values by hand.
I thought of this query, but I'm not sure if it is what I want. It may be right (I don't feel like it is ), but I'd like a second opinion
SELECT publisher,
(cast(strftime('%m', "timestamp") as integer) + 2) / 3 as quarter,
COUNT((strftime('%M',`timestamp`)/(60*1.0)) * viewers) as total_hours_watch
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game
WHERE quarter = 3
GROUP BY publisher,quarter
ORDER BY total_hours_watch DESC
Looks about right to me. You don't need to include quarter in the GROUP BY since the where clause limits you to only one quarter. You can modify the query to get only the top 10 publishers in a couple of ways depending on the SQL server you've created.
For SQL Server / MS Access modify your select statement: SELECT TOP 10 publisher, ...
For MySQL add a limit clause at the end of your query: ... LIMIT 10;

Trying to determine screen views by region

I'm new to BigQuery and have limited experience with SQL, but have been making a few queries successfully.
One complicated one which I am a bit stuck on is breaking down the number of screen views by a user's region.
My SQL query looks like this
SELECT
geo.region, COUNT(params.value.string_value) as count
FROM
`xxx`,
UNNEST(event_params) as params
WHERE
geo.country = "Australia" AND geo.region > "" AND event_name = "screen_view" AND params.key = "firebase_screen"
GROUP BY
geo.region
ORDER BY
count DESC
I get some output which is quite a significant amount less than what the Firebase console reports for total screen views in Australia.
Row region count
1 Victoria 25613
2 South Australia 3557
...
Is there something wrong with my query?

BigQuery Session & Hit level understanding

I want to ask about your knowledge regarding the concept of Events.
Hit level
Session Level
How in BigQuery (standard SQL) how i can map mind this logic, and also
Sessions
Events Per Session
Unique Events
Please can somebody guide me to understand these concepts?
totals.visitors is Session
sometime
visitId is taken as Session
to achieve that you need to grapple a little with a few different concepts. The first being "what is a session" in GA lingo. you can find that here. A session is a collection of hits. A hit is one of the following: pageview, event, social interaction or transaction.
Now to see how that is represented in the BQ schema, you can look here. visitId and visitorId will help you define a session (as opposed to a user).
Then you can count the number of totals.hits that are events of the type you want.
It could look something like:
select visitId,
sum(case when hits.type = "EVENT" then totals.hits else 0) from
dataset.table_* group by 1
That should work to get you an overview. If you need to slice and dice the event details (i.e. hits.eventInfo.*) then I suggest you make a query for all the visitId and one for all the relevant events and their respective visitId
I hope that works!
Cheers
You can think of these concepts like this:
every row is a session
technically every row with totals.visits=1 is a valid session
hits is an array containing structs which contain information for every hit
You can write subqueries on arrays - basically treat them as tables. I'd recommend to study Working with Arrays and apply/transfer every exercise directly to hits, if possible.
Example for subqueries on session level
SELECT
fullvisitorid,
visitStartTime,
(SELECT SUM(IF(type='EVENT',1,0)) FROM UNNEST(hits)) events,
(SELECT COUNT(DISTINCT CONCAT(eventInfo.eventCategory,eventInfo.eventAction,eventInfo.eventLabel) )
FROM UNNEST(hits) WHERE type='EVENT') uniqueEvents,
(SELECT SUM(IF(type='PAGE',1,0)) FROM UNNEST(hits)) pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
WHERE
totals.visits=1
LIMIT
1000
Example for Flattening to hit level
There's also the possibility to use fields in arrays for grouping if you cross join arrays with their parent row
SELECT
h.type,
COUNT(1) hits
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170801` AS t CROSS JOIN t.hits AS h
WHERE
totals.visits=1
GROUP BY
1
Regarding the relation between visitId and Sessions you can read this answer.