Trying to determine screen views by region - google-bigquery

I'm new to BigQuery and have limited experience with SQL, but have been making a few queries successfully.
One complicated one which I am a bit stuck on is breaking down the number of screen views by a user's region.
My SQL query looks like this
SELECT
geo.region, COUNT(params.value.string_value) as count
FROM
`xxx`,
UNNEST(event_params) as params
WHERE
geo.country = "Australia" AND geo.region > "" AND event_name = "screen_view" AND params.key = "firebase_screen"
GROUP BY
geo.region
ORDER BY
count DESC
I get some output which is quite a significant amount less than what the Firebase console reports for total screen views in Australia.
Row region count
1 Victoria 25613
2 South Australia 3557
...
Is there something wrong with my query?

Related

How can I reduce Google BigQuery costs?

I have been searching using Google BigQuery on the GDELT database of global news. I am repeating the same search 54 times, just changing the name of an African country.
Is it possible to include all 54 searches in the same query? As I understand the billing, the cost is based on the size of the database searched, not the number of query elements. Is that correct?
Here is an example of my queries for the country of Gabon, selecting themes appearing with ICT.
SELECT theme, COUNT(*) as count
FROM (
select UNIQUE(REGEXP_REPLACE(SPLIT(V2locations,';'), r',.*', '')) theme
from [gdelt-bq:gdeltv2.gkg]
where DATE>20150302000000 and DATE < 20200609000000 and V2locations like '%Gabon%'
AND V2themes like '%WB_133_INFORMATION_AND_COMMUNICATION_TECHNOLOGIES%'
)
group by theme
ORDER BY 2 DESC
LIMIT 300
The simplest way to do so without changing your query logic is to replace
V2locations like '%Gabon%'
with
REGEXP_MATCH(V2locations, r'Gabon|Angola|Zimbabwe')
Note: the query in question is in BigQuery LegacySQL - so obviously i would recommend migration to Standard SQL

How to get firebase console event details such as first_open, app_remove and Registration_Success using big query for last two weeks?

I'm creating visualization for App download count, the app removes count and user registration counts from firebase console data for the last two weeks. It gives us the total count of the selected period but we need date wise count for each. For that, we plan to get the data count using a big query. how do we get all metrics by writing a single query?
We will get all the metrics using single query has below
SELECT event_date,count(*),platform,event_name FROM `apple-XYZ.analytics_XXXXXX.events_*` where
(event_name = "app_remove" or event_name = "first_open" or event_name = "Registration_Success") and
(event_date between "20200419" and "20200502") and (stream_id = "XYZ" or stream_id = "ZYX") and
(platform = "ANDROID" or platform = "IOS") group by platform, event_date, event_name order by event_date;
Result: for two weeks (From 19-04-2020 to 02-04-2020)

BigQuery Firebase Average Coins Per Level In The Game

I developed a words game (using firebase as my backend) with levels and coins.
Now, I'm facing some difficulties while trying to query my DB, so that it will output a table with all levels in the game and average user coins for each level. For example :
Level Avg User Coins
0 50
1 12
2 2
Attached is a picture of my events table:
So as you can see, there is an event of 'level_end', then we can see the 'user coins' and 'level_num'. What is the right way to do that?
This is what I managed to do so far, obviously the wrong way :
SELECT event_name,user_id
FROM `words-game-en.analytics_208527783.events_20191004`,
UNNEST(event_params) as event_param
WHERE event_name = "level_end"
AND event_param.key = "user_coins"
You seem to want something like this:
SELECT event_param.level_num, AVG(event_param.user_coins)
FROM `words-game-en.analytics_208527783.events_20191004` CROSS JOIN
UNNEST(event_params) as event_param
WHERE event_name = 'level_end' AND event_param.key = 'user_coins'
GROUP BY level_num
ORDER BY level_num;
I'm a little confused by what is in event_params and what is directly in events, so you might need to properly qualify the column references.

SQL Time Series Homework

Imagine you have this two tables.
a) streamers: it contains time series data, at a 1-min granularity, of all the channels that broadcast on
Twitch. The columns of the table are:
username: Channel username
timestamp: Epoch timestamp, in seconds, corresponding to the moment the data was captured
game: Name of the game that the user was playing at that time
viewers: Number of concurrent viewers that the user had at that time
followers: Number of total followers that the channel had at that time
b) games_metadata: it contains information of all the games that have ever been broadcasted on Twitch.
The columns of the table are:
game: Name of the game
release_date: Timestamp, in seconds, corresponding to the date when the game was released
publisher: Publisher of the game
genre: Genre of the game
Now I want the Top 10 publishers that have been watched the most during the first quarter of 2019. The output should contain publisher and hours_watched.
The problem is I don't have any database, I created one and inputted some values by hand.
I thought of this query, but I'm not sure if it is what I want. It may be right (I don't feel like it is ), but I'd like a second opinion
SELECT publisher,
(cast(strftime('%m', "timestamp") as integer) + 2) / 3 as quarter,
COUNT((strftime('%M',`timestamp`)/(60*1.0)) * viewers) as total_hours_watch
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game
WHERE quarter = 3
GROUP BY publisher,quarter
ORDER BY total_hours_watch DESC
Looks about right to me. You don't need to include quarter in the GROUP BY since the where clause limits you to only one quarter. You can modify the query to get only the top 10 publishers in a couple of ways depending on the SQL server you've created.
For SQL Server / MS Access modify your select statement: SELECT TOP 10 publisher, ...
For MySQL add a limit clause at the end of your query: ... LIMIT 10;

How can I get Access SQL to return a dataset of the largest value in each category?

This has been driving me crazy all day, and I've gone through every solution I can find on here. This should be a very simple thing.
I have a table in Access that contains a list of applications:
ApplicantNumber | Region
There are many more columns, but those are the two I care about at the moment. Each row is a separate application, and each applicant can submit multiple applications.
I have a query in Access that finds the count per applicant of applications in each region:
ApplicantNumber | Region | CountOfAPplications
How the ##&*!!! do I pull out of that the region with the most applications for each ApplicantNumber?
As far as I can tell, the following should work fine but it just provides the same output as the initial query with the full count per applicant:
SELECT myQry.ApplicantNumber, myQRY.Region, Max(myQRY.CountOfRegion)
FROM (SELECT AppliedCensusBlocks.ApplicantNumber, AppliedCensusBlocks.Region, Count(AppliedCensusBlocks.Region) AS CountOfRegion
FROM AppliedCensusBlocks
GROUP BY AppliedCensusBlocks.ApplicantNumber, AppliedCensusBlocks.Region) AS myQRY
GROUP BY myQry.ApplicantNumber, myQry.Region
What am I doing wrong? If I remove the Region field, Access will work as I'd expect and just show the ApplicantNumber and maximum count. BUt I'm really trying to get at the region name associated with the maximum count.
This is a bit tricky. MS Access is not the best suited for this sort of query. But here is one way
SELECT acb.ApplicantNumber, acb.Region, Count(*) AS CountOfRegion
FROM AppliedCensusBlocks as acb
GROUP BY acb.ApplicantNumber, acb.Region
HAVING COUNT(*) = (SELECT TOP 1 COUNT(*)
FROM AppliedCensusBlocks as acb2
WHERE acb2.ApplicantNumber = acb.ApplicantNumber
GROUP BY acb2.Region
ORDER BY COUNT(*) DESC, acb2.Region
);
SELECT TOP 1 ApplicantNumber, Region, COUNT(*) AS Applications
FROM AppliedCensusBlocks
GROUP BY ApplicantNumber, Region
ORDER BY COUNT(*) DESC