Firebase Analytics App and Big Query Count User - google-bigquery

How user are counted in firebase analytics?
I launched the query below, but I have an higher number in GA dashboard (9k user and 3.5k new user) instead of BigQuery (7.190)
SELECT count(distinct user_pseudo_id) as user
FROM `pn.analytics_nid.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20210302' AND '20210308'
and how to calculate in Big Query new users?

I am not sure how to calculate the number of users on firebase console.
You can count new users like bellow,
SELECT count(distinct user_pseudo_id) as user
FROM `pn.analytics_nid.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20210302' AND '20210308'
AND event_name = 'first_open'
btw, precisely it depends on your definition of "new user". So take a look this link for first_open meaning.
https://support.google.com/firebase/answer/9234069?hl=en

Related

Why are my GA4 UI total users not adding up with the ones in the BigQuery export?

I am trying to recreate the Google analytics 4 data in BigQuery for reporting. I’m trying to count total users by using the following query
SELECT
count(distinct user_pseudo_id) as total_users
FROM `project.dataset.events_*`
WHERE _table_suffix = “20221201”
Problem is that for any given day, I get around 10-15% LESS total users than I see in the GA4 interface. Why can this be?
I tried Googleing but it didn’t make me any wiser. Anyone has had the same problem?

How to query Total Events based on Experiment Variant?

I'm looking to query some data from GA through BQ for use in A/B-test analysis.
What I'd like to pull out is how many users were placed into each variant, and what was the total amount of add-to-cart completions.
The following query doesn't quite match up with what I'm seeing in GA (I know there will/can be differences), so I guess I just want to make sure that I've gotten it completely correct.
The following query very closely matches the 'Unique Events' Metric in GA, but I want to make sure that it's showing me the 'Total Events' Metric:
SELECT
exp_.experimentVariant AS variant,
COUNT(DISTINCT fullVisitorId) AS users,
COUNTIF(hits_.eventinfo.eventAction = "add to cart") AS add_to_cart
FROM
`XXXXX.YYYYY.ga_sessions_*`,
UNNEST(hits) AS hits_,
UNNEST(hits_.experiment) AS exp_
WHERE
exp_.experimentid = "XXXYYYZZZ"
AND _TABLE_SUFFIX BETWEEN "20220315" AND "20220405"
GROUP BY
variant
ORDER BY
variant
The reason for why I'm not sure this is quite right is because when I use the following query, the output completely matches the 'Total Events' Metric in GA:
SELECT
COUNT(DISTINCT fullVisitorId) AS users,
COUNTIF(hits.eventinfo.eventAction = "add to cart") AS add_to_cart
FROM
`XXXXX.YYYYY.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN "20220315" AND "20220405"
The query will return all users that had a hit with the specified experimentVariant and all add to cart events that had the specified variant sent together with the hit. In that way it looks correct.
A user segment in GA of users exposed to the experiment will work differently and return a different result. The experiment variant users can also have performed add to cart events that didn't have the experiment paramter sent together with them. For example, the add to cart event could have been sent before the user even became exposed to the experiment. If those events are within the timeframe they will be included if the user is qualified for the segment.

What exactly Peak Concurrent User means

what exactly PCU refer?
I have data like this can anyone tell me from this data what would be the result for peak concurrent users?
How to calculate PCU using sql query according to this data?
PCU would mean the maximum number of users logged in at the same time during a given time period.
In your data you only have two users, so it can never be more than 2. We can see that at times, both users are logged in at the same time, so PCU=2 for this data.
Assuming you have access to Window Functions / Analytic Functions...
SELECT
MAX(concurrent_users) AS peak_concurrent_users
FROM
(
SELECT
SUM(CASE WHEN event = 'logout' THEN -1 ELSE 1 END)
OVER (ORDER BY __time, user_id, event)
AS concurrent_users
FROM
yourTable
WHERE
event IN ('login', 'logout')
)
AS running_total
This just adds up how many people have logged in or out in time order, and keeps a running total. When someone logs in, the number goes up, when someone logs out, the number goes down.
Where two events happen at exactly the same time, it assumes the lowest user_id actually went first. And if a user has a login and logout at exactly the same time, it assumes the login actually went first.

Daily Retention with Filter in BigQuery

I am using a query to calculate daily retention on my Firebase Analytics data exported to BigQuery. It is working well and the numbers match with the numbers in Firebase, but when I try to filter the query by a cohort of users, the numbers don't add up.
I want to compare the results of an A/B test from Firebase, and so I've looked at the user_property "firebase_exp_2" which is my A/B test, and I've split up the users in each group (0/1). The retention numbers do not match (at all) the numbers that I can see in my A/B test results in Firebase - actually they show the opposite pattern.
The query is adapted from here: https://github.com/sagishporer/big-query-queries-for-firebase/wiki/Query:-Daily-retention
All I've changed is adding the following under the "WHERE" clause:
WHERE
event_name = 'user_engagement' AND user_pseudo_id IN
(SELECT user_pseudo_id
FROM `analytics_XXX.events_*`,
UNNEST (user_properties) user_properties
WHERE user_properties.key = 'firebase_exp_2' AND user_properties.value.string_value='1')
Firebase says that there are 6,043 users in the Control group and 6,127 in the Variant A group, but my numbers are 5,632 and 5,730, and the retained users are around 1,000 users more than what Firebase reports.
What am I doing wrong?
The export to BigQuery happens on a daily basis and each imported table is named events_YYYYMMDD. Additionally, a table is imported for events received throughout the current day. This table is named events_intraday_YYYYMMDD.
The additions you made are querying from events_* which is fine. The example uses events_201812* though which would ignore the intraday table. That would explain why your numbers a lower. You are missing users added to the A/B test during the current day.

get all the values of a specific field of a datastore for each users using bigquery

I need to get all the log in times of each users from our data store.
fields of the data store are
users, logInTime, LogOutTime, ...
I know I can use count(logInTime) and group it by users to see how many times a user logged in to our system, but how can I get all the logged in times in a list?
Thank you guys!
you can use the group_concat function so something like this:
select
userid, group_concat(logInTime,'|')
from table
group by userid