Results within Bigquery do not remain the same as in GA4 - google-bigquery

I'm inside BigQuery performing the query below to see how many users I had from August 1st to August 14th, but the number is not matching what GA4 presents me.
with event AS (
SELECT
user_id,
event_name,
PARSE_DATE('%Y%m%d',
event_date) AS event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_timestamp,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY TIMESTAMP_MICROS(event_timestamp) DESC) AS rn,
FROM
`events_*`
WHERE
event_name= 'push_received')
SELECT COUNT ( DISTINCT user_id)
FROM
event
WHERE
event_date >= '2022-08-01'
Resultado do GA4
Result BQ = 37024

There are quite a few reasons why your GA4 data in the web will not match when compared to the BigQuery export and the Data API.
In this case, I believe you are running into the Time Zone issue. event_date is the date that the event was logged in the registered timezone of your Property. However, event_timestamp is a time in UTC that the event was logged by the client.
To resolve this, simply update your query with:
EXTRACT(DATETIME FROM TIMESTAMP_MICROS(`event_timestamp`) at TIME ZONE 'TIMEZONE OF YOUR PROPERTY' )
Your data should then match the WebUI and the GA4 Data API. This post that I co-authored goes into more detail on this and other reasons why your data doesn't match: https://analyticscanvas.com/3-reasons-your-ga4-data-doesnt-match/

You cannot simply compare totals. Divide it into daily comparisons and look at details.

Related

GA4 vs BigQuery - User Count don't match

I have extracted from Bigquery the active_users and totalusers on 31/12/2022, grouped by CampaignName and Country, using the following query:
select
count(distinct case when (select value.int_value from unnest(event_params) where key = 'engagement_time_msec') > 0 or (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then user_pseudo_id else null end) AS active_users
,count(distinct user_pseudo_id) AS totalusers
,traffic_source.name AS CampaignName
,geo.country AS Country
FROM `independent-tea-354108.analytics_254831690.events_20221231`
GROUP BY
traffic_source.name
,geo.country
The result filtered by CampaignName='(organic)' was:
(https://i.stack.imgur.com/LMQAH.png)
But when I compare with the data from GA4, it doesn't match and the difference is huge (around 15000 more active_users in GA4 than in BigQuery). Please note that this is only for one day, if it was a month the difference would be even higher:
(https://i.stack.imgur.com/8arYs.png)
I've tried filtering by other CampaignNames and not a single value matches and the differences are always huge.
These are two common reasons for the GA4 to BigQuery difference, You have probably already looked at them already.
Check your source table for blank 'user_pseudo_id's if you have a consent mode on your website they may be counted in GA4 but not in bigquery and this can cause big differences.
Time zone is another are that can make a difference BigQuery is always in UTC time your GA4 may not be.
I hope these help

How to calculate "Daily user engagement" on Big Query and get the same result that Firebase shows in its dashboard?

I'm trying to calculate the amount of time the average user spent in my app on a particular day.
I just read this great post from Todd Kerpelman on Firebase Blog that explains how to do it. With a little modification on his query I reached the solution bellow:
SELECT AVG(user_daily_engagement_time/1000/60)
FROM(
SELECT user_pseudo_id, event_date, sum(engagement_time) AS user_daily_engagement_time
FROM (
SELECT user_pseudo_id, event_date,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key =
"engagement_time_msec") AS engagement_time
FROM `mydataset.events_20201104`
)
WHERE engagement_time > 0
GROUP BY 1,2)
The problem is that when I compare it with Firebase analytics (for the same period) it gives me a difference higher than 2 minutes. In this case the BigQuery answer is >2minutes higher than the Firebase Dashboard.

Google Analytics to Big Query data-What is the SQL code from Custom Dimension with transaction?

How to see the data above in Big Query-The tables are there since an year.
What code should I use to see the above result?
User subscription status is Session based dimension which has made transactions.
I have enabled data in Big Query but how to see the exact the same results in BQ.?
Try code below. Change table name and date interval according to your request.
#standardSQL
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170731'
GROUP BY date
ORDER BY date ASC
These documents could be useful for you before posting questions:
https://support.google.com/analytics/answer/4419694?hl=tr
https://support.google.com/analytics/answer/3437719?hl=tr
For custom dimensions on session scope write a subquery that runs on the unnested array.
#standardSQL
SELECT
date,
-- select one value from unnested array
(SELECT value FROM UNNEST(customDimensions) WHERE index=4) AS cd4,
SUM(totals.transactions) AS transactions,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20160802'
GROUP BY
date, cd4
ORDER BY
date ASC
you need to change the condition in the subquery to your custom dimension index

BigQuery - COUNTIF Unixtimestamp meets event_date in YYYYMMDD format?

I'm working with Google Analytics App plus Web data in BigQuery.
I want to count a user AS new_user WHEN user_first_touch_timestamp value matchs the table's event date value. This would result in a a count of new users who visited the site on a particular day.
Example value in user_first_touch_timestamp
1595912758378962
Example value in event_date
20200809
How can I do this?
Thanks.
Below is for BigQuery Standard SQL
You should parse both values t same DATE type - as below
PARSE_DATE('%Y%m%d', event_date) AS event_date_day
and
DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_timestamp_day
After this is done - now you can do whatever comparison you need
For example - if you want to use it in WHERE clause - it can look like below
WHERE PARSE_DATE('%Y%m%d', event_date) = DATE(TIMESTAMP_MICROS(user_first_touch_timestamp))

Obtain latest record for a given second Postgres

I have data with millisecond precision timestamp. I want to only filter for the most recent timestamp within a given second. Ie. records (2020-07-13 5:05.38.009, event1), (2020-07-13 5:05.38.012, event2) should only retrieve the latter.
I've tried the following:
SELECT
timestamp as time, event as value, event_type as metric
FROM
table
GROUP BY
date_trunc('second', time)
But then I'm asked to group by event as well and I see all the data (as if no group by was provided)
In Postgres, you can use distinct on:
select distinct on (date_trunc('second', time)) t.*
from t
order by time desc;