BigQuery sessions count not matching Google Analytics - sql

What's the best way to get the sessions count to get as close as possible to what's in Google Analytics? (I'm still using GA360.)
This is my current query for page-level session figures:
SELECT
date,
COUNT(DISTINCT IF(totals.visits=1, CONCAT(fullVisitorId, CAST(visitId AS STRING)), NULL) ) as Sessions,
h.page.pageTitle,
h.page.hostname,
channelgrouping as default_channel_grouping,
concat("https://", h.page.hostname, h.page.pagePathLevel1, REGEXP_REPLACE(h.page.pagePathLevel2, '^.', ''), REGEXP_REPLACE(h.page.pagePathLevel3, '^.', '')) as page_path
FROM
`dive-ga.projectnumber.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
_table_suffix BETWEEN '20210901'
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY
date, h.page.pageTitle, h.page.hostname, channelgrouping, concat("https://", h.page.hostname, h.page.pagePathLevel1, REGEXP_REPLACE(h.page.pagePathLevel2, '^.', ''), REGEXP_REPLACE(h.page.pagePathLevel3, '^.', ''))
I understand that Google Analytics is only showing sessions where there is an event fired. And I've tried looking at all the different session calculations mentioned in the top response in this post.
But the sessions figures are still so off that when I look at YoY figures I can see an increase when looking at BigQuery, but GA will show a decline.
Anything I should be changing in this query?

Related

GA4 vs BigQuery - User Count don't match

I have extracted from Bigquery the active_users and totalusers on 31/12/2022, grouped by CampaignName and Country, using the following query:
select
count(distinct case when (select value.int_value from unnest(event_params) where key = 'engagement_time_msec') > 0 or (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then user_pseudo_id else null end) AS active_users
,count(distinct user_pseudo_id) AS totalusers
,traffic_source.name AS CampaignName
,geo.country AS Country
FROM `independent-tea-354108.analytics_254831690.events_20221231`
GROUP BY
traffic_source.name
,geo.country
The result filtered by CampaignName='(organic)' was:
(https://i.stack.imgur.com/LMQAH.png)
But when I compare with the data from GA4, it doesn't match and the difference is huge (around 15000 more active_users in GA4 than in BigQuery). Please note that this is only for one day, if it was a month the difference would be even higher:
(https://i.stack.imgur.com/8arYs.png)
I've tried filtering by other CampaignNames and not a single value matches and the differences are always huge.
These are two common reasons for the GA4 to BigQuery difference, You have probably already looked at them already.
Check your source table for blank 'user_pseudo_id's if you have a consent mode on your website they may be counted in GA4 but not in bigquery and this can cause big differences.
Time zone is another are that can make a difference BigQuery is always in UTC time your GA4 may not be.
I hope these help

Results within Bigquery do not remain the same as in GA4

I'm inside BigQuery performing the query below to see how many users I had from August 1st to August 14th, but the number is not matching what GA4 presents me.
with event AS (
SELECT
user_id,
event_name,
PARSE_DATE('%Y%m%d',
event_date) AS event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_timestamp,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY TIMESTAMP_MICROS(event_timestamp) DESC) AS rn,
FROM
`events_*`
WHERE
event_name= 'push_received')
SELECT COUNT ( DISTINCT user_id)
FROM
event
WHERE
event_date >= '2022-08-01'
Resultado do GA4
Result BQ = 37024
There are quite a few reasons why your GA4 data in the web will not match when compared to the BigQuery export and the Data API.
In this case, I believe you are running into the Time Zone issue. event_date is the date that the event was logged in the registered timezone of your Property. However, event_timestamp is a time in UTC that the event was logged by the client.
To resolve this, simply update your query with:
EXTRACT(DATETIME FROM TIMESTAMP_MICROS(`event_timestamp`) at TIME ZONE 'TIMEZONE OF YOUR PROPERTY' )
Your data should then match the WebUI and the GA4 Data API. This post that I co-authored goes into more detail on this and other reasons why your data doesn't match: https://analyticscanvas.com/3-reasons-your-ga4-data-doesnt-match/
You cannot simply compare totals. Divide it into daily comparisons and look at details.

PSQL Filter query by time intervals

I have a query that will count the number of all completed issuances from a specific network. Problem is DB has a lot if issuances, starting from 2019-2020 and it counts all of them while I need the ones since last month (from current time, not some fixed date), IN A PRACTICAL WAY. Examples:
This is the query that counts all, which is about 12k
select count(*)
from issuances_extended
where network = 'ethereum'
and status = 'completed'
And this is the query I wrote that counts from a month ago to current time, which is about 100
select count(*)
from issuances_extended
where network = 'ethereum'
and issued_at > now() - interval '1 month'
and status = 'completed'
But I have a lot to count (1,2,3,4,5 months ago, year to date) and different networks so if I go my way as a solution it's ultimately very inefficient way of solving this. Is there a better way? Seems like this could be done via JS transformers but I couldn't figure it out.
Try using GROUP BY and DATE_TRUNC.
SELECT DATE_TRUNC('month', issued_at) as month, count(*) as issuances
FROM issuances_extended
WHERE network = 'ethereum'
AND status = 'completed'
GROUP BY DATE_TRUNC('month', issued_at)
How to Group by Month in PostgreSQL

How to calculate "Daily user engagement" on Big Query and get the same result that Firebase shows in its dashboard?

I'm trying to calculate the amount of time the average user spent in my app on a particular day.
I just read this great post from Todd Kerpelman on Firebase Blog that explains how to do it. With a little modification on his query I reached the solution bellow:
SELECT AVG(user_daily_engagement_time/1000/60)
FROM(
SELECT user_pseudo_id, event_date, sum(engagement_time) AS user_daily_engagement_time
FROM (
SELECT user_pseudo_id, event_date,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key =
"engagement_time_msec") AS engagement_time
FROM `mydataset.events_20201104`
)
WHERE engagement_time > 0
GROUP BY 1,2)
The problem is that when I compare it with Firebase analytics (for the same period) it gives me a difference higher than 2 minutes. In this case the BigQuery answer is >2minutes higher than the Firebase Dashboard.

Show transactions from a user who saw X page/s in their session

I am working with Google Analytics data in BigQuery.
I'd like to show a list of transaction ids from users who visited a particular page on a website in their session, I've unnested hits.page.pagepath in order to identify a particular page, but since I don't know which row the actual transaction ID will occur on I am having trouble returning meaningful results.
My code looks like this, but is returning 0 results, as all the transaction Ids are NULL values, since they do not happen on rows where the page path meets the AND hits.page.pagePath LIKE "%clear-out%" condition:
SELECT hits.transaction.transactionId AS orderid
FROM `xxx.xxx.ga_sessions_20*` AS t
CROSS JOIN UNNEST(hits) AS hits
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 1 day) and
DATE_sub(current_date(), interval 1 day)
AND totals.transactions > 0
AND hits.page.pagePath LIKE "%clear-out%"
AND hits.transaction.transactionId IS NOT NULL
How can I say, for example, return the transaction Ids for all sessions where the user viewed AND hits.page.pagePath LIKE "%clear-out%"?
When cross joining, you're repeating the whole session for each hit. Use this nested info per hit to look for your page - not the cross joined hits.
You're unfortunately giving both the same name. It's better to keep them seperate - here's what it could look like:
SELECT
h.transaction.transactionId AS orderId
--,ARRAY( (SELECT AS STRUCT hitnumber, page.pagePath, transaction.transactionId FROM t.hits ) ) AS hitInfos -- test: show all hits in this session
FROM
`google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` AS t
CROSS JOIN t.hits AS h
WHERE
totals.transactions > 0 AND h.transaction.transactionId IS NOT NULL
AND
-- use the repeated hits nest (not the cross joined 'h') to check all pagePaths in the session
(SELECT LOGICAL_OR(page.pagePath LIKE "/helmets/%") FROM t.hits )
LOGICAL_OR() is an aggregation function for OR - so if any hit matches the condition it returns TRUE
(This query uses the openly available GA data from Google. It's a bit old but good to play around with.)