Bigquery Active User count not accurate (Google Analytics) - sql

I have Google Analytics integrated to Bigquery and I'm trying to write a query to fetch Active Users that should match with the number on GA Portal.
Here's the query I've written;
SELECT
date(date) as date,
EXACT_COUNT_DISTINCT(fullVisitorId) as daily_active_users,
FROM TABLE_DATE_RANGE([<project_id>:<dataset>.ga_sessions_],
TIMESTAMP('2018-01-01'),
TIMESTAMP(CURRENT_DATE()))
group by date
order by date desc
The numbers I get in response are somehow related to the ones Google Analytics shows me, but they aren't a 100% accurate.
The numebers I get in return are slightely higher than the ones on the portal and I assume I need to put a where clause to filter a property GA might be filtering on the portal.

Your query looks fine to me. Assuming that you're looking at the same GA view as the one linked to BigQuery, I think that the problem could be sampling.
Even if the GA UI says that "This report is based on 100% of sessions.", try to export it as an Unsampled Report and check the numbers (in my experience, the users metric sometimes doesn't match between unsampled reports and default reports without sampling).

Related

Google Storage Transfer service to BigQuery - GEO doesn't give correct data

I have an issue with the GEO and Google Storage Transfer service. I use it to pull Google Ads data in to BigQuery. It's not empty but the numbers don't match. E.g. a query like this
SELECT _DATA_DATE, sum(clicks) FROM project.dataset.GeoStats_XXX where LocationType='LOCATION_OF_PRESENCE' and IsTargetingLocation IN (true, false) group by 1
returns a smaller number than the real one, the one that I get if I just create the same report via Google Ads UI. I've tried different manipulations with the LocationType and IsTargetingLocation filters but nothing seems to work.
I see no errors in Storage Transfer service logs
The CampaignStat returns almost correct data but there is no GEO info there and I need the country breakdown. Not Target Location but GEO (location of presence)
I expect the numbers returned by my query to match the numbers that I see via Google Ads UI

BigQuery Google Ads Transfer - Duplicate data

We are using the Google Ads transfer in BigQuery to ingest our Google Ads data. One thing I have noticed when querying the results is that all of the metrics are exactly 156x of the values we would expect in the Google Ads UI (cost, clicks, etc.)
We have tested multiple transfers and each time we have this same issue. The transfer process seems pretty straight forward, but am I missing something? Has anyone else noticed a similar issue or have any ideas of what to look at to adjust in the data transfer?
For which tables do you notice this behavior?
The dimension tables such as Customer, Campaign, AdGroup are exported every day and so are partitioned by day.
This could cause your duplication?!
You only need the latest partition/day.
So this is for example how I get the latest account / customer data:
SELECT
-- main reason I cast all the id's to string is because BI reporting tool will not see it as a metric but as a dimension field
CAST(customer_id AS STRING) AS account_id, --globally unique, see also: https://developers.google.com/google-ads/api/docs/concepts/api-structure
customer_descriptive_name,
customer_auto_tagging_enabled,
customer_currency_code,
customer_manager,
customer_test_account,
customer_time_zone,
_DATA_DATE AS date, --source table is paritioned on date
_LATEST_DATE,
CASE WHEN _DATA_DATE = _LATEST_DATE THEN TRUE ELSE FALSE END is_most_recent_record
FROM
`YOURPROJECTID.google_ads.ads_Customer_YOURID`
WHERE
_DATA_DATE = _LATEST_DATE

Why BigQuery BI engine does not use all the reservation?

I have a dashboard connected to a BigQuery Table, BI engine works as expected as I am using a calendar filter and my table is partitioned per date.
when I select a longer date range, BI engine stop working with this message "The table or data volume was larger than BI Engine supports at this time", that's fair.
Please notice, I am already filtering by a partition, but sometimes, I need to see the whole data
to solve that, I created a BI reservation, and I notice regardless of the size 1,2,4 GB the memory used is always 600MB? and I get the same message, I attached a screenshot here, is this by design?
Bug Report here: https://issuetracker.google.com/issues/150633500
turn out the error is not related to reservation, but to the fact that BI engine support only 500 partition, my table has more
https://cloud.google.com/bi-engine/docs/overview#limitations
the solution is instead of partition per day, I will use something like week or month

Daily Retention with Filter in BigQuery

I am using a query to calculate daily retention on my Firebase Analytics data exported to BigQuery. It is working well and the numbers match with the numbers in Firebase, but when I try to filter the query by a cohort of users, the numbers don't add up.
I want to compare the results of an A/B test from Firebase, and so I've looked at the user_property "firebase_exp_2" which is my A/B test, and I've split up the users in each group (0/1). The retention numbers do not match (at all) the numbers that I can see in my A/B test results in Firebase - actually they show the opposite pattern.
The query is adapted from here: https://github.com/sagishporer/big-query-queries-for-firebase/wiki/Query:-Daily-retention
All I've changed is adding the following under the "WHERE" clause:
WHERE
event_name = 'user_engagement' AND user_pseudo_id IN
(SELECT user_pseudo_id
FROM `analytics_XXX.events_*`,
UNNEST (user_properties) user_properties
WHERE user_properties.key = 'firebase_exp_2' AND user_properties.value.string_value='1')
Firebase says that there are 6,043 users in the Control group and 6,127 in the Variant A group, but my numbers are 5,632 and 5,730, and the retained users are around 1,000 users more than what Firebase reports.
What am I doing wrong?
The export to BigQuery happens on a daily basis and each imported table is named events_YYYYMMDD. Additionally, a table is imported for events received throughout the current day. This table is named events_intraday_YYYYMMDD.
The additions you made are querying from events_* which is fine. The example uses events_201812* though which would ignore the intraday table. That would explain why your numbers a lower. You are missing users added to the A/B test during the current day.

Google AdWords Transfers in Big Query: Possible to Change Table Schema?

I started to test Google AdWords transfers for Big Query (https://cloud.google.com/bigquery/docs/adwords-transfer).
I have few questions for which I cannot find answers anywhere.
Is it possible to e.g. edit which columns are downloaded from AdWords to Big Query? E.g. Keyword report has only ad group ID column but not ad group text name.
Or is it possible to decide which tables=reports are downloaded? The transfer creates around 60 tables and I need just 5...
DZ
According to here, AdWords data transfer
store your AdWords data into a Dataset. So, the inputs are in terms of Adwords customer IDs (minimum one customer ID) and the output is a collection of Datasets.
I think, you need a modified version of PubSub to store special columns or tables in BigQuery.