We're trying to connect online and offline behaviour via measurement protocol.
It's been sent a hit to Google Analytics with the following parameters (among others):
eventCategory= offline_transaction
source= store
medium= offline
The data are correctly registered in Google Analytics, been available in Reporting section.
I'm trying to get them in BigQuery this way:
SELECT
hits.eventInfo.eventCategory, trafficSource.source, trafficSource.medium
FROM [XXX:YYY.ga_sessions_20160827]
where hits.eventInfo.eventCategory="offline_transaction"
and trafficSource.source="store"
and trafficSource.medium="offline"
and the output is 'Query returned zero records'.
Any idea about what I'm doing wrong? Are available in BigQuery the data coming from Measurement Protocol?
Thanks in advance.
I believe what is happening is that the trafficSource.source/medium are being recorded at the session level and hits.eventCategory at the hit level, and thus they are never included in a single row together, so 0 rows match your query. Try something like the below:
SELECT
MAX(IF (hits.eventInfo.eventCategory = "offline_transaction", hits.eventInfo.eventCategory, NULL)) WITHIN RECORD AS eventCategory,
SUM(IF (hits.eventInfo.eventCategory = "offline_transaction", 1, NULL)) WITHIN RECORD AS eventCnt,
trafficSource.source,
trafficSource.medium
FROM [XXX:YYY.ga_sessions_20160827]
where hits.eventInfo.eventCategory="offline_transaction"
and trafficSource.source="store"
and trafficSource.medium="offline"
This should give you a count of how many times that event occurred within that session. Without knowing more about your use case/what you want to pull out of the table, I don't know how else to help.
I've had to use the aggregate_function() WITHIN RECORD syntax frequently to deal with these types of issues.
Related
I'm looking to query some data from GA through BQ for use in A/B-test analysis.
What I'd like to pull out is how many users were placed into each variant, and what was the total amount of add-to-cart completions.
The following query doesn't quite match up with what I'm seeing in GA (I know there will/can be differences), so I guess I just want to make sure that I've gotten it completely correct.
The following query very closely matches the 'Unique Events' Metric in GA, but I want to make sure that it's showing me the 'Total Events' Metric:
SELECT
exp_.experimentVariant AS variant,
COUNT(DISTINCT fullVisitorId) AS users,
COUNTIF(hits_.eventinfo.eventAction = "add to cart") AS add_to_cart
FROM
`XXXXX.YYYYY.ga_sessions_*`,
UNNEST(hits) AS hits_,
UNNEST(hits_.experiment) AS exp_
WHERE
exp_.experimentid = "XXXYYYZZZ"
AND _TABLE_SUFFIX BETWEEN "20220315" AND "20220405"
GROUP BY
variant
ORDER BY
variant
The reason for why I'm not sure this is quite right is because when I use the following query, the output completely matches the 'Total Events' Metric in GA:
SELECT
COUNT(DISTINCT fullVisitorId) AS users,
COUNTIF(hits.eventinfo.eventAction = "add to cart") AS add_to_cart
FROM
`XXXXX.YYYYY.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN "20220315" AND "20220405"
The query will return all users that had a hit with the specified experimentVariant and all add to cart events that had the specified variant sent together with the hit. In that way it looks correct.
A user segment in GA of users exposed to the experiment will work differently and return a different result. The experiment variant users can also have performed add to cart events that didn't have the experiment paramter sent together with them. For example, the add to cart event could have been sent before the user even became exposed to the experiment. If those events are within the timeframe they will be included if the user is qualified for the segment.
We are using the Google Ads transfer in BigQuery to ingest our Google Ads data. One thing I have noticed when querying the results is that all of the metrics are exactly 156x of the values we would expect in the Google Ads UI (cost, clicks, etc.)
We have tested multiple transfers and each time we have this same issue. The transfer process seems pretty straight forward, but am I missing something? Has anyone else noticed a similar issue or have any ideas of what to look at to adjust in the data transfer?
For which tables do you notice this behavior?
The dimension tables such as Customer, Campaign, AdGroup are exported every day and so are partitioned by day.
This could cause your duplication?!
You only need the latest partition/day.
So this is for example how I get the latest account / customer data:
SELECT
-- main reason I cast all the id's to string is because BI reporting tool will not see it as a metric but as a dimension field
CAST(customer_id AS STRING) AS account_id, --globally unique, see also: https://developers.google.com/google-ads/api/docs/concepts/api-structure
customer_descriptive_name,
customer_auto_tagging_enabled,
customer_currency_code,
customer_manager,
customer_test_account,
customer_time_zone,
_DATA_DATE AS date, --source table is paritioned on date
_LATEST_DATE,
CASE WHEN _DATA_DATE = _LATEST_DATE THEN TRUE ELSE FALSE END is_most_recent_record
FROM
`YOURPROJECTID.google_ads.ads_Customer_YOURID`
WHERE
_DATA_DATE = _LATEST_DATE
I have Google Analytics integrated to Bigquery and I'm trying to write a query to fetch Active Users that should match with the number on GA Portal.
Here's the query I've written;
SELECT
date(date) as date,
EXACT_COUNT_DISTINCT(fullVisitorId) as daily_active_users,
FROM TABLE_DATE_RANGE([<project_id>:<dataset>.ga_sessions_],
TIMESTAMP('2018-01-01'),
TIMESTAMP(CURRENT_DATE()))
group by date
order by date desc
The numbers I get in response are somehow related to the ones Google Analytics shows me, but they aren't a 100% accurate.
The numebers I get in return are slightely higher than the ones on the portal and I assume I need to put a where clause to filter a property GA might be filtering on the portal.
Your query looks fine to me. Assuming that you're looking at the same GA view as the one linked to BigQuery, I think that the problem could be sampling.
Even if the GA UI says that "This report is based on 100% of sessions.", try to export it as an Unsampled Report and check the numbers (in my experience, the users metric sometimes doesn't match between unsampled reports and default reports without sampling).
I am using the QuickBooks Api to request transaction data from a QB database. However due to the number of transactions, it takes a long time for the request to come back. Is there a way of requesting a summarise view in the XML, i.e instead of getting data on TxnID level, I can get it to just aggregated the 'amount' by accounts.
Thanks in advance
Is there a way of requesting a summarise view in the XML, i.e instead of getting data on TxnID level, I can get it to just aggregated the 'amount' by accounts.
Using a TransactionQueryRq? No.
If you're trying to get summary data, you should look into the reporting features of the SDK/qbXML instead -- they are likely closer to what you need.
I know that BigQuery offers the first "1 TB of data processed" per month for free but I can't figure out where to look on my dashboard to see my monthly usage. I used to be able to "revert" to the old dashboard which had the info but for the past couple of weeks the "old dashboard" isn't accessible.
From the Google Cloud Console overview page for your project, click on the "details" section on the top-right, next to the charge estimate :
You'll get an estimate of the charges for the current month for each service and item in the service, including Big Query analysis :
If you want to track this usage, you can also export the data into CSV every day by going in the Billing settings and enable the usage export feature. Do not worry about the fact that it only mentions Compute Engine, it actually works for other services also.
You can also access directly the billing history by clicking on the billing account link :
You will get a detailed bill with the usage info :
Post GCP Console Redesign Answer
The GCP console was redesigned and now the other answer here no longer applies, but it is still possible to view your usage by going to IAM & Admin -> Quotas.
What you're looking for is "Big Query API: Query usage per day". It doesn't seem possible to view your usage over 30 days unfortunately, but you can see your current usage (per day) and your peak usage over the past 7 days. You can also set a daily quota. If you're just working infrequently or doing a lot in one day, you can set a quota to 1 TiB and prevent yourself from blowing your whole allocation in one day.
You can try sending feedback about these limitations, like I did, by clicking the question mark at the top right and then send feedback.
Theo is correct that there is no way to view the number of bytes processed or billed since the start of the month (inside of the free tier) in the GCP Billing Console. However, you can extract the bytes processed and bytes billed data from logs in Cloud Logging and calculate the total bytes processed/billed since the start of the month inside of BigQuery.
Here are the steps to count total bytes billed in a month:
Under Cloud Logging, go to Logs Explorer (NOT the Legacy Logs Explorer) and run the following query in the query builder frame:
resource.type="bigquery_project" AND
protoPayload.metadata.jobChange.job.jobStats.queryStats.totalBilledBytes>1 AND
timestamp>="2021-04-01T00:00:00Z"
The timestamp clause is not actually necessary, but it will speed up the query. You can set timestamp >= <value> to any valid timestamp you want as long as it returns at least one result.
In the Query Results frame, click the "Action" button, and select "Create Sink".
In the window that opens, give your sink a name, click "Next", and in the "Select sink service" dropdown menu select "BigQuery dataset".
In the "Select BigQuery dataset" dropdown menu, either select an existing dataset where you would like to create your sink (which is a table containing logs) or if you prefer, choose "Create new BigQuery dataset.
Finally, you will likely want to check the box for Partition Table, since this will help you control costs whenever you query this sink. As of the time of this answer, however, Google limits partition tables to 4000 partitions, so you may find it is necessary to clear out old logs eventually.
Click "Create Sink" (there is no need for any inclusion or exclusion filters).
Run a query in BigQuery that produces bytes billed (i.e. a query that does not return a previously cached result). This is necessary to instantiate the sink. Moments after your query runs, you should now see a table called <your_biquery_dataset>.cloudaudit_googleapis_com_data_access
Enter the following Standard SQL query in the BigQuery query editor:
WITH
bytes_table AS (
SELECT
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.createTime') AS date_time,
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.queryStats.totalBilledBytes') AS billedbytes
FROM
`<your_project><your_bigquery_dataset>.cloudaudit_googleapis_com_data_access`
WHERE
EXTRACT(MONTH
FROM
timestamp) = 4
AND EXTRACT(YEAR
FROM
timestamp) = 2021)
SELECT
(SUM(CAST(billedbytes AS INT64))/1073741824) AS total_GB
FROM
bytes_table;
You will want to chance the month from 4 to whatever month you intend to query, and 2021 to whatever year you intend to query. Also, you may find it helpful to save this query as a view if you intend to rerun it periodically.
Be advised that your sink does not contain your past BigQuery logs, only BigQuery logs produced after you created the sink. Therefore in the first month the number of GB returned by this query will not be an accurate count your bytes billed in month unless you happen to have created the sink prior to running any queries in BigQuery during the current month.
Might be related to How can I monitor incurred BigQuery billings costs (jobs completed) by table/dataset in real-time?
If you are fine by using BigQuery itself to get that information (instead of using a UI), you can use something like this:
DECLARE gb_divisor INT64 DEFAULT 1024*1024*1024;
DECLARE tb_divisor INT64 DEFAULT gb_divisor*1024;
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
DECLARE cost_factor FLOAT64 DEFAULT cost_per_tb_in_dollar / tb_divisor;
SELECT
ROUND(SUM(total_bytes_processed) / gb_divisor,2) as bytes_processed_in_gb,
ROUND(SUM(IF(cache_hit != true, total_bytes_processed, 0)) * cost_factor,4) as cost_in_dollar,
user_email,
FROM (
(SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
UNION ALL
(SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
)
WHERE
DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) and CURRENT_DATE()
GROUP BY
user_email
Open in BigQuery UI
Explanation
Please consider the caveats I mentioned in my answer here