Google Analytics to Big Query data-What is the SQL code from Custom Dimension with transaction? - sql

How to see the data above in Big Query-The tables are there since an year.
What code should I use to see the above result?
User subscription status is Session based dimension which has made transactions.
I have enabled data in Big Query but how to see the exact the same results in BQ.?

Try code below. Change table name and date interval according to your request.
#standardSQL
SELECT
date,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170731'
GROUP BY date
ORDER BY date ASC
These documents could be useful for you before posting questions:
https://support.google.com/analytics/answer/4419694?hl=tr
https://support.google.com/analytics/answer/3437719?hl=tr

For custom dimensions on session scope write a subquery that runs on the unnested array.
#standardSQL
SELECT
date,
-- select one value from unnested array
(SELECT value FROM UNNEST(customDimensions) WHERE index=4) AS cd4,
SUM(totals.transactions) AS transactions,
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20160802'
GROUP BY
date, cd4
ORDER BY
date ASC
you need to change the condition in the subquery to your custom dimension index

Related

Results within Bigquery do not remain the same as in GA4

I'm inside BigQuery performing the query below to see how many users I had from August 1st to August 14th, but the number is not matching what GA4 presents me.
with event AS (
SELECT
user_id,
event_name,
PARSE_DATE('%Y%m%d',
event_date) AS event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_timestamp,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY TIMESTAMP_MICROS(event_timestamp) DESC) AS rn,
FROM
`events_*`
WHERE
event_name= 'push_received')
SELECT COUNT ( DISTINCT user_id)
FROM
event
WHERE
event_date >= '2022-08-01'
Resultado do GA4
Result BQ = 37024
There are quite a few reasons why your GA4 data in the web will not match when compared to the BigQuery export and the Data API.
In this case, I believe you are running into the Time Zone issue. event_date is the date that the event was logged in the registered timezone of your Property. However, event_timestamp is a time in UTC that the event was logged by the client.
To resolve this, simply update your query with:
EXTRACT(DATETIME FROM TIMESTAMP_MICROS(`event_timestamp`) at TIME ZONE 'TIMEZONE OF YOUR PROPERTY' )
Your data should then match the WebUI and the GA4 Data API. This post that I co-authored goes into more detail on this and other reasons why your data doesn't match: https://analyticscanvas.com/3-reasons-your-ga4-data-doesnt-match/
You cannot simply compare totals. Divide it into daily comparisons and look at details.

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.
Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

Select the date of a UserIDs first/most recent purchase

I am working with Google Analytics data in BigQuery, looking to aggregate the date of last visit and first visit up to UserID level, however my code is currently returning the max visit date for that user, so long as they have purchased within the selected date range, because I am using MAX().
If I remove MAX() I have to GROUP by DATE, which I don't want as this then returns multiple rows per UserID.
Here is my code which returns a series of dates per user - last_visit_date is currently working, as it's the only date that can simply look at the last date of user activity. Any advice on how I can get last_ord_date to select the date on which the order actually occurred?
SELECT
customDimension.value AS UserID,
# Last order date
IF(COUNT(DISTINCT hits.transaction.transactionId) > 0,
(MAX(DATE)),
"unknown") AS last_ord_date,
# first visit date
IF(SUM(totals.newvisits) IS NOT NULL,
(MAX(DATE)),
"unknown") AS first_visit_date,
# last visit date
MAX(DATE) AS last_visit_date,
# first order date
IF(COUNT(DISTINCT hits.transaction.transactionId) > 0,
(MIN(DATE)),
"unknown") AS first_ord_date
FROM
`XXX.XXX.ga_sessions_20*` AS t
CROSS JOIN
UNNEST (hits) AS hits
CROSS JOIN
UNNEST(t.customdimensions) AS customDimension
CROSS JOIN
UNNEST(hits.product) AS hits_product
WHERE
parse_DATE('%y%m%d',
_table_suffix) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 day)
AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 day)
AND customDimension.index = 2
AND customDimension.value NOT LIKE "true"
AND customDimension.value NOT LIKE "false"
AND customDimension.value NOT LIKE "undefined"
AND customDimension.value IS NOT NULL
GROUP BY
UserID
the most efficient and clear way to do this (and also most portable) is to have a simple table/view that has two columns: userid, last_purchase and another that has other two cols userid, first_visit.
then you inner join it with the original raw table on userid and hit timestamp to get, say, the session IDs you're interested in. 3 steps but simple, readable and easy to maintain
It's very easy to hit too much complexity for a query that relies on first or last purchase/action (just look at the unnest operations you have there) that is becomes unusable and you'll spend way too much time trying to figure out the meaning of the output.
Also keep in mind that using the wildcard in the query has a limit of 1000 tables, so your last and first visits are in a rolling window of 1000 days.

BigQuery Standard SQL using UNNEST duplicates the data

I am using Bigquery standard sql to count google analytics data, but when i apply unnest to break repeated record field in table, other columns like hit count data become duplicate and display more value than the actual
SELECT
date,
trafficSource.source as source,
trafficSource.medium as medium,
SUM(totals.hits) AS total_hit,
MAX(hits.transaction.transactionid) as transaction
FROM
`test.test.session_streaming_*`,unnest(hits) hits
WHERE
_table_suffix BETWEEN '20180401'
AND '20180501'
GROUP BY
date,
trafficSource.source,
trafficSource.medium
Could anyone help me to tell how can we remove duplicate data in this query
It looks like you want to compute the max transaction ID within hits for each row, and then take the max across all rows. This should work:
SELECT
date,
trafficSource.source as source,
trafficSource.medium as medium,
SUM(totals.hits) AS total_hit,
MAX((SELECT MAX(transaction.transactionid) FROM UNNEST(hits))) as transaction
FROM
`test.test.session_streaming_*`
WHERE
_table_suffix BETWEEN '20180401'
AND '20180501'
GROUP BY
date,
trafficSource.source,
trafficSource.medium

Bigquery full date (year-month-day) to (year-month)

a noob question.
I want to query my database looking for pageviews for a given page, and i wrote a query that returns the page / number of pageviews daily. How i should change my query to get the same statistics but not daily but mothly?
So instead:
page pv date
/mysite 10 2017-01-01
get
page pv date
/mysite 500 2017-01
my query:
select
date,
hits.page.pagePath as pagePath,
count(totals.pageviews) as pageViews
from Table_DATE_RANGE ([818251235.ga_sessions_] , Timestamp('2016-01-01'), Timestamp('2017-11-01'))
group by 1,2
It's not clear what you are trying to count in your original query, but here is a query that uses standard SQL and performs the grouping on a monthly basis:
#standardSQL
SELECT
DATE_TRUNC(PARSE_DATE('%Y%m%d', date), MONTH) AS month,
hit.page.pagePath,
‎COUNT(*)
FROM `818251235.ga_sessions_*`,
UNNEST (hits) AS hit
WHERE _TABLE_SUFFIX BETWEEN
'20160101' AND '20181101'
GROUP BY 1, 2;
Edit: fixed to use DATE_TRUNC instead of EXTRACT(MONTH FROM ...) since both the year and month are relevant.
you can use date functions like UTC_USEC_TO_MONTH, UTC_USEC_TO_WEEK, UTC_USEC_TO_DAY to normalize them to the first day of the month, first day of the week.
select
date(UTC_USEC_TO_MONTH(date)) as monthly,
.....