How to select google analytics segment in google big query? SQL - sql

I create Google Analytics data source in Tableau.
The data source has the segment by "new user".
Now, I would like to push the Google Analytics in Google Bigquery and create the same data source in Tableau by creating a data source from Google Bigquery.
After checking the GA data source in Google Bigquery project.
There is no segment in Bigquery.
How to query by segment "new user" in Google Bigquery??

You can look at BigQuery GA Schema to see all fields that are exported there.
The field totals.newVisits has what you are looking for:
select
hits.transaction.transactionid tid,
date,
totals.pageviews pageviews,
hits.item.itemquantity item_qtd,
hits.transaction.transactionrevenue / 1e6 rvn,
totals.bounces bounces,
fullvisitorid fv,
visitid v,
totals.timeonsite tos,
totals.newVisits new_visit
FROM
`project_id.dataset_id.ga_sessions*`,
unnest(hits) hits
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-10')
AND TIMESTAMP('2017-05-10')
group by
tid, date, pageviews, item_qtd, rvn, bounces, fv, v, tos, new_visit
Notice that this field is defined in the session level.

Related

Matching BigQuery data with Traffic Acquisition GA4 report

I'm new to BigQuery and I'm trying to replicate the Traffic Acquisition GA4 report, but not very successfully at the moment, as my results are not even remotely close to the GA4 view.
I understand that the source/medium/campaign fields are event-based and not session-based in GA4 / BQ. My question is, why not every event has a source/medium/campaign as an event_parameter_key? It seems logical for me to have these parameters for the 'session_start' event, but unfortunately, it's not the case
I tried the following options to replicate the Traffic Acquisition report:
2.1 To check the first medium for sessions:
with cte as ( select
PARSE_DATE("%Y%m%d", event_date) AS Date,
user_pseudo_id,
concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
FIRST_VALUE((select value.string_value from unnest(event_params) where key = 'medium')) OVER (PARTITION BY concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) ORDER BY event_timestamp) as first_medium
FROM `project`)
select Date, first_medium, count(distinct user_pseudo_id) as Users, count (distinct session_id) as Sessions
from cte
group by 1,2;
The query returns 44k users with 'null' medium and 1.8k organic users while there are 17k users with the 'none' medium and 8k organic users in GA4.
2.2 If I change the first medium to the last medium:
FIRST_VALUE((select value.string_value from unnest(event_params) where key = 'medium')) OVER (PARTITION BY concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) ORDER BY event_timestamp desc) as last_medium
Organic medium increases to 9k users, though the results are still not matching the GA4 data.
2.3 I've also tried this code - https://www.ga4bigquery.com/traffic-source-dimensions-metrics-ga4/ - source / medium (based on session), and still got completely different results compared to the GA4.
Any help would be much appreciated!
I have noticed the samething, looking deeper I pulled 1 days worth of data from big query into google sheets and examined it.
Unsurprisingly I could replicate the results from ga4bigquery codes you have mentioned above results but they did not align with GA4 and although close for high traffic pages could be wildly out for the lower ones.
I then did a count for 'email' in event parmas source & ea_tracking_id as well as traffic_source and found they are all lower than the GA4 analytics.
I went to my dev site where I know exactly how many sessions have a source of email GA4 analytics agreed but big query did not, Google seems to be allocating a some traffic to not set randomly.
I have concluded the problem is not in the SQL and not in the tagging but in the bigquery GA4 data source. I have logged a query with google and we will see what happens. Sorry its not a solution

Get List hits.eventInfo.eventAction in Google Analytics BigQuery

I want to get list hits.eventInfo.eventAction in Google Analytics data via BigQuery using this code:
SELECT DISTINCT hits.eventInfo.eventAction FROM `ga_sessions_*`
But the error like this:
Cannot access field eventInfo on a value with type ARRAY<STRUCT<hitNumber INT64, time INT64
I try to add UNNEST(hits) but also error. Any suggestion?
Can you try the query below, I used UNNEST:
SELECT DISTINCT hits.eventInfo.eventAction FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`, UNNEST(hits) as hits
Output:

BigQuery imports from GoogleAds show all CPM related Fields with value 0

Hello wonderful person!
I've followed this guide to import google ads campaign info to a BigQuery database.
My goal is to create a simple query that can be stored as a view and accessed from Data Studio to make a report. But some fields like AverageCpm are always set to 0.
I also have a data studio report made using google ads as source for reference and I can access all the campaigns from the google ads platform.
Here is the query I'm working on:
SELECT
c.ExternalCustomerId,
c.CampaignName as name,
c.CampaignStatus,
cs.date as dia,
SUM(cs.Impressions) AS Impressions,
SUM(cs.Interactions) AS Interactions,
AVG(cs.AverageCpm) AS CPM,
SUM(cs.Cost) AS Cost
FROM
`<DB>.google_ads.Campaign_<ACCOUNT_ID>` c
LEFT JOIN
`<DB>.google_ads.CampaignStats_<ACCOUNT_ID>` cs
ON
(c.CampaignId = cs.CampaignId
AND cs._DATA_DATE BETWEEN
DATE_ADD(CURRENT_DATE(), INTERVAL -80 DAY) AND DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY))
WHERE
c._DATA_DATE = c._LATEST_DATE
and c.CampaignName = 'THE_NAME_OF_MY_CAMPAIGN'
GROUP BY
1, 2, 3 , 4
ORDER BY
CampaignName, dia
The field Impressions, returns with a value that is consistent with my reference datastudio report and the info I see in google ads stats, so I feel I'm in the right track.
My problem is that some fields like CampaignStats.AverageCpm , CampaignStats.Cost are always 0.
For example, the query:
Select * from `<DB>.google_ads.p_CampaignStats_<ACCOUNT_ID>` where AverageCpm >0;
Returns with no results.
I'm thinking permission problems? But I have administrator access to all the company's accounts.
Database is backfilled correctly.
I've tried generating a new dataset: Same problem and I don't see if there is a way to configure how google makes the imports.
What else could it be? What else can I do?
Thank you very very much!
Answer by Roman Petrochenkov, check his youtube channel he is the best.
AVG(cs.AverageCpm) AS CPM, Is not correct
Since average of averages is not average of the total.
You need to calculate CPM manually as SUM(Impressions)/SUM(NULLIF(Cost, 0)) as CPM
Although, I would recommend against calculating it in the BQ and would recommend to calculate int in the BI (data studio in this case).

BigQuery - Transactions in internal promo report

like in the question here (Replicate Internal Promotion report in BigQuery with transactions) I want to rebuild the internal promo report from Google Analytics.
I was able to get PromoViews and PromoClicks, but I don't get the transactions...
My query looks like this
SELECT
clientid,
fullvisitorid,
visitstarttime,
concat(fullvisitorid, cast(visitstarttime AS string)) AS sessionid,
hp.promoid,
hp.promoname,
hp.promocreative
hp.promoposition,
promotionActionInfo.promoIsView as promoview,
promotionActionInfo.promoIsClick as promoclick
FROM [MYDATA], UNNEST (hits) as h, UNNEST (h.promotion) as hp
When I sum promoview and promoclick I get the exact same results like in Google Analytics
The official Google Documentation says:
>How transactions are attributed
>The Internal Promotion report attributes transactions to either an internal-promotion click or >internal-promotion view.
>
>Each hit in an ecommerce session can have:
>
>0 or 1 internal-promotion clicks
>0 or more internal-promotion views
>Internal-promotion click attribution
>If a hit includes a single internal-promotion click, then that internal-promotion is credited >for the transaction.
>
>If a session includes multiple internal-promotion clicks, then the last-clicked internal->promotion is credited for the transaction.
>
>If a hit includes zero internal-promotions clicks but one of that user’s previous hits does >include an internal-promotion click, then the internal promotion from the previous click is >credited for the transaction.
>
>Internal-promotion view attribution
>If none of the conditions above is true but a hit includes one or more internal-promotion >views, then the transaction is credited to all promotional views within the session.
https://support.google.com/analytics/answer/6014872?hl=en
Keeping this in mind, my approach was to do a join with a separate table where I query the enhanced ecommerce data using sessionid as the join key
SELECT
clientid,
fullvisitorid,
visitstarttime,
concat(fullvisitorid, cast(visitstarttime AS string)) AS sessionid,
hp.v2ProductName AS ProductName,
h.transaction.transactionId AS TransactionId,
hp.productQuantity as Quantity,
FROM [MYDATA], UNNEST (hits) AS h, UNNEST (h.product) as hp
WHERE
h.eCommerceAction.action_type = "6" AND
(hp.isImpression IS NULL) AND
(
(h.promotionActionInfo.promoIsView is true) OR
(h.promotionActionInfo.promoIsClick is true)
)
But it seems that my WHERE clause (filtering the promoviews and clicks) is not working like I expect, because I receive an empty table as a result.
Can anybody help me with this?

Google Analytics query: landing page and page paths

I am a newbie with SQL and BigQuery so I had hoped you could help me with a standard SQL query I am working on.
The data set is from a Google Analytics roll-up property.
The objective with this query is to have for each session: the date, the GA property, the number of transactions, the total revenue, some custom dimensions that are session-scoped and 2 other elements I can't seem to grasp.
1) I would like to add the landing page for each session. There are some resources on the internet for that but so far I didn't succeed. Do you have any idea ?
2) I also would like to add funnel steps based on page paths, like have a column "Step 1" where it indicates 1 or 0 depending if the session contains a page view on a certain page path or not. Do you have any idea on how to do that ?
This is my current query (sorry if it's not well formatted):
SELECT
date,
visitId,
hits.sourcePropertyInfo.sourcePropertyDisplayName AS service,
totals.transactions AS transactions,
totals.totalTransactionRevenue AS revenue,
ARRAY(
SELECT STRUCT(
MAX(IF(cd.index=3, cd.value, NULL)) AS endUserProvider,
MAX(IF(cd.index=2, cd.value, NULL)) AS connection,
MAX(IF(cd.index=10, cd.value, NULL)) AS sid,
MAX(IF(cd.index=11, cd.value, NULL)) AS price,
MAX(IF(cd.index=12, cd.value, NULL)) AS period,
MAX(IF(cd.index=13, cd.value, NULL)) AS serviceId,
MAX(IF(cd.index=14, cd.value, NULL)) AS promotion)
FROM UNNEST(hits.customDimensions) cd) result
FROM `wide-oasis-135923.126764585.ga_sessions_*`,
UNNEST(hits) hits
WHERE _TABLE_SUFFIX = '20171026'
LIMIT 100;
Thank you very much