I'm trying to write query to return the time that an event got clicked, and the time that a page got clicked. Wondering if you guys can help take a look and see if they make sense!
Below is what I have for event got clicked:
select DISTINCT
fullVisitorId||'.'||visitStartTime||'.'||visitNumber AS session_id,
eventInfo.eventCategory,
MIN(DATETIME_ADD(DATETIME(TIMESTAMP_SECONDS(visitStartTime)), INTERVAL hits.time MILLISECOND)) AS Event_time
from `xx.xx.ga_sessions_*`,
UNNEST(hits) AS hits
where
hits.type = "EVENT"
and date= '20220801' AND totals.visits = 1
group by 1,2
This is what I have for page clicked time
SELECT
fullVisitorId||'.'||visitStartTime||'.'||visitNumber AS session_id,
page.pagePath,
MIN(DATETIME_ADD(DATETIME(TIMESTAMP_SECONDS(visitStartTime)), INTERVAL hits.time MILLISECOND)) AS PageVisit_time
from `xx.xx.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE hits.type = "PAGE"
and date= '20220801' AND totals.visits = 1
group by 1,2
Related
In this task, the idea is to assign the sales credit (transactions and revenues) equally to the events that were clicked on during the user's session. The output table would look like this, except that the revenue and transaction are split if the user had two events, three events, etc.
Below are three scenarios -> "three scenarios" on how transactions and revenues should be shared between events. Does anyone have an idea how to customize the code?
I include a code that assigns sales, but without dividing the credit into Revenue and Transactions, and this code would need to be modified.
three scenarios
output table
Grateful in advance for any help
with event_home_page as (select q.* except(isEntrance), if (isEntrance = true, 'true', 'false') isEntrance
from (
select
PARSE_DATE('%Y%m%d', CAST(date AS STRING)) as true_date,
hits.isEntrance,
hits.eventInfo.eventCategory,
hits.eventInfo.eventAction,
hits.eventInfo.eventLabel,
concat(fullvisitorid, cast(visitstarttime as string)) ID,
count(*) click
FROM `ga360.123456.ga_sessions_*`, unnest (hits) as hits
WHERE
_table_suffix = FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
and hits.page.pagePath in ('www.example.com/')
and regexp_contains(hits.eventInfo.eventCategory, '^clickable_element.*')
group by 1,2,3,4,5,6) q
),
transactions as (
select
PARSE_DATE('%Y%m%d', CAST(date AS STRING)) as true_date,
concat(fullvisitorid, cast(visitstarttime as string)) ID,
sum(totals.totalTransactionRevenue/1000000) as all_revenue,
sum(totals.transactions) all_transactions
FROM `ga360.123456.ga_sessions_*`
WHERE
_table_suffix = FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
group by 1,2
)
select hp.true_date, hp.isEntrance, hp.eventCategory, hp.eventAction, hp.eventLabel, hp.click, t.all_revenue revenue, t.all_transactions transactions
from event_home_page hp left join transactions t on hp.true_date=t.true_date and hp.id=t.id
order by revenue desc
I'm trying to get a table with 3 columns:
date
total events for a specific event action
total sessions for the whole site
This is what I have at the moment:
SELECT
CAST(format_date('%Y-%m-%d', parse_date("%Y%m%d", date)) AS DATE) as DATE,
COUNT(CASE WHEN hits.eventinfo.eventaction = 'Add to Cart' THEN hits.eventinfo.eventaction END) AS AddtoCart,
SUM(totals.visits) AS visits
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) as hits
WHERE
_table_suffix BETWEEN '20160801' AND '20160815'
GROUP BY date
ORDER BY date desc
The problem is the unnesting of hits seems to multiply my number of sessions, this is what I get without the unnesting:
SELECT
CAST(format_date('%Y-%m-%d', parse_date("%Y%m%d", date)) AS DATE) as DATE,
SUM(totals.visits) AS visits
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_table_suffix BETWEEN '20160801' AND '20160815'
GROUP BY date
ORDER BY date desc
How can I have the 2 in the same table?
Thanks for your help!
You can try creating a session ID and then count the distinct number of session IDs.
SELECT
CAST(format_date('%Y-%m-%d', parse_date("%Y%m%d", date)) AS DATE) as DATE,
COUNT(CASE WHEN hits.eventinfo.eventaction = 'Add to Cart' THEN hits.eventinfo.eventaction END) AS AddtoCart,
COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS string),CAST(visitId AS string))) AS visits
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) as hits
WHERE
_table_suffix BETWEEN '20160801' AND '20160815'
GROUP BY date
ORDER BY date desc
I have an event table (user_id, timestamp). I need to write a query to define a user session (every user can have more than one session and every session can have >= 1 event). 30 minutes of inactivity for the user is a completed session.
The output table should have the following format: (user_id, start_session, end_sesson). I wrote part of query, but what to do next i have no idea.
select
t.user_id,
t.ts start_session,
t.next_ts
from ( select
user_id,
ts,
DATEDIFF(SECOND, lag(ts, 1) OVER(partition by user_id order by ts), ts) next_ts
from
events_tabl ) t
You want a cumulative sum to identify the sessions and then aggregation:
select user_id, session_id, min(ts), max(ts)
from (select e.*,
sum(case when prev_ts > dateadd(minute, -30, ts)
then 0 else 1
end) over (partition by user_id order by ts) as session_id
from (select e.*,
lag(ts) over (partition by user_id order by ts), ts) as prev_ts
from events_tabl e
) e
) e
group by user_id, session_id;
Note that I changed the date/time logic from using datediff() to a direct comparison of the times. datediff() counts the number of "boundaries" between two times. So, there is 1 hour between 12:59 a.m. and 1:01 a.m. -- but zero hours between 1:01 a.m. and 1:59 a.m.
Although handling the diffs at the second level produces similar results, you can run into occasions where you are working with seconds or milliseconds -- but the time spans are too long to fit into an integer. Overflow errors. It is just easier to work directly with the date/time values.
Taking what has been described on https://webmasters.stackexchange.com/a/87523
As well as my own understanding, I've come up with what I think would be considered "Returning Users"
1.First a query to show users who had their first "latest visit" within a two year time period:
SELECT
parsedDate,
CASE
# return fullVisitorId when the first latest visit is between 2 years and today
WHEN parsedDate BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR) AND CURRENT_DATE() THEN fullVisitorId
END fullVisitorId
FROM (
SELECT
# convert the date field from string to date and get the latest date
PARSE_DATE('%Y%m%d',
MAX(date)) parsedDate,
fullVisitorId
FROM
`project.dataset.ga_sessions_*`
WHERE
# only show fullVisitorId if first visit
totals.newVisits = 1
GROUP BY
fullVisitorId)
2.Then a separate query to select some fields within a specific date range:
SELECT
PARSE_DATE('%Y%m%d',
date) parsedDate,
fullVisitorId,
visitId,
totals.newVisits,
totals.visits,
totals.bounces,
device.deviceCategory
FROM
`project.dataset.ga_sessions_*`
WHERE
_TABLE_SUFFIX = "20180118"
3.Joining these two queries together to find "Returning Users"
SELECT
q1.parsedDate date,
COUNT(DISTINCT q1.fullVisitorId) users,
# Default way to determine New Users
SUM(q1.newVisits) newVisits,
# Number of "New Users" based on my queries (matches with default way above)
COUNT(DISTINCT IF(q2.parsedDate < q1.parsedDate, NULL, q2.fullVisitorId)) newUsers,
# Number of "Returning Users" based on my queries
COUNT(DISTINCT IF(q2.parsedDate < q1.parsedDate, q2.fullVisitorId, NULL)) returningUsers
FROM (
(SELECT
PARSE_DATE('%Y%m%d',
date) parsedDate,
fullVisitorId,
visitId,
totals.newVisits,
totals.visits,
totals.bounces,
device.deviceCategory
FROM
`project.dataset.ga_sessions_*`
WHERE
_TABLE_SUFFIX = "20180118") q1
LEFT JOIN (
SELECT
parsedDate,
CASE
# return fullVisitorId when the first latest visit is between 2 years and today
WHEN parsedDate BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR) AND CURRENT_DATE() THEN fullVisitorId
END fullVisitorId
FROM (
SELECT
# convert the date field from string to date and get the latest date
PARSE_DATE('%Y%m%d',
MAX(date)) parsedDate,
fullVisitorId
FROM
`project.dataset.ga_sessions_*`
WHERE
# only show fullVisitorId if first visit
totals.newVisits = 1
GROUP BY
fullVisitorId)) q2
ON q1.fullVisitorId = q2.fullVisitorId)
GROUP BY
date
Results in BQ
Un-sampled new/returning visitors split by Users report for the same period in GA
Questions/Issues:
Given that newVisits (default field) and newUsers (my calculation) is giving the same results which is inline with the GA report New Visitor Users. Why is there mismatch of GAs Returning Visitor Users and my calculation of returningUsers in BQ? can these two even be compared, what am I missing?
Is my approach the most efficient and less verbose way of going about this?
Is there a better way to get the figures, something I'm missing?
SOLUTION
Based on Martin's answer below, I managed to create the "Returning Users" metric/field within the context of the query I was running:
SELECT
date,
deviceCategory,
# newUsers - SUM result if it's a new user
SUM(IF(userType="New Visitor", 1, 0)) newUsers,
# returningUsers - COUNT DISTINCT fullvisitorId if it's a returning user
COUNT(DISTINCT IF(userType="Returning Visitor", fullvisitorid, NULL)) returningUsers,
COUNT(DISTINCT fullvisitorid) users,
SUM(visits) sessions
FROM (
SELECT
date,
fullVisitorId,
visitId,
totals.visits,
device.deviceCategory,
IF(totals.newVisits IS NOT NULL, "New Visitor", "Returning Visitor") userType
FROM
`project.dataset.ga_sessions_20180118` )
GROUP BY
deviceCategory,
date
Google Analytics uses approximations for users (fullvisitorid) - even if it says "based on 100%". You get better user numbers when using an unsampled report.
Another thing to mention: fullvisitorids are taken into consideration even if totals.visits != 1, while sessions are only counted where totals.visits = 1
Also users are double-counted if they where new and then returned. Meaning, this should give you correct numbers:
SELECT
totals.newVisits IS NOT NULL AS isNew,
COUNT(DISTINCT fullvisitorid) AS visitors,
SUM(totals.visits) AS sessions
FROM
`project.dataset.ga_sessions_20180214`
GROUP BY
1
If you want to avoid double counting you can use this, where a user is counted as new even if she returned:
WITH
visitors AS (
SELECT
fullvisitorid,
-- check if any visit of this visitor was new - will be used for grouping later
MAX(totals.newVisits ) isNew,
SUM(totals.visits) as sessions
FROM
`project.dataset.ga_sessions_20180214`
GROUP BY 1
)
SELECT
isNew IS NOT NULL AS isNew,
COUNT(1) AS visitors,
sum(sessions) as sessions
FROM
visitors
GROUP BY 1
Of course these numbers match with GA only in totals.
I have a query that collects data from a dynamic date range (last 7 days) from one dataset in BigQuery - my data source is Google Analytics, so I have other datasets connected with identical schema. I'd like my query to also return data from other datasets, usually I would use a UNION ALL for this, but my query contains a complex categorization query which needs to be updated regularly and I'd rather not do this multiple times for each set.
Could you advise on how to query across datasets, or suggest a more elegant way to handle the UNION ALL approach?
SELECT
Date,
COUNT(DISTINCT VisitId) AS users,
COUNT(VisitId) AS sessions,
SUM(totals.transactions) AS orders,
CASE
# Organic Search - Google
WHEN ( channelGrouping LIKE "Organic Search"
OR trafficSource.source LIKE "com.google.android.googlequicksearchbox")
AND trafficSource.source LIKE "%google%" THEN "Organic Search - Google"
ELSE "Other"
END AS Channel,
hits.page.hostname AS site
FROM
`xxx.dataset1.ga_sessions_20*`
CROSS JOIN
UNNEST (hits) AS hits
WHERE
parse_DATE('%y%m%d',
_table_suffix) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 day)
AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 day)
AND totals.visits = 1
AND hits.isEntrance IS TRUE
GROUP BY
Date,
Channel,
hits.isEntrance
ORDER BY
Users DESC
UPDATE: I have got as far as follows thanks to the responses below, the following queries all datasets in the UNION but the date range is not applying, instead all data is being queried, any ideas why it's not picking it up?
SELECT
Date,
LOWER(hits.page.hostname) AS site,
IFNULL(COUNT(VisitId),0) AS sessions,
IFNULL(SUM(totals.transactions),0) AS orders,
IFNULL(ROUND(SUM(totals.transactions)/COUNT(VisitId),4),0) AS conv_rate,
# Channel definition starts here
CASE
# Organic Search - Google
WHEN ( channelGrouping LIKE "Organic Search"
OR trafficSource.source LIKE "com.google.android.googlequicksearchbox")
AND trafficSource.source LIKE "%google%" THEN "Organic Search - Google"
ELSE "Other"
END AS Channel
FROM (
SELECT * FROM `xxx.43786551.ga_sessions_20*` UNION ALL
SELECT * FROM `xxx.43786097.ga_sessions_20*` UNION ALL
SELECT * FROM `xxx.43786092.ga_sessions_20*`
WHERE PARSE_DATE('%Y%m%d',_TABLE_SUFFIX) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
)
CROSS JOIN UNNEST (hits) AS hits
WHERE totals.visits = 1
AND hits.isEntrance IS TRUE
GROUP BY
Date,
channel,
hits.isEntrance,
site
HAVING hits.isEntrance IS TRUE
#standardSQL
SELECT
DATE,
COUNT(DISTINCT VisitId) AS users,
COUNT(VisitId) AS sessions,
SUM(totals.transactions) AS orders,
CASE
# Organic Search - Google
WHEN ( channelGrouping LIKE "Organic Search"
OR trafficSource.source LIKE "com.google.android.googlequicksearchbox")
AND trafficSource.source LIKE "%google%" THEN "Organic Search - Google"
ELSE "Other"
END AS Channel,
hits.page.hostname AS site
FROM (
SELECT * FROM `xxx.dataset1.ga_sessions_20*` WHERE PARSE_DATE('%y%m%d',_TABLE_SUFFIX) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL SELECT * FROM `xxx.dataset2.ga_sessions_20*` WHERE PARSE_DATE('%y%m%d',_TABLE_SUFFIX) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL SELECT * FROM `xxx.dataset3.ga_sessions_20*` WHERE PARSE_DATE('%y%m%d',_TABLE_SUFFIX) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
)
CROSS JOIN UNNEST (hits) AS hits
WHERE totals.visits = 1
AND hits.isEntrance IS TRUE
GROUP BY
DATE,
Channel,
site
ORDER BY
Users DESC