BigQuery - COUNTIF Unixtimestamp meets event_date in YYYYMMDD format? - google-bigquery

I'm working with Google Analytics App plus Web data in BigQuery.
I want to count a user AS new_user WHEN user_first_touch_timestamp value matchs the table's event date value. This would result in a a count of new users who visited the site on a particular day.
Example value in user_first_touch_timestamp
1595912758378962
Example value in event_date
20200809
How can I do this?
Thanks.

Below is for BigQuery Standard SQL
You should parse both values t same DATE type - as below
PARSE_DATE('%Y%m%d', event_date) AS event_date_day
and
DATE(TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_timestamp_day
After this is done - now you can do whatever comparison you need
For example - if you want to use it in WHERE clause - it can look like below
WHERE PARSE_DATE('%Y%m%d', event_date) = DATE(TIMESTAMP_MICROS(user_first_touch_timestamp))

Related

With bigquery variable #DS_END_DATE I am losing 24 hours

So, there is a piece of query from bigquery to google data studio report:
WHERE creation_date IN (NULL, '1970-01-01T00:00:00') OR creation_date BETWEEN PARSE_DATE('%Y%m%d', #DS_START_DATE) AND PARSE_DATE('%Y%m%d', #DS_END_DATE)
And #DS_END_DATE helps users select date range in GDS report. However, when filtering data, I am missing last day. For example, if #DS_END_DATE is '2022-10-20', I can't find rows with creation_date = '2022-10-20T14:03:08' in report. But report contains all data till '2022-10-20'
How can I query bigquery to get all stuff from #DS_END_DATE like it is '2022-10-20T23:59:59' ?
P.S. Yeah, there were some troubles first time, so I am using creation_date IN (NULL, '1970-01-01T00:00:00') to identify deleted IDs, but it doesn't matter.
When BigQuery is doing the comparison it is normalizing the data types and casting your provided date as a timestamp. When it does this conversion it looks like 2022-10-20 00:00:00 UTC. Given that you can see why it is dropping things on 2022-10-20.
To alleviate this you can do something like:
select
creation_date
from sample_data
where cast(creation_date as date) between '2022-10-19' and '2022-10-20'

Results within Bigquery do not remain the same as in GA4

I'm inside BigQuery performing the query below to see how many users I had from August 1st to August 14th, but the number is not matching what GA4 presents me.
with event AS (
SELECT
user_id,
event_name,
PARSE_DATE('%Y%m%d',
event_date) AS event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_timestamp,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY TIMESTAMP_MICROS(event_timestamp) DESC) AS rn,
FROM
`events_*`
WHERE
event_name= 'push_received')
SELECT COUNT ( DISTINCT user_id)
FROM
event
WHERE
event_date >= '2022-08-01'
Resultado do GA4
Result BQ = 37024
There are quite a few reasons why your GA4 data in the web will not match when compared to the BigQuery export and the Data API.
In this case, I believe you are running into the Time Zone issue. event_date is the date that the event was logged in the registered timezone of your Property. However, event_timestamp is a time in UTC that the event was logged by the client.
To resolve this, simply update your query with:
EXTRACT(DATETIME FROM TIMESTAMP_MICROS(`event_timestamp`) at TIME ZONE 'TIMEZONE OF YOUR PROPERTY' )
Your data should then match the WebUI and the GA4 Data API. This post that I co-authored goes into more detail on this and other reasons why your data doesn't match: https://analyticscanvas.com/3-reasons-your-ga4-data-doesnt-match/
You cannot simply compare totals. Divide it into daily comparisons and look at details.

Use DataStudio to specify the date range for a custom query in BigQuery, where the date range influences operators in the query

I currently have a DataStudio dashboard connected to a BigQuery custom query.
That BQ query has a hardcoded date range and the status of one of the columns (New_or_Relicensed) can change dynamically for a row, based on the dates specified in the range. I would like to be able to alter that range from DataStudio.
I have tried:
simply connecting the DS dashboard to the custom query in BQ and then introducing a date range filter, but as you can imagine - that does not work because it's operating on an already hard-coded date range.
reviewing similar answers, but their problem doesn't appear to be quite the same E.g. BigQuery Data Studio Custom Query
Here is the query I have in BQ:
SELECT t0.New_Or_Relicensed, t0.Title_Category FROM (WITH
report_range AS
(
SELECT
TIMESTAMP '2019-06-24 00:00:00' AS start_date,
TIMESTAMP '2019-06-30 00:00:00' AS end_date
)
SELECT
schedules.schedule_entry_id AS Schedule_Entry_ID,
schedules.schedule_entry_starts_at AS Put_Up,
schedules.schedule_entry_ends_at AS Take_Down,
schedule_entries_metadata.contract AS Schedule_Entry_Contract,
schedules.platform_id AS Platform_ID,
platforms.platform_name AS Platform_Name,
titles_metadata.title_id AS Title_ID,
titles_metadata.name AS Title_Name,
titles_metadata.category AS Title_Category,
IF (other_schedules.schedule_entry_id IS NULL, "new", "relicensed") AS New_Or_Relicensed
FROM
report_range, client.schedule_entries AS schedules
JOIN client.schedule_entries_metadata
ON schedule_entries_metadata.schedule_entry_id = schedules.schedule_entry_id
JOIN
client.platforms
ON schedules.platform_id = platforms.platform_id
JOIN
client.titles_metadata
ON schedules.title_id = titles_metadata.title_id
LEFT OUTER JOIN
client.schedule_entries AS other_schedules
ON schedules.platform_id = other_schedules.platform_id
AND other_schedules.schedule_entry_ends_at < report_range.start_date
AND schedules.title_id = other_schedules.title_id
WHERE
((schedules.schedule_entry_starts_at >= report_range.start_date AND
schedules.schedule_entry_starts_at <= report_range.end_date) OR
(schedules.schedule_entry_ends_at >= report_range.start_date AND
schedules.schedule_entry_ends_at <= report_range.end_date))
) AS t0 LIMIT 100;
Essentially - I would like to be able to set the start_date and end_date from google data studio, and have those dates incorporated into the report_range that then influences the operations in the rest of the query (that assign a schedule entry as new or relicensed).
Have you looked at using the Custom Query interface of the BigQuery connector in Data Studio to define start_date and end_date as parameters as part of a filter.
Your query would need a little re-work...
The following example custom query uses the #DS_START_DATE and #DS_END_DATE parameters as part of a filter on the creation date column of a table. The records produced by the query will be limited to the date range selected by the report user, reducing the number of records returned and resulting in a faster query:
Resources:
Introducing BigQuery parameters in Data Studio
https://www.blog.google/products/marketingplatform/analytics/introducing-bigquery-parameters-data-studio/
Running parameterized queries
https://cloud.google.com/bigquery/docs/parameterized-queries
I had a similar issue where I wanted to incorporate a 30 day look back before the start (#ds_start_date). In this case I was using Google Analytics UA session data and using table suffix in my where clause. I was able to calculate a date RELATIVE to the built in data studio "string" dates by using the following:
...
WHERE
_table_suffix BETWEEN
CAST(FORMAT_DATE('%Y%m%d', DATE_SUB (PARSE_DATE('%Y%m%d',#DS_START_DATE), INTERVAL 30 DAY)) AS STRING)
AND
CAST(FORMAT_DATE('%Y%m%d', DATE_SUB (PARSE_DATE('%Y%m%d',#DS_END_DATE), INTERVAL 0 DAY)) AS STRING)

I need to add a dynamic date variable in a BigQuery query via Klipfolio

I've got user data in BigQuery, from a Firebase app. I using Klipfolio to extract date. I want to extract engaged user data from a time range selected by the user. I, therefore, need to add dynamic date variables in my SQL query. Klipfolio supports using dynamic date variables in a query. It's the syntax for introducing a start and end date variable I'm not sure about.
I can already extract the data by date - but such a table does not work for engaged users as the same users will be counted multiple times.
standardSQL
SELECT
event_date,
count (distinct user_pseudo_id) AS engagedUsers
FROM
`dataTable`
WHERE
event_name = 'user_engagement'
GROUP BY
event_date
ORDER BY
event_date
I'm looking for the number of active users between a start and end date variable.
Assuming you have want to filter user data based on a 'event_date' field in your table and by start and end dates coming from date pickers in Klipfolio and you have used variable names "start_date" and "end_date", your SQL query can look like this:
SELECT
event_date,
count (distinct user_pseudo_id) AS engagedUsers
FROM
`dataTable`
WHERE
event_name = 'user_engagement' and
event_date >='{props.start_date}' and
event_date <= '{props.end_date}'
GROUP BY
event_date
ORDER BY
event_date
"props" calls the variable value in Klipfolio defined after the dot and swaps that in before sending the query to the specified service.
Ensure your output format of your date pickers are in yyyy-MM-dd to match the valid format for your SQL query.

Bigquery: Group timestamp by month

I am new in bigquery and I can show timestamp like this.
select event_timestamp as timestamp1
FROM `alive-ios.analytics_160092165.events_201810*`
GROUP BY timestamp1
Output is like this. How can I group those by month? Is it like this?
https://www.pascallandau.com/bigquery-snippets/convert-timestamp-date-datetime-to-different-timezone/
I try with to_char, DATE , etc and it is not okay.
It sounds like you want the TIMESTAMP_TRUNC function, e.g.
select TIMESTAMP_TRUNC(event_timestamp, MONTH) as timestamp1
FROM `alive-ios.analytics_160092165.events_201810*`
GROUP BY timestamp1
Below is for BigQuery Standard SQL
SELECT
FORMAT_TIMESTAMP('%Y-%m', TIMESTAMP_MICROS(event_timestamp)) month,
COUNT(1) events
FROM `project.dataset.table`
GROUP BY month
Note: most likely you want to count events for each month, so I added COUNT(1), but you can add whatever you need - like SUM(amount) for example if you want to calculate some metric named value
Also, your wildcard expression is build in such a way that it will have only events for month of October 2018 (assuming the table name represent time of event) - so you will need to relax a little you wildcard expression to (for example) alive-ios.analytics_160092165.events_2018* so you will have events for months of whole 2018 year
Above assuming your event_timestamp is represented in microseconds
If in reality they are of TIMESTAMP type - just remove use of TIMESTAMP_MICROS() function
Building on Elliott's example, I think you need to convert the value to a timestamp first. From your example data I think you need TIMESTAMP_MICROS
TIMESTAMP_MICROS
select TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), MONTH) as timestamp1
FROM `alive-ios.analytics_160092165.events_201810*`
GROUP BY timestamp1