BigQuery StandardSQL: Last 7 Days using _TABLE_SUFFIX - google-bigquery

Question: I want to pull data from multiple Google Analytics sessions tables using _TABLE_SUFFIX, but I want to set the suffix parameters to between "seven days ago" and "one day ago" (i.e. pulling data for the last 7 days)
The current syntax (that doesn't work):
#StandardSQL
SELECT
date,
SUM (totals.visits) AS visits
FROM
`projectname.123456789.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN
'DATE_ADD(CURRENT_TIMESTAMP(), INTERVAL -7 DAY)' AND
'DATE_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)'
GROUP BY
date
ORDER BY
date ASC
Which gives me the message "Valid: This query will process 0 B when run." To my eyes, there is no error in the syntax, but BigQuery is unable ot read my date functions and thus unable to suffix them to the ga_sessions_* wildcard
Inspiration:
BigQuery Cookbook has an example for legacySQL that I have been basing this on: (https://support.google.com/analytics/answer/4419694?hl=en#7days)
#LegacySQL
SELECT
date,
SUM (totals.visits) AS visits
FROM
(TABLE_DATE_RANGE([73156703.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY'),
DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC
Things I've tried: (that doesn't work)
Using DATE_SUB instead of DATE_ADD and using CURRENT_DATE instead of CURRENT_TIMESTAMP:
WHERE
_TABLE_SUFFIX BETWEEN
'DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)' AND
'DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)'
Resulting in "Valid: This query will process 0 B when run."
Using DATE_FORMAT around DATE_SUB and CURRENT_DATE in order to get the dates without dashes:
WHERE
_TABLE_SUFFIX BETWEEN
'FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))' AND
'FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))'
Resulting in "Valid: This query will process 0 B when run."
Tried skippingt he hyphens '' around the DATE_SUB clause
WHERE
_TABLE_SUFFIX BETWEEN
DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND
DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
Resulting in the following error message "Error: No matching signature for operator BETWEEN for argument types: STRING, DATE, DATE. Supported signature: (ANY) BETWEEN (ANY) AND (ANY)"
Thanks in advance,

Elliott's answer is correct, but if you want to get the most performance out of BigQuery for such kind of query, instead of converting _TABLESUFFIX to DATE, you should convert CURRENT_DATE expressions to strings:
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

You were almost there with your last attempt. You need to convert your string to a date in order to use it in the comparison:
WHERE
PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) BETWEEN
DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND
DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)

This works for anyone just looking to segment out the last week of data in big query. Works for any data set as long as you have a timestamp!
where TIMESTAMPFIELD >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)

If you also want to include today's data, use the intraday table:
Google Analytics: (docs)
SELECT *
FROM `myproject.xxxxxxx.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND
FORMAT_DATE("intraday_%Y%m%d", CURRENT_DATE())
Google Analytics for Firebase: (docs)
SELECT *
FROM `myproject.analytics_xxxxxxx.events_*`
WHERE _TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND
FORMAT_DATE("intraday_%Y%m%d", CURRENT_DATE())

Related

Presto TIMESTAMP get data from 2 days ago without inputting year month date?

My goal is to have the query grab data from 2 days ago. I don't want to have to keep inputting the date like this:
WHERE usage_start_date
BETWEEN TIMESTAMP '2020-09-09 00:00:00.000' and TIMESTAMP '2020-09-09
23:59:59.999'
but instead something like:
usage_start_date = current_date - interval '2' day
the above works for my Athena Presto SQL query, but for some reason will not give all the data that ran in those 24 hours, instead giving about half the day. Is there a way to do a statement like this one to ensure it gives ALL data in that day?
WHERE current_date - interval '2' day AND
BETWEEN TIMESTAMP '00:00:00.000' and TIMESTAMP '23:59:59.999'
without inputting the year, month, day? It seems like TIMESTAMP needs the y/m/d but what about doing a LIKE so it picks up the hour, minute, second but no need to put the y/m/d?
To get a timestamp for the start of the day that was two days ago you can do
DATE_TRUNC('day', NOW() - INTERVAL '2' DAY)
e.g.
WHERE usage_start_date >= DATE_TRUNC('day', NOW() - INTERVAL '2' DAY)
AND usage_start_date < DATE_TRUNC('day', NOW() - INTERVAL '1' DAY)
You can use below query to achieve the task by fetching the hour and date from the usage_start_date
select * from table where hour(usage_start_date) between 0 and 23 and current_date - interval '2' day = date(usage_start_date)
I would suggest:
WHERE usage_start_date >= CURRENT_DATE - INTERVAL '2' DAY AND
usage_start_date < CURRENT_DATE - INTERVAL '1' DAY

Remove the Last 31 Days SQL Query in BigQuery not working due to date format

I have in a BigQuery Table the date in the folowing format:
Date
2020-07-15
I´m trying to use this query to remove the last 31 days:
SELECT
DISTINCT*
FROM
`dataset.Raw_.Data`
WHERE
DATE(Date) <= DATE_SUB(CURRENT_DATE(), INTERVAL 31 DAY)
Unfortunally is not woking, and I believe the reason is the format of the date in the Bigquery table. This is the error I get:
No matching signature for function DATE for argument types: DATE.
Supported signatures: DATE(TIMESTAMP, [STRING]); DATE(DATETIME);
DATE(INT64, INT64, INT64) at [6:2]
=> Is it any way to modify the sql-query to remove the last 31 days without modifying the tables?
If you are storing dates as strings, you should be able to just cast():
WHERE CAST(Date as date) <= DATE_SUB(CURRENT_DATE(), INTERVAL 31 DAY)
If you have a date already, then no need to cast():
WHERE Date <= DATE_SUB(CURRENT_DATE(), INTERVAL 31 DAY)
As per error message - your Date field is already of DATE data type so you can just use below
SELECT
DISTINCT *
FROM
`dataset.Raw_.Data`
WHERE
Date <= DATE_SUB(CURRENT_DATE(), INTERVAL 31 DAY)

Between Operator Big Query Standard SQL

Using Standard SQL in BQ - as part of a task I want to search for records created between 2pm the previous day & 2pm on current day
I have found
SELECT DATETIME_SUB(DATETIME_TRUNC(CURRENT_DATETIME(), DAY), INTERVAL 10 hour) Gives me 2PM yesterday
SELECT DATETIME_ADD(DATETIME_TRUNC(CURRENT_DATETIME(), DAY), INTERVAL 14 hour)
Gives me 2pm today
So, i assumed i could use this in my query
Select * from
TableA
where CreatedDate Between
DATETIME_SUB(DATETIME_TRUNC(CURRENT_DATETIME(), DAY), INTERVAL 10 hour) and DATETIME_ADD(DATETIME_TRUNC(CURRENT_DATETIME(), DAY), INTERVAL 14 hour)
However I get the following
No matching signature for operator BETWEEN for argument types:
TIMESTAMP, DATETIME, DATETIME. Supported signature: (ANY) BETWEEN
(ANY) AND (ANY)
Where am i going wrong?
Your issue is that CreatedDate is TIMESTAMP and you need to convert into a DATETIME
It could be like:
where DATETIME(CreatedDate) Between ...
But you could easily write your own statements for TIMESTAMP
SELECT timestamp_sub(timestamp_trunc(current_timestamp() ,
DAY),interval 10 hour)

How to run a query for every date for last 3 month

I have a table(pkg_date) in redshift. I want to fetch some data for every date for the last 3 months.
Here is my query
select * from pkg_data where scan_date < current_date;
How can I use current_date as a variable in the query itself and run this query for every date from April 1.
I have set a cron job which will run in every hour. In every hour it should run with different current_date
SELECT *
FROM pkg_data
WHERE scan_date > CURRENT_DATE - INTERVAL '3 months'
Be careful — Redshift works in UTC, so the CURRENT_DATE might suffer from timezone effects and be +/- what you expect sometimes.
SELECT
CURRENT_DATE,
(CURRENT_DATE - INTERVAL '3 months')::date
Returns:
2018-06-21 2018-03-21
Also be careful with strange lengths of months!
SELECT DATE '2018-05-31' - INTERVAL '3 months'
returns:
2018-02-28 00:00:00
Notice that it gave the last day of the month (31st vs 28th).
By the way, you can use DATE '2018-05-31' or '2018-05-31'::DATE, and also INTERVAL '3 months' or '3 months'::INTERVAL to convert types.
Use dateadd() for getting date 3 moth old day and GETDATE() for get current date.
ie code will look like.
select * from pkg_data where scan_date < dateadd(month,-3,GETDATE());
for cron refer How to execute scheduled SQL script on Amazon Redshift?

BigQuery SQL WHERE Date Between Current Date and -15 Days

I am trying to code the following condition in the WHERE clause of SQL in BigQuery, but I am having difficulty with the syntax, specifically date math:
WHERE date_column between current_date() and current_date() - 15 days
This seems easy in MySQL, but I can't get it to work with BigQuery SQL.
Use DATE_SUB
select *
from TableA
where Date_Column between DATE_SUB(current_date(), INTERVAL 15 DAY) and current_date()
Remember, between needs the oldest date first
You should probably switch the two around - the syntax should be the following:
WHERE date_column BETWEEN DATE_ADD(CURRENT_DATE(), -15, 'DAY') AND CURRENT_DATE()
This works for me.
WHERE DATE(date_column) BETWEEN DATE(DATE_ADD(CURRENT_DATE(), -15, 'DAY'))
AND CURRENT_DATE()