getting error in bigquery while creating a model - google-bigquery

I'm getting this error after running a model creation query:
All time series failed to fit, likely because they are all invalid. Please run auto-arima on each time series to find out the root cause based on the returned error message.
I have used the following code:
CREATE OR REPLACE MODEL gs_analytics_dataset.arima_model
OPTIONS(
MODEL_TYPE='ARIMA_PLUS',
TIME_SERIES_TIMESTAMP_COL='date',
TIME_SERIES_DATA_COL='total_amount',
TIME_SERIES_ID_COL='store_id',
HOLIDAY_REGION='US'
) AS
SELECT
store_id,
date,
total_amount
FROM
gs_analytics_dataset.revenue

Not used this type of model before, but the error seems to point to an issue with the timeseries (which you have give a date to - the option name implies that it wants a timestamp).
Try changing your statement to provide a timestamp
CREATE OR REPLACE MODEL gs_analytics_dataset.arima_model
OPTIONS(
MODEL_TYPE='ARIMA_PLUS',
TIME_SERIES_TIMESTAMP_COL='ts',
TIME_SERIES_DATA_COL='total_amount',
TIME_SERIES_ID_COL='store_id',
HOLIDAY_REGION='US'
) AS
SELECT
store_id,
CAST(date AS TIMESTAMP) ts,
total_amount
FROM
gs_analytics_dataset.revenue

Related

I want to create a new column and use cast function for typecasting on it in SQL

I want to create a new column that has data in the DateTime format but want only the time data from it, I tried the following function in BigQuery
SELECT *,
(start-end) as ride_length
cast(ride_length as time)
FROM xyz
It won't run it shows the error ride_length is an unknown name. please someone provide a solution with an explanation. Thank you.

Extract date from timestamp containing time zone in Big Query

I have data containing dates of the form
2020-12-14T18:58:10+01:00[Europe/Stockholm]
but I really only need the date 2020-12-14.
So, I tried:
DATE(Timestamp) as LastUpdateDate
which returned Error: Invalid time zone: +02:00[Europe/Stockholm]
So, thinking that the problem came from the time zone, I tried this instead:
TIMESTAMP(FORMAT_TIMESTAMP("%Y-%m-%d", PARSE_TIMESTAMP("%Y%m%d", Timestamp)))
which magically returned a new error, namely
Error: Failed to parse input string "2021-10-04T09:24:20+02:00[Europe/Stockholm]"
How do I solve this?
Just substring the date part from the string. Try one of these:
select left(Timestamp, 10)
select date(left(Timestamp, 10))
You should clean your data first.
select date("2020-12-14T18:58:10+01:00") as LastUpdateDate
This will work as expected.
Any chance of cleaning your data before using it in a query? Actually I think that +01:00[Europe/Stockholm] is not supported as format.

BigQuery #run_date used as different types

I have a scheduled query using the #run_date parameter in BigQuery.
SELECT
#run_date AS run_date,
timestamp,
event
FROM
`ops-data.usage.full_user_dataset`
WHERE
DATE(timestamp) < #run_date
timestamp is of type TIMESTAMP
I am unable to schedule it - the schedule option is greyed out in the new UI and unavailable in the classic UI (it says it requires valid SQL). If I try and run the query then I receive error message Undeclared parameter 'run_date' is used assuming different types (DATE vs INT64) at [2:3]
After trying various things I was able to schedule the query below. The idea was to force BigQuery to treat #run_date as a date without changing it
SELECT
DATE_SUB(#run_date, INTERVAL 0 DAY) AS run_date,
timestamp,
event
FROM
`ops-data.usage.full_user_dataset`
WHERE
DATE(timestamp) < #run_date
Why does this error occur and why does the fix work?
I think it is a bug around #run_date, below workaround should work for you until it is fixed.
DECLARE run_date DATE DEFAULT #run_date;
SELECT
run_date,
timestamp,
event
FROM
`ops-data.usage.full_user_dataset`
WHERE
DATE(timestamp) < run_date
BTW, since the workaround utilizes Scripting and not being able to set a destination table, if you do need a destination table, it has to be written as:
CREATE OR REPLACE TABLE <yourDestinationTable>
AS SELECT ... -- your query

Azure stream analytics query - how to set even timestamp based on a javascript udf function?

I am timestamping data stream input events by a property "TS" in the message. However before I timestamp it using TS, I want to ensure that TS is ISO8601 compliant. If TS is not ISO8601 ocmpliant, I want to use EventEnqueuedUtcTime which is the arrival time of the message as timestamp.
My query looks something like this
SELECT
T.*
FROM
input TIMESTAMP BY PARTITION BY PartitionId TIMESTAMP BY udf.getEventTimestamp(T)
Here udf.getEventTimestamp(T) returns the TS property in message(T) if it is ISO8601-compliant otherwise it will return EventEnqueuedUtcTime( arrival time of message in Iot Hub).
Running this script locally gives me the exception -
Error : Unexpected hosted function call
I also tried to use CASE construct to accomplish this
SELECT
T.*
FROM
input TIMESTAMP BY PARTITION BY PartitionId TIMESTAMP BY
CASE
WHEN udf.isValid(T.TS) THEN T.TS
ELSE T.EventEnqueuedUtcTime
END
where udf.isValid(T.TS) returns true if the property TS is a valid ISO8601 compliant timestamp.
Again running this locally returns - Error : Unexpected hosted function call
As per Microsoft Azure docs, After you add a JavaScript user-defined function to a job, you can use the function anywhere in the query, like a built-in scalar function
Does this mean that we cannot use udfs in TIMESTAMP BY and CASE constructs?
Can you suggest any workaround?
At this time we can't use UDF within the TIMESTAMP BY clause.
However we case use TRY_CAST to solve your requirement.
Here's the query with the workaround:
SELECT
T.*
FROM
input PARTITION BY PartitionId TIMESTAMP BY
CASE
WHEN TRY_CAST(T.TS AS DateTime) is not null THEN T.TS
ELSE T.EventEnqueuedUtcTime
END
Let me know if you have any further question.
Thanks,
JS

Cannot use calculated offset in BigQuery's DATE_ADD function

I'm trying to create a custom query in Tableau to use on Google's BigQuery. The goal is to have an offset parameter in Tableau that changes the offsets used in a date based WHERE clause.
In Tableau it would look like this:
SELECT
DATE_ADD(UTC_USEC_TO_MONTH(CURRENT_DATE()),<Parameters.Offset>-1,"MONTH") as month_index,
COUNT(DISTINCT user_id, 1000000) as distinct_count
FROM
[Orders]
WHERE
order_date >= DATE_ADD(UTC_USEC_TO_MONTH(CURRENT_DATE()),<Parameters.Offset>-12,"MONTH")
AND
order_date < DATE_ADD(UTC_USEC_TO_MONTH(CURRENT_DATE()),<Parameters.Offset>-1,"MONTH")
However, BigQuery always returns an error:
Error: DATE_ADD 2nd argument must have INT32 type.
When I try the same query in the BigQuery editor using simple arithmetic it fails with the same error.
SELECT
DATE_ADD(UTC_USEC_TO_MONTH(CURRENT_DATE()),5-3,"MONTH") as month_index,
FROM [Orders]
Any workaround for this? My only option so far is to make multiple offsets in Tableau, it seems.
Thanks for the help!
I acknowledge that this is a hole in functionality of DATE_ADD. It can be fixed, but it will take some time until fix is rolled into production.
Here is a possible workaround. It seems to work if the first argument to DATE_ADD is a string. Then you can truncate the result to a month boundary and convert it from a timestamp to a string.
SELECT
FORMAT_UTC_USEC(UTC_USEC_TO_MONTH(DATE_ADD(CURRENT_DATE(),5-3,"MONTH"))) as month_index;