BigQuery query doesn't work inside a view - google-bigquery

When I select from a view I've created I receive the following error from time to time, but not always:
query: Timestamp evaluation: connection error. (error code: invalidQuery)
Job ID vex-custom:bquijob_59705b02_155966ddc5f
Start Time Jun 28, 2016, 11:53:50 AM
End Time Jun 28, 2016, 11:53:50 AM
Running the query by itself works perfectly well.
There's two 2 special things about this query:
It uses TABLE_DATE_RANGE
It references tables from another project then where the view resides. But we've done this a lot of times without issues
Can someone from Google perhaps check the job id?

I checked the internal details for your query. The view that your failed query references makes a few problematic calls to TIMESTAMP functions. Here's one example:
SELECT * FROM TABLE_DATE_RANGE([...], TIMESTAMP(DATE_ADD(UTC_USEC_TO_DAY(CURRENT_DATE()), -15, "day")), CURRENT_TIMESTAMP())
Specifically, the call to TIMESTAMP(DATE_ADD(UTC_USEC_TO_DAY(CURRENT_DATE()), -15, "day")) is erroring because:
UTC_USEC_TO_DAY returns an INTEGER, not a TIMESTAMP.
DATE_ADD expects an argument of type TIMESTAMP.
You can wrap the call to UTC_USEC_TO_DAY with USEC_TO_TIMESTAMP to convert the argument to type TIMESTAMP, like so:
TIMESTAMP(DATE_ADD(USEC_TO_TIMESTAMP(UTC_USEC_TO_DAY(CURRENT_DATE())), -15, "day"))
We're in the process of rolling out a release that more closely checks the expected input types of many timestamp functions, which is why you are currently seeing inconsistent behavior. These fixes prevent issues where some functions can return malformatted TIMESTAMPs, and also brings our behavior more in line with our documentation on timestamp functions.
Separately, we need to work on making sure errors that occur within the evaluation of timestamps for TABLE_DATE_RANGE return more useful errors than "connection error".

Related

SQL Server : best practice query for date manipulation

Long time listener, first time caller.
At work we have all of the date columns for most tables stored in as a simple "string" (varchar) formats. Such as yyyymmdd (eg. 20220625) or yyyymm (202206) etc.
Now for a lot of queries that are time based we need to compare to current date, or some fixed offset from current date.
Now two obvious versions that I know of to get current utc date into either of those formats are the following (for yyyymm as example):
SELECT LEFT(CONVERT(VARCHAR, GETUTCDATE(), 112), 6) ...
SELECT CONVERT(VARCHAR(6), GETUTCDATE(), 112) ...
I'm wondering if anyone knows of a better way, either both idiomatically or performance wise to convert those, and/or is there anything wrong with the second one to be worried about versus the first one in regards to either security/reliability etc? The second one definitely satisfies my code golf sensibilities, but not if it's at the expense of something I'm unaware of.
Also for some extra context the majority of our code runs in SQL Server or T-SQL, BUT we also need to attempt to be as platform agnostic as possible as there are customers on Oracle and/or Mysql.
Any insight/help would be highly appreciated.
There is no problem with either approach. Both work just fine. It is a matter of personal preference which to choose. The first looks more explicit, the second is shorter and thus easier to read maybe. As to performance: You want to get the current day or month only once in a query, so the call doesn't realy affect query runtime.
As to getting this platform agnostic is quite a different story. SQL dialects differ. Especially when it comes to date/time handling. You already notice that SQL Server's date functions are quite restricted. In Oracle and MySQL you would simple state the format you want (TO_CHAR(SYSDATE, 'YYYYMM') in Oracle and DATE_FORMAT(CURRENT_DATE, '%Y%m') in MySQL). But you also see that the function calls differ.
Now, you could write a user defined function GET_CURRENT_MONTH_FORMATTED for this which would return the string for the current month, e.g. '202206'. Then you'd have the different codes hidden in that function and the SQL queries would all look the same. The problem, though, is how to tell the DBMS that the function result is deterministic for a particular timestamp? If you run the query on December 31, 2022 at 23:50 and it runs until January 1, 2023 at 0:20, you want the DBMS to call this function only once for the query resulting in '202212' and not being called again, suddenly resulting in another string '202301'. I don't even know whether this is possible. I guess it is not.
I think you cannot write a query that does what you want and looks the same in all mentioned DBMS.

Where in my query to place the CONVERT to convert DateTime to Date

Just learning SQL and I've searched many options about converting a DateTime into a Date, and I do not want current date. It's a super simple query from this website: https://sqlzoo.net/wiki/Guest_House_Assessment_Easy
SELECT booking_date, nights
FROM booking
WHERE guest_id=1183
But the output is with the timestamp and I just want the date. I've searched so many forums and tried all their suggestions, including this:
SELECT CONVERT(varchar(10), <col>, 101)
So I've done:
SELECT CONVERT(varchar(10), booking_date,101), nights
FROM booking
WHERE guest_id=1183
But I'm getting syntax errors. This is probably so simple and you'll all think me an idiot, but I'd greatly appreciate help. It's driving me nuts.
When I fiddled about at your sqlzoo link I got the error
execute command denied to user 'scott'#'localhost' for routine 'gisq.to_date'`.
When I googled gisq.to_date I got this link https://sqlzoo.net/wiki/Format_a_date_and_time
Which has examples of how this dialect represents dates. See if you can work it out. Something like this:
SELECT date_format(booking_date,'%d/%m/%Y')
FROM booking
You didn't post the error in your question which is a big mistake. When you get an error message, you actually have something to work from.
It is also very important to note that the query above returns a string, not a date. It's only good for display, not for date arithmetic
TBH that seems like a terrible site to learn on as it gives no clues about the dialect. it looks like Oracle but to_date and trunc don't work.
The use of convert() suggests that you think you are uinsg SQL Server. If you only want the date component of a date/time data type, then you can use:
SELECT CONVERT(DATE, booking_date), nights
FROM booking
WHERE guest_id = 1183;
The syntax error suggests that you are not using SQL Server.
CONVERT() is bespoke syntax for SQL Server. Examples of similar functionality in other databases are:
DATE(booking_date)
TRUNC(booking_date)
DATE_TRUNC('day', booking_date)
In addition, what you see also depends on the user-interface.
In your case, the data is being stored as a date with no time component, but the UI is showing the time. For that, you want to convert to a string. That site uses MariaDB -- which is really a flavor of MySQL-- and you would use:
DATE_FORMAT(booking_date, '%Y-%m-%d')

Airflow - Bigquery operator not working as intended

I'm new with Airflow and I'm currently stuck on an issue with the Bigquery operator.
I'm trying to execute a simple query on a table from a given dataset and copy the result on a new table in the same dataset. I'm using the bigquery operator to do so, since according to the doc the 'destination_dataset_table' parameter is supposed to do exactly what I'm looking for (source:https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/bigquery_operator/index.html#airflow.contrib.operators.bigquery_operator.BigQueryOperator).
But instead of copying the data, all I get is a new empty table with the schema of the one I'm querying from.
Here's my code
default_args = {
'owner':'me',
'depends_on_past':False,
'start_date':datetime(2019,1,1),
'end_date':datetime(2019,1,3),
'retries':10,
'retry_delay':timedelta(minutes=1),
}
dag = DAG(
dag_id='my_dag',
default_args=default_args,
schedule_interval=timedelta(days=1)
)
copyData = BigQueryOperator(
task_id='copyData',
dag=dag,
sql=
"SELECT some_columns,x,y,z FROM dataset_d.table_t WHERE some_columns=some_value",
destination_dataset_table='dataset_d.table_u',
bigquery_conn_id='something',
)
I don't get any warnings or errors, the code is running and the tasks are marked as success. It does create the table I wanted, with the columns I specified, but totally empty.
Any idea what I'm doing wrong?
EDIT: I tried the same code on a much smaller table (from 10Gb to a few Kb), performing a query with a much smaller result (from 500Mb to a few Kb), and it did work this time. Do you think the size of the table/the query result matters? Is it limited? Or does performing a too large query cause a lag of some sort?
EDIT2: After a few more tests I can confirm that this issue is not related to the size of the query or the table. It seems to have something to do with the Date format. In my code the WHERE condition is actually checking if a date_column = 'YYYY-MM-DD'. When I replace this condition with an int or string comparison it works perfectly. Do you guys know if Bigquery uses a particular date format or requires a particular syntax?
EDIT3: Finally getting somewhere: When I cast my date_column as a date (CAST(date_column AS DATE)) to force its type to DATE, I get an error that says that my field is actually an int-32 (Argument type mismatch). But I'm SURE that this field is a date, so that implies that either Bigquery stores it as an int while displaying it as a date, or that the Bigquery operator does some kind of hidden type conversion while loading tables. Any idea on how to fix this?
I had a similar issue when transferring data from other data sources than big-query.
I suggest casting the date_column as follows: to_char(date_column, 'YYYY-MM-DD') as date
In general, I have seen that big-query auto detection schema is often problematic. The safest way is to always specify schema before executing its corresponding query, or use operators that support schema definition.

Getting Conversion failed when converting date and/or time from character string just trying to do a COUNT(*)

This is a quickie:
I'm confused. This is my SQL:
select COUNT(*) as rowCnt
from fy2015View
It's VERY SIMPLE and yet, DOESN'T WORK!
This is the ERROR I get:
Msg 241, Level 16, State 1, Line 1
Conversion failed when converting date and/or time from character string.
NO MATTER WHAT I DO, it will not work.
But, if I do that same call on another table, I get the number of rows back.
Now, this fy2015View is just that: A VIEW.
Does that have anything to do with it?
Is there something I needed to do during the creation of the view or something I "shouldn't" have done?
I can't have a long discussion to keep in compliance with Stacks rules... so please a quick answer would be best.
As I mentioned in my comment above, the issue lies in the view's definition.
Without knowing how that view is defined, we can't give you a simple answer, and the reason why COUNT(*) fails, is because the number of records of a view that runs into execution errors is undefined.
I created a sample sqlfiddle to show why this would occur.

DATEDIFF command won't work as 'day' is not a recognised column

I'm relatively new to SQL and have been attempting to run a script wherein I can bring up the number of days that have passed between two points in time. I understand how this should look based on your website, but for some reason when I input the values, my database is returning the following error:
ProgrammingError: ERROR: column "day" does not exist
The code I'm using is:
select datediff(day, '2014-01-01', '2014-02-01')
I assume I'm missing something very simple (this is a hugely basic query I'm sure), but would be appreciative of any assistance. I've variously tried pointing it towards the specific table I want to draw from, but it keeps stumbling on this error.
If you are doing this in postgresql then use
select DATE_PART('day', '2014-01-01'::timestamp - '2014-02-01'::timestamp)