spark sql date_add failing with INTERVAL in sql command

spark sql date_add failing with INTERVAL in sql command - apache-spark-sql

PySpark Sql (version 2) and I'm using a sql command to execute it from a file:
Snippet of the sql:
WHERE TO_DATE(mac.StartDatetime) <= date_add(ec.AdmissionDatetime, INTERVAL 2 HOUR)
Error:
cannot resolve 'date_add(CAST(ec.AdmissionDatetime AS DATE), interval 2 hours)' due to data type mismatch: argument 2 requires int type, however, 'interval 2 hours' is of calendarinterval type.;
What am I missing? All the documentation seems to talk about it being implemented and should work. At least from the api standpoint. I've seen closed tickets for 1.4 that implemented the date_add and time_add.
I've tried time_add but get an unrecognized function error. We keep our sql in separate files and have built a framework to execute the sql for our process.

Related

H2 vs PostgreSQL generated column with function

I'm trying to setup a generated column which will also take null checks into consideration when subtracting values. In PostgreSQL I did:
ALTER TABLE session ADD COLUMN
duration INTERVAL GENERATED ALWAYS AS age(time_ended, time_started) STORED;
H2 doesn't support age function so I another patch to create alias to function:
CREATE ALIAS age FOR "net.agileb.config.H2Functions.age";
and corresponding java code:
package net.agileb.config;
import java.time.Duration;
import java.time.LocalDateTime;
public class H2Functions {
public static Duration age(LocalDateTime endDate, LocalDateTime startDate) {
return Duration.between(endDate, startDate);
}
}
I run H2 in PostgreSQL compatibility mode:
public:
type: com.zaxxer.hikari.HikariDataSource
url: jdbc:h2:mem:agileb;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE;MODE=PostgreSQL;
driverClassName: org.h2.Driver
but h2 still doesn't like the syntax of the generated column:
SQL State : 42001
Error Code : 42001
Message : Syntax error in SQL statement "ALTER TABLE SESSION ADD COLUMN
DURATION INTERVAL GENERATED[*] ALWAYS AS AGE(TIME_ENDED, TIME_STARTED) STORED"; expected "YEAR, MONTH, DAY, HOUR, MINUTE, SECOND"; SQL statement:
ALTER TABLE session ADD COLUMN
duration INTERVAL GENERATED ALWAYS AS age(time_ended, time_started) STORED [42001-200]
Location : db/migration/V1606395529__topic_calculated_duration_column.sql (/home/agilob/Projects/springowy/build/resources/main/db/migration/V1606395529__topic_calculated_duration_column.sql)
Line : 3
Statement : ALTER TABLE session ADD COLUMN
duration INTERVAL GENERATED ALWAYS AS age(time_ended, time_started) STORED
I understand H2 wants me to use specific interval like INTERVAL SECOND, generated as identity and STORED keyword doesn't seem to be supported.
Is there a way to make this query work in PostgreSQL and H2?

There is no way to use the same syntax for generated columns in PostgreSQL and H2.
INTERVAL data type without interval qualifier is a feature of PostgreSQL. Other DBMS, including the H2, support only standard-compliant intervals such as INTERVAL YEAR, INTERVAL YEAR(3) TO MONTH, INTERVAL DAY TO SECOND, etc. Hopefully, you can use standard-compliant interval data types in PostgreSQL too, they are also supported. But all these types a either year-month intervals or daytime intervals. Interval with YEAR and/or MONTH fields can't have DAY, HOUR, MINUTE, or SECOND fields and vice versa. If you really need a mixed interval with all these fields, you can use only the PostgreSQL and its forks.
H2 1.4.200 supports only non-standard syntax for generated columns with AS keyword (upcoming H2 2.0 also supports the standard syntax GENERATED ALWAYS AS). PostgreSQL doesn't support non-standard syntax from H2. You can build H2 from its current sources to have the possibility to use the same standard syntax here.
The biggest problem is that PostgreSQL requires non-standard STORED clause at the end of definition of generated column for a some weird reason and doesn't accept standard-compliant definitions of columns. H2 and others don't have and don't accept this clause.
So the only solution here is to use different SQL for PostgreSQL and for H2.

H2 Database error Unknown data type INTERVAL

I'm working on integration tests for a JPA project. The tests run on an embedded h2 database. However, I'm getting an error from h2 during hibernate schema generation when I use
#Column(columnDefinition = "INTERVAL HOUR TO MINUTE")
The error is org.h2.jdbc.JdbcSQLException: Unknown data type: "INTERVAL";
The h2 documentation indicates that INTERVAL is supported:
http://www.h2database.com/html/datatypes.html#interval_type
I am using h2 version 1.4.197
Stepping away from JPA and working directly in the h2 console, I have tried the following script which also generates the Unknown data type error:
CREATE TABLE test_interval (id INTEGER, test_hours INTERVAL HOUR TO MINUTE);
I have tried other variations of the INTERVAL type, all of which generate the same error
I can not find any discussion of this issue anywhere.

You need to use a more recent version of H2. H2 supports the standard INTERVAL data type since 1.4.198, but 1.4.198 is a beta-quality version, use a more recent one, such as 1.4.199 or 1.4.200.
The online documentation is actual only for the latest release, currently it is 1.4.200. If you use some older version, you have to use the documentation from its distribution.

Hive ODBC driver does not recognise unix_timestamp

Short version:
How can I get the difference in seconds between 2 timestamps, via the ODBC driver?
Long version:
Using ODBC for a simple query (not that I use cast (... as timestamp) to have a standalone line, the actual query runs against a table with timestamp data):
select unix_timestamp(cast('2019-02-01 01:02:03' as timestamp)) as tto
I got the error message:
unix_timestamp is not a valid scalar function or procedure call
I could not find any configuration option that would change this. Native query is disabled (because I am using prepared statements) and other functions work fine. My guess is that unix_timestamp() (without parameter) is deprecated, and the driver is a bit enthusiastic about preventing using the function.
I tried to work around the problem, and I cast the timestamp as bigint instead of using the unix_timestamp function:
select cast(cast('2019-02-01 01:02:03' as timestamp) as bigint)
This works fine! But when I try to get the diff of 2 timestamps:
select cast(cast('2019-02-01 01:02:03' as timestamp) as bigint) - cast(cast('2019-02-01 01:02:03' as timestamp) as bigint)
I got the message
Operand types SQL_WCHAR and SQL_WCHAR are incompatible for the binary
minus operator
(but then only for complex queries, not if the query consists only of this select).
The driver will accept a diff between 2 timestamps, but then I end up with an interval type, which I cannot convert back to seconds.
I would consider that those are bugs in the ODBC driver, but I cannot contact Hortonworks because I am not a paying customer, and I cannot contact Simba either because I am not a paying customer.
On a side note, if I try to use the floor function, I get the message:
‘floor’ is a reserved keyword.
Yes, I know it's reserved and I am actually trying to ise it.
Any idea how I could get around this?

In short, the official Hive ODBC driver is really really really bad if you cannot use native statements (ie. if you need parameterised queries).
My suggested workarounds are to either get a paying one (eg. https://www.progress.com/datadirect-connectors - I tried it and it works very well) or to just use a jdbc one if your application can support it. All ODBC drivers I found for Hive are wrappers around the jdbc one anyway, bundling a jre.

datediff not working in azure ml

I want to find difference between two dates in azure ml using apply sql transformation module. After lot of search I found that DateDiff would be helpful for doing my task. Unfortunately, it's not working. It always displays the datepart as error saying that no column in database. How to resolve it.
SQL query
SELECT datediff(month,Dispatch_Date,Order_Date) as Month_Diff
from t1;
Error :- is not correct: SQL logic error or missing database no such column: month

Use abbreviation for date part instead of directly using month.
SELECT datediff(mm,Dispatch_Date,Order_Date) as Month_Diff
from t1;
Refer SQL Server documentation for more details :- SQL Server DatePart Documentation

Datediff won't work as its not SQL but SQLLite.
You should be using the SQLLite function to get the difference
For example to get the Day difference use
Cast((JulianDay(EndDate) - JulianDay(StartDate)) As Integer)

Odd error with casting to timestamp in standard SQL/Tableau

The latest version of Tableau has started using standard SQL when it connects to Google's BigQuery.
I recently tried to update a large table but found that there appeared to be errors when trying to parse datetimes. The table originates as a CSV which is loaded into BigQuery where further manipulations happen. The datetime column in the original CSV contain strings in ISO standard date time format (basically yyyy-mm-dd hh:mm). This saves a lot of annoying manipulation later.
But on trying to convert the datetime strings in Tableau into dates or datetimes I got a bunch of errors. On investigation they seemed to come from BigQuery and looked like this:
Error: Invalid timestamp: '2015-06-28 02:01'
I thought at first this might be a Tableau issue so I loaded a chunk of the original CSV into Tableau directly where the conversion of the string to a data worked perfectly well.
I then tried simpler versions of the conversion (to a year rather than a full datetime) and they still failed. The generated SQL for the simplest conversion looks like this:
SELECT
EXTRACT(YEAR
FROM
CAST(`Arrival_Date` AS TIMESTAMP)) AS `yr_Arrival_Date_ok`
FROM
`some_dataset`.`some_table` `some_table`
GROUP BY
1
The invalid timestamp in the error message always looks to me like a perfectly valid timestamp. And further analysis suggests it doesn't happen for all the rows in the source table, just occasional ones.
This error did not appear in older versions of Tableau/BigQuery where legacy SQL was the default for Tableau. So i'm presuming it is a consequence of standard SQL.
So is there an intermittent problem with casting to timestamps in BigQuery? Or is this a Tableau problem which causes the SQL to be incorrectly formatted? And what can I do about it?

The seconds part in the canonical timestamp representation required if the hour and minute are also present. Try this instead with PARSE_TIMESTAMP and see if it works:
SELECT
EXTRACT(YEAR
FROM
PARSE_TIMESTAMP('%F %R', `Arrival_Date`)) AS `yr_Arrival_Date_ok`
FROM
`some_dataset`.`some_table`.`some_table`
GROUP BY
1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas