Apache Drill query on Parquet file not working for timestamp field - apache-spark-sql

I am trying to query Parquet files which are generated from PySpark job. The data in Timestamp field is coming as hex string. I tried to use CAST function, but it did not work.
Is there any setting which is required to fix this issue.
Appreciate your help.
Thanks

I got it resolved by adding the following setting to spark session. This would make it compatible with other platforms.
.config("spark.sql.parquet.outputTimestampType","INT96")

If the timestamp is coming back as a hex string, you might try Drill's CONVERT_FROM() function. 1
Basically, this can convert a hex string into something else.

Related

AsterixDB unable to import datetime when importing from CSV file (SQL)

I am attempting to load a database from a CSV file using AsterixDB. Currently, it works using only string, int, and double fields. However, I have a column in the CSV file that is in DateTime format. Currently I am importing them as strings, which works fine, but I would like to import them as the SQL DateTime data type. When I try changing my schema and reimporting I get the following error:
ERROR: Code: 1 "org.apache.hyracks.algebricks.common.exceptions.NotImplementedException: No value parser factory for fields of type datetime"
All entries are in this format 02/20/2010 12:00:00 AM.
I know this isn't exactly inline with the format specified by the Asterix Data Model, however, I tried a test line with the proper format and the error persisted.
Does this mean AsterixDB cant parse DateTime when doing mass imports? And if so how can I get around this issue?
Any help is much appreciated.
Alright, after discussing with some colleagues, we believe that AsterixDB does not currently support DateTime parsing when mass importing. Our solution was to upsert every entry in the dataset with the parsing built into the query.
We used the following query:
upsert into csv_set (
SELECT parse_datetime(c.Date_Rptd, "M/D/Y h:m:s a") as Datetime_Rptd,
parse_datetime(c.Date_OCC, "M/D/Y h:m:s a") as Datetime_OCC,
c.*
FROM csv_set c
);
As you can see we parse the strings using the parse_datetime function from the AsterixDB Temporal Functions library. This query intentionally doesn't erase the column with the DateTimes in string format, although that would be very simple to do if your application requires it. If anyone has a better or more elegant solution please feel free to add to this thread!

converting a DateTime column to string in ADF

I am trying to build a fully parametrised pipeline template in ADF. With the work I have done so far, I can do a full load without any issues but when it comes to delta load, it seems like my queries are not working. I believe the reason for this is that my "where" statement looks somewhat like this:
SELECT #{item().source_columns} FROM #{item().source_schema}.#{item().source_table}
WHERE #{item().source_watermarkcolumn} > #{item().max_watermarkcolumn_loaded} AND #{item().source_watermarkcolumn} <= #{activity('Watermarkvalue').output.firstRow.max_watermarkcolumn_loaded}
where the 'max_watermarkcolumn_loaded' is a datetime format and the 'activity' output is obviously a string format.
Please correct me if my assumption is wrong and let me know what I can do to fix.
EDIT:
screenshot of the error
ADF is picking a date from SQL column 'max_watermarkcolumn_loaded' in this format '"2021-09-29T06:11:16.333Z"' and I think thats where the problem is.
I tried to repro this error. I gave the parameter without single quotes to a sample Query.
Wrap the date parameters with single quotes.
Corrected Query
SELECT #{item().source_columns} FROM
#{item().source_schema}.#{item().source_table}
WHERE #{item().source_watermarkcolumn} >
'#{item().max_watermarkcolumn_loaded}' AND
#{item().source_watermarkcolumn} <=
'#{activity('Watermarkvalue').output.firstRow.max_watermarkcolumn_loaded}'
With this query, pipeline is run successfully.

PostgreSQL: How do I sum after using a text function and cast? (ERROR SQL state: 22P02)

I am probably missing something obvious and asking a silly thing, but I am unable to do a simple sum.
My data was imported with the character '€' character so I had to import the data as text:
original data sample:
"€31.51"
"€0.10"
"€24.23"
I tried to use a string function to remove the €. I was then hoping to convert to numeric and sum.
SELECT sum(coalesce(CAST(split_part(revenue_eur, '€', 2) as NUMERIC),'0'))
FROM revenue_test;
the only piece that runs is
SELECT coalesce(split_part(revenue_eur, '€', 2),'0')
FROM revenue_test;
What I need is just a sum. Could someone please kindly help me figure it out?
I tried doing a subquery but failed in misery..
Maybe there is a way to import the data without the € and into numeric?
Thank you!!!!!!
EDIT: I imported via CSV to pgAdmin4 and using postgres 12 (The file counts to 85k rows in sql)
To import the data I tried COPY with the pgAdmin4 query tool but I got error 'permission denied'. I checked all the permissions to my file but I clearly was missing something, the most likely solution I found was that I needed to connect to postgres via the terminal on my mac and use \COPY. But I didn't manage to do that.
So I ended up using the right click feature 'import' via pgAdmin.
EDIT 2: I found the problem, when importing a ',' was inserted to one value (mark thousands) so I am unable to cast without taking it away.
I found regular expression help on removing a character from a specific order but the , appears randomly.
Code works!
WITH test AS
(SELECT translate(revenue_eur, '€,', '')::float as eur
FROM revenue_test)
SELECT sum(coalesce(eur, '0'))
from test;

Oracle convert blob to string

I would like to convert a blob of an oracle db to an readable string.
I have tried some functions, but none of them worked for me.
In the end I tried to convert the string via sql statement like:
SELECT CONVERT(CAST(blob as BINARY) USING utf8) as blob FROM tablewithblob
Can anyone tell me, what I am doing wrong? The error of the sqldeveloper is "missing right paranthesis. Thanks in advance!
The CONVERT(value USING charset) function is a mysql function, not Oracle
https://www.w3schools.com/sql/func_mysql_convert.asp
Take a look at this instead
https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions027.htm
But it looks like DBMS_LOB is a better way to do what you're doing in Oracle. Go check out How do I get textual contents from BLOB in Oracle SQL

BigQuery timestamp field in Data Studio error

I have data in a BigQuery instance with a some date fields in epoch/timestamp format. I'm trying to convert to a YYYYMMDD format or similar in order to create a report in Data Studio. I have tried the following solutions so far:
Change the format in the Edit Connection menu when creating the Data Source in Data Studio to Date format. Not working. I get Configuration errors when I add the field to the Data Studio report.
Create a new field using the TODATE() function. I always get an invalid formula error (even when I follow the documentation for this function). I have tried to change the field type prior to use the TODATE() function. Not working in any case.
Am I doing something wrong? Why do I always get errors?
Thanks!
The function for TODATE() is actually CURRENT_DATE(). Change timestamp to DATE using EXTRACT(DATE from variableName)
make sure not use Legacy SQL !
The issue stayed, but changing the name of the variable from actual_delivery_date to ADelDate made it work. So I presume there's a bug and short(er) names may help to avoid it
As commented by Elliott Brossard, the solution would be instead of using Data Studio for the conversion,use PARSE_DATE or PARSE_TIMESTAMP in BigQuery and convert it there instead.