AsterixDB unable to import datetime when importing from CSV file (SQL) - sql

I am attempting to load a database from a CSV file using AsterixDB. Currently, it works using only string, int, and double fields. However, I have a column in the CSV file that is in DateTime format. Currently I am importing them as strings, which works fine, but I would like to import them as the SQL DateTime data type. When I try changing my schema and reimporting I get the following error:
ERROR: Code: 1 "org.apache.hyracks.algebricks.common.exceptions.NotImplementedException: No value parser factory for fields of type datetime"
All entries are in this format 02/20/2010 12:00:00 AM.
I know this isn't exactly inline with the format specified by the Asterix Data Model, however, I tried a test line with the proper format and the error persisted.
Does this mean AsterixDB cant parse DateTime when doing mass imports? And if so how can I get around this issue?
Any help is much appreciated.

Alright, after discussing with some colleagues, we believe that AsterixDB does not currently support DateTime parsing when mass importing. Our solution was to upsert every entry in the dataset with the parsing built into the query.
We used the following query:
upsert into csv_set (
SELECT parse_datetime(c.Date_Rptd, "M/D/Y h:m:s a") as Datetime_Rptd,
parse_datetime(c.Date_OCC, "M/D/Y h:m:s a") as Datetime_OCC,
c.*
FROM csv_set c
);
As you can see we parse the strings using the parse_datetime function from the AsterixDB Temporal Functions library. This query intentionally doesn't erase the column with the DateTimes in string format, although that would be very simple to do if your application requires it. If anyone has a better or more elegant solution please feel free to add to this thread!

Related

Apache Drill query on Parquet file not working for timestamp field

I am trying to query Parquet files which are generated from PySpark job. The data in Timestamp field is coming as hex string. I tried to use CAST function, but it did not work.
Is there any setting which is required to fix this issue.
Appreciate your help.
Thanks
I got it resolved by adding the following setting to spark session. This would make it compatible with other platforms.
.config("spark.sql.parquet.outputTimestampType","INT96")
If the timestamp is coming back as a hex string, you might try Drill's CONVERT_FROM() function. 1
Basically, this can convert a hex string into something else.

String To DateTime Error in Azure Data Factory

I'm trying to import data from a CSV to Azure SQL and there seems to be an issue with ADF importing a datetime column. I'm using ADF V2 and all the online help seems to show fixes for ADF V1.
The date column is in dd/mm/yyyy hh:mm:ss format in the source CSV and the destination Azure SQL is datetime format, so should work perfectly, but it doesn't.
ADF seems to collect all data in the CSV as a string and then throw out an error saying cant convert string to datetime.
ErrorCode=TypeConversionFailure,Exception occurred when converting value '04-Apr-22 00:00:00' for column name 'DateTime' from type 'String' (precision:, scale:) to type 'DateTime' (precision:23, scale:3). Additional info: String was not recognized as a valid DateTime.
I have tried using the type version setting in the Mapping, but that doesn't work. I've tried every datetime format I can think of.
Any help solving this would be much appreciated
Have you tried it without setting a DateTime format in the mapping settings?

BigQuery timestamp field in Data Studio error

I have data in a BigQuery instance with a some date fields in epoch/timestamp format. I'm trying to convert to a YYYYMMDD format or similar in order to create a report in Data Studio. I have tried the following solutions so far:
Change the format in the Edit Connection menu when creating the Data Source in Data Studio to Date format. Not working. I get Configuration errors when I add the field to the Data Studio report.
Create a new field using the TODATE() function. I always get an invalid formula error (even when I follow the documentation for this function). I have tried to change the field type prior to use the TODATE() function. Not working in any case.
Am I doing something wrong? Why do I always get errors?
Thanks!
The function for TODATE() is actually CURRENT_DATE(). Change timestamp to DATE using EXTRACT(DATE from variableName)
make sure not use Legacy SQL !
The issue stayed, but changing the name of the variable from actual_delivery_date to ADelDate made it work. So I presume there's a bug and short(er) names may help to avoid it
As commented by Elliott Brossard, the solution would be instead of using Data Studio for the conversion,use PARSE_DATE or PARSE_TIMESTAMP in BigQuery and convert it there instead.

Odd error with casting to timestamp in standard SQL/Tableau

The latest version of Tableau has started using standard SQL when it connects to Google's BigQuery.
I recently tried to update a large table but found that there appeared to be errors when trying to parse datetimes. The table originates as a CSV which is loaded into BigQuery where further manipulations happen. The datetime column in the original CSV contain strings in ISO standard date time format (basically yyyy-mm-dd hh:mm). This saves a lot of annoying manipulation later.
But on trying to convert the datetime strings in Tableau into dates or datetimes I got a bunch of errors. On investigation they seemed to come from BigQuery and looked like this:
Error: Invalid timestamp: '2015-06-28 02:01'
I thought at first this might be a Tableau issue so I loaded a chunk of the original CSV into Tableau directly where the conversion of the string to a data worked perfectly well.
I then tried simpler versions of the conversion (to a year rather than a full datetime) and they still failed. The generated SQL for the simplest conversion looks like this:
SELECT
EXTRACT(YEAR
FROM
CAST(`Arrival_Date` AS TIMESTAMP)) AS `yr_Arrival_Date_ok`
FROM
`some_dataset`.`some_table` `some_table`
GROUP BY
1
The invalid timestamp in the error message always looks to me like a perfectly valid timestamp. And further analysis suggests it doesn't happen for all the rows in the source table, just occasional ones.
This error did not appear in older versions of Tableau/BigQuery where legacy SQL was the default for Tableau. So i'm presuming it is a consequence of standard SQL.
So is there an intermittent problem with casting to timestamps in BigQuery? Or is this a Tableau problem which causes the SQL to be incorrectly formatted? And what can I do about it?
The seconds part in the canonical timestamp representation required if the hour and minute are also present. Try this instead with PARSE_TIMESTAMP and see if it works:
SELECT
EXTRACT(YEAR
FROM
PARSE_TIMESTAMP('%F %R', `Arrival_Date`)) AS `yr_Arrival_Date_ok`
FROM
`some_dataset`.`some_table`.`some_table`
GROUP BY
1

Convert String to smalldatetime object using SSIS

I am trying to use SSIS to import a CSV to a Database. I am having issues with a Column that is a smalldatetimedatatype that does not have null values . The string associated with this column is formatted MMddYYYY that has no null values either.
Currently I am trying to use a Derived column to convert the string to DT_DBTIMESTAMP.
Currently I am getting the error message: [Derived Column [36]] Error: An error occurred while attempting to perform a type cast.
In my expression Field I have: (DT_DBTIMESTAMP)(SUBSTRING([Derived Column 5],5,4) + "/" + SUBSTRING([Derived Column 5],1,2) + "/" + SUBSTRING([Derived Column 5],3,2))
Thank you in advance for any help!
I started in IT over 25 years ago, and find it somewhat depressing that we are still struggling to get dates from two different systems to integrate ... sigh ...
I would abandon SSIS expressions for this requirement and use a Script Task instead. The .NET DateTime.TryParse method is an elegant solution that can easily be extended for varying date formats.