Athena date_parse for date with optional millisecond field - sql

I have date in S3 using which I created an Athena table. I have some date entries in S3 in json format which Athena is not accepting as either Date or timestamp when am running the queries.
Using AWS Athena which uses Prestodb as query engine
Example json :
{"creationdate":"2018-09-12T15:49:07.269Z", "otherfield":"value1"}
{"creationdate":"2018-09-12T15:49:07Z", "otherfield":"value2"}
AWS Glue is taking both the fields as string and when am changing them to timestamp and date respectively the queries around timestamp are not working giving ValidationError on the timestamp field.
Anyway, I found a way to use prestodb date_parse function but its not working either since some fields have milliseconds while other not.
parse_datetime(creationdate, '%Y-%m-%dT%H:%i:%s.%fZ')
parse_datetime(creationdate, '%Y-%m-%dT%H:%i:%sZ')
Both are failing because of different entries present i.e. one with millisecond %f and one without
Is there a way to provide a parser, regex so that am able to convert these strings into Date during sql query execution?

Instead of providing the timestamp format, you can use the from_iso8601_timestamp function.
This way, all timestamps get parsed.
select from_iso8601_timestamp(creationdate) from table1;

Do you just need date?
If so you could use date_parse(string, format).
date_parse(creationdate, ā€˜%Y-%m-%dā€™)

Use this:
SELECT requestdatetime, remoteip, requester, key
FROM MYDB.TABLE
WHERE parse_datetime(requestdatetime,'dd/MMM/yyyy:HH:mm:ss Z')
BETWEEN parse_datetime('2020-10-14:00:00:00','yyyy-MM-dd:HH:mm:ss')
AND parse_datetime('2020-10-14:23:59:59','yyyy-MM-dd:HH:mm:ss');

Related

Converting a STRING to DATE in Big Query [duplicate]

Been struggling with some datasets I want to use which have a problem with the date format.
Bigquery could not load the files and returned the following error:
Could not parse '4/12/2016 2:47:30 AM' as TIMESTAMP for field date (position 1) starting at location 21 with message 'Invalid time zone:
AM'
I have been able to upload the file manually but as strings, and now would like to set the fields back to the proper format, However, I just could not find a way to change the format of the date column from string to proper DateTime format.
Would love to know if this is possible as the file is just too long to be formatted in excel or sheets (as I have done with the smaller files from this dataset).
now would like to set the fields back to the proper format ... from string to proper DateTime format
Use parse_datetime('%m/%d/%Y %r', string_col) to parse datetime out of string
If applied to sample string in your question - you got
As #Mikhail Berlyant rightly said, using the parse_datetime('%m/%d/%Y %r', string_col)
function will convert your badly formatted dates to a standard format as per ISO 8601 accepted by Google Bigquery . the best option will then be to save these query results to a new table on the database in your Bigquery Project.
I had a similar issue.
Below is an image of my table which i uploaded with all columns in String format .
Next up was that i applied the following settings to the query below
The Settings below stored the query output to a new table called heartrateSeconds_clean on the same dataset
The Write if empty option is a good option to avoid overwriting the existing raw data or just arbitrarily writing output to a temporary table, except if you are sure you want to do so. Save the settings and proceed to Run your Query.
As seen below, the output schema of the new table is automatically updated
Below is the new preview of the resulting table
NB: I did not apply an ORDER BY clause to the Results hence the data is not ordered by any specific column in both versions of the same table.
This dataset has over 2M rows.

Bad date format change from string to date in Bigquery

Been struggling with some datasets I want to use which have a problem with the date format.
Bigquery could not load the files and returned the following error:
Could not parse '4/12/2016 2:47:30 AM' as TIMESTAMP for field date (position 1) starting at location 21 with message 'Invalid time zone:
AM'
I have been able to upload the file manually but as strings, and now would like to set the fields back to the proper format, However, I just could not find a way to change the format of the date column from string to proper DateTime format.
Would love to know if this is possible as the file is just too long to be formatted in excel or sheets (as I have done with the smaller files from this dataset).
now would like to set the fields back to the proper format ... from string to proper DateTime format
Use parse_datetime('%m/%d/%Y %r', string_col) to parse datetime out of string
If applied to sample string in your question - you got
As #Mikhail Berlyant rightly said, using the parse_datetime('%m/%d/%Y %r', string_col)
function will convert your badly formatted dates to a standard format as per ISO 8601 accepted by Google Bigquery . the best option will then be to save these query results to a new table on the database in your Bigquery Project.
I had a similar issue.
Below is an image of my table which i uploaded with all columns in String format .
Next up was that i applied the following settings to the query below
The Settings below stored the query output to a new table called heartrateSeconds_clean on the same dataset
The Write if empty option is a good option to avoid overwriting the existing raw data or just arbitrarily writing output to a temporary table, except if you are sure you want to do so. Save the settings and proceed to Run your Query.
As seen below, the output schema of the new table is automatically updated
Below is the new preview of the resulting table
NB: I did not apply an ORDER BY clause to the Results hence the data is not ordered by any specific column in both versions of the same table.
This dataset has over 2M rows.

Converting a Time to 24 hour time in DB2

I am running on DB2 and I am trying to convert a H:MI:SS AM/PM format, like this '3:33:38 PM' into 24HH:MI:SS format, like this '15:33:38'
This is frequently asked. Different methods exist, cyou an use TO_DATE aka TIMESTAMP_FORMAT combined with TIME or similar.
example, to create a time result
time(to_date('3:33:38 PM', 'HH12:MI:SS AM'))
which yields
15:33:38
It would be unusual to store a time in Db2 as a string..
select timefld
from mytable
Might indeed return, 3:33:38 PM, but if timefld is an actual time data type, then the value return you are seeing is a function of whatever tool you're using to query Db2.
Look around in your client's config for an option to change the format used for dates and times
Note that this only affects how the UI displays the data stored in the database.
It doesn't affect the internal format used to actually store the time, nor the external format used to return the data to clients.

Formatting a string to time on BigQuery?

I've got a huge (1.5GB) CSV file, with dates in it in the format 2014-12-25. I have managed to upload it to BigQuery with the format string for this column. I'm wondering if I can transform this in situ to a datetime format, without having to download the data, parse it and send it back?
I have used the BigQuery GUI (newbie) but am happy to use the CLI if this will make it easier.
You can use some of Date and time functions to "transform" string represented date to datetime
For example
SELECT '2014-12-25', TIMESTAMP('2014-12-25')
Added:
If you feel that you really need to have your data with date in timestamp format vs string and you have this data (string) already in BigQuery - you can do just similar to below query with writing to new table.
SELECT
TIMESTAMP(date_string) as date_timestamp,
< list all the rest of the fields >
FROM original_table

Custom date format for loading data into BigQuery, using bq?

I'm uploading a CSV file to Google BigQuery using bq load on the command line. It's working great, but I've got a question about converting timestamps on the fly.
In my source data, my timestamps are formatted as YYYYMM, e.g. 201303 meaning March 2013.
However, Google BigQuery's timestamp fields are documented as only supporting Unix timestamps and YYYY-MM-DD HH:MM:SS format strings. So unsurprisingly, when I load the data, these fields don't convert to the correct date.
Is there any way I can convey to BigQuery that these are YYYYMM strings?
If not I can convert them before loading, but I have about 1TB of source data, so I'm keen to avoid that if possible :)
Another alternative is to load this field as STRING, and convert it to TIMESTAMP inside BigQuery itself, copying the data into another table (and deleting the original one afterwards), and doing the following transformation:
SELECT TIMESTAMP(your_ts_str + "01") AS ts
An alternative to Mosha's answer can be achieved by:
SELECT DATE(CONCAT(your_ts_str, "01")) as ts