I am trying to change the format of a timestamp in AWS Athena but I am not able to get it correct, would someone please help?
The value (Data format: string (Partitioned)) of the column I am trying to change is
20220826T073200Z
and I would like the output to be
2022-08-26 07:32:00
You need to parse date first, for example with date_parse:
select date_parse('20220826T073200Z', '%Y%m%dT%H%i%sZ');
Output:
_col0
2022-08-26 07:32:00.000
If this is not good enough you can format it with date_format:
select date_format(date_parse('20220826T073200Z', '%Y%m%dT%H%i%sZ'), '%Y-%m-%d %H:%i:%s');
_col0
2022-08-26 07:32:00
I imported data from a marketing source in my db and it changed the format of the timestamp.
This is how the timestamp stored in string/text format appears - 20220725115427 ---1
This is what this string actually means if you manually space it out - 2022-07-25 11:54:27 ---2
How do I go from 1 to 2? I want 2 in timestamp format.
Any help is appreciated.
I'm on Amazon Redshift Database - redshift SQL
Have you tried the TO_TIMESTAMP function?
Usage:
SELECT '20220725115427' AS str, TO_TIMESTAMP('20220725115427', 'YYYYMMDDHH24MISS') AS ts;
Please see this for forammting reference.
I'm processing CSV files, outputting parquet files using Pandas in an AWS Lambda function, saving the data to an S3 bucket to query with Athena. The RAW input format to the Lambda function is CSV, with a unix timestamp in UTC that looks like:
Timestamp,DeviceName,DeviceUUID,SignalName,SignalValueRaw,SignalValueScaled,SignalType,Valid
1605074410110,F2016B1E.CAP.0 - 41840982B40192,323da038-bb49-4f3a-a045-925194364e5b,X.ALM.FLG,0,0,INTEGER,true
I parse the Timestamp like:
df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='ms')
df.head()
Timestamp DeviceName DeviceUUID SignalName SignalValueRaw SignalValueScaled SignalType SubstationId StationBankId FeederId year month day hour DeviceNameClean DeviceType
0 2020-11-11 06:00:10.110 F2016B2W.MLR.0 - 41841005000073 3c4839b1-ab99-4164-b415-4653948360ef CVR_X_ENGAGED_A 0 0 BOOLEAN Kenton FR2016B2 F2016B2W 2020 11 11 6 MLR.0 - 41841005000073 MLR
I process the data further in the Lambda function, then output a parquet file.
I then run a Glue crawler against the parquet files that this script outputs, and in S3, can query the data fine:
2020-11-14T05:00:43.609Z,02703ee8-b08a-4c49-9581-706f905aa192,FR22607.REG.0,REG,REG.0,ROSS,FR22607,,0,0,0,0,0,0,0,0,,0.0,,,,0.0,,,,1.0,,
The glue crawler correctly identifies the column as timestamp:
CREATE EXTERNAL TABLE `cvr_event_log`(
`timestamp` timestamp,
`deviceuuid` string,
`devicename` string,
`devicetype` string,
...
But when I then query the table in Athena, I get this for the date:
"timestamp","deviceuuid","devicename","devicetype",
"+52840-11-19 16:56:55.000","0ca4ed37-930d-4778-b3a8-f49d9b498364","FR22606.REG.0","REG",
What has Athena so confused about the timestamp?
For a TIMESTAMP column to work in Athena you need to use a specific format, which unfortunately is not ISO 8601. It looks like this: "2020-11-14 20:33:42".
You can use from_iso8601_timestamp(ts) to parse ISO 8601 timestamps in queries.
Glue crawlers sadly misinterprets things quite often and creates tables that don't work properly with Athena.
"Date" data from GA in BQ is "yyyymmdd" which is not able to convert to "date" data set.
Is there any way to make BQ recognize it as "date"?
Thank you,
According to the documentation, the date field is exported as String from your GA data.
However, it is possible to change that after you export your data to BigQuery. You can overwrite your current table or create a new one with the date format you desire. In order to achieve this, we will use PARSE_DATE() builtin method. It receives a String that will be casted to date according to the string format it has. Below is the StandardSQL syntax in BigQuery:
SELECT PARSE_DATE("%Y%m%d", date) as date FROM `project.dataset.table`
The date will be outputed as YYYY-MM-DD. In addition, if you want to change the date format, you can use FORMAT_DATE() builtin method using one of the formatting elements.
In your case that you want to replace the whole table with the date column with the desired format, you could use the following syntax:
CREATE OR REPLACE TABLE `project.dataset.table` AS
( SELECT * REPLACE(PARSE_DATE("%Y%m%d",date) as date) FROM `project.dataset.table`)
Therefore, your table will have all the same columns, but the date field will be formatted as DATE.
Request your help as have been trying to solve this but not able to.
I have a column in athena which is string . I want to convert that column into timestamp in athena.
I have used the query:
select date_parse(timestamp,'%Y-%m-%dT%H:%i:%s.%fZ') from wqmparquetformat ;
But i am getting errors:
INVALID_FUNCTION_ARGUMENT: Invalid format: "1589832352" is malformed at "832352"
I have tried all the combination of Presto in timestamp format.
When i run the below query :
select to_iso8601(from_unixtime(1589832352));
I receive the below output:
2020-05-18T20:05:52.000Z
The date_parse() function expects (string, format) as parameters and returns timestamp. So you need to pass your string as shown below :
select date_parse(to_iso8601(from_unixtime(1589832352)),'%Y-%m-%dT%H:%i:%s.%fZ')
which gave me below output
2020-05-18 20:05:52.000
You need to pass the column name contains the value 1589832352 in your case
select date_parse(to_iso8601(from_unixtime(timestamp)),'%Y-%m-%dT%H:%i:%s.%fZ')
In your case you should cast timestamp as double for it to work as shown below:
select date_parse(to_iso8601(from_unixtime(cast(timestamp as double))),'%Y-%m-%dT%H:%i:%s.%fZ')
To test run below query which works fine.
select date_parse(to_iso8601(from_unixtime(cast('1589832352' as double))),'%Y-%m-%dT%H:%i:%s.%fZ')
For me, date_format works great in AWS Athena:
SELECT date_format(from_iso8601_timestamp(datetime), '%m-%d-%Y %H:%i') AS myDateTime FROM <table>;
OR
select date_format(from_iso8601_timestamp(timestamp),'%Y-%m-%dT%H:%i:%s.%fZ') from wqmparquetformat ;