Equivalent of Hive's date_format function in Impala? - hive

Is there an equivalent of Hive's date_format function in Impala?
I need to change a date column to the first day of the month (e.g., '2020-09-29' to '2020-09-01'), so I had originally used: date_format(LOG_DATE,'yyyy-MM-01') as FIRST_DAY_MONTH
Thanks!

You can use to_timestamp().
Pls use this to_timestamp('20200901','yyyyMMdd') to get a timestamp.
Generic command may be to_timestamp(concat(substr(data_col,1,7),'-01'),'yyyy-MM-dd')

Related

Remove ".000000" from a field like this: "2386.000000"

How do I remove the ".000000" part of the "2386.000000" field? I want to leave only the numerical part before the dot in databricks
You can use cast
select cast(2386.000000 as int) i
There are many ways to convert a float to an int. The cast() function is just one.
Please see the following link for all supported Spark SQL functions.
https://spark.apache.org/docs/2.3.0/api/sql/index.html#round
In my solution, I use the round() function. As long as you get the correct answer, the path you take may differ.

What is the alternative of datefromparts in Presto (AWS Athena)

What is the best performance alternative of datefromparts SQL function in AWS Athena (Presto DB)?
The use case is:
I have the date parts (i.e. the day, month, and year) and need the date from these.
You would typically use parse_date(), with the proper format specifiers. If your date is in ISO format, you can directly use from_iso_date() (or from_iso_timestamp()).
On the other hand, if you need to extract dates part, you can use extract(), like:
extract(hour from current_timestamp)
Note that Presto also offers a full range of short function name that correspond to the possible extraction parts: year(), quarter(), month(), ...

Can BigQuery's PARSE_DATE function deal with ISOWEEK?

When doing year over year comparisons, it's handy to be able to compare ISOWEEKs. BigQuery's DATE_ADD or DATE_SUB function can't deal with ISOWEEK, so my idea was to simply alter the year (+/- 1) and then getting back the start date of the ISOWEEK's week number via the PARSE_DATE function, but
this works:
SELECT FORMAT_DATE("%G-%V", DATE('2019-04-15')) -> 2019-16
this does not work:
SELECT PARSE_DATE("%G-%V", "2018-16") -> 1970-01-01
There exists also a DATE_TRUNC function that does give back the start date of an ISOWEEK for any given date, so I was expecting the PARSE_DATE function to behave in the same way when parsing a string with an ISOYEAR and and ISOWEEK.
The documentation explicitly lists the ISOYEAR %G and the ISOWEEK %V as supported arguments. Am I missing something here?
Google Cloud Platform Support here!
I have been investigating and there is an issue with the %V argument and PARSE_DATE function. In the following link you will be able to follow the status of the issue while it's being investigated.
If you have further information to add, please feel free to do so in the link I have provided you.

Athena date_parse for date with optional millisecond field

I have date in S3 using which I created an Athena table. I have some date entries in S3 in json format which Athena is not accepting as either Date or timestamp when am running the queries.
Using AWS Athena which uses Prestodb as query engine
Example json :
{"creationdate":"2018-09-12T15:49:07.269Z", "otherfield":"value1"}
{"creationdate":"2018-09-12T15:49:07Z", "otherfield":"value2"}
AWS Glue is taking both the fields as string and when am changing them to timestamp and date respectively the queries around timestamp are not working giving ValidationError on the timestamp field.
Anyway, I found a way to use prestodb date_parse function but its not working either since some fields have milliseconds while other not.
parse_datetime(creationdate, '%Y-%m-%dT%H:%i:%s.%fZ')
parse_datetime(creationdate, '%Y-%m-%dT%H:%i:%sZ')
Both are failing because of different entries present i.e. one with millisecond %f and one without
Is there a way to provide a parser, regex so that am able to convert these strings into Date during sql query execution?
Instead of providing the timestamp format, you can use the from_iso8601_timestamp function.
This way, all timestamps get parsed.
select from_iso8601_timestamp(creationdate) from table1;
Do you just need date?
If so you could use date_parse(string, format).
date_parse(creationdate, ‘%Y-%m-%d’)
Use this:
SELECT requestdatetime, remoteip, requester, key
FROM MYDB.TABLE
WHERE parse_datetime(requestdatetime,'dd/MMM/yyyy:HH:mm:ss Z')
BETWEEN parse_datetime('2020-10-14:00:00:00','yyyy-MM-dd:HH:mm:ss')
AND parse_datetime('2020-10-14:23:59:59','yyyy-MM-dd:HH:mm:ss');

How to use locale for strftime in BigQuery

I am using BigQuery to output a formatted Timestamp value using STRFTIME_UTC_USEC function, the documentation leads me strftime C++ reference,
which specify modifiers like %b (for month) etc. which are locale specific,
is their a way to use locale specific month names using STRFTIME?
The only other alternative I see is to write my own UDF function and do a lookup using Map.
Even though STRFTIME_UTC_USEC function is based on C++'s strftime there is no provision to supply locale.
We usually recommend using Standard SQL which has FORMAT_TIMESTAMP function, but it does not allow changing locale either.
You probably don't have to write complex UDF, just a simple REPLACE or REGEXP_REPLACE can be enough. Or you can have an array with localized month names - ["Январь", "Февраль", "Март", "Апрель", ...] and get element out of it based on month EXTRACT(MONTH FROM date)