Sanitize a string column with a mix of datetime and timestamps

Sanitize a string column with a mix of datetime and timestamps - sql

I have a column on BigQuery of String datatype that has a mix of date and timestamps as strings.
https://www.evernote.com/l/AmOc0thoaMRLJ7y1IFnLMxwLAeREujUtGRc
Tried SAFE_CAST, DATE_PARSE but neither work.
I want to be able to query this column uniformly as timestamps.

I want to be able to query this column uniformly as timestamps.
Below example for BigQuery Standard SQL
#standardSQL
SELECT close_date,
COALESCE(
TIMESTAMP_MILLIS(SAFE_CAST(close_date AS INT64)),
SAFE.PARSE_TIMESTAMP('%m/%d/%y', close_date)
) AS close_date_as_timestamp
FROM `project.dataset.table`
If to apply to your sample data - result is
Row close_date close_date_as_timestamp
1 1556064000000 2019-04-24 00:00:00 UTC
2 01/24/19 2019-01-24 00:00:00 UTC
3 1548892800000 2019-01-31 00:00:00 UTC
4 11/27/18 2018-11-27 00:00:00 UTC
Note: You can add to COALESCE as many different patterns as you expect in your data
For example you can add below to support 2019-01-01
SAFE.PARSE_TIMESTAMP('%Y-%m-%d', close_date)
And so on ...

Related

how to round timestamp to day in HIVE?

I have a string value like '2020-10-01T02:02:50.918+03:00'. How can I get value like this: 2020-10-01 00:00:00.000 in timestamp datatype in Hive?

Use substr to get yyyy-MM-dd, then use timestamp construct.
Demo:
select timestamp(substr('2020-10-01T02:02:50.918+03:00',1,10))
Result:
2020-10-01 00:00:00.0

How to convert UTC time to local timezones based on timezone column in bigquery?

I've been trying to convert each UTC time back to the appropriate local timezone using standard SQL in GBQ, but couldn't find a good way to do it dynamically because I might have tons of different timezone name within the database. I'm wondering if anyone has an idea?
The table I have contains 2 different columns (see screenshot)

Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.yourtable` AS (
SELECT 'Pacific/Honolulu' timezone, TIMESTAMP '2020-03-01 03:41:27 UTC' UTC_timestamp UNION ALL
SELECT 'America/Los_Angeles', '2020-03-01 03:41:27 UTC'
)
SELECT *,
DATETIME(UTC_timestamp, timezone) AS local_time
FROM `project.dataset.yourtable`
with output
Row timezone UTC_timestamp local_time
1 Pacific/Honolulu 2020-03-01 03:41:27 UTC 2020-02-29T17:41:27
2 America/Los_Angeles 2020-03-01 03:41:27 UTC 2020-02-29T19:41:27

Change Date Format from an array in SQL SELECT Statement

I have a column updated_at that returns an array
["2019-01-05T17:28:32.506-05:00","2019-06-15T13:22:02.625-04:00"]
But I want the output date format like this 2019-01-03.
How can I accomplish this in sql databricks?
Thanks!

Try unnest and cast that as a date:
with ts_array as
(select array['2019-01-05T17:28:32.506-05:00','2019-06-15T13:22:02.625-04:00'] as tsa)
select unnest(tsa)::date from ts_array ;

You can use "date_trunc" SQL function to get the output in date format.
date_trunc(fmt, ts) - Returns timestamp ts truncated to the unit specified by the format model fmt. fmt should be one of [“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]
Examples:
> SELECT date_trunc('YEAR', '2015-03-05T09:32:05.359');
2015-01-01 00:00:00
> SELECT date_trunc('MM', '2015-03-05T09:32:05.359');
2015-03-01 00:00:00
> SELECT date_trunc('DD', '2015-03-05T09:32:05.359');
2015-03-05 00:00:00
> SELECT date_trunc('HOUR', '2015-03-05T09:32:05.359');
2015-03-05 09:00:00
Reference: Databricks - SQL Functions.
Hope this helps.

Calculate time difference between two columns of string type in hive without changing the data type string

I am trying to calculate the time difference between two columns of a row which are of string data type. If the time difference between them is less than 2 hours then select the first column of that row else if the time difference is greater than 2 hours then select the second column of that row. It can be done by converting the columns to datetime format, but I want the result to be in string only. How can I do that? The data looks like this:
col1(string type)
2018-07-16 02:23:00
2018-07-26 12:26:00
2018-07-26 15:32:00
col2(string type)
2018-07-16 02:36:00
2018-07-26 14:29:00
2018-07-27 15:38:00

I think you don't need to convert the columns to datetime format, since the data in your case is already ordered (yyyy-MM-dd hh:mm:ss). You just need to take all the digits and take it into one string (yyyyMMddhhmmss) then you can apply your selection which is bigger or smaller than 2 hours (here 20000 since the hour is followed by mmss). By looking at your example (assuming col2 > col1), this query would work:
SELECT case when regexp_replace(col2,'[^0-9]', '')-regexp_replace(col1,'[^0-9]', '') < 20000 then col1 else col2 end as col3 from your_table;

Use unix_timestamp() to convert string timestamp to seconds.
The difference in hours will be:
hive> select (unix_timestamp('2018-07-16 02:23:00')- unix_timestamp('2018-07-16 02:36:00'))/60/60;
OK
-0.21666666666666667
Important update: this method will work correctly only if time zone is configured as UTC. Because for DST timezones for some marginal cases Hive converts time during timestamp operations. Consider this example for PDT time zone:
hive> select hour('2018-03-11 02:00:00');
OK
3
Note the hour is 3, not 2. This is because 2018-03-11 02:00:00 cannot exist in PDT time zone because exactly at 2018-03-11 02:00:00 time is adjusted and becomes 2018-03-11 03:00:00.
The same happens when converting to unix_timestamp. For PDT time zone unix_timestamp('2018-03-11 03:00:00') and unix_timestamp('2018-03-11 02:00:00') will return the same timestamp:
hive> select unix_timestamp('2018-03-11 03:00:00');
OK
1520762400
hive> select unix_timestamp('2018-03-11 02:00:00');
OK
1520762400
And few links for your reference:
https://community.hortonworks.com/questions/82511/change-default-timezone-for-hive.html
http://boristyukin.com/watch-out-for-timezones-with-sqoop-hive-impala-and-spark-2/
Also have a look at this jira please: Hive should carry out timestamp computations in UTC

select values of a specific month from table contain timestamp column

For example I have a following table(tbl_trans) like below
transaction_id transaction_dte
integer timestamp without time zone
---------------+----------------------------------
45 | 2014-07-17 00:00:00
56 | 2014-07-17 00:00:00
78 | 2014-04-17 00:00:00
so how can I find the tottal no.of transaction in 7th month from tbl_trans ?
so the expected output is
tot_tran month
--------+-------
2 | July

select count(transaction_id) tot_tran
,to_char(max(transaction_dte),'Month') month from tbl_trans
where extract (month from transaction_dte)=7
PostgreSQL Extract function explained here
Reference : Date/Time Functions and Operators

select count(transaction_id),date_part('month',transaction_dte)
from tbli_trans where date_part('month',transaction_dte)=7

EXTRACT(MONTH FROM TIMESTAMP transaction_dte)
OR
date_part('month', timestamp transaction_dte)
You only need to add the word timestamp if your timestamp is saved in a string format
Properly looked up what the difference between the 2 is now:
The extract function is primarily intended for computational
processing. For formatting date/time values for display.
The date_part function is modeled on the traditional Ingres equivalent
to the SQL-standard function extract.

Use Datepart function.
where datepart(transaction_dte, mm) = 7

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sanitize a string column with a mix of datetime and timestamps - sql

I have a column on BigQuery of String datatype that has a mix of date and timestamps as strings. https://www.evernote.com/l/AmOc0thoaMRLJ7y1IFnLMxwLAeREujUtGRc Tried SAFE_CAST, DATE_PARSE but neither work. I want to be able to query this column uniformly as timestamps.

Related

how to round timestamp to day in HIVE?

How to convert UTC time to local timezones based on timezone column in bigquery?

Change Date Format from an array in SQL SELECT Statement

Calculate time difference between two columns of string type in hive without changing the data type string

select values of a specific month from table contain timestamp column

Categories

Resources