I have a field in a BigQuery table that is a STRING but is actually a timestamp, of the format "2019-06-14T11:31:07". So I am using CAST(sign_up_date AS TIMESTAMP) to convert to a usable TIMESTAMP.
This works perfectly in Legacy SQL, however, in Standard SQL, it brings up errors when the STRING is of the format "2019-06-14T09:09" (on the exact minute, missing ":00") or "2019-05-25T05:31:22.7263555" (as sometimes they come through with decimal seconds).
Any idea on how I can get it to work in Standard SQL? Obviously I could just use Legacy SQL, but I want to write in Standard as there are other functionns that work better in that one.
Thanks,
Benji
Below is example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT "2019-06-14T11:31:07" sign_up_date UNION ALL
SELECT "2019-05-25T05:31:22.7263555" UNION ALL
SELECT "2019-06-14T09:09"
)
SELECT sign_up_date,
COALESCE(
SAFE.PARSE_TIMESTAMP('%FT%R', sign_up_date),
SAFE.PARSE_TIMESTAMP('%FT%R:%E*S', sign_up_date)
) AS sign_up_date_as_timestamp
FROM `project.dataset.table`
with result
Row sign_up_date sign_up_date_as_timestamp
1 2019-06-14T11:31:07 2019-06-14 11:31:07 UTC
2 2019-05-25T05:31:22.7263555 2019-05-25 05:31:22.726355 UTC
3 2019-06-14T09:09 2019-06-14 09:09:00 UTC
As you can see this will cover all three patters you presented in your question.
If you will find more - you can add respective SAFE.PARSE_TIMESTAMP inside COALESCE
Related
I'm using PostgreSQL, but this question is for any modern dbms
I want to basically convert a datetime column which has yyyy/mm/dd into just yyyy/mm
I tried getting months and year separately and using Concat, but the problem is the month comes as a single digit integers for values < 10 and that messes up ordering
select *,
concat(date_part('year' , date_old), '/', date_part('month' , date_old)) as date_new
from table
date _old
date_new
2010-01-20
2010-1
2010-01-22
2010-1
2010-11-22
2010-11
You can use to_char()
to_char(date_old, 'yyyy/mm')
If you want to display your date in the format YYYY-MM then
In PostgreSQL (db<>fiddle) and Oracle (db<>fiddle), use TO_CHAR:
SELECT TO_CHAR(date_old, 'YYYY/MM') FROM table_name;
In MySQL (db<>fiddle), use DATE_FORMAT:
SELECT DATE_FORMAT(date_old, '%Y/%m') FROM table_name;
In SQL Server (db<>fiddle), use CONVERT or, if you are using SQL Server 12 or later, FORMAT:
SELECT CONVERT(varchar(7), date_old, 111) FROM table_name;
SELECT FORMAT(date_old,'yyyy/MM') FROM table_name;
Don't do this.
If you're able to use the date_part() function, what you have is not actually formatted as the yyyy/mm/dd value you say it is. Instead, it's a binary value that's not human-readable, and what you see is a convenience shown you by your tooling.
You should leave this binary value in place!
If you convert to yyyy/mm, you will lose the ability to directly call functions like date_part(), and you will lose the ability to index the column properly.
What you'll have left is a varchar column that only pretends to be a date value. Schemas that do this are considered BROKEN.
want to convert this number '20210412070422' to date format '2021-04-12' in hive
I am trying but this returns null value
from_unixtime(unix_timestamp(eap_as_of_dt, 'MM/dd/yyyy'))
The best methoid is to do without unix_timestamp/from_unixtime if possible and in your case it is possible. date() can be removed, string in yyyy-MM-dd format is compatible with date type:
select date(concat_ws('-',substr(ts,1,4),substr(ts,5,2),substr(ts,7,2)))
from
(
select '20210412070422' as ts
)s
Result:
2021-04-12
Another efficient method using regexp_replace:
select regexp_replace(ts,'^(\\d{4})(\\d{2})(\\d{2}).*','$1-$2-$3')
If you prefer using unix_timestamp/from_unixtime
select date(from_unixtime(unix_timestamp(ts, 'yyyyMMddHHmmss')))
from
(
select '20210412070422' as ts
)s
But it is more complex, slower (SimpleDateFormat class is involved) and error prone because will not work if data is not exactly in expected format, for example '202104120700'
Of course you can make it more reliable by taking substring of required length and using yyyyMMdd template:
select date(from_unixtime(unix_timestamp(substr(ts,1,8), 'yyyyMMdd')))
from
(
select '20210412070422' as ts
)s
It makes it even more complex.
Use unix_timestamp/from_unixtime only if simple substr or regexp_replace do not work for data format like '2021Apr12blabla'.
I have a column called 'created_date' whose data type is string. It contains records that are of the pattern date and time. I want to create another column called 'modified_date' that will take just the date from the 'created_date' column so as to be able to do some mathematical computations on dates later. I want to do this using the SQL CAST operator.
Below is how I expect the output to be-
ID created_date modified_date
1 2017-11-01 16:30:40 2017-11-01
2 2017-11-23 15:30:40 2017-11-23
3 2017-11-16 14:30:40 2017-11-16
Any suggestions on how to do this?
I am going to assume that you are using BigQuery.
You can use:
select date(created_date)
You could also be more specific:
select date(substr(created_date, 1, 10))
Or convert to a datetime and then to a date:
select date(cast(created_date as datetime))
You could use a simple date_format() and str_to_date()
select date_format(str_to_date(created_date,'%Y-%m-%d %T'), '%Y-%m-%d') modified_date
Below is for BigQuery Standard SQL
#standardSQL
SELECT *, DATE(TIMESTAMP(created_date)) modified_date
FROM `project.dataset.table`
I want to do this using the SQL CAST operator.
Note: i do not recommend using generic CAST for DATE, DATETIME, TIMESTAMP data types. Instead you should use respective functions as in this answer. Or if string is not directly represent such datatypes - you can use respective PARSE_ function where you can set format in which date/datetime/timestamp is represented in string!
I have timestamp strings that look something like the example here:
2017-07-12T01:51:12.732-0600. Is there any function/combination of functions I can use this to convert this to UTC accurately?
The output should be 2017-07-12 07:51:12.732000. I've tried using to_timestamp and convert_timezone. Obviously, the latter failed, but so did the former and I'm at my wit's end. Help?
you can convert the string directly to timestamp and then set source timezone in convert_timezone function like this (note, offset sign is the opposite to timezone):
select convert_timezone('UTC+06','utc','2017-07-12T01:51:12.732-0600'::timestamp)
if -0600 part is varying you can construct 'UTC+06' part dynamically like this
with times as (
select '2017-07-12T01:51:12.732-0600'::varchar(28) as ts_col
)
select convert_timezone('utc'+(substring(ts_col from 24 for 3)::integer*(-1))::varchar(3),'utc',ts_col::timestamp)
from times
Our SQL query shown below is converting strings to timestamp fields but failing on some dates and not on others. What is causing this conversion to fail?
SELECT birthdate, TIMESTAMP(REGEXP_REPLACE(birthdate, r'(..)/(..)/(....)', r'\3-\2-\1')) ts
FROM [our_project:our_table] LIMIT 1000
Here are the results. Notice that BigQuery is giving "null" for many of the dates. Why is the regex failing? Is there something to add to make it more robust?
Here is a second conversion query we tried.
SELECT birthdate, TIMESTAMP(year + '-' + month + '-' + day) as output_timestamp
FROM (
SELECT
birthdate,
REGEXP_EXTRACT(birthdate, '.*/([0-9]{4})$') as year,
REGEXP_EXTRACT(birthdate, '^([0-9]{2}).*') as day,
REGEXP_EXTRACT(birthdate, '.*/([0-9]{2})/.*') AS month
FROM
[our_project:our_table]
)
LIMIT 1000
Notice that nulls appeared in these results as well.
How might we fix what is going wrong?
Is there a reason you're not using the supported TIMESTAMP data type?
From the docs:
You can describe TIMESTAMP data types as either UNIX timestamps or calendar datetimes.
Datetimes need to be in a specific format:
A date and time string in the format YYYY-MM-DD HH:MM:SS. The UTC and Z specifiers are supported.
This would also make it easier to query this particular column, as it would allow you to leverage BigQuery's standard SQL dialect. Commands such as HOUR, DAYOFWEEK, DAYOFYEAR, etc.
Here's an example query using one of BQ's public datasets to find the most popular pickup hour using a timestamp field:
SELECT
HOUR(pickup_datetime) as pickup_hour,
COUNT(*) as pickup_count
FROM
[nyc-tlc:green.trips_2014]
GROUP BY
1
ORDER BY
pickup_count DESC
will yield:
Row pickup_hour pickup_count
1 19 1059068
2 18 1051326
3 20 985664
4 17 957583
5 21 938378
6 22 908296
It turns out that the month and the day were swapped (international versus U.S.) The result is that the ranges were invalid for the timestamp. Once we swapped the day and the month - then the conversions occurred without problems.
If your data has custom formatting of timestamps, you can always use PARSE_TIMESTAMP function in Standard (non-legacy) SQL - https://cloud.google.com/bigquery/sql-reference/functions-and-operators#parse_timestamp
I.e. all the following queries
select parse_timestamp("%Y-%d-%m", x) from
unnest(["2016-31-12", "1999-01-02"]) x
select parse_timestamp("%Y-%d-%m", x) from
unnest(["2016-31-12", "1999-01-02"]) x
select parse_timestamp("%Y-%b-%d", x) from
unnest(["2016-Dec-31", "1999-Feb-01"]) x
results in
f0_
1 2016-12-31 00:00:00 UTC
2 1999-02-01 00:00:00 UTC