BigQuery: Validate that all dates are formatted as yyyy-mm-dd - google-bigquery

Using Google BIGQUERY, I need to check that the values in a column called birth_day_col are the correct and desired date format: YYYY-MM-DD. The values in this column are defined as STRING. Also the values in this column are currently of the following format: YYYY-MM-DD.
I researched a lot on the internet and found an interesting workaround. The following query:
SELECT
DISTINCT birth_day_col
FROM `project.dataset.datatable`
WHERE birth_day_col LIKE '[1-2][0-9][0-9][0-9]/[0-1][0-9]/[0-3][0-9]'
AND country_code = 'country1'
But the result is: "This query returned no results."
I then checked with NOT, using the following code:
SELECT
DISTINCT birth_day_col
FROM `project.dataset.datatable`
WHERE NOT(birth_day_col LIKE '[1-2][0-9][0-9][0-9]/[0-1][0-9]/[0-3][0-9]')
AND country_code = 'country1'
Surprisingly it gave all the values in birth_dat_col, which I have verified and are of the correct date format, but this result coud very much be a coincidence.
And it is very strange (wrong) that I used a query that should result only the wrong format dates, but it actually gives me the correct ones. Everything about these two queries seems like an inversation of each one's role.
The expected result of any query for this business case is to make a count of all incorrect formatted dates (even if currently this is 0).
Thank you for your help!
Robert

A couple of things here:
Read the documentation for the LIKE operator if you want to understand how to use it. It looks like you're trying to use regular expression syntax, but the LIKE operator does not take a regular expression as input.
The standard format for BigQuery's dates is YYYY-MM-DD, so you can just try casting and see if the result is a valid date, e.g.:
SELECT SAFE_CAST(birth_day_col AS DATE) AS birth_day_col
FROM `project`.dataset.table
This will return null for any values that don't have the correct format. If you want to find all of the ones that don't have the correct format, you can use SAFE_CAST inside a filter:
SELECT DISTINCT birth_day_col AS invalid_date
FROM `project`.dataset.table
WHERE SAFE_CAST(birth_day_col AS DATE) IS NULL
The result of this query will be all of the date strings that don't use YYYY-MM-DD format. If you want to check for slashes instead, you can use REGEXP_CONTAINS, e.g. try this:
SELECT
date,
REGEXP_CONTAINS(date, r'^[0-9]{4}/[0-9]{2}/[0-9]{2}$')
FROM (
SELECT '2019/05/10' AS date UNION ALL
SELECT '2019-05-10' UNION ALL
SELECT '05/10/2019'
)
If you want to find all dates with either YYYY-MM-DD format or YYYY/MM/DD format, you can use a query like this:
SELECT
DISTINCT date
FROM `project`.dataset.table
WHERE REGEXP_CONTAINS(date, r'^[0-9]{4}[/\-][0-9]{2}[/\-][0-9]{2}$')
For example:
SELECT
DISTINCT date
FROM (
SELECT '2019/05/10' AS date UNION ALL
SELECT '2019-05-10' UNION ALL
SELECT '05/10/2019'
)
WHERE REGEXP_CONTAINS(date, r'^[0-9]{4}[/\-][0-9]{2}[/\-][0-9]{2}$')

Yet another example for BigQuery Standrad SQL - with use of SAFE.PARSE_DATE
#standardSQL
WITH `project.dataset.table` AS (
SELECT '1980/08/10' AS birth_day_col UNION ALL
SELECT '1980-08-10' UNION ALL
SELECT '08/10/1980'
)
SELECT birth_day_col
FROM `project.dataset.table`
WHERE SAFE.PARSE_DATE('%Y-%m-%d', birth_day_col) IS NULL
with result of list of all dates which are not formatted as yyyy-mm-dd
Row birth_day_col
1 1980/08/10
2 08/10/1980

Google BigQuery's LIKE operator does not support matching digits nor does it uses the [ character in its syntax (I don't think ISO standard SQL does either - LIKE is nowhere near as powerful as Regex).
X [NOT] LIKE Y
Checks if the STRING in the first operand X matches a pattern specified by the second operand Y. Expressions can contain these characters:
A percent sign "%" matches any number of characters or bytes
An underscore "_" matches a single character or byte
You can escape "\", "_", or "%" using two backslashes. For example, "\%". If you are using raw strings, only a single backslash is required. For example, r"\%".
You should use REGEX_CONTAINS instead.
I note that string format tests won't tell you if a date is valid or not, however. Consider that 2019-02-31 has a valid date format, but an invalid date value. I suggest using a datatype conversion function (to convert the STRING to a DATE value) instead.

Related

Want to convert timestamp to date format in hive

want to convert this number '20210412070422' to date format '2021-04-12' in hive
I am trying but this returns null value
from_unixtime(unix_timestamp(eap_as_of_dt, 'MM/dd/yyyy'))
The best methoid is to do without unix_timestamp/from_unixtime if possible and in your case it is possible. date() can be removed, string in yyyy-MM-dd format is compatible with date type:
select date(concat_ws('-',substr(ts,1,4),substr(ts,5,2),substr(ts,7,2)))
from
(
select '20210412070422' as ts
)s
Result:
2021-04-12
Another efficient method using regexp_replace:
select regexp_replace(ts,'^(\\d{4})(\\d{2})(\\d{2}).*','$1-$2-$3')
If you prefer using unix_timestamp/from_unixtime
select date(from_unixtime(unix_timestamp(ts, 'yyyyMMddHHmmss')))
from
(
select '20210412070422' as ts
)s
But it is more complex, slower (SimpleDateFormat class is involved) and error prone because will not work if data is not exactly in expected format, for example '202104120700'
Of course you can make it more reliable by taking substring of required length and using yyyyMMdd template:
select date(from_unixtime(unix_timestamp(substr(ts,1,8), 'yyyyMMdd')))
from
(
select '20210412070422' as ts
)s
It makes it even more complex.
Use unix_timestamp/from_unixtime only if simple substr or regexp_replace do not work for data format like '2021Apr12blabla'.

How to retrieve data from MariaDB where any dates are in the format YYYY-MM-DD?

I'm retrieving data from MariaDB using:
SELECT * FROM table_name
Three columns in this data contain dates (and are formatted as dates in the form YYYY-MM-DD). When I receive them on the client side, they appear as "2021-07-11T14:00:00.000Z" but I instead want "2021-07-11". I've tried lots of things, including:
SELECT * FROM table_name DATE_FORMAT(date_column_one,'dd/mm/yyyy')
which doesn't work, as well as
SELECT *, DATE_FORMAT(date_column_one,'dd/mm/yyyy') FROM table_name
but this simply adds another column of data - I still get the other dates in the wrong format.
There is a lot of info about this stuff online but I can't find anywhere where they actually combine the formatting of dates with a basic select statement.
Your call to DATE_FORMAT is not using the format mask you seem to want here, which is yyyy-mm-dd. Try using the correct mask, and also don't select the date columns via SELECT *:
SELECT col1, col2, col3, ... -- excluding date_column_one etc.
DATE_FORMAT(date_column_one, 'yyyy-mm-dd') AS date_column_one
FROM table_name;
If you just want to return a date, then you can use the date function:
SELECT . . ., -- the other columns
DATE(date_column_one)
FROM table_name;
However, this returns a column with the type of date and you are at the mercy of your application to display it. Some applications might do you the "favor" of deciding that you want to see the time and timezone.
You can control this by converting the value to a string using DATE_FORMAT():
SELECT . . ., -- the other columns
DATE_FORMAT(date_column_one, '%d/%m/%Y')
FROM table_name;
Now the value is a string and its format will not be changed. The format '%Y-%m-%d' is the standard YYYY-MM-DD format, and I much prefer that.

SQL number Wildcard issue

I'm trying to add a wildcard to my date select query so i only pull the day not time. I.e. 2021-03-11 17:54:30.123. I thought a number could be substituted for a #.
select AID, LocalCoAltIn,LocalCoAltOut,EventTime
from EXCDS.dbo.WKS_LOG_VIEW
where EventTime like '2021-03-11 ##:##:##:###';
My query is returning no values even though they are in the table. Thanks.
No! Don't use strings! One method is to convert to a date:
select AID, LocalCoAltIn,LocalCoAltOut,EventTime
from EXCDS.dbo.WKS_LOG_VIEW
where convert(date, EventTime) = '2021-03-11';
Another method is to use a range:
where EventTime >= '2021-03-11' and
EventTime < '2021-03-12'
The LIKE operator in most flavors of SQL only support _ and * wildcards (matching any one single, or multiple characters). Gordon has given you a better approach, but if you wanted to fix your current query on SQL Server you could try:
SELECT AID, LocalCoAltIn, LocalCoAltOut, EventTime
FROM EXCDS.dbo.WKS_LOG_VIEW
WHERE EventTime LIKE '2021-03-11 [0-9][0-9]:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][0-9]';
SQL Server extended the LIKE operator to accept a few extra things, such as character classes. Here [0-9] inside LIKE would match any single digit.
Not sure that like operator would work for date as you want, but you still have few options.
Use DATEPART function to retrieve year\month\etc and compare it with exact value that you need
select AID, LocalCoAltIn,LocalCoAltOut,EventTime from EXCDS.dbo.WKS_LOG_VIEW where DATEPART(year,EventTime) = 2021 AND DATEPART(month,EventTime) = 3 AND DATEPART(day,EventTime = 11);
Or use Gordon Linoff suggestion if you dont care about exact date part and only need to compare entire date without time

Select a date in a string and convert it to datetime

I have a string like:
'SPY US 03/20/20 P45'
I want to select just the date from the string.
My current query is:
Select Ticker, SUBSTRING(Ticker, PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9]%',o.Ticker),8) AS 'myDate' FROM TABLE
This returns: 'SPY US 03/20/20 P45', 03/20/20
I want myDate to be in datetime.
I have tried various morphs of cast and convert, but they fail, presumably because they want the format to be in YYYY-MM-DD rather than MM/DD/YY.
My "smartest" attempt to convert was this:
CONVERT(DATETIME, SUBSTRING(o.Ticker, PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9]%',o.Ticker),8),1)
After reading style guidelines here: https://www.w3schools.com/sql/func_sqlserver_convert.asp
but it still failed.
The ideal end-format for the date would be YYYY-MM-DD
Edited to add:
I have been fiddling with it and realized that I over simplied my question. The convert works if I just test it on a string, but the entire query involves several joins.
As I can understand you are looking for something like this.
You can use string_split() function to split string with blank space and then use try_cast() function to check each value whether it is a date.
declare #string as varchar(120) = 'SPY US 03/20/20 P4'
; with cte as (select
value
from string_split (#string, ' ')
)Select value from cte where try_cast (value as datetime) is not null
Live db<>fiddle demo.
So it turns out that there were a few entries in the column that didn't have friendly date format, so the patindex clause was returning nonsense.
For some reason that caused the entire operation to fail(rather than just returning null on the few entries that were failing).
Once I selected the entire (ultimately more complicated join statement) into a temp table, then I was able to try_convert the substring into a date and run my operations.

SQL:how to perform select only on some characters from a column

I wanted to know how to perform the SQL SELECT operation on only a particular range of characters.
For example,I've got an SQL query:
SELECT date,score from feedback GROUP BY date
Now this date is of format yyyy/mm/dd.
So I wanted to strip the days or cut out the days from it and make it yyyy/mm, thereby selecting only 0-7 characters from the date.
I've searched everywhere for the answer but could not find anything.Could I maybe do something like this?
SELECT date(7),score from feedback GROUP BY date(7)
In MSQL you use LEFT other use substring
SELECT LEFT('abcdefg',2);
--
ab
So in your case
SELECT LEFT(date,7), score
FROM feedback
GROUP BY LEFT(date,7)
But again things will be much easier if you use a date field instead a text field.
Assuming that your date field is an actual date field (or even a datetime field) the following solution would work:
select left(convert(date ,getdate()),7) as Year_Month
Changing getdate() to your date field, it would look like:
select left(convert(date , feedback.date),7) as Year_Month
Both queries return the following:
2016-01