Issue with splitting string in bigquery - google-bigquery

I cant seem to split this string, I would like to get the last 8 numbers as 'Date' in the format of YYYY-MM-DD. I would like to use the _filename to feed through and generate the date.
select
split(split("gs://mmmm_ssss_count_detail/mmm_ssss_count_detail_20220125.csv",'/')[offset(2)],'_')[offset(3)]
This just gives me 'detail' which is not what I need, it should at least give me '20220125.csv' where I can then remove the .csv part and parsedate it to 'Date' in my main select query.
Help please.

Instead of a split, can you use a regex to find the date value? If so try the following:
select
parse_date("%Y%m%d",regexp_extract(path, r'_(\d+)\.csv'))
from sample_data
The regex string above is looking for a pattern where a set of digits (\d+) exists between an _ and .csv
With the sample string you provided it yields:

Consider below approach / option
select _filename,
parse_date('%Y%m%d', array_reverse(split(replace(_filename, '.', '_'), '_'))[offset(1)]) file_date
from your_table
if applied to sample data in your question - output is

Related

I have a column starttimeGMT which contains date in format 2021-03-31T14:30:27+00:00.I need to extract the portion 14:30:27 in sql.How can I get it

I have tried using parse_timestamp but was not able to get it.Please help me find a query for this
To get a piece of string you can use SUBSTRING, for example:
SELECT SUBSTRING(starttimeGTM, 12, 29) FROM tableName

BigQuery: Validate that all dates are formatted as yyyy-mm-dd

Using Google BIGQUERY, I need to check that the values in a column called birth_day_col are the correct and desired date format: YYYY-MM-DD. The values in this column are defined as STRING. Also the values in this column are currently of the following format: YYYY-MM-DD.
I researched a lot on the internet and found an interesting workaround. The following query:
SELECT
DISTINCT birth_day_col
FROM `project.dataset.datatable`
WHERE birth_day_col LIKE '[1-2][0-9][0-9][0-9]/[0-1][0-9]/[0-3][0-9]'
AND country_code = 'country1'
But the result is: "This query returned no results."
I then checked with NOT, using the following code:
SELECT
DISTINCT birth_day_col
FROM `project.dataset.datatable`
WHERE NOT(birth_day_col LIKE '[1-2][0-9][0-9][0-9]/[0-1][0-9]/[0-3][0-9]')
AND country_code = 'country1'
Surprisingly it gave all the values in birth_dat_col, which I have verified and are of the correct date format, but this result coud very much be a coincidence.
And it is very strange (wrong) that I used a query that should result only the wrong format dates, but it actually gives me the correct ones. Everything about these two queries seems like an inversation of each one's role.
The expected result of any query for this business case is to make a count of all incorrect formatted dates (even if currently this is 0).
Thank you for your help!
Robert
A couple of things here:
Read the documentation for the LIKE operator if you want to understand how to use it. It looks like you're trying to use regular expression syntax, but the LIKE operator does not take a regular expression as input.
The standard format for BigQuery's dates is YYYY-MM-DD, so you can just try casting and see if the result is a valid date, e.g.:
SELECT SAFE_CAST(birth_day_col AS DATE) AS birth_day_col
FROM `project`.dataset.table
This will return null for any values that don't have the correct format. If you want to find all of the ones that don't have the correct format, you can use SAFE_CAST inside a filter:
SELECT DISTINCT birth_day_col AS invalid_date
FROM `project`.dataset.table
WHERE SAFE_CAST(birth_day_col AS DATE) IS NULL
The result of this query will be all of the date strings that don't use YYYY-MM-DD format. If you want to check for slashes instead, you can use REGEXP_CONTAINS, e.g. try this:
SELECT
date,
REGEXP_CONTAINS(date, r'^[0-9]{4}/[0-9]{2}/[0-9]{2}$')
FROM (
SELECT '2019/05/10' AS date UNION ALL
SELECT '2019-05-10' UNION ALL
SELECT '05/10/2019'
)
If you want to find all dates with either YYYY-MM-DD format or YYYY/MM/DD format, you can use a query like this:
SELECT
DISTINCT date
FROM `project`.dataset.table
WHERE REGEXP_CONTAINS(date, r'^[0-9]{4}[/\-][0-9]{2}[/\-][0-9]{2}$')
For example:
SELECT
DISTINCT date
FROM (
SELECT '2019/05/10' AS date UNION ALL
SELECT '2019-05-10' UNION ALL
SELECT '05/10/2019'
)
WHERE REGEXP_CONTAINS(date, r'^[0-9]{4}[/\-][0-9]{2}[/\-][0-9]{2}$')
Yet another example for BigQuery Standrad SQL - with use of SAFE.PARSE_DATE
#standardSQL
WITH `project.dataset.table` AS (
SELECT '1980/08/10' AS birth_day_col UNION ALL
SELECT '1980-08-10' UNION ALL
SELECT '08/10/1980'
)
SELECT birth_day_col
FROM `project.dataset.table`
WHERE SAFE.PARSE_DATE('%Y-%m-%d', birth_day_col) IS NULL
with result of list of all dates which are not formatted as yyyy-mm-dd
Row birth_day_col
1 1980/08/10
2 08/10/1980
Google BigQuery's LIKE operator does not support matching digits nor does it uses the [ character in its syntax (I don't think ISO standard SQL does either - LIKE is nowhere near as powerful as Regex).
X [NOT] LIKE Y
Checks if the STRING in the first operand X matches a pattern specified by the second operand Y. Expressions can contain these characters:
A percent sign "%" matches any number of characters or bytes
An underscore "_" matches a single character or byte
You can escape "\", "_", or "%" using two backslashes. For example, "\%". If you are using raw strings, only a single backslash is required. For example, r"\%".
You should use REGEX_CONTAINS instead.
I note that string format tests won't tell you if a date is valid or not, however. Consider that 2019-02-31 has a valid date format, but an invalid date value. I suggest using a datatype conversion function (to convert the STRING to a DATE value) instead.

Converting a number to a date (DD-MON-YY) in SQL

I'm selecting a series of columns to place in a new table. One of these columns (EPISTART_RAW) is a date variable that is currently listed as a raw number, such as 13042015 and 01010216. What I'd like to do is convert these numbers according to a sensible date format, such as 13-APR-15 and 01-JAN-16.
I'm working through various examples I'd found online, but struggling.
I believe you just want to_date():
select to_date(EPISTART_RAW, 'DDMMYYYY')
If it is actually stored as a number, you need to handle initial zeros, so:
select to_date(to_char(EPISTART_RAW, '00000000'), 'DDMMYYYY')
Date value should be within quote:
select to_date(to_char('01010216'),'DDMMYYYY') dt from dual

Remove Square brackets in postgresql (SQL)

i have a date column that reads as string value [2018/04/09].i want to read it as date column 2018/04/09.How to do it in postgresql?
Use the function to_date with the date formatter.
Include additional string characters that appear on all dates as part of the format string, and the date can be parsed correctly.
WITH example (dt) AS (
VALUES ('[2018/04/09]')
)
SELECT to_date(dt, '[YYYY/MM/DD]') FROM example
Alternatively, if the object is to clean up the data as well, e.g. some dates have the square brackets while other dates don't, then it is better to replace all invalid characters and then parse as date.
example:
WITH example (dt) AS (
VALUES ('[2018/04/09]')
)
SELECT to_date(trim(both '[]' from dt), 'YYYY/MM/DD') FROM example
Try this:
SELECT TO_DATE(date_column, 'YYYY/MM/DD') as date
FROM tablename

SQL Convert String to Date

I am trying to find a way of extracting the first part of line of string and separating as a date. The following is an example of some of the data.
17/10/12 lskell Still waiting for one more signature on the
I have tried casting the whole field as a date, and converting to a date, but these fail?
Would anyone have any ideas?
Try this for MySql
DATE_FORMAT(STR_TO_DATE(SUBSTRING_INDEX(columnname,' ',1), '%d/%m/%y'), '%Y-%m-%d')
If you know that
the first "word" in your string will always be a date
the date will always be separated from the rest of the string by a space
the date will have a consistent format
you are working in MySQL
then try this:
SELECT CAST(SUBSTRING_INDEX(myfield, ' ', 1) AS DATE) adate FROM mytable;
Try:
SELECT SUBSTRING('10/17/12 lskell Still waiting for one more signature on the', 1, 8)
(assuming date comes in this format)
-- MYSQL:
SELECT Str_to_Date(Left(yourstring,8),'%d/%m/%y') from yourtable;
-- Oracle
SELECT TO_DATE(left(yourstring,8),'dd/mm/yy')
from your table;
--- sql server
SELECT CONVERT(DATETIME,left(yourstring,8),120) from your table;