Extract a substring and take second value in a Bigquery Column - google-bigquery

I have this data:
id val
1 ajkdks - jkdj
2 djs - djsd
I want to take only the second value. Which is:
id val
1 jkdj
2 djsd
I know the query if using MySQL:
SUBSTRING_INDEX(SUBSTRING_INDEX(val, " - ", 2)," - ",-1)
But what the query if i using bigquery?

Use below
select id, split(val, ' - ')[safe_offset(1)] val
from your_table
if applied to sample data in your question - output is

We could phrase this using REGEXP_EXTRACT:
SELECT id, REGEXP_EXTRACT(val, r'[^ -]+$') AS val
FROM yourTable
ORDER BY id;
Note that the above regex approach is also robust to the case where val might not have any hyphen separator, in which case the entire value would be returned.

Related

ORACLE TO_CHAR SPECIFY OUTPUT DATA TYPE

I have column with data such as '123456789012'
I want to divide each of each 3 chars from the data with a '/' in between so that the output will be like: "123/456/789/012"
I tried "SELECT TO_CHAR(DATA, '999/999/999/999') FROM TABLE 1" but it does not print out the output as what I wanted. Previously I did "SELECT TO_CHAR(DATA, '$999,999,999,999.99') FROM TABLE 1 and it printed out as "$123,456,789,012.00" so I thought I could do the same for other case as well, but I guess that's not the case.
There is also a case where I also want to put '#' in front of the data so the output will be something like this: #12345678901234. Can I use TO_CHAR for this problem too?
Is these possible? Because when I go through the documentation of oracle about TO_CHAR, it stated a few format that can be use for TO_CHAR function and the format that I want is not listed there.
Thank you in advance. :D
Here is one option with varchar2 datatype:
with test as (
select '123456789012' a from dual
)
select listagg(substr(a,(level-1)*3+1,3),'/') within group (order by rownum) num
from test
connect by level <=length(a)
or
with test as (
select '123456789012.23' a from dual
)
select '$'||listagg(substr((regexp_substr(a,'[0-9]{1,}')),(level-1)*3+1,3),',') within group (order by rownum)||regexp_substr(a,'[.][0-9]{1,}') num
from test
connect by level <=length(a)
output:
1st query
123/456/789/012
2nd query
$123,456,789,012.23
If you wants groups of three then you can use the group separator G, and specify the character to use:
SELECT TO_CHAR(DATA, 'FM999G999G999G999', 'NLS_NUMERIC_CHARACTERS=./') FROM TABLE_1
123/456/789/012
If you want a leading # then you can use the currency indicator L, and again specify the character to use:
SELECT TO_CHAR(DATA, 'FML999999999999', 'NLS_CURRENCY=#') FROM TABLE_1
#123456789012
Or combine both:
SELECT TO_CHAR(DATA, 'FML999G999G999G999', 'NLS_CURRENCY=# NLS_NUMERIC_CHARACTERS=./') FROM TABLE_1
#123/456/789/012
db<>fiddle
The data type is always a string; only the format changes.

Google Big Query SQL to extract numeric ID from string

How do I write a SQL Query in Google Big Query to extract numeric ID from a string like these:
Example 1:
Column Value: "http://www.google.com/abc/eeq/entity/32132"
Desired Extraction: 32132
Example 2:
Column Value: "http://www.google.com/abc/eeq/entity/32132/ABC/2138"
Desired Extraction: 32132
Example 3:
Column Value: "http://www.google.com/abc/eeq/entity/32132http://www.google.com/abc/eeq/entity/32132"
Desired Extraction: 32132
Below example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT "http://www.google.com/abc/eeq/entity/32132" url UNION ALL
SELECT "http://www.google.com/abc/eeq/entity/32132/ABC/2138" UNION ALL
SELECT "http://www.google.com/abc/eeq/entity/32132http://www.google.com/abc/eeq/entity/32132"
)
SELECT url, REGEXP_EXTRACT(url, r'\d+') extracted_id
FROM `project.dataset.table`
with output
Row url extracted_id
1 http://www.google.com/abc/eeq/entity/32132 32132
2 http://www.google.com/abc/eeq/entity/32132/ABC/2138 32132
3 http://www.google.com/abc/eeq/entity/32132http://www.google.com/abc/eeq/entity/32132 32132
You can use regexp_extract(). To get the first series of digits in the string:
select regexp_extract(col, '[0-9]+')

How can I count repeated values in the string in BigQuery?

Example:
I have the following string:
201904,BLANK,201902,BLANK,BLANK,201811,201810,201809
How can I count the number of repeated values "BLANK" that goes one by one?
In the described example the answer is 2, but what is the query?
Thanks for your help in advance!
Below is for BigQuery Standard SQL (with quick simplified example)
Corrected Version
#standardSQL
WITH `project.dataset.table` AS (
SELECT '201904,BLANK,201902,BLANK,BLANK,201811,201810,201809,BLANK,BLANK,BLANK' value UNION ALL
SELECT '201904,BLANK,201902,BLANK,BLANK,BLANK,201811' UNION ALL
SELECT '201904,BLANK,201902,BLANK,201811,201902,BLANK,201811'
)
SELECT value,
(
SELECT MAX(ARRAY_LENGTH(SPLIT(list))) - 1
FROM UNNEST(REGEXP_EXTRACT_ALL(value || ',', r'(?:BLANK,){1,}')) list
) max_repeated_count
FROM `project.dataset.table`
The idea here is
extract all instances of consecutive BLANK
split each such instances to array of elements of BLANK
and finally get max length of those arrays as a result
Just something came as quick approach
Refactored Version
#standardSQL
WITH `project.dataset.table` AS (
SELECT '201904,BLANK,201902,BLANK,BLANK,201811,201810,201809,BLANK,BLANK,BLANK' value UNION ALL
SELECT '201904,BLANK,201902,BLANK,BLANK,BLANK,201811' UNION ALL
SELECT '201904,BLANK,201902,BLANK,201811,201902,BLANK,201811'
)
SELECT value,
(
SELECT MAX(LENGTH(element) - 1)
FROM UNNEST(REGEXP_EXTRACT_ALL(REPLACE(value || ',', 'BLANK', ''), r',+')) element
) max_repeated_count
FROM `project.dataset.table`
Both with output
Row value max_repeated_count
1 201904,BLANK,201902,BLANK,BLANK,201811,201810,201809,BLANK,BLANK,BLANK 3
2 201904,BLANK,201902,BLANK,BLANK,BLANK,201811 3
3 201904,BLANK,201902,BLANK,201811,201902,BLANK,201811 1
Refactored version is slightly different (but main idea the same)
it removes all BLANKS (assuming BLANK cannot be part of other element - if it can - code can easily be adjusted)
then extract all consecutive entries of commas into array
calculates max length of such sequences of commas
Maybe I misunderstood, but can't you simply split by the value you're looking for and subtract 2 (1 for the first element and 1 for counting elements after splitting):
declare t DEFAULT '201904,BLANK,201902,BLANK,BLANK,201811,201810,201809';
SELECT
t as theString,
split(t,'BLANK') as theSplittedString,
array_length(split(t,'BLANK'))-2 as theAmount
n>0 - amount of repetition,
0 - no repetition,
-1 - element not found

Search PostgreSQL column for substring

I have a DB column that has entries like this:
"56/45/34"
"78/34/145"
"45"
"" (i.e. NULL)
I want to search for the rows that match a certain number - for example "45" would should return the first and third rows but not the second.
We can try using a regex approach here with word boundaries:
select col
from your_table
where col ~* '\y45\y';
Demo
You can convert the delimited string to an array and then test the array
select *
from the_table
where '45' = any(string_to_array(the_column, '/'))

Finding the second last occurrence of a string (date) in Regex

I got the following strings:
(1640.31; 08/19/2016; 09/13/2016;); (250000.0; 09/30/2016; 02/17/2018;); (100000.0; 03/12/2018; 12/31/2025;);
Or
(1000000.0; 05/30/2018; 06/03/2028;);
I need to return this second to last date, so in these cases for example 1: 03/12/2018 and example 2: 05/30/2018.
Because there are a lot of string-parts ending with ; I can't figure quite out how I can get the second to last date.
Below example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT '(1640.31; 08/19/2016; 09/13/2016;); (250000.0; 09/30/2016; 02/17/2018;); (100000.0; 03/12/2018; 12/31/2025;);' AS str UNION ALL
SELECT '(1000000.0; 05/30/2018; 06/03/2028;);'
)
SELECT ARRAY_REVERSE(REGEXP_EXTRACT_ALL(str, r'\d\d/\d\d/\d\d\d\d'))[SAFE_OFFSET(1)] dt
FROM `project.dataset.table`
with result:
Row dt
1 03/12/2018
2 05/30/2018
note: above assumes that dates are always in mm/dd/yyyy or dd/mm/yyyy format, but can be adjusted if different
I think this does what you want:
select (select array_agg(val order by o desc limit 2) -- the limit is just for efficiency
from unnest(split(str, ';')) val with offset o
where val like '%/%/%'
)[ordinal(2)] a
from (select '1640.31; 08/19/2016; 09/13/2016;' as str) x;
Note that this also (happens to) work with parentheses, if they are really part of the strings.