Snowflake SQL Regex - sql

I am trying to identify a value that is nested in a string using Snowflakes regexp_substr()
The value that I want to access is in quotes:
...
Type:
value: "CategoryA"
...
Edit: This text is nested in a much larger portion of text.
I want to extract CategoryA for all columns using regexp_substr. But I am unsure how.
I have tried:
regexp_substr(col, 'Type\\W+(\\w+)\\W+\\w.+')
and while that gives the portion of the string, I just want what is in quotes and can't figure out how to do so.

You could use regexp_replace() instead:
regexp_replace(col, '(^[^"]*")|("[^"]*$)", '')
The regexp matches on both following conditions, and replaces matching parts with the empty string:
^[^"]*": everything from the beginning of the string to the first double quote
("[^"]*$)": everything from the last double quote to the end of the string

Related

regex extract big query with numeric data

how would I be able to grab the number 2627995 from this string
"hellotest/2627995?hl=en"
I want to grab the number 2627995, here is my current regex but it does not work when I use regex extract from big query
(\/)\d{7,7}
SELECT
REGEXP_EXTRACT(DESC, r"(\/)\d{7,7}")
AS number
FROM
`string table`
here is the output
Thank you!!
I think you just want to match all digits coming after the last path separator, before either the start of the query parameter, or the end of the URL.
SELECT REGEXP_EXTRACT(DESC, r"/(\d+)(?:\?|$)") AS number
FROM `string table`
Demo
Try this one: r"\/(\d+)"
Your code returns the slash because you captured it (see the parentheses in (\/)\d{7,7}). REGEXP_EXTRACT only returns the captured substring.
Thus, you could just wrap the other part of your regex with the parentheses:
SELECT
REGEXP_EXTRACT(DESC, r"/(\d{7})")
AS number
FROM
`string table`
NOTE:
In BigQuery, regex is specified with string literals, not regex literals (that are usually delimited with forward slashes), that is why you do not need to escape the / char (it is not a special regex metacharacter)
{7,7} is equal to {7} limiting quantifier, meaning seven occurrences.
Also, if you are sure the number is at the end of string or is followed with a query string, you can enhance it as
REGEXP_EXTRACT(DESC, r"/(\d+)(?:[?#]|$)")
where the regex means
/ - a / char
(\d+) - Group 1 (the actual output): one or more digits
(?:[?#]|$) - either ? or # char, or end of string.

Remove all characters between two substrings in Hive SQL query

I have a column of strings that looks like this:
STRING:SECTION1/SECTION2/0000123456789/SECTION3/SECTION4
STRING:SECTION1/SECTION2/0000987654321/SECTION3/SECTION4
STRING:SECTION1/SECTION2/00005552121X/SECTION3/SECTION4
STRING:SECTION1/SECTION2/00005552222:ID/SECTION3/SECTION4
I am trying to use REGEXP_REPLACE to replace the variable length, alpha/num/special char string from the middle and replace it with something generic so that they all look like this:
STRING:SECTION1/SECTION2/id_number_removed/SECTION3/SECTION4
I have been trying all morning to try to find the right regex expression to replace everything between '/SECTION2/' and '/SECTION3/' but have had no success.
Replace regex pattern 'SECTION2/[^/]+/SECTION3' with 'SECTION2/id_number_removed/SECTION3'. [^/]+ means 1 or more characters that are not slashes.
select regexp_replace(
'STRING:SECTION1/SECTION2/00005552222:ID/SECTION3/SECTION4',
'SECTION2/[^/]+/SECTION3',
'SECTION2/id_number_removed/SECTION3');
which gives
STRING:SECTION1/SECTION2/id_number_removed/SECTION3/SECTION4

How can i find two extra characters in DB2 and list down those in a column?

I have written this expression for checking extra characters and I am counting the occurrence of those extra characters.
REGEXP_COUNT('Mr.John® Êlite', regexp_extract ('Mr.John® Êlite','[^\x00-\x7F]'))
It's working fine if the string has only one extra character e.g
Mr. John®
It will take out ® and give me count as 1.
But if my string has two extra characters, it will only pick the first one and ignore the second character e.g
Mr.John® Êlite
My function will extract ® and ignore Ê.
I have tried subquery as well.Not working.Need help
As noted by Wiktor Stribiżew REGEXP_COUNT needs just a source string and regexp:
db2 "values REGEXP_COUNT('Mr.John® Êlite', '[^\x00-\x7F]')"
1
-----------
2
Because you used REGEXP_EXTRACT, it does extract the first occurrence only:
The REGEXP_EXTRACT scalar function returns one occurrence of a substring of a string that matches the regular expression pattern.
and only then you do actual count.

How can I replace a string pattern with blank in hive?

I have a string as:
https://maps.googleapis.com/maps/api/staticmap?center=41.892532+-87.63811&zoom=11&scale=2&size=280x320&maptype=roadmap&format=png&visual_refresh=true%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:1%7C2413+S+State+St++Chicago+IL+60616%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:2%7C3000+N+Halsted+St++Chicago+IL+60657%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&key=AIzaSyBNEAQcC5niAEeiP3zkA_nuWGvtl0IOEs4
I want to replace the '++++' pattern at the end with blank and not the single occurrence of '+'. Tried using regexp_replace and translate functions in hive but that replaces all the single occurrences of '+' as well.
Use
regexp_replace(string,'[+]{4}','')
Pattern '[+]{4}' means + caracter four times.
Test:
select regexp_replace('++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&','[+]{4}','');
Result:
OK
++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C&
Dod you try this?
replace(string, '++++', '')
Admittedly, this will replace all occurrences of '++++', but your string only has one of them.

Cut string after first occurrence of a character

I have strings like 'keepme:cutme' or 'string-without-separator' which should become respectively 'keepme' and 'string-without-separator'. Can this be done in PostgreSQL? I tried:
select substring('first:last' from '.+:')
But this leaves the : in and won't work if there is no : in the string.
Use split_part():
SELECT split_part('first:last', ':', 1) AS first_part
Returns the whole string if the delimiter is not there. And it's simple to get the 2nd or 3rd part etc.
Substantially faster than functions using regular expression matching. And since we have a fixed delimiter we don't need the magic of regular expressions.
Related:
Split comma separated column data into additional columns
regexp_replace() may be overload for what you need, but it also gives the additional benefit of regex. For instance, if strings use multiple delimiters.
Example use:
select regexp_replace( 'first:last', E':.*', '');
SQL Select to pick everything after the last occurrence of a character
select right('first:last', charindex(':', reverse('first:last')) - 1)