REGEXP_EXTRACT after a pattern - sql

I'm really blind about regex. I want to extract after the word 'page='
page=course; uri=%2Fcourse
page=homepage
in this case it would be "course" and "homepage". I've tried
REGEXP_EXTRACT(context, r"(?<=page=)+[a-zA-Z]*")
but it said "Cannot parse regular expression: invalid perl operator: (?<" in Google BigQuery.
Any suggestions? Thank you!

Google's regex library doesn't support lookbehinds, but you can use a capture group to get REGEXP_EXTRACT to return that substring instead of the entire match:
REGEXP_EXTRACT(context, r"page=([a-zA-Z]+)")

Related

TERADATA REGEXP_SUBSTR Get string between two values

I am fairly new to teradata, but I was trying to understand how to use REGEXP_SUBSTR
For example I have the following cell value = ABCD^1234567890^1
How can I extract 1234567890
What I attempted to do is the following:
REGEXP_SUBSTR(x, '(?<=^).*?(?=^)')
But this didnt seem to work.
Can anyone help?
It might (or might not) be possible to use REGEXP_SUBSTR() to handle this, but you would need to use a capture group. An alternative here would be to do a regex replacement instead:
SELECT x, REGEXP_REPLACE(x, '^.*?\^|\^.*$', '') AS output
FROM yourTable;
The regex pattern used here matches:
^.*?\^ everything from the start to the first ^
| OR
\^.*$ everything from the second ^ to the end
We then replace with empty string to remove the content being matched.

Extract characters between a string and the first occurrence of something in BigQuery

I want to extract a set of characters between "u1=" and the first semi-colon using a regex. For instance, given the following string: id=1w54;name=nick;u1=blue;u2=male;u3=ohio;u5=
The desired regex output should be just blue.
I tested (?<=u1=)[^;]* on https://regex101.com and it works. However, when I run this in BigQuery, using regexp_extract(string, '(?<=u1=)[^;]*') , I get an error that reads "Cannot parse regular expression: invalid perl operator: (?<"
I'm confused why this isn't working in BQ. Any help would be appreciated.
You can use regexp_extract() like this:
regexp_extract(string, 'u1=([^;]+)')

Regexp_Extract BigQuery anything up to "|"

I'm fairly new to coding and I was wondering if you could give me a hand writing some regular expression for BigQuery SQL.
Basically I would like to extract everything before the bar sign "|" for one of my column.
Example:
Source string:
bla-BLABLA-cid=123456_sept1220_blabla--potato-Blah|someMore_string_stuff-IDontNeed
Desired output:
bla-BLABLA-cid=123456_sept1220_blabla--potato-Blah
I thought about using the REGEXP_EXTRACT(string, delimiter) function but I'm totally unable to write some regex (LOL). Therefore I had a look over Stack, and have found stuff like:
SELECT REGEXP_EXTRACT( String_Name , "\S*\s*\|" ) ,
# or
SELECT REGEXP_EXTRACT( String_Name , '.+?(?=|)')
But every time I get error messages like " invalid perl operator: (?= " or "Illegal escape space"
Would you have any suggestions on why I get these messages and/or how could I proceed to extract these strings?
Many many thanks in advance <3
You can use SPLIT instead:
SELECT SPLIT("bla-BLABLA-cid=123456_sept1220_blabla--potato-Blah|someMore_string_stuff-IDontNeed", "|")[OFFSET(0)]
Prefix the pattern string with r:
SELECT REGEXP_EXTRACT(String_Name, r'\S*\s*\|')
This is the syntax for a raw string constant. You can review what this means in the documentation.

What's the regular expression to replace (., space, -, ) in a string using Labview

I am trying to replace some unnecessary characters in a string in LabView. For example in the following string, "he llo-hOW-are.You?" I want to replace space, . and - with an Underscore _.
The output shall be "he_llo_hoW_are_You?"
I tried "[\ \.\-]" and many more things but nothing seem to work for a regular expression.
I am using Search and Replace function and I am putting the Regular expression in the Search String field.
Thanks.
Did you make sure to right click the function and check the Regular Expression option? The function won't work with regexes otherwise (as the help for it says).
This seems to work just fine here:
Make sure you set the Regular Expression in the "Search and Replace String" right click menu.
It should work with the regular expression you are trying to use. Or you can use [\ .-] in the Regex as well.
This logic will apply all special Characters.

Regular expression for date in selenium ide

I tried to create a regular expression for the below date format, where as I am unable to justify that, please help me out.
04-Apr-2013 [10:58:13 GMT+05:30]
This is what I came up with:
\\d{2}-\\w{3}-d{4} [\\d{2}:d{2}:d{2} \\w{3}+\\d{2}:d{2}]
Correct me where i have gone wrong.
Thanks
You have to escape the square brackets as they have a special meaning in regular expressions and also some of your digit indicators doesn't have backslash prefix. I've tested the following regex and it worked for me:
regexp:\d{2}-\w{3}-\d{4} \[\d{2}:\d{2}:\d{2} \w{3}\+\d{2}:\d{2}\]