REGEXP_SUBSTR with a URL - sql

I have a string in which I'm trying to extract a URL from. When I run it on this RegEx site, it works fine.
The Regex Pattern is: http:\/\/GNTXN.US\/\S+
The message I'm extracting from is below, and lives in a column called body in my SQL database.
Test Message: We want to hear from you! Take our 2022 survey & tell us what matters most to you this year: http://GNTXN.US/qsx Text STOP 2 stop/HELP 4 help
But when I run the following in SQL:
SELECT
body,
REGEXP_SUBSTR(body, 'http:\/\/GNTXN.US\/\S+') new_body
FROM
table.test
It returns no value. I have to imagine it's something to do with the backslashes in the URL, but I've tried everything.
The new_body output should read as http://GNTXN.US/qsx

In mysql you just need to escape the \
select body, REGEXP_SUBSTR(body, 'http:\\/\\/GNTXN.US\\/\\S+') as new_body
from table.test;
new_body output:
http://GNTXN.US/qsx

Related

Extract characters between a string and the first occurrence of something in BigQuery

I want to extract a set of characters between "u1=" and the first semi-colon using a regex. For instance, given the following string: id=1w54;name=nick;u1=blue;u2=male;u3=ohio;u5=
The desired regex output should be just blue.
I tested (?<=u1=)[^;]* on https://regex101.com and it works. However, when I run this in BigQuery, using regexp_extract(string, '(?<=u1=)[^;]*') , I get an error that reads "Cannot parse regular expression: invalid perl operator: (?<"
I'm confused why this isn't working in BQ. Any help would be appreciated.
You can use regexp_extract() like this:
regexp_extract(string, 'u1=([^;]+)')

TRIM or REPLACE in Netsuite Saved Search

I've looked at lots of examples for TRIM and REPLACE on the internet and for some reason I keep getting errors when I try.
I need to strip suffixes from my Netsuite item record names in a saved item search. There are three possible suffixes: -T, -D, -S. So I need to turn 24335-D into 24335, and 24335-S into 24335, and 24335-T into 24335.
Here's what I've tried and the errors I get:
Can you help me please? Note: I can't assume a specific character length of the starting string.
Use case: We already have a field on item records called Nickname with the suffixes stripped. But I've ran into cases where Nickname is incorrect compared to Name. Ex: Name is 24335-D but Nickname is 24331-D. I'm trying to build a saved search alert that tells me any time the Nickname does not equal suffix-stripped Name.
PS: is there anywhere I can pay for quick a la carte Netsuite saved search questions like this? I feel bad relying on free technical internet advice but I greatly appreciate any help you can give me!
You are including too much SQL - a formulae is like a single result field expression not a full statement so no FROM or AS. There is another place to set the result column/field name. One option here is Regex_replace().
REGEXP_REPLACE({name},'\-[TDS]$', '')
Regex meaning:
\- : a literal -
[TDS] : one of T D or S
$ : end of line/string
To compare fields a Formulae (Numeric) using a CASE statement can be useful as it makes it easy to compare the result to a number in a filter. A simple equal to 1 for example.
CASE WHEN {custitem_nickname} <> REGEXP_REPLACE({name},'\-[TDS]$', '') then 1 else 0 end
You are getting an error because TRIM can trim only one character : see oracle doc
https://docs.oracle.com/javadb/10.8.3.0/ref/rreftrimfunc.html (last example).
So try using something like this
TRIM(TRAILING '-' FROM TRIM(TRAILING 'D' FROM {entityid}))
And always keep in mind that saved searches are running as Oracle SQL queries so Oracle SQL documentation can help you understand how to use the available functions.

how to get specific part from string in sql

I want to retrieve file names from urls in sql.
for example:
Input:
url:
https://www.google.co.in/root/subdir/file.extension?p1=v1&p2=v2
https://www.abxdhcak.com/sitemap-companies.xml
then Output should be:
file.extension
sitemap-companies.xml
To match your expected output you can use REGEXP_REPLACE
REGEXP_REPLACE(txt, '^.*/|\?.*$') as rg
This does 2 things:
'^.*/'
This removes all characters up to and including the last forward-slash in the string.
'\?.*$'
This removes all characters after and including a question mark.
This may not work for all cases, but it works for the examples provided.

Hive regex_extract for values in bracket

This is probably a simple problem but unfortunately I wasn't able to get the results I wanted.
I have the following input line
A[C1234/3/4]b[123/0]C[123/0]d[123/0]E[123/0]d[http://google.com]AD[M/1/2]g[ab]
I want to retrieve the numbers using regex_extract in Hive
1/2
which is followed by "AD[M/ " in each case.
I am currently using
'\(AD([^)]+)\)' which gives output AD[M/1/2]g[ab]
Implementing any other like (//d*) is give a code 2 error. Please suggest the possible replacements
Try this regex
.*AD\[M\/(.*)\].*
by the way () should be the capturing bracket pair, not \(\)

Extract substring from character A to character B or EOL using Regexp_extract of Big Query / Google Analytics

I'm working with Google Big Query and try to extract some information from a string column into another column using Regexp_extract. In short:
Data in myVariable:
yippie/eggs-spam/?portlet:hungry=1234
yippie/eggs-spam/?portlet:hungry=456&portlet:hungrier=7890
I want a column with:
1234
456
My command:
SELECT Regexp_extract(myVariable, r'SOME_MAGIC') as result
FROM table
I tried for SOME_MAGIC:
hungry=(.*)[&$] - null, 456 (I learned that $ is interpreted as is)
hungry=(.*)(&|$) - Error: Exactly one capturing group must be specified
hungry=(.*)^& - null, null
hungry=(&.*)?$ - null, null
I read this, but there the number has a fixed length. Also looked at this, but "?=" is no known command for perl.
Does anybody have an idea? Thank you in advance!
I just found an answer to how I can solve my problem differently:
hungry=([0-9]+) - 1234, 456
It isn't an answer to my abstract question (regex for selecting Charater A to [Character B or EOL]), so it's not that satisfying. E.g. it won't work with
yippie/eggs-spam/?portlet:hungry=12AB34
However my original problem is solved. I leave the question open for a while in case somebody has a better answer.
I think I had a similar problem were I was trying to select the last 6 characters in a string (link_id) as a new column.
I kept getting this error:
Exactly one capturing group must be specified
My code originally was:
SELECT
...
REGEXP_EXTRACT(link_id, r'......$') AS updated_link_id
FROM sometable;
To get rid of the error and retrieve the correct substring as a column, I had to add parentheses around my regex string.
SELECT
...
REGEXP_EXTRACT(link_id, r'(......$)') AS updated_link_id
FROM sometable;