Regexp_extract everything after appearance of '-q_' - sql

Have strings containing 'q_' which I want to extract everything that comes after it. Some rows contain occurrence of q_ which I want everything that occurs after it. Example values in the column are:
prod-q_cat_trait_cat_social_issue
_prod-q_body_modification_graffiti
event_tickets
dappled_grey
_prod-q_cat_tech_support
What is wrong with my regular expression as I'm trying to remove the trailing '_' after q.
REGEXP_EXTRACT(queue_id, '[^q_]+$')
Is just returning
issue
I've also tried the split method:
SPLIT(queue_id, 'q_')[OFFSET(2)]
But this returns
Array index 2 is out of bounds (overflow)
Any suggestions. Thanks! (I am using Google Cloud SQL)

Using a capturing group, you may extract all after the first q_ with:
REGEXP_EXTRACT(queue_id, 'q_(.*)')
You may extract all after the last q_ with:
REGEXP_EXTRACT(queue_id, '.*q_(.*)')
See the regex demo #1 and regex demo #2.
Here, q_ finds the first occurrence of q_ and (.*) grabs the rest of the line into Group 1, and this is the value returned by REGEXP_EXTRACT. .* matches any 0+ chars other than line break chars as many as possible, that is why the second regex will start capturing the rest of the line after the last occurrence of q_.

Google Cloud SQL uses MySQL. I think the simplest method is substring_index():
select substring_index(queue_id, '-q_', -1)

Can you try this : q_([^q_]+)$? You'll have what you want in the first group.
Edit: this one match all the cases > (?(?<=-q_).*|^((?!-q_).)*$)

Related

Postgres - substring from the beginning to the second last occurrence of a char within a string

I need to retrieve the bolded section of the below string . This value is in a column within my Postgres database table.
SEALS_LME_TRADES_MBL_20220919_00212.csv
I tried to utilize the functions; substring, reverse, strpos but they all have limitations. It seems like regex is the best option, however I was not able to do it.
Essentially I need to substring from beginning till the second last '_'. I do not want the date and sequence number along with the file extension at the end.
The closes regex I managed to get is: ^(([^]*){4})
https://regex101.com/
This look a little wonky but how about this?
select substring ('SEALS_LME_TRADES_MBL_20220919_00212.csv', '^(.+)_[^_]+_[^_]+')
Translation
^ from the beginning
(.+) any characters (capture and return this value), followed by
_ an underscore, followed by
[^_]+ one or more non-underscores, followed by
_ an underscore, followed by
[^_]+ one or more non-underscores
Regex greediness will cause any incidental underscores to be captured in the initial string.
Technically speaking the last portion (one or more non-underscores) can probably be omitted.

REGEX to search for and remove all characters up to and including the last hyphen

I am looking for a way to search for and remove everything up to and including the - from my strings below. I have tried variations and none works exactly how I want it to. I tried regex_replace, but it did not catch all of them, and I found myself creating individual regexp statements for each scenario, which did not seem any better than hard-coding. I am hoping someone has a solution. I would very much appreciate it.
POLY GON - HOME
POLY-GON-HOME
POLY - GON - HOME
POLY - GON HOME
PG - HOME
PG-HOME
I want to show everything after the second hyphen. So, HOME is what I want to display.
I tried
regexp_replace(string,\A[^-]+-[^-]+)
but it removes everything except for the second hyphen. Otherwise it works.
Use
SELECT regexp_replace(string, '^.*-', '')
^ matches the beginning of the string.
.* matches any string
- matches hyphen
Since * is greedy, this will match everything up to the last hyphen. It then gets replaced with an empty string.
I am thinking something like:
select regexp_substr(string, '[^-]+$')
This basically keeps the last string of characters that are not hyphens.
Not all databases that support regular expression supports regexp_substr(), but they have some similar function.

Big Query Regex Extraction

I am trying to extract a item_subtype field from an URL.
This regex works fine in the to get the first item item_type
SELECT REGEXP_EXTRACT('info?item_type=icecream&item_subtype=chocolate/cookies%20cream,vanilla&page=1', r'item_type=(\w+)')
but what is the correct regex to get everything starting from 'chocolate' all the way to before the '&page1'
I have tried this, but can't seem to get it to work to go further
SELECT REGEXP_EXTRACT('info?item_type=icecream&item_subtype=chocolate/cookies%20cream,vanilla&page=1', r'item_subtype=(\w+[^Z])')
basically, I want to extract 'chocolate/cookies%20cream,vanilla'
In your case, \w+ only matches one or more letters, digits or underscores. Your expected values may contain other characters, too.
You may use
SELECT REGEXP_EXTRACT('info?item_type=icecream&item_subtype=chocolate/cookies%20cream,vanilla&page=1', r'item_subtype=([^&]+)')
See the regex demo.
Notes:
item_subtype= - this string is matched as a literal char sequence
([^&]+) - a Capturing group 1 that matches and captures one or more chars other than & into a separate memory buffer that is returned by REGEXP_EXTRACT function.

How to get the part of a string after the last occurrence of certain character?

I would like to have the substring after the last occurrence of a certin character.
Now I found here how to get the first, second or so parts, but I need only the last part.
The input data is a list of file directories:
c:\dir\subdir\subdir\file.txt
c:\dir\subdir\subdir\file2.dat
c:\dir\subdir\file3.png
c:\dir\subdir\subdir\subdir\file4.txt
Unfortunately this is the data I have to work it, otherwise I could list it using command prompt.
The problem is that the number of the directories are always changing.
My code based on the previous link is:
select (regexp_split_to_array(BTRIM(path),'\\'))[1] from myschema.mytable
So far I've tried some things in the brackets that came in to my mind. For example [end], [-1] etc.
Non of them are working. Is there a way to get the last part without rearranging my strings backwards, and getting the first part, then turning it back?
You can use regexp_matches():
select (regexp_matches(path, '[^\\]+$'))[1]
Here is a db<>fiddle.

SQL substring non greedy regex

I have data like
http://www.linz.at/politik_verwaltung/32386.asp
stored in a text column. I thought a non-greedy extraction with
select substring(turl from '\..*?$') as ext from tdata
would give me .asp but instead it still ?greedely results in
.linz.at/politik_verwaltung/32386.asp
How can I only match against the last occurence of dot .?
Using Postgresql 9.3
\.[^.]*$ matches . followed by any number of non-dot characters followed by end-of-string:
# select substring('http://www.linz.at/politik_verwaltung/32386.asp'
from '\.[^.]*$');
substring
-----------
.asp
(1 row)
As for why the non-greedy quantifiers do not work here is that they still start matching as soon as possible while still trying to match as short as possible from there on.
Try this:
\.[\w]*$
Here is how it works:
all the word characters (\w), any numbers of them with *, between dot (\.) and the end of the string ($), with the last . itself.
Note: updated the answer, now will capture the strings ends with ..