Get second match from regexp_matches results - sql

I have a name column which looks like this:
'1234567 - 7654321 - some - more - text'
I need to get a string "7654321". I am stuck with the following:
SELECT regexp_matches('1234567 - 7654321 - some - more - text', '\d+', 'g');
regexp_matches
----------------
{1234567}
{7654321}
(2 rows)
How do I what I want? Maybe there's a better option than regexp_matches - gladly will consider. Thx!

You could use REGEXP_REPLACE:
SELECT REGEXP_REPLACE('1234567 - 7654321 - some - more - text', '^\d+[^\d]+(\d+).*$', '\1');
Output
7654321
This regexp looks for a string starting with some number of digits (^\d+) followed by some non-digit characters ([^\d]+) and then another group of digits ((\d+)) followed by some number of characters until the end of the string (.*$). The () around the second group of digit characters makes that a capturing group, which we can then refer to in the replacement string with \1. Since REGEXP_REPLACE only replaces the parts of the string that match the regex, it is necessary to have a regex that matches the whole string in order to replace it with just the desired data.
Update
If there are potentially characters before the first set of digits, you should change the regex to
^[^\d]*\d+[^\d]+(\d+).*$
Update 2
If it's possible that there is only one set of numbers at the beginning, we must make matching the first part optional. We can do that with a non-capturing group:
^[^\d]*(?:\d+[^\d]+)?(\d+).*$
This makes the match on the first set of digits optional so that if it doesn't exist (i.e. there is only one set of digits) the regex will still match. By using a non-capturing group (adding the ?: to the beginning of the group, we don't need to change the replacement string from \1. Updated SQLFiddle

regexp_matches() returns a table, so you can use that in the from clause together with the with ordinality option:
SELECT t.value
from regexp_matches('1234567 - 7654321 - some - more - text', '\d+', 'g') with ordinality as t(value,idx)
where t.idx = 2;
Note that value is still an array, to get the actual array element you can use:
SELECT t.value[1]
from regexp_matches('1234567 - 7654321 - some - more - text', '\d+', 'g') with ordinality as t(value,idx)
where t.idx = 2;

Related

SQL Query to select a string after last delimiter

I want to retrieve a String after last appearance of ~ delimiter. I have whole string like "Attachments:Attachments~Attachment" and I want to take substring after ~ characters that is output will be Attachment. How can be this done in SQL/Oracle select statement?
Use REGEXP_SUBSTR
select regexp_substr('Attachments:Attachments~Attachment','[^~]+$') from dual;
[^ ] - Used to specify a nonmatching list where you are trying to match any character except for the ones in the list.
+ - Matches one or more occurrences
$ - Matches the end of a string
Demo on db<>fiddle
You can use the substr and instr for such simple pattern matching requirements as regexp will be costly compared to substr and instr combination.
You can try the following:
substr(str,instr(str,'~',-1) + 1)
Example:
SQL> select substr('Attachments:Attachments~Attachment1~Attachment2',
2 instr('Attachments:Attachments~Attachment1~Attachment2','~',-1) + 1)
3 from dual;
SUBSTR('ATT
-----------
Attachment2
SQL>

Get rows which contain exactly one special character

I have a SQL query which returns some rows having the below format:
DB_host
DB_host_instance
How can i filter to get rows which only have the format of 'DB_host' (place a condition to return values with only one occurrence of '_')
i tried using [0-9a-zA-Z_0-9a-zA-Z], but seems like its not right. Please suggest.
One option would be using REGEXP_COUNT and at most one underscore is needed then use
WHERE REGEXP_COUNT( col, '_' ) <= 1
or strictly one underscore should exist then use
WHERE REGEXP_COUNT( col, '_' ) = 1
A simple method is a regular expression:
where regexp_like(col, '^[^_]+_[^_]+$')
This matches the full string when there is a string with no underscores followed by an underscore followed by another string with no underscores.
You could also do this with LIKE, but it is more complicated:
where col like '%\_%' and col not like '%\_%\_%'
That is, has one underscore but not two. The \ is needed because _ is a wildcard for LIKE patterns.
You can suppress underscores in the string, and ensure that the length of the result is just one character less than the original:
where len(replace(col, '_', '')) = len(col) - 1
I wonder how this method would compare to a regex or two likes in terms of efficiency on a large dataset. I would not be surprised it it was more efficient.

Masking a query string param value using Postgres regexp_replace

I want to mask movie names with XXXXXXXX in a PostgreSQL table column. The content of the column is something like
hollywood_genre_movieTitle0=The watergate&categorey=blabla&hollywood_genre_movieTitle1=Terminator&hollywood_genre_movieTitle2=Spartacus&hollywood_genre_movieTitle3=John Wayne and the Indians&categorey=blabla&hollywood_genre_movieTitle4=Start Trek&hollywood_genre_movieTitle5=ET&categorey=blabla
And I would like to mask the titles (behind the pattern hollywood_genre_movieTitle\d) using the regexp_replace function
regexp_replace('(hollywood_genre_movieTitle\d+=)(.*?)(&?)', '\1XXXXXXXX\3', 'g')
This just replaces the first occurrence of a title and and cuts the string. In short this expression does not do the thing I want. What I would like is that all movies names are replace with XXXXXXXX.
Can someone help me solve that?
The regex does not work because (.*?)(&?) matches an empty string or & lands in Group 3 if it immediately follows hollywood_genre_movieTitle\d+= pattern.
You need to use a negated character class [^&] and a + quantifier to match any 1 or more chars other than & after the hollywood_genre_movieTitle\d+= pattern.
SELECT regexp_replace(
'hollywood_genre_movieTitle0=The watergate&categorey=blabla&hollywood_genre_movieTitle1=Terminator&hollywood_genre_movieTitle2=Spartacus&hollywood_genre_movieTitle3=John Wayne and the Indians&categorey=blabla&hollywood_genre_movieTitle4=Start Trek&hollywood_genre_movieTitle5=ET&categorey=blabla',
'(hollywood_genre_movieTitle\d+=)[^&]+',
'\1XXXXXXXX',
'g')
See the online demo.
Details
(hollywood_genre_movieTitle\d+=) - Capturing group 1:
hollywood_genre_movieTitle - a substring
\d+= - 1 or more digits and a = after them
[^&]+ - 1 or more chars other than &.

Oracle regexp to match only digits after certain combination of signs

I have a string which roughly looks like: XXXXXXXXX - 1234567 XXXXXXXX,
where X can be either digit, string or sign (<,>,. or space).
I need to extract only these numbers after ' - '.
I have tried following:
select regexp_substr('17.12.12 <XXXXXXXXXX> - 1234567 <XXXXXXXXXX>','(- )[0-9]{1,7}') from dual
I end up with - 1234567.
How to I get rid of '- '?
Thank you in advance
This should work with Oracle 11g.
Place the capturing group around the pattern part you are interested in first. Since you need the digits, wrap the [0-9]{1,7} with the capturing parentheses.
Then, pass all the 6 arguments to the REGEXP_SUBSTR function where the 6th one indicates the number of capturing group you want to extract:
select regexp_substr('17.12.12 <XXXXXXXXXX> - 1234567 <XXXXXXXXXX>',' - ([0-9]{1,7})', 1,1,NULL,1) from dual
Here, 1,1,NULL,1 means: start looking for a pattern match from Position 1, just for the first match, with no specific regex options, and return the contents of Group 1.
What #Gordon Linoff was trying to say was:
select substr(regexp_substr('17.12.12 <XXXXXXXXXX> - 1234567 <XXXXXXXXXX>','(- )[0-9]{1,7}'), 3)
from dual
Substr the remaining "- " off of your result.

Oracle SQL - select parts of a string

How can I select abcdef.txt from the following string?
abcdef.123.txt
I only know how to select abcdef by doing select substr('abcdef.123.txt',1,6) from dual;
You can using || for concat and substr -3 for right part
select substr('abcdef.123.txt',1,6) || '.' ||substr('abcdef.123.txt',-3) from dual;
or avoiding a concat (like suggested by Luc M)
select substr('abcdef.123.txt',1,7) || substr('abcdef.123.txt',-3) from dual;
A general solution, assuming the input string has exactly two periods . and you want to extract the first and third tokens, separated by one . The length of the "tokens" in the input string can be arbitrary (including zero!) and they can contain any characters other than .
select regexp_replace('abcde.123.xyz', '([^.]*).([^.]*).([^.]*)', '\1.\3') as result
from dual;
RESULT
---------
abcde.xyz
Explanation:
[ ] means match any of the characters between brackets.
^
means do NOT match the characters in the brackets - so...
[^.]
means match any character OTHER THAN .
* means match zero or
more occurrences, as many as possible ("greedy" match)
( ... ) is called a subexpression... see below
'\1.\3 means replace the original string
with the first subexpression, followed by ., followed by the THIRD
subexpression.
Replace the substring of anything surrounded by dots (inclusive) with a single dot. No dependence on lengths of components of the string:
SQL> select regexp_replace('abcdef.123.txt', '\..*\.', '.') fixed
from dual;
FIXED
----------
abcdef.txt
SQL>