SQL regex expression for text before pipe - sql

I need an oracle regex to fetch data before first pipe and after the last slash from the text before pipe.
For example, from the string:
test=file://2019/13/40/9/53/**2abc123-7test-1edf-9xyz-12345678.bin**|type
the data to be fetched is:
2abc123-7test-1edf-9xyz-12345678.bin

This works in Oracle :
select regexp_substr(col,'[^|/]+\.\w+',1,1,'i')
from (
select 'test=file://2019/13/40/9/53/2abc123-7test-1edf-9xyz-12345678.bin|type=app/href|size=1234|encoding=|locale=en_|foo.bar' as col
from dual
) q
MySql & TeraData also have such REGEXP_SUBSTR function, but haven't tested it on those.

The pattern ^.+?/([^/]+?)\| starts at the beginning of the string, skips over every character, then captures all non-slash characters, between the last slash and the first pipe.

You may use:
REGEXP_SUBSTR(column, '/([^/|]+)\|', 1, 1, NULL, 1)
Live demo here
Regex breakdown:
/ Match literally
( Start of capturing group #1
[^/|]+ Match anything except slash and pipe, at least one character
) End of CG #1
\| Match a pipe

[^\/]*?(?=\|)
[^\/]*? — matches any character that is not a backslash
(?=\|) — positive lookahead to match a vertical line

Related

How to strip ending date from string using Regex? - SQL

I want to format the strings in a table column, in a specific format.
Input table:
file_paths
my-file-path/wefw/wefw/2022-03-20
my-file-path/wefw/2022-01-02
my-file-path/wef/wfe/wefw/wef/2021-02-03
my-file-path/wef/wfe/wef/
I want to remove everything after the last / sign, if the only thing after it resembles a date (i.e. YYYY-MM-dd or ####-##-##).
Output:
file_paths
my-file-path/wefw/wefw/
my-file-path/wefw/
my-file-path/wef/wfe/wefw/wef/
my-file-path/wef/wfe/wef/
I'm thinking of doing something like:
SELECT regexp_replace(file_paths, 'regex_here', '', 1, 'i')
FROM my_table
I'm unsure of how to write the RegEx for this though. I'm also open to easier methods of string manipulation, if there are any. Thanks in advance!
You can use
REGEXP_REPLACE ( file_paths, '/[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}$', '/' )
See the regex demo.
The /[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}$ is a POSIX ERE compliant pattern matching
/ - a slash
[0-9]{4} - four digits
- - a hyphen
[0-9]{1,2} - one or two digits
-[0-9]{1,2} - a hyphen and one or two digits
$ - end of string.
If your values can contain trailing whitespace, insert [[:space:]]* before $: /[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}[[:space:]]*$.
You may use this regex:
^(.*?\/)\d{4}-\d{2}-\d{2}$
You may try this query:
select regexp_replace(file_paths, '^(.*?\/)\\d{4}-\\d{2}-\\d{2}$','$1')
Demo

regex extract big query with numeric data

how would I be able to grab the number 2627995 from this string
"hellotest/2627995?hl=en"
I want to grab the number 2627995, here is my current regex but it does not work when I use regex extract from big query
(\/)\d{7,7}
SELECT
REGEXP_EXTRACT(DESC, r"(\/)\d{7,7}")
AS number
FROM
`string table`
here is the output
Thank you!!
I think you just want to match all digits coming after the last path separator, before either the start of the query parameter, or the end of the URL.
SELECT REGEXP_EXTRACT(DESC, r"/(\d+)(?:\?|$)") AS number
FROM `string table`
Demo
Try this one: r"\/(\d+)"
Your code returns the slash because you captured it (see the parentheses in (\/)\d{7,7}). REGEXP_EXTRACT only returns the captured substring.
Thus, you could just wrap the other part of your regex with the parentheses:
SELECT
REGEXP_EXTRACT(DESC, r"/(\d{7})")
AS number
FROM
`string table`
NOTE:
In BigQuery, regex is specified with string literals, not regex literals (that are usually delimited with forward slashes), that is why you do not need to escape the / char (it is not a special regex metacharacter)
{7,7} is equal to {7} limiting quantifier, meaning seven occurrences.
Also, if you are sure the number is at the end of string or is followed with a query string, you can enhance it as
REGEXP_EXTRACT(DESC, r"/(\d+)(?:[?#]|$)")
where the regex means
/ - a / char
(\d+) - Group 1 (the actual output): one or more digits
(?:[?#]|$) - either ? or # char, or end of string.

Oracle REGEXP_SUBSTR from CLOB

I'm trying to find a substring from a CLOB-field in my database.
Consider the following string:
someothertext 2. Grad Dekubitus (Druckgeschwür) mit
Abschürfung/Blase/Hautverlust someothertext
I only want to extract the "2. Grad" from the string, but my Regexp doesn't seem to work - I tested it on the string in some online regexp checkers, where it does actually work (Fiddle)
This is my regular expression:
REGEXP_SUBSTR(DBMS_LOB.SUBSTR(cf.TEXT, 4000), '\b[0-9]\.\sGrad$') AS "Grad"
Currently, it returns NULL, but I'm not sure why.
Any ideas on how to get this working?
Oracle does not support word boundaries \b in regular expressions.
Either remove the \b or replace it with (^|\s) if you are expecting white space before the digit.
You also need to remove the trailing $ as you are not trying to match the end of the string at that point.
REGEXP_SUBSTR( DBMS_LOB.SUBSTR(cf.TEXT, 4000), '(^|\s)[0-9]\.\sGrad' ) AS "Grad"
Also, if you can have multi-digit numbers then you may want to use [0-9]+.
If you do not want the leading white space then you can wrap the second part of your expression in a capturing group and then extract that capturing group's value with the 6th argument of REGEXP_SUBSTR:
REGEXP_SUBSTR(
DBMS_LOB.SUBSTR(cf.TEXT, 4000),
'(^|\s)([0-9]\.\sGrad)',
1, -- Start from the 1st character
1, -- Find the 1st occurrence
NULL, -- No flags
2 -- Return the 2nd capturing group
) AS "Grad"
Oracle regex does not support word boundaries. Also, the $ is redundant in your pattern (note you do not use it in your regex demo).
You can use
REGEXP_SUBSTR(
'someothertext 2. Grad Dekubitus (Druckgeschwür) mit Abschürfung/Blase/Hautverlust someothertext',
'(^|\D)([0-9]\.\sGrad)', 1, 1, NULL, 2
) AS "Grad"
where
(^|\D) - Group 1: start of string or a non-digit
([0-9]\.\sGrad) - Group 2: a digit, a dot, as whitespace and Grad
If the digit matched with [0-9] should be preceded with whitespace, you may replace (^|\D) with (\s|^).

Extract string between different special symbols

I am having following string in my query
.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt
beginning with a period from which I need to extract the segment between the final \ and the file extension period, meaning following expected result
ABC__123_123_123_ABC123
Am fairly new to using REGEXP and couldn't help myself to an elegant (or workable) solution with what Q&A here or else. In all queries the pattern is the same in quantity and order but for my growth of knowledge I'd prefer to not just count and cut.
You can use REGEXP_REPLACE function such as
REGEXP_REPLACE(col,'(.*\\)(.*)\.(.*)','\2')
in order to extract the piece starting from the last slash upto the dot. Preceding slashes in \\ and \. are used as escape characters to distinguish the special characters and our intended \ and . characters.
Demo
You need just regexp_substr and simple regexp ([^\]+)\.[^.]*$
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'([^\]+)\.[^.]*$',
1, -- position
1, -- occurence
null, -- match_parameter
1 -- subexpr
) substring
from dual;
([^\]+)\.[^.]*$ means:
([^\]+) - find one or more(+) any characters except slash([] - set, ^ - negative, ie except) and name it as group \1(subexpression #1)
\. - then simple dot (. is a special character which means any character, so we need to "escape" it using \ which is an escape character)
[^.]* - zero or more any characters except .
$ - end of line
So this regexp means: find a substring which consist from: one or more any characters except slash followed by dot followed by zero or more any characters except dot and it should be in the end of string. And subexpr parameter = 1, says oracle to return first subexpression (ie first matched group in (...))
Other parameters you can find in the doc.
Here is my simple full compatible example with Oracle 11g R2, PCRE2 and some other languages.
Oracle 11g R2 using function substr (Reference documentation)
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}',
1,
1
) substring
from dual;
Pattern: ((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}
Result: ABC__123_123_123_ABC123
Just as simple as it can be, regular expressions always follow a minimal standard, as you can see portability also provided, just for the case someone else is interested in going the simplest way.
Hopefully, this will help you out!

Oracle sql REGEXP_REPLACE expression to replace a number in a string matching a pattern

I have a string 'ABC.1.2.3'
I wish to replace the middle number with 1.
Input 'ABC.1.2.3'
Output 'ABC.1.1.3'
Input 'XYZ.2.2.1'
Output 'XYZ.2.1.1'
The is, replace the number after second occurrence of '.' with 1.
I know my pattern is wrong, the sql that I have at the moment is :
select REGEXP_REPLACE ('ABC.1.2.8', '(\.)', '.1.') from dual;
You can use capturing groups to refer to surrounding numbers in replacement string later:
select REGEXP_REPLACE ('ABC.1.2.8', '([0-9])\.[0-9]+\.([0-9])', '\1.1.\2') from dual;
You could use
^([^.]*\.[^.]*\.)\d+(.*)
See a demo on regex101.com.
This is:
^ # start of the string
([^.]*\.[^.]*\.) # capture anything including the second dot
\d+ # 1+ digits
(.*) # the rest of the string up to the end
This is replaced by
$11$2