Regular Expression in Oracle with REGEXP_SUBSTR - sql

I want to get the email address part of a string.
For example if the string is
"your name(your#name.com)" aaa#bbb.com
then I want to get only
aaaa#bbb.com
basically if I can remove the string within
""
then it does the trick. I am using below regular expression with REGEXP_SUBSTR
REGEXP_SUBSTR('"your name(abc#dd.com)" aaa#bbb.com',
'([a-zA-Z0-9_.\-])+#(([a-zA-Z0-9-])+[.])+([a-zA-Z0-9]{2,4})+')
kindly help.

You can simply indicate that the match must occur at the end of the string, using $ anchor.
with t1(col) as(
select '"your name(your#name.com)" aaa#bbb.com' from dual
)
select regexp_substr(col, '[[:alnum:]._%-]+#[[:alnum:]._%-]+\.com$') as res
from t1
Result:
RES
-----------
aaa#bbb.com

You probably need something more along the lines of:
REGEXP_SUBSTR('"your name(abc#dd.com)" aaa#bbb.com','[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}')
Things like [.] doesnt really make sense, dot matches any character and the square brackets is a kind of "OR" statement where any character inside can go in that place, but in your case you actually want to match the literal dot so you need to escape that \. not sure how oracle handles the escapes, you might need to double escape them.

SELECT REGEXP_SUBSTR(email, '[A-Za-z0-9\_\-\.]+#\w+\.\w+', 1, 2) AS cleaned_email
FROM
(
SELECT '"your name(your#name.com)" aaa#bbb.com' AS email FROM DUAL UNION ALL
SELECT '"your name(your.name#name.com)" aaa#bbb.com' AS email FROM DUAL
)
;

Related

ORACLE: How to use regexp_like to find a string with single quotes between two characters?

I need to query the DB for all records that have two single quite between characters. Example : We've, who's.
I have the regex https://regex101.com/r/6MtB9j/1 but it doesn't work with REGEXP_LIKE.
Tried this
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '(?<=[a-zA-Z])''(?=[a-zA-Z])')
Appreciate the help!
Oracle regex does not support lookarounds.
You do not actually need lookaround in this case, you can use
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '[a-zA-Z]''[a-zA-Z]')
This will work since REGEXP_LIKE only attempts one match, and if there is a match, it returns true, otherwise, false (eventually, fetching a record or not).
Lookarounds are useful in case you need to replace or extract values, when matches may overlap.
If you just need a single quote in a string, you can use:
where content like '%''%'
If they specifically need to be letters, then you need a regular expression:
regexp_like(content, '[a-zA-Z][''][a-zA-Z]')
or:
regexp_like(content, '[a-zA-Z]\'[a-zA-Z]')
If I understand well, you may need something like
regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2.
For example, this
with myTable(content) as
(
select q'[what's]' from dual union all
select q'[who's, what's]' from dual union all
select q'[who's, what's, I'm]' from dual
)
select *
from myTable
where regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2
gives
CONTENT
------------------
who's, what's

How to remove leftmost group of numbers from string in Oracle SQL?

I have a string like T_44B56T4 that I'd like to make T_B56T4. I can't use positional logic because the string could instead be TE_2BMT that I'd like to make TE_BMT.
What is the most concise Oracle SQL logic to remove the leftmost grouping on consecutive numbers from the string?
EDIT:
regex_replace is unavailable but I have LTRIM,REPLACE,SUBSTR, etc.
would this fit the bill? I am assuming there are alphanumeric characters, then underscore, and then the numbers you want to remove followed by anything.
select regexp_replace(s, '^([[:alnum:]]+)_\d*(.*)$', '\1_\2')
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual
)
It uses regular expressions with matched groups.
Alphanumeric characters before underscore are matched and stored in first group, then underscore followed by 0-many digits (it will match as many digits as possible) followed by anything else that is stored in second group.
If we have a match, the string will be replaced by content of the first group followed by underscore and content of the second group.
if there is no match, the string will not be changed.
It seems that you must use standard string functions, as regular expression functions are not available to you. (Comment under Gordon Linoff's answer; it would help if you would add the same at the bottom of your original question, marked clearly as EDIT).
Also, it seems that the input will always have at least one underscore, and any digits that must be removed will always be immediately after the first underscore.
If so, here is one way you could solve it:
select s, substr(s, 1, instr(s, '_')) ||
ltrim(substr(s, instr(s, '_') + 1), '0123456789') as result
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual union all
select '34_AB3_1D' from dual
)
S RESULT
--------- ------------------
T_44B56T4 T_B56T4
TXM_1JK7B TXM_JK7B
34_AB3_1D 34_AB3_1D
I added one more test string, to show that only digits immediately following the first underscore are removed; any other digits are left unchanged.
Note that this solution would very likely be faster than regexp solutions, too (assuming that matters; sometimes it does, but often it doesn't).
If I understand correctly, you can use regexp_replace():
select regexp_replace('T_44B56T4', '_[0-9]+', '_')
Here is a db<>fiddle with your two examples.
Note: Your questions says the left most grouping, but the examples all have the number following an underscore, so the underscore seems to be important.
EDIT:
If you really just want the first string of digits replaced without reference to the underscore:
select regexp_replace(code, '[0-9]+', '', 1, 1)
from (select 'T_44B56T4' as code from dual union all select 'TE_2BMT' from dual ) t

get everything before a string including itself oracle

I need to get everything before a string including itself and replace it with something else after that. For example, if I have a value in column as 28/29/81/732536/1496071 then I want to select everything before 81 including itself, i.e I want 28/29/81 from it and replace it with some other string. I have tried the below, but I am getting only 28/29.
SELECT SUBSTR(eda.ATTRIBUTE_VALUE, 0, INSTR(eda.ATTRIBUTE_VALUE, '81')-2) AS output, ATTRIBUTE_VALUE
FROM EVENT_DYNAMIC_ATTRIBUTE eda
The solution will have to work when the "token" ( the '81' in your example ) appears between two slashes, or right at the beginning of the string and before a slash, or right after the last slash at the end of the string. It should not match if '81' appears as part of a "token" (between slashes or before the first or after the last slash). Also, if the "token" appears more than once, it should be replaced (with everything before it) only once, and if it doesn't appear at all, then the original string should be unchanged.
If these are the rules, then you can do something like I show below. If any of the rules are different, the solution can be modified to accommodate.
I created a few input strings to test all these cases in a WHERE clause. I also created the "search token" and the "replacement text" in a second subquery in the WITH clause. The entire WITH clause should be replaced - it is not part of the solution, it is only for my own testing. In the main query you should use your actual table and column names (and/or hardcoded text).
I use REGEXP_REPLACE to find the token and replace it and everything that comes before it (but not the slash after it, if there is one) with the replacement text. I must be careful with that slash after the search token; I use a backreference in the replacement string in REGEXP_REPLACE for that purpose.
with
event_dynamic_attribute ( attribute_value ) as (
select '28/29/81/732536/1496071' from dual union all
select '29/33/530813/340042/88' from dual union all
select '81/6883/3902/81/993' from dual union all
select '123/45/6789/81' from dual
),
substitution ( token, replacement ) as (
select '81', 'mathguy is great' from dual
)
select attribute_value,
regexp_replace (attribute_value, '(^|.*?/)' || token || '(/|$)',
replacement || '\2', 1, 1) new_attrib_value
from event_dynamic_attribute cross join substitution
;
ATTRIBUTE_VALUE NEW_ATTRIB_VALUE
----------------------- ----------------------------------------
28/29/81/732536/1496071 mathguy is great/732536/1496071
29/33/530813/340042/88 29/33/530813/340042/88
81/6883/3902/81/993 mathguy is great/6883/3902/81/993
123/45/6789/81 mathguy is great
you can use something like this:
SELECT 'STRING_TO_REPLACE_WITH' || SUBSTR(eda.ATTRIBUTE_VALUE, INSTR(eda.ATTRIBUTE_VALUE, '81') + 2) AS output
FROM EVENT_DYNAMIC_ATTRIBUTE eda;

Extract substring that has special character - Oracle

In my column, there is a string that contains a word which starts with character '=' How can I extract those words? I found REGEXP_SUBSTR but I couldn't find out particular regular expression to do this? I appreciate any help. Thanks.
EDIT :
I have such a string :
"What a =lovely day!"
I want to get "=lovely"
You can use regexp_substr for this:
select regexp_substr(col, '=\S+')
from your_table;
=\S+:
= - match literal =
\S+ - match one or more non space characters
Try
REGEXP_SUBSTR('What a =lovely day!', '=\w+')
or
REGEXP_SUBSTR('What a =lovely day!', '=\S+')
depending on your needs.
Also if you want to match based on a list of special characters, use something like below.
In this example, you can match = and #. You can add more special character if you like.
Also if you want just = to be returned in case rest word is missing after that, then use \S*. Else use \S+
For strings which dont have this format, you will get null.
select regexp_substr(col1,'[=#]\S*') from
(select 'what a =lovely day' as col1 from dual union all
select 'some other #word as' from dual union all
select 'a normal string' from dual)

RegEx: Repeated identical vowels in a string - Oracle SQL

I need to only display those strings (name of manufacturers) that contain 2 or more identical vowels in Oracle11g. I am using a RegEx to find this.
SELECT manuf_name "Manufacturer", REGEXP_LIKE(manuf_name,'([aeiou])\2') Counter FROM manufacturer;
For example:
The RegEx accepts
OtterBox
Abca
abcA
The RegEx rejects
Samsung
Apple
I am not sure how to proceed ahead.
I think you want something like this:
WITH mydata AS (
SELECT 'OtterBox' AS manuf_name FROM dual
UNION ALL
SELECT 'Apple' FROM dual
UNION ALL
SELECT 'Samsung' FROM dual
)
SELECT * FROM mydata
WHERE REGEXP_LIKE(manuf_name, '([aeiou]).*\1', 'i');
I am not sure why you used \2 as a backreference instead of \1 -- \2 doesn't refer to anything in this regex. Also, note the wildcard and quantifier .* to indicate that there can be any number of any character between the first occurrence of the vowel and the second. Third, note the 'i' parameter to indicate a case-insensitive search (which I think is what you want since you say that the regex should match "OtterBox").
SQL Fiddle here.
David yours wasn't quite working for me. What about this?
\w*([aeiou])\w*\1+\w*
https://regex101.com/r/eE3iC2/3
EDIT: updated one per suggestions:
.*([aeiou]).*\1.*
https://regex101.com/r/eE3iC2/5