get everything before a string including itself oracle - sql

I need to get everything before a string including itself and replace it with something else after that. For example, if I have a value in column as 28/29/81/732536/1496071 then I want to select everything before 81 including itself, i.e I want 28/29/81 from it and replace it with some other string. I have tried the below, but I am getting only 28/29.
SELECT SUBSTR(eda.ATTRIBUTE_VALUE, 0, INSTR(eda.ATTRIBUTE_VALUE, '81')-2) AS output, ATTRIBUTE_VALUE
FROM EVENT_DYNAMIC_ATTRIBUTE eda

The solution will have to work when the "token" ( the '81' in your example ) appears between two slashes, or right at the beginning of the string and before a slash, or right after the last slash at the end of the string. It should not match if '81' appears as part of a "token" (between slashes or before the first or after the last slash). Also, if the "token" appears more than once, it should be replaced (with everything before it) only once, and if it doesn't appear at all, then the original string should be unchanged.
If these are the rules, then you can do something like I show below. If any of the rules are different, the solution can be modified to accommodate.
I created a few input strings to test all these cases in a WHERE clause. I also created the "search token" and the "replacement text" in a second subquery in the WITH clause. The entire WITH clause should be replaced - it is not part of the solution, it is only for my own testing. In the main query you should use your actual table and column names (and/or hardcoded text).
I use REGEXP_REPLACE to find the token and replace it and everything that comes before it (but not the slash after it, if there is one) with the replacement text. I must be careful with that slash after the search token; I use a backreference in the replacement string in REGEXP_REPLACE for that purpose.
with
event_dynamic_attribute ( attribute_value ) as (
select '28/29/81/732536/1496071' from dual union all
select '29/33/530813/340042/88' from dual union all
select '81/6883/3902/81/993' from dual union all
select '123/45/6789/81' from dual
),
substitution ( token, replacement ) as (
select '81', 'mathguy is great' from dual
)
select attribute_value,
regexp_replace (attribute_value, '(^|.*?/)' || token || '(/|$)',
replacement || '\2', 1, 1) new_attrib_value
from event_dynamic_attribute cross join substitution
;
ATTRIBUTE_VALUE NEW_ATTRIB_VALUE
----------------------- ----------------------------------------
28/29/81/732536/1496071 mathguy is great/732536/1496071
29/33/530813/340042/88 29/33/530813/340042/88
81/6883/3902/81/993 mathguy is great/6883/3902/81/993
123/45/6789/81 mathguy is great

you can use something like this:
SELECT 'STRING_TO_REPLACE_WITH' || SUBSTR(eda.ATTRIBUTE_VALUE, INSTR(eda.ATTRIBUTE_VALUE, '81') + 2) AS output
FROM EVENT_DYNAMIC_ATTRIBUTE eda;

Related

ORACLE: How to use regexp_like to find a string with single quotes between two characters?

I need to query the DB for all records that have two single quite between characters. Example : We've, who's.
I have the regex https://regex101.com/r/6MtB9j/1 but it doesn't work with REGEXP_LIKE.
Tried this
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '(?<=[a-zA-Z])''(?=[a-zA-Z])')
Appreciate the help!
Oracle regex does not support lookarounds.
You do not actually need lookaround in this case, you can use
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '[a-zA-Z]''[a-zA-Z]')
This will work since REGEXP_LIKE only attempts one match, and if there is a match, it returns true, otherwise, false (eventually, fetching a record or not).
Lookarounds are useful in case you need to replace or extract values, when matches may overlap.
If you just need a single quote in a string, you can use:
where content like '%''%'
If they specifically need to be letters, then you need a regular expression:
regexp_like(content, '[a-zA-Z][''][a-zA-Z]')
or:
regexp_like(content, '[a-zA-Z]\'[a-zA-Z]')
If I understand well, you may need something like
regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2.
For example, this
with myTable(content) as
(
select q'[what's]' from dual union all
select q'[who's, what's]' from dual union all
select q'[who's, what's, I'm]' from dual
)
select *
from myTable
where regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2
gives
CONTENT
------------------
who's, what's

How to remove leftmost group of numbers from string in Oracle SQL?

I have a string like T_44B56T4 that I'd like to make T_B56T4. I can't use positional logic because the string could instead be TE_2BMT that I'd like to make TE_BMT.
What is the most concise Oracle SQL logic to remove the leftmost grouping on consecutive numbers from the string?
EDIT:
regex_replace is unavailable but I have LTRIM,REPLACE,SUBSTR, etc.
would this fit the bill? I am assuming there are alphanumeric characters, then underscore, and then the numbers you want to remove followed by anything.
select regexp_replace(s, '^([[:alnum:]]+)_\d*(.*)$', '\1_\2')
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual
)
It uses regular expressions with matched groups.
Alphanumeric characters before underscore are matched and stored in first group, then underscore followed by 0-many digits (it will match as many digits as possible) followed by anything else that is stored in second group.
If we have a match, the string will be replaced by content of the first group followed by underscore and content of the second group.
if there is no match, the string will not be changed.
It seems that you must use standard string functions, as regular expression functions are not available to you. (Comment under Gordon Linoff's answer; it would help if you would add the same at the bottom of your original question, marked clearly as EDIT).
Also, it seems that the input will always have at least one underscore, and any digits that must be removed will always be immediately after the first underscore.
If so, here is one way you could solve it:
select s, substr(s, 1, instr(s, '_')) ||
ltrim(substr(s, instr(s, '_') + 1), '0123456789') as result
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual union all
select '34_AB3_1D' from dual
)
S RESULT
--------- ------------------
T_44B56T4 T_B56T4
TXM_1JK7B TXM_JK7B
34_AB3_1D 34_AB3_1D
I added one more test string, to show that only digits immediately following the first underscore are removed; any other digits are left unchanged.
Note that this solution would very likely be faster than regexp solutions, too (assuming that matters; sometimes it does, but often it doesn't).
If I understand correctly, you can use regexp_replace():
select regexp_replace('T_44B56T4', '_[0-9]+', '_')
Here is a db<>fiddle with your two examples.
Note: Your questions says the left most grouping, but the examples all have the number following an underscore, so the underscore seems to be important.
EDIT:
If you really just want the first string of digits replaced without reference to the underscore:
select regexp_replace(code, '[0-9]+', '', 1, 1)
from (select 'T_44B56T4' as code from dual union all select 'TE_2BMT' from dual ) t

REGEXP to insert special characters, not remove

How would i put double quotes around the two fields that are missing it? Would i be able to use like a INSTR/SUBSTR/REPLACE in one statement to accomplish it?
string := '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Expected string := '"ES26653","ABCBEVERAGES","861526999728","**606.32**","2017-01-26","2017-01-27","","","**77910467**","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Please suggest! Thank you.
This answer does not work in this case, because some fields contain commas. I am leaving it in case it helps anyone else.
One rather brute force method for internal fields is:
replace(replace(string, ',', '","'), '""', '"')
This adds double quotes on either side of a comma and then removes double double quotes. You don't need to worry about "". It becomes """" and then back to "".
This can be adapted for the first and last fields as well, but it complicates the expression.
This offering attempts to address a number of end cases:
Addressing issues with first and last fields. Here only the last field is a special case as we look out for the end-of-string $ rather than a comma.
Empty unquoted fields i.e. leading commas, consecutive commas and trailing commas.
Preserving a pair of double quotes within a field representing a single double quote.
The SQL:
WITH orig(str) AS (
SELECT '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"'
FROM dual
),
rpl_first(str) AS (
SELECT REGEXP_REPLACE(str, '("(([^"]|"")*)"|([^,]*))(,|$)','"\2\4"\5')
FROM orig
)
SELECT REGEXP_REPLACE(str, '"""$','"') fixed_string
FROM rpl_first;
The technique is to find either a quoted field and remember it or a non-quoted field and remember it, terminated by a comma or end-of-string and remember that. The answers is then a " followed by one of the fields followed by " and then the terminator.
The quoted field is basically "[^"]*" where [^"] is a any character that is not a quote and * is repeated zero or more times. This is complicated by the fact the not-a-quote character could also be a pair of quotes so we need an OR construct (|) i.e. "([^"]|"")*". However we must remember just the field inside the quotes so add brackets so we can later back reference just that i.e. "(([^"]|"")*)".
The unquoted field is simply a non-comma repeated zero or more times where we want to remember it all ([^,]*).
So we want to find either of these, the OR construct again i.e. ("(([^"]|"")*)"|([^,]*)). Followed by the terminator, either a comma or end-of-string, which we want to remember i.e. (,|$).
Now we can replace this with one of the two types of field we found enclosed in quotes followed by the terminator i.e. "\2\4"\5. The number n for the back reference \n is just a matter of counting the open brackets.
The second REGEXP_REPLACE is to work around something I suspect is an Oracle bug. If the last field is quoted then a extra pair of quotes is added to the end of the string. This suggests that the end-of-string is being processed twice when it is parsed, which would be a bug. However regexp processing is probably done by a standard library routine so it may be my interpretation of the regexp rules. Comments are welcome.
Oracle regexp documentation can be found at Using Regular Expressions in Database Applications.
My thanks to #Gary_W for his template. Here I am keeping the two separate regexp blocks to separate the bit I can explain from the bit I can't (the bug?).
This method makes 2 passes on the string. First look for a grouping of a double-quote followed by a comma, followed by a character that is not a double-quote. Replace them by referring to them with the shorthand of their group, the first group, '\1', the missing double-quote, the second group '\2'. Then do it again, but the other way around. Sure you could nest the regex_replace calls and end up with one big ugly statement, but just make it 2 statements for easier maintenance. The guy working on this after you will thank you, and this is ugly enough as it is.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017
-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA
","NE","68144"'
from dual
),
rpl_first(str) as (
select regexp_replace(str, '(",)([^"])', '\1"\2')
from orig
)
select regexp_replace(str, '([^"])(,")', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
SQL>
EDIT: Changed regex's and added a third step to allow for empty, unquoted fields per Unoembre's comment. Good catch! Also added additional test cases. Always expect the unexpected and make sure to add test cases for all data combinations.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2
017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OM
AHA","NE","68144"'
from dual union
select 'ES26653,"ABCBEVERAGES","861526999728"' from dual union
select '"ES26653","ABCBEVERAGES",861526999728' from dual union
select '1S26653,"ABCBEVERAGES",861526999728' from dual union
select '"ES26653",,861526999728' from dual
),
rpl_empty(str) as (
select regexp_replace(str, ',,', ',"",')
from orig
),
rpl_first(str) as (
select regexp_replace(str, '(",|^)([^"])', '\1"\2')
from rpl_empty
)
select regexp_replace(str, '([^"])(,"|$)', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
"ES26653","ABCBEVERAGES","861526999728"
"ES26653","","861526999728"
"1S26653","ABCBEVERAGES","861526999728"
"ES26653","ABCBEVERAGES","861526999728"
SQL>

Regular Expression in Oracle with REGEXP_SUBSTR

I want to get the email address part of a string.
For example if the string is
"your name(your#name.com)" aaa#bbb.com
then I want to get only
aaaa#bbb.com
basically if I can remove the string within
""
then it does the trick. I am using below regular expression with REGEXP_SUBSTR
REGEXP_SUBSTR('"your name(abc#dd.com)" aaa#bbb.com',
'([a-zA-Z0-9_.\-])+#(([a-zA-Z0-9-])+[.])+([a-zA-Z0-9]{2,4})+')
kindly help.
You can simply indicate that the match must occur at the end of the string, using $ anchor.
with t1(col) as(
select '"your name(your#name.com)" aaa#bbb.com' from dual
)
select regexp_substr(col, '[[:alnum:]._%-]+#[[:alnum:]._%-]+\.com$') as res
from t1
Result:
RES
-----------
aaa#bbb.com
You probably need something more along the lines of:
REGEXP_SUBSTR('"your name(abc#dd.com)" aaa#bbb.com','[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}')
Things like [.] doesnt really make sense, dot matches any character and the square brackets is a kind of "OR" statement where any character inside can go in that place, but in your case you actually want to match the literal dot so you need to escape that \. not sure how oracle handles the escapes, you might need to double escape them.
SELECT REGEXP_SUBSTR(email, '[A-Za-z0-9\_\-\.]+#\w+\.\w+', 1, 2) AS cleaned_email
FROM
(
SELECT '"your name(your#name.com)" aaa#bbb.com' AS email FROM DUAL UNION ALL
SELECT '"your name(your.name#name.com)" aaa#bbb.com' AS email FROM DUAL
)
;

Oracle: How to use regexp_substr in this case

I have a table in Oracle where one of the column contains UserIds which are in the form of \. For eg "fin\george", "sales\andy" etc. How can I use REGEXP_SUBSTR function to get only the from the UserIds. ie I want to fetch only "george", "andy" etc. I have achieved the desired reult using SUBSTR function but I want to use REGEXP_SUBSTR in this case.
I tried doing this:
SELECT REGEXP_SUBSTR('fin\george','\[^\]+,') "UserName" FROM DUAL;
but it did'nt help. Can anyone please point out my mistake ?
I believe you want to use a regexp_replace with a backreference. I'm assuming that all the characters before and after the \ are alphabetic. If you allow numbers, you'd want to use the [[:alnum:]] rather than [[:alpha::].
1* SELECT REGEXP_replace('fin\george',
'([[:alpha:]]+\\)([[:alpha:]]+)$',
'\2') "UserName"
FROM DUAL
SQL> /
UserNa
------
george
SQL> SELECT REGEXP_SUBSTR('fin\george', '[^\]+', 1, 2) AS userId from dual;
USERID
------
george
See this Oracle Base article
select regexp_replace( 'fin\george', '.*\\', null ) from dual;
returns george.
The regex will match any character followed by the \ (which is escaped), as many times as possible (greedy).
So it will match everything up to the final \.
Then the matching string is replaced with null.
null is the default so
select regexp_replace('fin\george', '.*\\' ) from dual;
does the same thing
Same expression can extract filename from the end of pathname e.g.
select regexp_replace ('fin\fin2\fin3\fin4\george', '.*\\' ) from dual;
will also return george.
You have to use escaping: \\ instead of \
The easiest way (IMHO) to do this is the following:
SELECT REGEXP_SUBSTR('fin\george', '[^\\]+$') AS "UserName" FROM DUAL;
The issues with your original query were (a) that the \ character was not escaped and (b) there was an extraneous comma in the regular expression. I've used the end-of-string anchor $ here, assuming that there are not more than two elements delimited by \. If there are more than two, and you need only the second one, you can use the following:
SELECT REGEXP_SUBSTR('fin\george\ringo', '[^\\]+', 1, 2) AS "UserName"
FROM DUAL;
This tells Oracle to start looking at the first character of the string and return the second match.