How to remove leftmost group of numbers from string in Oracle SQL? - sql

I have a string like T_44B56T4 that I'd like to make T_B56T4. I can't use positional logic because the string could instead be TE_2BMT that I'd like to make TE_BMT.
What is the most concise Oracle SQL logic to remove the leftmost grouping on consecutive numbers from the string?
EDIT:
regex_replace is unavailable but I have LTRIM,REPLACE,SUBSTR, etc.

would this fit the bill? I am assuming there are alphanumeric characters, then underscore, and then the numbers you want to remove followed by anything.
select regexp_replace(s, '^([[:alnum:]]+)_\d*(.*)$', '\1_\2')
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual
)
It uses regular expressions with matched groups.
Alphanumeric characters before underscore are matched and stored in first group, then underscore followed by 0-many digits (it will match as many digits as possible) followed by anything else that is stored in second group.
If we have a match, the string will be replaced by content of the first group followed by underscore and content of the second group.
if there is no match, the string will not be changed.

It seems that you must use standard string functions, as regular expression functions are not available to you. (Comment under Gordon Linoff's answer; it would help if you would add the same at the bottom of your original question, marked clearly as EDIT).
Also, it seems that the input will always have at least one underscore, and any digits that must be removed will always be immediately after the first underscore.
If so, here is one way you could solve it:
select s, substr(s, 1, instr(s, '_')) ||
ltrim(substr(s, instr(s, '_') + 1), '0123456789') as result
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual union all
select '34_AB3_1D' from dual
)
S RESULT
--------- ------------------
T_44B56T4 T_B56T4
TXM_1JK7B TXM_JK7B
34_AB3_1D 34_AB3_1D
I added one more test string, to show that only digits immediately following the first underscore are removed; any other digits are left unchanged.
Note that this solution would very likely be faster than regexp solutions, too (assuming that matters; sometimes it does, but often it doesn't).

If I understand correctly, you can use regexp_replace():
select regexp_replace('T_44B56T4', '_[0-9]+', '_')
Here is a db<>fiddle with your two examples.
Note: Your questions says the left most grouping, but the examples all have the number following an underscore, so the underscore seems to be important.
EDIT:
If you really just want the first string of digits replaced without reference to the underscore:
select regexp_replace(code, '[0-9]+', '', 1, 1)
from (select 'T_44B56T4' as code from dual union all select 'TE_2BMT' from dual ) t

Related

How do I extract consonants from a string field?

How do I extract only the consonants from a field in records that contain names?
For example, if I had the following record in the People table:
Field
Value
Name
Richard
How could I extract only the consonants in "Richard" to get "R,c,r,d"?
If you mean "how can I remove all vowels from the input" so that 'Richard' becomes 'Rchrd', then you can use the translate function as Boneist has shown, but with a couple more subtle additions.
First, you can completely remove a character with translate, if it appears in the second argument and it doesn't have a corresponding "translate to" character in the third argument.
Second, alas, if the third (and last) argument to translate is null the function returns null (and the same if the last argument is the empty string; there is a very small number of instances where Oracle does not treat the empty string as null, but this is not one of them). So, to make the whole thing work, you need to add an extra character to both the second and the third argument - a character you do NOT want to remove. It may be anything (it doesn't even need to appear in the input string), just not one of the characters to remove. In the illustration below I use the period character (.) but you can use any other character - just not a vowel.
Pay attention too to upper vs lower case letters. Ending up with:
with
sample_inputs (name) as (
select 'Richard' from dual union all
select 'Aliosha' from dual union all
select 'Ai' from dual union all
select 'Ng' from dual
)
select name, translate(name, '.aeiouAEIOU', '.') as consonants
from sample_inputs
;
NAME CONSONANTS
------- ----------
Richard Rchrd
Aliosha lsh
Ai
Ng Ng
Should be able to string a couple replace functions together
Select replace(replace(Value, 'A', ''), 'E', '')),...etc
You can easily do this with the translate() function, e.g.:
WITH people AS (SELECT 'Name' field, 'Richard' val FROM dual UNION ALL
SELECT 'Name' field, 'Siobhan' val FROM dual)
SELECT field, val, TRANSLATE(val, 'aeiou', ',,,,,') updated_val
FROM people;
FIELD VAL UPDATED_VAL
----- ------- -----------
Name Richard R,ch,rd
Name Siobhan S,,bh,n
The translate function simply takes a list of characters and - based on the second list of characters, which defines the translation - translates the input string.
So in the above example, the a (first character in the first list) becomes a , (first character in the second list), the e (second character in the first list) becomes a , (second character in the second list), etc.
N.B. I really, really hope your key-value table is just a made-up example for the situation you're trying to solve, and not an actual production table; in general, key-value tables are a terrible idea in a relational database!

REGEXP to insert special characters, not remove

How would i put double quotes around the two fields that are missing it? Would i be able to use like a INSTR/SUBSTR/REPLACE in one statement to accomplish it?
string := '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Expected string := '"ES26653","ABCBEVERAGES","861526999728","**606.32**","2017-01-26","2017-01-27","","","**77910467**","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Please suggest! Thank you.
This answer does not work in this case, because some fields contain commas. I am leaving it in case it helps anyone else.
One rather brute force method for internal fields is:
replace(replace(string, ',', '","'), '""', '"')
This adds double quotes on either side of a comma and then removes double double quotes. You don't need to worry about "". It becomes """" and then back to "".
This can be adapted for the first and last fields as well, but it complicates the expression.
This offering attempts to address a number of end cases:
Addressing issues with first and last fields. Here only the last field is a special case as we look out for the end-of-string $ rather than a comma.
Empty unquoted fields i.e. leading commas, consecutive commas and trailing commas.
Preserving a pair of double quotes within a field representing a single double quote.
The SQL:
WITH orig(str) AS (
SELECT '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"'
FROM dual
),
rpl_first(str) AS (
SELECT REGEXP_REPLACE(str, '("(([^"]|"")*)"|([^,]*))(,|$)','"\2\4"\5')
FROM orig
)
SELECT REGEXP_REPLACE(str, '"""$','"') fixed_string
FROM rpl_first;
The technique is to find either a quoted field and remember it or a non-quoted field and remember it, terminated by a comma or end-of-string and remember that. The answers is then a " followed by one of the fields followed by " and then the terminator.
The quoted field is basically "[^"]*" where [^"] is a any character that is not a quote and * is repeated zero or more times. This is complicated by the fact the not-a-quote character could also be a pair of quotes so we need an OR construct (|) i.e. "([^"]|"")*". However we must remember just the field inside the quotes so add brackets so we can later back reference just that i.e. "(([^"]|"")*)".
The unquoted field is simply a non-comma repeated zero or more times where we want to remember it all ([^,]*).
So we want to find either of these, the OR construct again i.e. ("(([^"]|"")*)"|([^,]*)). Followed by the terminator, either a comma or end-of-string, which we want to remember i.e. (,|$).
Now we can replace this with one of the two types of field we found enclosed in quotes followed by the terminator i.e. "\2\4"\5. The number n for the back reference \n is just a matter of counting the open brackets.
The second REGEXP_REPLACE is to work around something I suspect is an Oracle bug. If the last field is quoted then a extra pair of quotes is added to the end of the string. This suggests that the end-of-string is being processed twice when it is parsed, which would be a bug. However regexp processing is probably done by a standard library routine so it may be my interpretation of the regexp rules. Comments are welcome.
Oracle regexp documentation can be found at Using Regular Expressions in Database Applications.
My thanks to #Gary_W for his template. Here I am keeping the two separate regexp blocks to separate the bit I can explain from the bit I can't (the bug?).
This method makes 2 passes on the string. First look for a grouping of a double-quote followed by a comma, followed by a character that is not a double-quote. Replace them by referring to them with the shorthand of their group, the first group, '\1', the missing double-quote, the second group '\2'. Then do it again, but the other way around. Sure you could nest the regex_replace calls and end up with one big ugly statement, but just make it 2 statements for easier maintenance. The guy working on this after you will thank you, and this is ugly enough as it is.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017
-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA
","NE","68144"'
from dual
),
rpl_first(str) as (
select regexp_replace(str, '(",)([^"])', '\1"\2')
from orig
)
select regexp_replace(str, '([^"])(,")', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
SQL>
EDIT: Changed regex's and added a third step to allow for empty, unquoted fields per Unoembre's comment. Good catch! Also added additional test cases. Always expect the unexpected and make sure to add test cases for all data combinations.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2
017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OM
AHA","NE","68144"'
from dual union
select 'ES26653,"ABCBEVERAGES","861526999728"' from dual union
select '"ES26653","ABCBEVERAGES",861526999728' from dual union
select '1S26653,"ABCBEVERAGES",861526999728' from dual union
select '"ES26653",,861526999728' from dual
),
rpl_empty(str) as (
select regexp_replace(str, ',,', ',"",')
from orig
),
rpl_first(str) as (
select regexp_replace(str, '(",|^)([^"])', '\1"\2')
from rpl_empty
)
select regexp_replace(str, '([^"])(,"|$)', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
"ES26653","ABCBEVERAGES","861526999728"
"ES26653","","861526999728"
"1S26653","ABCBEVERAGES","861526999728"
"ES26653","ABCBEVERAGES","861526999728"
SQL>

Extracting specific part of column values in Oracle SQL

I want to extract a specific part of column values.
The target column and its values look like
TEMP_COL
---------------
DESCOL 10MG
TEGRAL 200MG 50S
COLOSPAS 135MG 30S
The resultant column should look like
RESULT_COL
---------------
10MG
200MG
135MG
This can be done using a regular expression:
SELECT regexp_substr(TEMP_COL, '[0-9]+MG')
FROM the_table;
Note that this is case sensitive and it always returns the first match.
I would probably approach this using REGEXP_SUBSTR() rather than base functions, because the structure of the prescription text varies from record to record.
SELECT TRIM(REGEXP_SUBSTR(TEMP_COL, '(\s)(\S*)', 1, 1))
FROM yourTable
The pattern (\s)(\S*) will match a single space followed by any number of non-space characters. This should match the second term in all cases. We use TRIM() to remove a leading space which is matched and returned.
how do you know what is the part you want to extract? how do you know where it begins and where it ends? using the white-spaces?
if so, you can use substr for cutting the data and instr for finding the white-spaces.
example:
select substr(tempcol, -- string
instr(tempcol, ' ', 1), -- location of first white-space
instr(tempcol, ' ', 1, 2) - instr(tempcol, ' ', 1)) -- length until next space
from dual
another solution is using regexp_substr (but it might be harder on performance if you have a lot of rows):
SELECT REGEXP_SUBSTR (tempcol, '(\S*)(\s*)', 1, 2)
FROM dual;
edit: fixed the regular expression to include expressions that don't have space after the parsed text. sorry about that.. ;)

How to match and replace sections of a string in SQL

I'm pulling a list of popular sites from my database, but I want to combine results that are from the same domain. I've been able to do this partially by using :
REGEXP_REPLACE(site, '%|^www([123])?\.|^m\.|^mobile\.|^desktop\.')) as site
so that "www.facebook.com" and "facebook.com" or "m.facebook.com"
- all of which appear in the database - are treated as the same when I do a select distinct.
However, I want to take this a step further by writing an expression that looks at each string between periods. If a match is found consecutively in three or more strings between periods, then I want to treat those as the same. I simply can't predict every possible string that could come before "facebook.com", or any other site.
So for example:
"my.careerone.com.au" and
"careerone.com.au" match in three places.
Or "yahoo.realestate.com.au" and "rs.realestate.com.au" match in three places.
Any ideas on how to achieve this?
#David code will work in Vertica as well but not so well performance wise maybe.
You can use Vertica's own internal functions such as TRIM & REGEXP_REPLACE.
After borrowing #David Faber reg exp i endend-up with this.
select TRIM(LEADING '.' from REGEXP_REPLACE(col_name,'^.*((\.[^.]+){3})$', '\1')) AS fixed_dn from table_name;
I don't have Vertica available so I tested this in Oracle SQL (which does have REGEXP_REPLACE() that is similar to Vertica's). Not sure what the CTE syntax would be in Vertica but you'll be querying against a table anyway:
WITH d1 AS (
SELECT 'my.careerone.com.au' AS domain_nm FROM dual
UNION ALL
SELECT 'careerone.com.au' FROM dual
UNION ALL
SELECT 'yahoo.realestate.com.au' FROM dual
UNION ALL
SELECT 'rs.realestate.com.au' FROM dual
)
SELECT domain_nm, TRIM('.' FROM REGEXP_REPLACE(domain_nm, '^.*((\.[^.]+){3})$', '\1')) AS domain_nm_fix
FROM d1;
What REGEXP_REPLACE() does here is trim the highest level subdomains from the domain name, if it exists and if there are more than 3 levels. If there are only three levels then nothing will be replaced as the regex won't match -- that is why the leading . character then has to be trimmed. So, for example, careerone.com.au will be unaltered, while my.careerone.com.au will be changed to .careerone.com.au by the REGEXP_REPLACE(), from which the leading . then has to be trimmed.

RegEx: Repeated identical vowels in a string - Oracle SQL

I need to only display those strings (name of manufacturers) that contain 2 or more identical vowels in Oracle11g. I am using a RegEx to find this.
SELECT manuf_name "Manufacturer", REGEXP_LIKE(manuf_name,'([aeiou])\2') Counter FROM manufacturer;
For example:
The RegEx accepts
OtterBox
Abca
abcA
The RegEx rejects
Samsung
Apple
I am not sure how to proceed ahead.
I think you want something like this:
WITH mydata AS (
SELECT 'OtterBox' AS manuf_name FROM dual
UNION ALL
SELECT 'Apple' FROM dual
UNION ALL
SELECT 'Samsung' FROM dual
)
SELECT * FROM mydata
WHERE REGEXP_LIKE(manuf_name, '([aeiou]).*\1', 'i');
I am not sure why you used \2 as a backreference instead of \1 -- \2 doesn't refer to anything in this regex. Also, note the wildcard and quantifier .* to indicate that there can be any number of any character between the first occurrence of the vowel and the second. Third, note the 'i' parameter to indicate a case-insensitive search (which I think is what you want since you say that the regex should match "OtterBox").
SQL Fiddle here.
David yours wasn't quite working for me. What about this?
\w*([aeiou])\w*\1+\w*
https://regex101.com/r/eE3iC2/3
EDIT: updated one per suggestions:
.*([aeiou]).*\1.*
https://regex101.com/r/eE3iC2/5