I have the string '1_A_B_C_D_E_1_2_3_4_5' and I am trying to extract the data 'A_B_C_D_E'. I am trying to remove the _1_2_3_4_5 & the 1_ portion from the string. Which is essentially the numeric portion in the string. any special characters after the last alphabet must also be removed. In this example the _ after the character E must also not be present.
and the Query I am trying is as below
SELECT
REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[^0-9]+',1,1)
from dual
The Data I get from the above query is as below: -
_A_B_C_D_E_
I am trying to figure a way to remove the underscore towards the end. Any other way to approach this?
Assuming the "letters" come first and then the "digits", you could do something like this:
select regexp_substr('A_B_C_D_E_1_2_3_4_5','.*[A-Z]') from dual;
This will pull all the characters from the beginning of the string, up to the last upper-case letter in the string (.* is greedy, it will extend as far as possible while still allowing for one more upper-case letter to complete the match).
I have the string '1_A_B_C_D_E_1_2_3_4_5' and I am trying to extract the data 'A_B_C_D_E'
Use REGEXP_REPLACE:
SQL> SELECT trim(BOTH '_' FROM
2 (REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[0-9]+', ''))) str
3 FROM dual;
STR
---------
A_B_C_D_E
How it works:
REGEXP_REPLACE will replace all numeric occurrences '[0-9]+' from the string. Alternatively, you could also use POSIX character class '[^[:digit:]]+'
TRIM BOTH '_' will remove any leading and lagging _ from the string.
Also using REGEXP_SUBSTR:
SELECT trim(BOTH '_' FROM
(REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[^0-9]+'))) str
FROM dual;
STR
---------
A_B_C_D_E
How would i put double quotes around the two fields that are missing it? Would i be able to use like a INSTR/SUBSTR/REPLACE in one statement to accomplish it?
string := '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Expected string := '"ES26653","ABCBEVERAGES","861526999728","**606.32**","2017-01-26","2017-01-27","","","**77910467**","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Please suggest! Thank you.
This answer does not work in this case, because some fields contain commas. I am leaving it in case it helps anyone else.
One rather brute force method for internal fields is:
replace(replace(string, ',', '","'), '""', '"')
This adds double quotes on either side of a comma and then removes double double quotes. You don't need to worry about "". It becomes """" and then back to "".
This can be adapted for the first and last fields as well, but it complicates the expression.
This offering attempts to address a number of end cases:
Addressing issues with first and last fields. Here only the last field is a special case as we look out for the end-of-string $ rather than a comma.
Empty unquoted fields i.e. leading commas, consecutive commas and trailing commas.
Preserving a pair of double quotes within a field representing a single double quote.
The SQL:
WITH orig(str) AS (
SELECT '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"'
FROM dual
),
rpl_first(str) AS (
SELECT REGEXP_REPLACE(str, '("(([^"]|"")*)"|([^,]*))(,|$)','"\2\4"\5')
FROM orig
)
SELECT REGEXP_REPLACE(str, '"""$','"') fixed_string
FROM rpl_first;
The technique is to find either a quoted field and remember it or a non-quoted field and remember it, terminated by a comma or end-of-string and remember that. The answers is then a " followed by one of the fields followed by " and then the terminator.
The quoted field is basically "[^"]*" where [^"] is a any character that is not a quote and * is repeated zero or more times. This is complicated by the fact the not-a-quote character could also be a pair of quotes so we need an OR construct (|) i.e. "([^"]|"")*". However we must remember just the field inside the quotes so add brackets so we can later back reference just that i.e. "(([^"]|"")*)".
The unquoted field is simply a non-comma repeated zero or more times where we want to remember it all ([^,]*).
So we want to find either of these, the OR construct again i.e. ("(([^"]|"")*)"|([^,]*)). Followed by the terminator, either a comma or end-of-string, which we want to remember i.e. (,|$).
Now we can replace this with one of the two types of field we found enclosed in quotes followed by the terminator i.e. "\2\4"\5. The number n for the back reference \n is just a matter of counting the open brackets.
The second REGEXP_REPLACE is to work around something I suspect is an Oracle bug. If the last field is quoted then a extra pair of quotes is added to the end of the string. This suggests that the end-of-string is being processed twice when it is parsed, which would be a bug. However regexp processing is probably done by a standard library routine so it may be my interpretation of the regexp rules. Comments are welcome.
Oracle regexp documentation can be found at Using Regular Expressions in Database Applications.
My thanks to #Gary_W for his template. Here I am keeping the two separate regexp blocks to separate the bit I can explain from the bit I can't (the bug?).
This method makes 2 passes on the string. First look for a grouping of a double-quote followed by a comma, followed by a character that is not a double-quote. Replace them by referring to them with the shorthand of their group, the first group, '\1', the missing double-quote, the second group '\2'. Then do it again, but the other way around. Sure you could nest the regex_replace calls and end up with one big ugly statement, but just make it 2 statements for easier maintenance. The guy working on this after you will thank you, and this is ugly enough as it is.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017
-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA
","NE","68144"'
from dual
),
rpl_first(str) as (
select regexp_replace(str, '(",)([^"])', '\1"\2')
from orig
)
select regexp_replace(str, '([^"])(,")', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
SQL>
EDIT: Changed regex's and added a third step to allow for empty, unquoted fields per Unoembre's comment. Good catch! Also added additional test cases. Always expect the unexpected and make sure to add test cases for all data combinations.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2
017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OM
AHA","NE","68144"'
from dual union
select 'ES26653,"ABCBEVERAGES","861526999728"' from dual union
select '"ES26653","ABCBEVERAGES",861526999728' from dual union
select '1S26653,"ABCBEVERAGES",861526999728' from dual union
select '"ES26653",,861526999728' from dual
),
rpl_empty(str) as (
select regexp_replace(str, ',,', ',"",')
from orig
),
rpl_first(str) as (
select regexp_replace(str, '(",|^)([^"])', '\1"\2')
from rpl_empty
)
select regexp_replace(str, '([^"])(,"|$)', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
"ES26653","ABCBEVERAGES","861526999728"
"ES26653","","861526999728"
"1S26653","ABCBEVERAGES","861526999728"
"ES26653","ABCBEVERAGES","861526999728"
SQL>
How can I select abcdef.txt from the following string?
abcdef.123.txt
I only know how to select abcdef by doing select substr('abcdef.123.txt',1,6) from dual;
You can using || for concat and substr -3 for right part
select substr('abcdef.123.txt',1,6) || '.' ||substr('abcdef.123.txt',-3) from dual;
or avoiding a concat (like suggested by Luc M)
select substr('abcdef.123.txt',1,7) || substr('abcdef.123.txt',-3) from dual;
A general solution, assuming the input string has exactly two periods . and you want to extract the first and third tokens, separated by one . The length of the "tokens" in the input string can be arbitrary (including zero!) and they can contain any characters other than .
select regexp_replace('abcde.123.xyz', '([^.]*).([^.]*).([^.]*)', '\1.\3') as result
from dual;
RESULT
---------
abcde.xyz
Explanation:
[ ] means match any of the characters between brackets.
^
means do NOT match the characters in the brackets - so...
[^.]
means match any character OTHER THAN .
* means match zero or
more occurrences, as many as possible ("greedy" match)
( ... ) is called a subexpression... see below
'\1.\3 means replace the original string
with the first subexpression, followed by ., followed by the THIRD
subexpression.
Replace the substring of anything surrounded by dots (inclusive) with a single dot. No dependence on lengths of components of the string:
SQL> select regexp_replace('abcdef.123.txt', '\..*\.', '.') fixed
from dual;
FIXED
----------
abcdef.txt
SQL>