Extract Specific Set of data from a String in Oracle - sql

I have the string '1_A_B_C_D_E_1_2_3_4_5' and I am trying to extract the data 'A_B_C_D_E'. I am trying to remove the _1_2_3_4_5 & the 1_ portion from the string. Which is essentially the numeric portion in the string. any special characters after the last alphabet must also be removed. In this example the _ after the character E must also not be present.
and the Query I am trying is as below
SELECT
REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[^0-9]+',1,1)
from dual
The Data I get from the above query is as below: -
_A_B_C_D_E_
I am trying to figure a way to remove the underscore towards the end. Any other way to approach this?

Assuming the "letters" come first and then the "digits", you could do something like this:
select regexp_substr('A_B_C_D_E_1_2_3_4_5','.*[A-Z]') from dual;
This will pull all the characters from the beginning of the string, up to the last upper-case letter in the string (.* is greedy, it will extend as far as possible while still allowing for one more upper-case letter to complete the match).

I have the string '1_A_B_C_D_E_1_2_3_4_5' and I am trying to extract the data 'A_B_C_D_E'
Use REGEXP_REPLACE:
SQL> SELECT trim(BOTH '_' FROM
2 (REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[0-9]+', ''))) str
3 FROM dual;
STR
---------
A_B_C_D_E
How it works:
REGEXP_REPLACE will replace all numeric occurrences '[0-9]+' from the string. Alternatively, you could also use POSIX character class '[^[:digit:]]+'
TRIM BOTH '_' will remove any leading and lagging _ from the string.
Also using REGEXP_SUBSTR:
SELECT trim(BOTH '_' FROM
(REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[^0-9]+'))) str
FROM dual;
STR
---------
A_B_C_D_E

Related

Remove template text on regexp_replace in Oracle's SQL

I am trying to remove template text like &#x; or &#xx; or &#xxx; from long string
Note: x / xx / xxx - is number, The length of the number is unknown, The cell type is CLOB
for example:
SELECT 'H'ello wor±ld' FROM dual
A desirable result:
Hello world
I know that regexp_replace should be used, But how do you use this function to remove this text?
You can use
SELECT REGEXP_REPLACE(col,'&&#\d+;')
FROM t
where
& is put twice to provide escaping for the substitution character
\d represents digits and the following + provides the multiple occurrences of them
ending the pattern with ;
or just use a single ampersand ('&#\d+;') for the pattern as in the case of Demo , since an ampersand has a special meaning for Oracle, a usage is a bit problematic.
In case you wanted to remove the entities because you don't know how to replace them by their character values, here is a solution:
UTL_I18N.UNESCAPE_REFERENCE( xmlquery( 'the_double_quoted_original_string' RETURNING content).getStringVal() )
In other words, the original 'H'ello wor±ld' should be passed to XMLQUERY as '"H'ello wor±ld"'.
And the result will be 'H'ello wo±ld'

How to handle string with only space in oracle sql?

I have a case where I am getting the data from DB and converting the string to a number using TO_NUMBER, but this case fails when the string is an empty string with unknown or space char like
columnA
------
4444
333333
The string '4444' and '333333' is converted to number by there is and error "ora-01722 invalid number" for the 2nd string.
Can this be handled with DECODE or CAST in any way, because I need to use TO_NUMBER any how for further processing?
I hope this could be Insight of your issue.
select
TO_NUMBER(trim(colA)),
TO_NUMBER(REGEXP_REPLACE(colA,'(^[[:space:]]*|[[:space:]]*$)')),
regexp_instr(colA, '[0-9.]')
from
(
select ' 123' colA from dual
union all
select ' ' colA from dual
union all
select '.456' colA from dual
)
This is similar issue : Trim Whitespaces (New Line and Tab space) in a String in Oracle
If all the data within that column is composed of integers, integers with leading and/or trailing whitespaces, null values and only whitespaces then only using TRIM() function will suffice such as
SELECT TRIM(columnA)
FROM t
and that would be more performant than using functions of regular expressions
But
If the data contains decimal numbers, letters, punctiations and special characters along with whitespaces and null values, then use
SELECT TRIM('.' FROM REGEXP_REPLACE(columnA,'[^[:digit:].]'))
FROM t
where there is at most one dot character assumed to be between the starting and ending digits. All of the leading and trailing dots are trimmed at the end of the operation provided there is any of them. The other characters are already removed by the regular expression.
If you're sure that there's no trailing or leading dots, then using
SELECT REGEXP_REPLACE(columnA,'[^[:digit:].]')
FROM t
would be enough
Demo
You can wrap up any of the expressions with TO_NUMBER() function depending on your case at the end

Extract character between the first two characters

I have a table in BigQuery:
ab_col_jfsfhfd_ggg_sdf
arfd_am_fdsf_fddg_fg
d_fdf_fdddg_ffddd_f
I would like to extract those characters that go right after the first _ character and followed by the second _ character. I want to get the following:
col
am
fdf
I used the following regular expression to extract the characters but it does not work as intended:
^.*\_(\D+)\_.*$
regexp_replace(id,'^.*\\_(\\D+)\\_.*$' , '\\1')
Please help!
If I follow you correctly, you can use split():
(split(col, '_'))[safe_ordinal(2)]
split() turns the string column to an array of values, given a separator (here, we use _). Then we can just grab second array element.
split() is a very simply way of solving this. But regular expressions are also quite simple:
with t as (
select 'ab_col_jfsfhfd_ggg_sdf' as id union all
select 'arfd_am_fdsf_fddg_fg' union all
select 'd_fdf_fdddg_ffddd_f'
)
select id, regexp_extract(id, '[^_]+', 1, 2)
from t;
The logic for the pattern is: "Look for any string of characters that is not an underscore. Then take the second one in the string."
Use regexp_extract:
regexp_extract(id,'^[^_]+_([^_]+)')
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^_]+ any character except: '_' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^_]+ any character except: '_' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1

Oracle SQL - select parts of a string

How can I select abcdef.txt from the following string?
abcdef.123.txt
I only know how to select abcdef by doing select substr('abcdef.123.txt',1,6) from dual;
You can using || for concat and substr -3 for right part
select substr('abcdef.123.txt',1,6) || '.' ||substr('abcdef.123.txt',-3) from dual;
or avoiding a concat (like suggested by Luc M)
select substr('abcdef.123.txt',1,7) || substr('abcdef.123.txt',-3) from dual;
A general solution, assuming the input string has exactly two periods . and you want to extract the first and third tokens, separated by one . The length of the "tokens" in the input string can be arbitrary (including zero!) and they can contain any characters other than .
select regexp_replace('abcde.123.xyz', '([^.]*).([^.]*).([^.]*)', '\1.\3') as result
from dual;
RESULT
---------
abcde.xyz
Explanation:
[ ] means match any of the characters between brackets.
^
means do NOT match the characters in the brackets - so...
[^.]
means match any character OTHER THAN .
* means match zero or
more occurrences, as many as possible ("greedy" match)
( ... ) is called a subexpression... see below
'\1.\3 means replace the original string
with the first subexpression, followed by ., followed by the THIRD
subexpression.
Replace the substring of anything surrounded by dots (inclusive) with a single dot. No dependence on lengths of components of the string:
SQL> select regexp_replace('abcdef.123.txt', '\..*\.', '.') fixed
from dual;
FIXED
----------
abcdef.txt
SQL>

Oracle SQL: Remove specific number from from the initial part of the string

I have a alhpanumeric string. I also have one number with me. The string will always start with this number. How do I separate this number from the string and get the remaining part of the string?
e.g.
string => 21fgggg21.lkkk and number=> 21
result=> fgggg21.lkkk
or
string=> 215699898.55fff and number=> 2
result=> 15699898.55fff
Any hint would be appreciated.
Thanks.
substr(string, length(number)+1)
or
regexp_replace(string, '^'||number)
You could also use REGEXP_REPLACE. To remove '21' from the beginning of the string:
SELECT REGEXP_REPLACE('21fgggg21.lkkk', '^21') FROM DUAL;
REGEXP_REPLA
------------
fgggg21.lkkk
To remove '2' from the beginning of the string:
SELECT REGEXP_REPLACE('215699898.55fff', '^2') FROM DUAL;
REGEXP_REPLACE
--------------
15699898.55fff
By way of explanation...
The caret (^) means "anchor to the beginning of the string".
^21 means "match 21 at the beginning of the string".
REGEXP_REPLACE has an optional third parameter of what to replace the matched string with. Because you just want to remove the matched string you can omit the parameter, which replaces it with nothing.
If you are just looking to select it, you can use a combination of substr and instr.
substr(string, instr(string, 'number') + 1, len(string))
Your result should basically be the string started after where the number is located.