I have VP3 - Art & Design and HS5 - Health & Social Care, I need to get string after '-' in Oracle. Can this be achieved using substring?
For a string operation as simple as this, I might just use the base INSTR() and SUBSTR() functions. In the query below, we take the substring of your column beginning at two positions after the hyphen.
SELECT
SUBSTR(col, INSTR(col, '-') + 2) AS subject
FROM yourTable
We could also use REGEXP_SUBSTR() here (see Gordon's answer), but it would be a bit more complex and the performance might not be as good as the above query.
You can use regexp_substr():
select regexp_substr(col, '[^-]+', 1, 2)
If you want to remove an optional space, then you can use trim():
select trim(leading ' ', regexp_substr(col, '[^-]+', 1, 2))
The non-ovious parameters mean
1 -- search from the first character of the source. 1 is the default, but you have to set it anyway to be able to provide the second parameter.
2 -- take the second match as the result substring. the default would be 1.
You can use:
SELECT CASE
WHEN INSTR(value, '-') > 0
THEN SUBSTR(value, INSTR(value, '-') + 1)
END AS subject
FROM table_name
or
SELECT REGEXP_SUBSTR( value, '-(.*)$', 1, 1, NULL, 1 ) AS subject
FROM table_name
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'VP3 - Art & Design and HS5 - Health & Social Care' FROM DUAL UNION ALL
SELECT '1-2-3' FROM DUAL UNION ALL
SELECT '123456' FROM DUAL
Both output:
| SUBJECT |
| :------------------------------------------- |
| Art & Design and HS5 - Health & Social Care |
| 2-3 |
| null |
Trimming leading white-space:
If you want to trim the leading white-space then you can use:
SELECT CASE
WHEN INSTR(value, '-') > 0
THEN LTRIM(SUBSTR(value, INSTR(value, '-') + 1))
END AS subject
FROM table_name
or
SELECT REGEXP_SUBSTR( value, '-\s*(.*)$', 1, 1, NULL, 1 ) AS subject
FROM table_name
Which both output:
| SUBJECT |
| :------------------------------------------ |
| Art & Design and HS5 - Health & Social Care |
| 2-3 |
| null |
Why the naive solutions don't always work:
SELECT SUBSTR(value, INSTR(value, '-') + 2) AS subject
FROM table_name
Does not work in 2 cases:
It finds the index of the - character and then skips 2 characters (the - character and then the assumed white-space character); if the second character is not a white-space character then it will miss the first character of the substring (i.e. if the input is 1-2-3 then the output would be -3 rather than 2-3).
It assumes that there will always be a - character in the string; if this is not the case then it will erroneously return the substring starting from the second character rather than returning NULL (i.e. if the input is 123456 then the output is 23456 rather than NULL).
Using the regular expression:
SELECT REGEXP_SUBSTR(value, '[^-]+', 1, 2)
FROM table_name
Does not find the substring after the 1st - character; it will find the substring between the 1st and 2nd - characters and strip any characters outside that range (inclusive of the - characters). So if the input is VP3 - Art & Design and HS5 - Health & Social Care then the output is Art & Design and HS5 rather than the expected Art & Design and HS5 - Health & Social Care.
Related
I am trying to mask the data for the below String :
This is the new ADHAR NUMBER 123456789989 this is the string 3456798983 from Customer Name like 345678 to a String .
In above data I want to mask data starting from ADHAR NUMBER to length up to 60 characters.
OUTPUT :
This is the new *********************************************************Customer Name like 345678 to a String .
Can anyone please help
A little bit of substr + instr does the job (sample data in the first 2 lines; query begins at line #3):
SQL> with test (col) as
2 (select 'This is the new ADHAR NUMBER 123456789989 this is the string 3456798983 from Customer Name like 345678 to a String .' from dual)
3 select substr(col, 1, instr(col, 'ADHAR NUMBER') - 1) ||
4 lpad('*', 60, '*') ||
5 substr(col, instr(col, 'ADHAR NUMBER') + 60) result
6 from test;
RESULT
--------------------------------------------------------------------------------
This is the new ************************************************************ Cus
tomer Name like 345678 to a String .
SQL>
Here is a solution that covers all possibilities (I think). Notice the different inputs in the WITH clause (which is not part of the solution - remove it, and use your actual table and column names in the query). This is how one should test their solutions - consider all possible cases, including NULL input, non-NULL input string that doesn't contain the "magic words", string that has the "magic words" right at the beginning, etc.
There is one important situation the solution does NOT address, namely when the exact substring 'ADHAR NUMBER' is not two full words, but it is part of longer words - for example 'BHADHAR NUMBERS'. In this case the output will look like 'BH****************' masking ADHAR NUMBER and the S after NUMBER and more characters, up to 60 total.
Note that the output string has the same length as the input. This is generally part of the definition of "masking".
with
test (col) as (
select 'This is the new ADHAR NUMBER 123456789989 this is the string ' ||
'3456798983 from Customer Name like 345678 to a String.'
from dual union all
select 'This string does not contain the magic words' from dual union all
select 'ADHAR NUMBER 12345' from dual union all
select 'Blah blah ADHAR NUMBER 1234' from dual union all
select null from dual union all
select 'Another blah ADHAR NUMBER' from dual
)
select case when pos > 0
then
substr(col, 1, pos - 1) ||
rpad('*', least(60, length(col) - pos + 1), '*') ||
substr(col, pos + 60)
else col end as masked
from (
select col, instr(col, 'ADHAR NUMBER') as pos
from test
)
;
MASKED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is the new ************************************************************ Customer Name like 345678 to a String.
This string does not contain the magic words
******************
Blah blah *****************
Another blah ************
I have a nvarchar string from which I need to extract certain text from between characters.
Example: 1.abc.5,m001-1-Exit,822-FName-18001233321--2021-09-23 13:53:10 Thursday-m001-1-Exit-Swipe,Card NO: 822User ID: FNameName: 18001233321Dept: Read Date: 2021-09-23 13:53:10 ThursdayAddr: m001-1-ExitStatus: Swipe,07580ec2000002a52E917D0000000000372BA56E11010000
What I need:
| Name | Phone Number |
| -------- | -------------- |
| FName | 1800123321 |
My Attempt:
SELECT SUBSTRING(col, LEN(LEFT(col, CHARINDEX ('-', col))) + 1, LEN(col) - LEN(LEFT(col, CHARINDEX ('-', col))) - LEN(RIGHT(col, LEN(col) - CHARINDEX ('-', col))) - 1);
One way:
Use patindex to find "FName-"
Remove the start of the string up until and including "FName-"
Use patindex to find "--"
Remove the rest of the string from and including "--"
You can consolidate the query down to one line, but you'll find yourself repeating parts of the logic - which I like to avoid. And calculating one thing at a time makes it easier to debug.
select
A.Col
, B.StringStart
, C.NewString
, patindex('%--%',C.NewString) NewStringEnd
, substring(C.NewString,1,patindex('%--%',C.NewString)-1) -- <- Required Result
from (
values
(N'1.abc.5,m001-1-Exit,822-FName-18001233321--2021-09-23 13:53:10 Thursday-m001-1-Exit-Swipe,Card NO: 822User ID: FNameName: 18001233321Dept: Read Date: 2021-09-23 13:53:10 ThursdayAddr: m001-1-ExitStatus: Swipe,07580ec2000002a52E917D0000000000372BA56E11010000')
) A (Col)
cross apply (
values
(patindex('%FName-%',Col))
) B (StringStart)
cross apply (
values
(substring(A.Col,B.StringStart+6,len(A.Col)-B.StringStart-6))
) C (NewString);
I have a column named Concatenated Segments which has 12 segment values, and I'm looking to edit the formula on the column to only show the 5th segment. The segments are separated by periods.
How would I need to edit the formula to do this?
Would using a substring work?
Alternatively, using good old SUBSTR + INSTR combination
possibly faster on large data sets
which doesn't care about uninterrupted strings (can contain anything between dots)
SQL> WITH
2 -- thank you for typing, #marcothesane
3 indata(s) AS (
4 SELECT '1201.0000.5611005.0099.211003.0000.2199.00099.00099.0000.0000.00000' FROM dual
5 )
6 select substr(s, instr(s, '.', 1, 4) + 1,
7 instr(s, '.', 1, 5) - instr(s, '.', 1, 4) - 1
8 ) result
9 from indata;
RESULT
------
211003
SQL>
Use REGEXP_SUBSTR(), searching for the 5th uninterrupted string of digits, or the 5th uninterrupted string of anything but a dot (\d and [^\.]) starting from position 1 of the input string:
WITH
-- your input ... paste it as text next time, so I don't have to manually re-type it ....
indata(s) AS (
SELECT '1201.0000.5611005.0099.211003.0000.2199.00099.00099.0000.0000.00000' FROM dual
)
SELECT
REGEXP_SUBSTR(s,'\d+',1,5) AS just_digits
, REGEXP_SUBSTR(s,'[^\.]+',1,5) AS between_dots
FROM indata;
-- out just_digits | between_dots
-- out -------------+--------------
-- out 211003 | 211003
I have a string that has this format "number - name" I'm using REGEXP_SUBSTR to split it in two separate columns one for name and one for number.
SELECT
REGEXP_SUBSTR('123 - ABC','[^-]+',1,1) AS NUM,
REGEXP_SUBSTR('123 - ABC','[^-]+',1,2) AS NAME
from dual;
But it doesn't work if the name includes a hyphen for example: ABC-Corp then the name is shown only like 'ABC' instead of 'ABC-Corp'. How can I get a regex exp to ignore everything before the first hypen and include everything after it?
You want to split the string on the first occurence of ' - '. It is a simple enough task to be efficiently performed by string functions rather than regexes:
select
substr(mycol, 1, instr(mycol, ' - ') - 1) num,
substr(mycol, instr(mycol, ' - ') + 3) name
from mytable
Demo on DB Fiddlde:
with mytable as (
select '123 - ABC' mycol from dual
union all select '123 - ABC - Corp' from dual
)
select
mycol,
substr(mycol, 1, instr(mycol, ' - ') - 1) num,
substr(mycol, instr(mycol, ' - ') + 3) name
from mytable
MYCOL | NUM | NAME
:--------------- | :-- | :---------
123 - ABC | 123 | ABC
123 - ABC - Corp | 123 | ABC - Corp
NB: #GMB solution is much better in your simple case. It's an overkill to use regular expressions for that.
tldr;
Usually it's easierr and more readable to use subexpr parameter instead of occurrence in case of such fixed masks. So you can specify full mask: \d+\s*-\s*\S+
ie numbers, then 0 or more whitespace chars, then -, again 0 or more whitespace chars and 1+ non-whitespace characters.
Then we adding () to specify subexpressions: since we need only numbers and trailing non-whitespace characters we puts them into ():
'(\d+)\s*-\s*(\S+)'
Then we just specify which subexpression we need, 1 or 2:
SELECT
REGEXP_SUBSTR(column_value,'(\d+)\s*-\s*(\S+)',1,1,null,1) AS NUM,
REGEXP_SUBSTR(column_value,'(\d+)\s*-\s*(\S+)',1,1,null,2) AS NAME
from table(sys.odcivarchar2list('123 - ABC', '123 - ABC-Corp'));
Result:
NUM NAME
---------- ----------
123 ABC
123 ABC-Corp
https://docs.oracle.com/database/121/SQLRF/functions164.htm#SQLRF06303
https://docs.oracle.com/database/121/SQLRF/ap_posix003.htm#SQLRF55544
I have a string as follows: first, last (123456) the expected result should be 123456. Could someone help me in which direction should I proceed using Oracle?
It will depend on the actual pattern you care about (I assume "first" and "last" aren't literal hard-coded strings), but you will probably want to use regexp_substr.
For example, this matches anything between two brackets (which will work for your example), but you might need more sophisticated criteria if your actual examples have multiple brackets or something.
SELECT regexp_substr(COLUMN_NAME, '\(([^\)]*)\)', 1, 1, 'i', 1)
FROM TABLE_NAME
Your question is ambiguous and needs clarification. Based on your comment it appears you want to select the six digits after the left bracket. You can use the Oracle instr function to find the position of a character in a string, and then feed that into the substr to select your text.
select substr(mycol, instr(mycol, '(') + 1, 6) from mytable
Or if there are a varying number of digits between the brackets:
select substr(mycol, instr(mycol, '(') + 1, instr(mycol, ')') - instr(mycol, '(') - 1) from mytable
Find the last ( and get the sub-string after without the trailing ) and convert that to a number:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE test ( str ) AS
SELECT 'first, last (123456)' FROM DUAL UNION ALL
SELECT 'john, doe (jr) (987654321)' FROM DUAL;
Query 1:
SELECT TO_NUMBER(
TRIM(
TRAILING ')' FROM
SUBSTR(
str,
INSTR( str, '(', -1 ) + 1
)
)
) AS value
FROM test
Results:
| VALUE |
|-----------|
| 123456 |
| 987654321 |