Retrieve the characters before a matching pattern - sql

135 ;1111776698 ;AB555678765
I have the above string and what I am looking for is to retrieve all the digits before the first occurrence of ;.
But the number of characters before the first occurrence of ; varies i.e. it may be a 4 digit number or 3 digit number.
I have played with regex_instr and instr, but I unable to figure this out.
The query should return all the digits before the first occurrence of ;

This answer assumes that you are using Oracle database. I don't know of way to do this using REGEX_INSTR alone, but we can do with REGEXP_REPLACE using capture groups:
SELECT REGEXP_REPLACE('135 ;1111776698 ;AB555678765', '^\s*(\d{3,4})\s*;.*', '\1')
FROM dual;
Demo
Here is the regex pattern being used:
^\s*(\d{3,4})\s*;.*
This allows, from the start of the string, any amount of leading whitespace, followed by a 3 or 4 digit number, followed again by any amount of whitespace, then a semicolon. The .* at the end of the pattern just consumes whatever remains in your string. Note (\d{3,4}), which captures the 3-4 digit number, which is then available in the replacement as \1.

Using INSTR,SUBTSR and TRIM should work ( based on your comment that there are "just white spaces and digits" )
select TRIM(SUBSTR(s,1, INSTR(s,';')-1)) FROM t;
Demo

The following using regexp_substr() should work:
SELECT s, REGEXP_SUBSTR(s, '^[^;]*')

Make sure you try all possible values in that first position, even those you don't expect and make sure they are handled as you want them to be. Always expect the unexpected! This regex matches the first subgroup of zero or more optional digits (allows a NULL to be returned) when followed by an optional space then a semi-colon, or the end of the line. You may need to tighten (or loosen) up the matching rules for your situation, just make sure to test even for incorrect values, especially if the input comes from user-entered data.
with tbl(id, str) as (
select 1, '135 ;1111776698 ;AB555678765' from dual union all
select 2, ' 135 ;1111776698 ;AB555678765' from dual union all
select 3, '135;1111776698 ;AB555678765' from dual union all
select 4, ';1111776698 ;AB555678765' from dual union all
select 5, ';135 ;1111776698 ;AB555678765' from dual union all
select 6, ';;1111776698 ;AB555678765' from dual union all
select 7, 'xx135 ;1111776698 ;AB555678765' from dual union all
select 8, '135;1111776698 ;AB555678765' from dual union all
select 9, '135xx;1111776698 ;AB555678765' from dual
)
select id, regexp_substr(str, '(\d*?)( ?;|$)', 1, 1, NULL, 1) element_1
from tbl
order by id;
ID ELEMENT_1
---------- ------------------------------
1 135
2 135
3 135
4
5
6
7 135
8 135
9
9 rows selected.

To get the desired result, you should use REGEX_SUBSTR as it will substring your desired data from the string you give. Here is the example of the Query.
Solution to your example data:
SELECT REGEXP_SUBSTR('135 ;1111776698 ;AB555678765','[^;]+',1,1) FROM DUAL;
So what it does, Regex splits the string on the basis of ; separator. You needed the first occurrence so I gave arguments as 1,1.
So if you need the second string 1111776698 as your output you can give an argument as 1,2.
The syntax for Regexp_substr is as following:
REGEXP_SUBSTR( string, pattern [, start_position [, nth_appearance [, match_parameter [, sub_expression ] ] ] ] )
Here is the link for more examples:
https://www.techonthenet.com/oracle/functions/regexp_substr.php
Let me know if this works for you. Best luck.

Related

Issues with SUBSTR function Oracle_SQL

I used the SUBSTR function for the similar purposes, but I encountered the following issue:
I am extracting 6 characters from the right, but the data in column is inconsistent and for some rows it has characters less than 6, i.e. 5 or 4. So for such rows, the function returns blanks. How can I fix this?
Example Scenario 1:
SUBSTR('0000123456',-6,6)
Output: 123456
Scenario 2 (how do I fix this?, I need it to return '23456'):
SUBSTR('23456',-6,6)
Output: ""
You can use a case expression: if the string length is strictly greater than 6 then return just the last 6 characters; otherwise return the string itself. This way you don't need to call substr unless it is really needed.
Alternatively, if speed is not the biggest issue and you are allowed to use regular expressions, you can write this more compactly - select between 0 and 6 characters - as many as possible - at the end of the string.
Finally, if you don't mind using undocumented functions, you can use reverse and standard substr (starting from character 1 and extracting the first 6 characters; that will work as expected even if the string has length less than 6). So: reverse the string, extract first (up to) 6 characters, and then reverse again to restore the order. WARNING: This is shown only for fun; DO NOT USE THIS METHOD!
with
test_data (str) as (
select '0123449389' from dual union all
select '00000000' from dual union all
select null from dual union all
select 'abcd' from dual
)
select str,
case when length(str) > 6 then substr(str, -6) else str end as case_substr,
regexp_substr(str, '.{0,6}$') as regexp_substr,
reverse(substr(reverse(str), 1, 6)) as rev_substr
from test_data
;
STR CASE_SUBSTR REGEXP_SUBSTR REV_SUBSTR
---------- ------------- ------------- --------------
0123449389 449389 449389 449389
00000000 000000 000000 000000
abcd abcd abcd abcd
One method uses coalesce():
select coalesce(substr('23456', -6, 6), '23456')
Another tweaks the length:
select substr('23456', greatest(- length('23456'), -6), 6)

How to get first string after character Oracle SQL

I'm trying to get first string after a character.
Example is like
ABCDEF||GHJ||WERT
I need only
GHJ
I tried to use REGEXP but i couldnt do it.
Can anyone help me with please?
Thank you
Somewhat simpler:
SQL> select regexp_substr('ABCDEF||GHJ||WERT', '\w+', 1, 2) result from dual;
^
RES |
--- give me the 2nd "word"
GHJ
SQL>
which reads as: give me the 2nd word out of that string. Won't work properly if GHJ consists of several words (but that's not what your example suggests).
Something like I interpret with a separator in place, In this case it is || or | example is with oracle database
-- pattern -- > [^] represents non-matching character and + for says one or more character followed by ||
-- 3rd parameter --> starting position
-- 4th parameter --> nth occurrence
WITH tbl(str) AS
(SELECT 'ABCDEF||GHJ||WERT' str FROM dual)
SELECT regexp_substr(str
,'[^||]+'
,1
,2) output
FROM tbl;
I think the most general solution is:
WITH tbl(str) AS (
SELECT 'ABCDEF||GHJ||WERT' str FROM dual UNION ALL
SELECT 'ABC|DEF||GHJ||WERT' str FROM dual UNION ALL
SELECT 'ABClDEF||GHJ||WERT' str FROM dual
)
SELECT regexp_replace(str, '^.*\|\|(.*)\|\|.*', '\1')
FROM tbl;
Note that this works even if the individual elements contain punctuation or a single vertical bar -- which the other solutions do not. Here is a comparison.
Presumably, the double vertical bar is being used for maximum flexibility.
You should use regexp_substr function
select regexp_substr('ABCDEF||GHJ||WERT ', '\|{2}([^|]+)', 1, 1, 'i', 1) str
from dual;
STR
---
GHJ

Remove 2 characters in oracle sql

I have a column that contains 12 digits but user wants only to generate a 10 digits.
I tried the trim, ltrim function but nothing work. Below are the queries I tried.
ltrim('10', 'column_name')
ltrim('10', column_name)
ltrim(10, column_name)
For example I have a column that contains a 12 digit number
100000000123
100000000456
100000000789
and the expected result I want is
0000000123
0000000456
0000000789
To extract the last 10 characters of an input string, regardless of how long the string is (so this will work if some inputs have 10 characters, some 12, and some 15 characters), you could use negative starting position in substr:
substr(column_name, -10)
For example:
with
my_table(column_name) as (
select '0123401234' from dual union all
select '0001112223334' from dual union all
select '12345' from dual union all
select '012345012345' from dual
)
select column_name, substr(column_name, -10) as substr
from my_table;
COLUMN_NAME SUBSTR
------------- ----------
0123401234 0123401234
0001112223334 1112223334
12345
012345012345 2345012345
Note in particular the third example. The input has only 5 digits, so obviously you can't get a 10 digit number from it. The result is NULL (undefined).
Note also that if you use something like substr(column_name, 3) you will get just '345' in that case; most likely not the desired result.
try to use SUBSTR(column_name, 2)

How to extract value between 2 slashes

I have a string like "1490/2334/5166400411000434" from which I need to derive value after second slash. I tried below logic
select REGEXP_SUBSTR('1490/2334/5166400411000434','[^/]+',1,3) from dual;
it is working fine. But when i dont have value between first and second slash it is returining blank.
For example my string is "1490//5166400411000434" and am trying
select REGEXP_SUBSTR('1490//5166400411000434','[^/]+',1,3) from dual;
it is returning blank. Please suggest me what i am missing.
If I understand well, you may need
regexp_substr(t, '(([^/]*/){2})([^/]*)', 1, 1, 'i', 3)
This handles the first 2 parts like 'xxx/' and then checks for a sequence of non / characters; the parameter 3 is used to get the 3rd matching subexpression, which is what you want.
For example:
with test(t) as (
select '1490/2334/5166400411000434' from dual union all
select '1490//5166400411000434' from dual union all
select '1490//5166400411000434/ramesh/3344' from dual
)
select t, regexp_substr(t, '(([^/]*/){2})([^/]*)', 1, 1, 'i', 3) as substr
from test
gives:
T SUBSTR
---------------------------------- ----------------------------------
1490/2334/5166400411000434 5166400411000434
1490//5166400411000434 5166400411000434
1490//5166400411000434/ramesh/3344 5166400411000434
You can REVERSE() your string and take the value before the first slash. And then reverse again to obtain the desired output.
select reverse(regexp_substr(reverse('1490//5166400411000434'), '[^/]+', 1, 1)) from dual;
It can also be done with basic substring and instr function:
select reverse(SUBSTR(reverse('1490//5166400411000434'), 0, INSTR(reverse('1490//5166400411000434'), '/')-1)) from dual;
Use other options in REGEXP_SUBSTR to match a pattren
select REGEXP_SUBSTR('1490//5166400411000434','(/\d*)/(\d+)',1,1,'x',2) from dual
Basically it is finding the pattren of two / including digits starting from 1 with 1 appearance and ignoring whitespaces ('x') then outputting 2nd subexpression that is in second expression within ()
... pattern,1,1,'x',subexp2)

Insert character between string Oracle SQL

I need to insert character string after each character in Oracle SQL.
Example:
ABC will A,B,C
DEFG will be D,E,F,G
This question gives only one character in string
Oracle insert character into a string
Edit: As some fellows have mentioned, Oracle does not admit this regex. So my approach would be to do a regex to match all characters, add them a comma after the character and then removing the last comma.
WITH regex AS (SELECT REGEXP_REPLACE('ABC', '(.)', '\1,') as reg FROM dual) SELECT SUBSTR(reg, 1, length(reg)-1) FROM regex;
Note that with the solution of rtrim there could be errors if the string you want to parse has a final ending comma and you don't want to remove it.
Previous solution: (Not working on Oracle)
Check if this does the trick:
SELECT REGEXP_REPLACE('ABC', '(.)(?!$)', '\1,') FROM dual;
It does a regexp_replace of every character, but the last one for the same character followed by a ,
To see how regexp_replace works I recommend you: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
SELECT rtrim(REGEXP_REPLACE('ABC', '(.)', '\1,'),',') "REGEXP_REPLACE" FROM dual;
You could do it using:
REGEXP_REPLACE
RTRIM
For example,
SQL> WITH sample_data AS(
2 SELECT 'ABC' str FROM dual UNION ALL
3 SELECT 'DEFG' str FROM dual UNION ALL
4 SELECT 'XYZ' str FROM dual
5 )
6 -- end of sample_data mimicking a real table
7 SELECT str,
8 rtrim(regexp_replace(str, '(\w?)', '\1,'),',') new_str
9 FROM sample_data;
STR NEW_STR
---- ----------
ABC A,B,C
DEFG D,E,F,G
XYZ X,Y,Z
Since there is no way to negate the end of string in an Oracle regex (that does not support lookarounds), you may use
SELECT REGEXP_REPLACE(
REGEXP_REPLACE('ABC', '([^,])([^,])','\1,\2'),
'([^,])([^,])',
'\1,\2')
AS Result from dual
See the DB Fiddle. The point here is to use REGEXP_REPLACE with ([^,])([^,]) pattern twice to cater for consecutive matches.
The ([^,])([^,]) pattern matches any non-comma char into Group 1 (\1) and then any non-comma char into Group 2 (\2), and inserts a comma in between them.