Extract a number from column which contains a string - sql

So i have a function, which returns a combination of strings (multiple values). I need to extract everything that is followed by char "DL:". But only that.
So before extraction:
**pck_import.GETdocnumber(XML_DATA)**
________________________________________
DL:2212200090001 Pr:8222046017
________________________________________
Obj:020220215541 DL:1099089729
________________________________________
DL:DST22017260
________________________________________
DL:22122000123964 Pr:8222062485
________________________________________
DL:22122000108599
________________________________________
Obj:0202200015539 DL:2100001688
In every case, i'll need the "number" after char "DL:". The "DL:" can be alone, can be at first place (between multiple values), also can be the last string. Also in some cases, the "DL:" value contains char, too.
So, output:
**OUTPUT**
______________
2212200090001
______________
1099089729
______________
DST22017260
______________
22122000123964
______________
22122000108599
______________
2100001688
I tried:
substr(pck_import.GETdocnumber(XML_DATA),
instr(pck_import.GETdocnumber(XML_DATA),
'DL:') + 3))
That returns "Pr:", too.

with s as (
select 'DL:2212200090001 Pr:8222046017' str from dual union all
select 'Obj:020220215541 DL:1099089729' str from dual union all
select 'DL:DST22017260' str from dual union all
select 'DL:22122000123964 Pr:8222062485' str from dual union all
select 'DL:22122000108599' str from dual union all
select 'Obj:0202200015539 DL:2100001688' str from dual)
select str, regexp_substr(str, 'DL:(\S+)', 1, 1, null, 1) rs
from s;
STR RS
------------------------------- -------------------------------
DL:2212200090001 Pr:8222046017 2212200090001
Obj:020220215541 DL:1099089729 1099089729
DL:DST22017260 DST22017260
DL:22122000123964 Pr:8222062485 22122000123964
DL:22122000108599 22122000108599
Obj:0202200015539 DL:2100001688 2100001688
6 rows selected

Something like this?
Sample data:
SQL> with test (col) as
2 (select
3 '________________________________________
4 DL:2212200090001 Pr:8222046017
5 ________________________________________
6 Obj:020220215541 DL:1099089729
7 ________________________________________
8 DL:DST22017260
9 ________________________________________
10 DL:22122000123964 Pr:8222062485
11 ________________________________________
12 DL:22122000108599
13 ________________________________________
14 Obj:0202200015539 DL:2100001688'
15 from dual)
16 --
Query:
17 select replace(regexp_substr(col, 'DL:\w+', 1, level), 'DL:') result
18 from test
19 connect by level <= regexp_count(col, 'DL:');
RESULT
--------------------------------------------------------------------------------
2212200090001
1099089729
DST22017260
22122000123964
22122000108599
2100001688
6 rows selected.
SQL>
(note that query might need to be modified if you'll be dealing with more than a single row of data)

You could achieve this by using regular expressions utilising a positive lookbehind and lookahead.
The regex (?<=DL\:)\d*(?=\s)' matches all digits between DL: until a single whitespace character occurs.
You'd want to use the REGEXP_SUBSTR function for this (as you tagged this question with OracleSQL):
SELECT
REGEXP_SUBSTR(my_column,
'(?<=DL\:)\d*(?=\s)') "DL field"
FROM my_table;
If you want to match substrings like DST22017260 as well, using . (any character) instead of \d would work: (?<=DL\:).*(?=\s).

Related

Oracle replace all characters before "." dot

I need to replace all characters with nothing before the . character and also replace all [ and ] with nothing.
Please see examples below:
from
to
[PINWHEEL_ASSET].[MX5530]
MX5530
[PINWHEEL_TRADE].[AR5403]
AR5403
The parts before and after the . dot are variables.
with
sample_data (my_string) as (
select '[PINWHEEL_ASSET].[MX5530]' from dual
)
select rtrim(substr(my_string, instr(my_string, '.') + 2), ']') as second_part
from sample_data
;
SECOND_PART
-----------
MX5530
This assumes that the input string looks exactly like this: [first].[second], where "first" and "second" are (possibly empty) strings that do not contain periods or closing brackets.
Yet another option is to use regular expressions (see line #6).
Sample data:
SQL> with test (col) as
2 (select '[PINWHEEL_ASSET].[MX5530]' from dual union all
3 select '[PINWHEEL_TRADE].[AR5403]' from dual
4 )
Query begins here:
5 select col,
6 regexp_substr(col, '\w+', 1, 2) result
7 from test;
COL RESULT
------------------------- --------------------
[PINWHEEL_ASSET].[MX5530] MX5530
[PINWHEEL_TRADE].[AR5403] AR5403
SQL>

ORACLE SQL - REGEXP_LIKE Contains First Character As a Number and Second Character as an Alphabet

I am trying to generate a query in Oracle where i can get records that has first character in String as 3 or 4 AND second character is an alphabet. The rest can be anything else.
Something like this
SELECT COL1 FROM TABLE
WHERE REGEXP_LIKE (COL1, '3[A-Za-Z]')
OR REGEXP_LIKE (COL1, '4[A-Za-z]')
I Do get the output but for few records the data doesn't start with 3 or 4.
Meaning it selects those records who have 3 and An alphabet together anywhere in the column.
ex: 10573T2 (10573T2). I have to query records that should start with either 3 or 4 and the next character should be a letter.
Any help would be great
SQL> with test (col) as
2 (select '10573T2' from dual union all
3 select '3A1234F' from dual union all
4 select '23XXX02' from dual union all
5 select '4GABC23' from dual union all
6 select '31234FX' from dual
7 )
8 select col
9 from test
10 where regexp_like(col, '(^3|^4)[[:alpha:]]');
COL
-------
3A1234F
4GABC23
SQL>
begins ^ with 3 or | 4
and is followed by a letter [[:alpha:]]
As of your ^ doubts: that character has two roles:
[^ ... ] - Non-Matching Character List: matches any character not in list ...
^ - Beginning of Line Anchor: match the subsequent expression only when it occurs at the beginning of a line.
You need to anchor the pattern at the beginning of the string:
REGEXP_LIKE(COL1, '^[34][A-Za-z]')
Here is a db<>fiddle

Retrieve the characters before a matching pattern

135 ;1111776698 ;AB555678765
I have the above string and what I am looking for is to retrieve all the digits before the first occurrence of ;.
But the number of characters before the first occurrence of ; varies i.e. it may be a 4 digit number or 3 digit number.
I have played with regex_instr and instr, but I unable to figure this out.
The query should return all the digits before the first occurrence of ;
This answer assumes that you are using Oracle database. I don't know of way to do this using REGEX_INSTR alone, but we can do with REGEXP_REPLACE using capture groups:
SELECT REGEXP_REPLACE('135 ;1111776698 ;AB555678765', '^\s*(\d{3,4})\s*;.*', '\1')
FROM dual;
Demo
Here is the regex pattern being used:
^\s*(\d{3,4})\s*;.*
This allows, from the start of the string, any amount of leading whitespace, followed by a 3 or 4 digit number, followed again by any amount of whitespace, then a semicolon. The .* at the end of the pattern just consumes whatever remains in your string. Note (\d{3,4}), which captures the 3-4 digit number, which is then available in the replacement as \1.
Using INSTR,SUBTSR and TRIM should work ( based on your comment that there are "just white spaces and digits" )
select TRIM(SUBSTR(s,1, INSTR(s,';')-1)) FROM t;
Demo
The following using regexp_substr() should work:
SELECT s, REGEXP_SUBSTR(s, '^[^;]*')
Make sure you try all possible values in that first position, even those you don't expect and make sure they are handled as you want them to be. Always expect the unexpected! This regex matches the first subgroup of zero or more optional digits (allows a NULL to be returned) when followed by an optional space then a semi-colon, or the end of the line. You may need to tighten (or loosen) up the matching rules for your situation, just make sure to test even for incorrect values, especially if the input comes from user-entered data.
with tbl(id, str) as (
select 1, '135 ;1111776698 ;AB555678765' from dual union all
select 2, ' 135 ;1111776698 ;AB555678765' from dual union all
select 3, '135;1111776698 ;AB555678765' from dual union all
select 4, ';1111776698 ;AB555678765' from dual union all
select 5, ';135 ;1111776698 ;AB555678765' from dual union all
select 6, ';;1111776698 ;AB555678765' from dual union all
select 7, 'xx135 ;1111776698 ;AB555678765' from dual union all
select 8, '135;1111776698 ;AB555678765' from dual union all
select 9, '135xx;1111776698 ;AB555678765' from dual
)
select id, regexp_substr(str, '(\d*?)( ?;|$)', 1, 1, NULL, 1) element_1
from tbl
order by id;
ID ELEMENT_1
---------- ------------------------------
1 135
2 135
3 135
4
5
6
7 135
8 135
9
9 rows selected.
To get the desired result, you should use REGEX_SUBSTR as it will substring your desired data from the string you give. Here is the example of the Query.
Solution to your example data:
SELECT REGEXP_SUBSTR('135 ;1111776698 ;AB555678765','[^;]+',1,1) FROM DUAL;
So what it does, Regex splits the string on the basis of ; separator. You needed the first occurrence so I gave arguments as 1,1.
So if you need the second string 1111776698 as your output you can give an argument as 1,2.
The syntax for Regexp_substr is as following:
REGEXP_SUBSTR( string, pattern [, start_position [, nth_appearance [, match_parameter [, sub_expression ] ] ] ] )
Here is the link for more examples:
https://www.techonthenet.com/oracle/functions/regexp_substr.php
Let me know if this works for you. Best luck.

How to extract e-mail from string

I'd like to extract e-mail from string.
I have the string abc defg email#email.com and I would like to get the string email#email.com.
How could I do it in PL / SQL?
Something like this will work for many situations, but is far from perfect. I added one string that demonstrates two different ways in which this may fail, you will notice them. It will not be easy to write a query that catches ALL possible situations; how far you take the further refinement of the "match pattern" depends on how out-of-the-ordinary the emails in your input data may be.
In the regular expression, note that the dot (.) must be escaped with a backslash, and within matching lists (lists of characters in square brackets) the hyphen - must be either the first or the last characters in the list, anywhere else it is a metacharacter.
In the output, notice the last row; the input string is empty, so the output is null as well.
with
input_strings ( str ) as (
select 'sdss abc#gmail.com sdsda sdsds ' from dual union all
select 'pele#1-futbol.br may not work' from dual union all
select 'sql#oracle.com, sam#att.net,solo#violin.com' from dual union all
select '' from dual union all
select 'this string contains no email addresses' from dual union all
select '-this:email#address.illegal_domain' from dual union all
select 'alpha#123.34.23.1 talk#radio#mike.com' from dual
)
select str as original_string,
level as idx,
regexp_substr(str, '[[:alnum:]_-]+#[[:alnum:]_-]+\.[[:alnum:]_-]+', 1, level)
as email_address
from input_strings
connect by regexp_substr(str, '[[:alnum:]_-]+#[[:alnum:]_-]+\.[[:alnum:]_-]+', 1, level)
is not null
and prior str = str
and prior sys_guid() is not null
;
ORIGINAL_STRING IDX EMAIL_ADDRESS
------------------------------------------- ---------- --------------------------------
-this:email#address.illegal_domain 1 email#address.illegal_domain
alpha#123.34.23.1 talk#radio#mike.com 1 alpha#123.34
alpha#123.34.23.1 talk#radio#mike.com 2 radio#mike.com
pele#1-futbol.br may not work 1 pele#1-futbol.br
sdss abc#gmail.com sdsda sdsds 1 abc#gmail.com
sql#oracle.com, sam#att.net,solo#violin.com 1 sql#oracle.com
sql#oracle.com, sam#att.net,solo#violin.com 2 sam#att.net
sql#oracle.com, sam#att.net,solo#violin.com 3 solo#violin.com
this string contains no email addresses 1
1
10 rows selected.
Try this (regular expression) :
select regexp_substr ('sdss abc#gmail.com sdsda sdsds ','[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') email from dual

"Truncate" String Column in another column?

I have a table with a column containing String values. The String values always end with a letter "T" as the last character, a space " " and a number is right after the string:
StringColumn
"asdjadhasdT 32 asjashudT 2"
"tytweytwe aweriuhfT 23"
"ajkjsdT 6 asdajkdjkjT 1445"
"kjkasd aaassT 980"
I would like to get the number in another column.
In other words:
StringColumn | ColumnValues
"asdjadhasdT 32 asjashudT 2" | 2
"tytweytwe aweriuhfT 23" | 23
"ajkjsdT 6 asdajkdjkjT 1445" | 1445
"kjkasd aaassT 980" | 980
It looks like you also have a space after the 'T'. Here is one approach:
select StringColumn, substr(StringColumn, 2 - instr(reverse(StringColumn), 'T')) as Values
from . . .
This finds the position of 'T' in the reversed string, and then takes that many characters minus two from the end of the string.
EDIT:
with t as (
select 'asdjadhasdT 32 asjashudT 2' as StringColumn from dual union all
select 'tytweytwe aweriuhfT 23' as StringColumn from dual union all
select 'ajkjsdT 6 asdajkdjkjT 1445' as StringColumn from dual union all
select 'kjkasd aaassT 980' as StringColumn from dual
)
select StringColumn,
substr(StringColumn, 2-instr(reverse(StringColumn), 'T')) as "Values"
from t;
SQL Fiddle is here.
The problem with the first version is that Values is a reserved word in Oracle, so the query fails to compile.
i took the input strings provided and
got the position of last occurence of 'T' and then took a substring below is the Query :
SELECT StringColumn,Trim(SubStr(StringColumn,INSTR(StringColumn,'T',-1 )+1)) FROM test;
Thanks,
Koustubh Avadhani