How to extract e-mail from string - sql

I'd like to extract e-mail from string.
I have the string abc defg email#email.com and I would like to get the string email#email.com.
How could I do it in PL / SQL?

Something like this will work for many situations, but is far from perfect. I added one string that demonstrates two different ways in which this may fail, you will notice them. It will not be easy to write a query that catches ALL possible situations; how far you take the further refinement of the "match pattern" depends on how out-of-the-ordinary the emails in your input data may be.
In the regular expression, note that the dot (.) must be escaped with a backslash, and within matching lists (lists of characters in square brackets) the hyphen - must be either the first or the last characters in the list, anywhere else it is a metacharacter.
In the output, notice the last row; the input string is empty, so the output is null as well.
with
input_strings ( str ) as (
select 'sdss abc#gmail.com sdsda sdsds ' from dual union all
select 'pele#1-futbol.br may not work' from dual union all
select 'sql#oracle.com, sam#att.net,solo#violin.com' from dual union all
select '' from dual union all
select 'this string contains no email addresses' from dual union all
select '-this:email#address.illegal_domain' from dual union all
select 'alpha#123.34.23.1 talk#radio#mike.com' from dual
)
select str as original_string,
level as idx,
regexp_substr(str, '[[:alnum:]_-]+#[[:alnum:]_-]+\.[[:alnum:]_-]+', 1, level)
as email_address
from input_strings
connect by regexp_substr(str, '[[:alnum:]_-]+#[[:alnum:]_-]+\.[[:alnum:]_-]+', 1, level)
is not null
and prior str = str
and prior sys_guid() is not null
;
ORIGINAL_STRING IDX EMAIL_ADDRESS
------------------------------------------- ---------- --------------------------------
-this:email#address.illegal_domain 1 email#address.illegal_domain
alpha#123.34.23.1 talk#radio#mike.com 1 alpha#123.34
alpha#123.34.23.1 talk#radio#mike.com 2 radio#mike.com
pele#1-futbol.br may not work 1 pele#1-futbol.br
sdss abc#gmail.com sdsda sdsds 1 abc#gmail.com
sql#oracle.com, sam#att.net,solo#violin.com 1 sql#oracle.com
sql#oracle.com, sam#att.net,solo#violin.com 2 sam#att.net
sql#oracle.com, sam#att.net,solo#violin.com 3 solo#violin.com
this string contains no email addresses 1
1
10 rows selected.

Try this (regular expression) :
select regexp_substr ('sdss abc#gmail.com sdsda sdsds ','[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') email from dual

Related

Extract a number from column which contains a string

So i have a function, which returns a combination of strings (multiple values). I need to extract everything that is followed by char "DL:". But only that.
So before extraction:
**pck_import.GETdocnumber(XML_DATA)**
________________________________________
DL:2212200090001 Pr:8222046017
________________________________________
Obj:020220215541 DL:1099089729
________________________________________
DL:DST22017260
________________________________________
DL:22122000123964 Pr:8222062485
________________________________________
DL:22122000108599
________________________________________
Obj:0202200015539 DL:2100001688
In every case, i'll need the "number" after char "DL:". The "DL:" can be alone, can be at first place (between multiple values), also can be the last string. Also in some cases, the "DL:" value contains char, too.
So, output:
**OUTPUT**
______________
2212200090001
______________
1099089729
______________
DST22017260
______________
22122000123964
______________
22122000108599
______________
2100001688
I tried:
substr(pck_import.GETdocnumber(XML_DATA),
instr(pck_import.GETdocnumber(XML_DATA),
'DL:') + 3))
That returns "Pr:", too.
with s as (
select 'DL:2212200090001 Pr:8222046017' str from dual union all
select 'Obj:020220215541 DL:1099089729' str from dual union all
select 'DL:DST22017260' str from dual union all
select 'DL:22122000123964 Pr:8222062485' str from dual union all
select 'DL:22122000108599' str from dual union all
select 'Obj:0202200015539 DL:2100001688' str from dual)
select str, regexp_substr(str, 'DL:(\S+)', 1, 1, null, 1) rs
from s;
STR RS
------------------------------- -------------------------------
DL:2212200090001 Pr:8222046017 2212200090001
Obj:020220215541 DL:1099089729 1099089729
DL:DST22017260 DST22017260
DL:22122000123964 Pr:8222062485 22122000123964
DL:22122000108599 22122000108599
Obj:0202200015539 DL:2100001688 2100001688
6 rows selected
Something like this?
Sample data:
SQL> with test (col) as
2 (select
3 '________________________________________
4 DL:2212200090001 Pr:8222046017
5 ________________________________________
6 Obj:020220215541 DL:1099089729
7 ________________________________________
8 DL:DST22017260
9 ________________________________________
10 DL:22122000123964 Pr:8222062485
11 ________________________________________
12 DL:22122000108599
13 ________________________________________
14 Obj:0202200015539 DL:2100001688'
15 from dual)
16 --
Query:
17 select replace(regexp_substr(col, 'DL:\w+', 1, level), 'DL:') result
18 from test
19 connect by level <= regexp_count(col, 'DL:');
RESULT
--------------------------------------------------------------------------------
2212200090001
1099089729
DST22017260
22122000123964
22122000108599
2100001688
6 rows selected.
SQL>
(note that query might need to be modified if you'll be dealing with more than a single row of data)
You could achieve this by using regular expressions utilising a positive lookbehind and lookahead.
The regex (?<=DL\:)\d*(?=\s)' matches all digits between DL: until a single whitespace character occurs.
You'd want to use the REGEXP_SUBSTR function for this (as you tagged this question with OracleSQL):
SELECT
REGEXP_SUBSTR(my_column,
'(?<=DL\:)\d*(?=\s)') "DL field"
FROM my_table;
If you want to match substrings like DST22017260 as well, using . (any character) instead of \d would work: (?<=DL\:).*(?=\s).

We need to mask data for the String up to fixed length in Oracle

I am trying to mask the data for the below String :
This is the new ADHAR NUMBER 123456789989 this is the string 3456798983 from Customer Name like 345678 to a String .
In above data I want to mask data starting from ADHAR NUMBER to length up to 60 characters.
OUTPUT :
This is the new *********************************************************Customer Name like 345678 to a String .
Can anyone please help
A little bit of substr + instr does the job (sample data in the first 2 lines; query begins at line #3):
SQL> with test (col) as
2 (select 'This is the new ADHAR NUMBER 123456789989 this is the string 3456798983 from Customer Name like 345678 to a String .' from dual)
3 select substr(col, 1, instr(col, 'ADHAR NUMBER') - 1) ||
4 lpad('*', 60, '*') ||
5 substr(col, instr(col, 'ADHAR NUMBER') + 60) result
6 from test;
RESULT
--------------------------------------------------------------------------------
This is the new ************************************************************ Cus
tomer Name like 345678 to a String .
SQL>
Here is a solution that covers all possibilities (I think). Notice the different inputs in the WITH clause (which is not part of the solution - remove it, and use your actual table and column names in the query). This is how one should test their solutions - consider all possible cases, including NULL input, non-NULL input string that doesn't contain the "magic words", string that has the "magic words" right at the beginning, etc.
There is one important situation the solution does NOT address, namely when the exact substring 'ADHAR NUMBER' is not two full words, but it is part of longer words - for example 'BHADHAR NUMBERS'. In this case the output will look like 'BH****************' masking ADHAR NUMBER and the S after NUMBER and more characters, up to 60 total.
Note that the output string has the same length as the input. This is generally part of the definition of "masking".
with
test (col) as (
select 'This is the new ADHAR NUMBER 123456789989 this is the string ' ||
'3456798983 from Customer Name like 345678 to a String.'
from dual union all
select 'This string does not contain the magic words' from dual union all
select 'ADHAR NUMBER 12345' from dual union all
select 'Blah blah ADHAR NUMBER 1234' from dual union all
select null from dual union all
select 'Another blah ADHAR NUMBER' from dual
)
select case when pos > 0
then
substr(col, 1, pos - 1) ||
rpad('*', least(60, length(col) - pos + 1), '*') ||
substr(col, pos + 60)
else col end as masked
from (
select col, instr(col, 'ADHAR NUMBER') as pos
from test
)
;
MASKED
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is the new ************************************************************ Customer Name like 345678 to a String.
This string does not contain the magic words
******************
Blah blah *****************
Another blah ************

Retrieve the characters before a matching pattern

135 ;1111776698 ;AB555678765
I have the above string and what I am looking for is to retrieve all the digits before the first occurrence of ;.
But the number of characters before the first occurrence of ; varies i.e. it may be a 4 digit number or 3 digit number.
I have played with regex_instr and instr, but I unable to figure this out.
The query should return all the digits before the first occurrence of ;
This answer assumes that you are using Oracle database. I don't know of way to do this using REGEX_INSTR alone, but we can do with REGEXP_REPLACE using capture groups:
SELECT REGEXP_REPLACE('135 ;1111776698 ;AB555678765', '^\s*(\d{3,4})\s*;.*', '\1')
FROM dual;
Demo
Here is the regex pattern being used:
^\s*(\d{3,4})\s*;.*
This allows, from the start of the string, any amount of leading whitespace, followed by a 3 or 4 digit number, followed again by any amount of whitespace, then a semicolon. The .* at the end of the pattern just consumes whatever remains in your string. Note (\d{3,4}), which captures the 3-4 digit number, which is then available in the replacement as \1.
Using INSTR,SUBTSR and TRIM should work ( based on your comment that there are "just white spaces and digits" )
select TRIM(SUBSTR(s,1, INSTR(s,';')-1)) FROM t;
Demo
The following using regexp_substr() should work:
SELECT s, REGEXP_SUBSTR(s, '^[^;]*')
Make sure you try all possible values in that first position, even those you don't expect and make sure they are handled as you want them to be. Always expect the unexpected! This regex matches the first subgroup of zero or more optional digits (allows a NULL to be returned) when followed by an optional space then a semi-colon, or the end of the line. You may need to tighten (or loosen) up the matching rules for your situation, just make sure to test even for incorrect values, especially if the input comes from user-entered data.
with tbl(id, str) as (
select 1, '135 ;1111776698 ;AB555678765' from dual union all
select 2, ' 135 ;1111776698 ;AB555678765' from dual union all
select 3, '135;1111776698 ;AB555678765' from dual union all
select 4, ';1111776698 ;AB555678765' from dual union all
select 5, ';135 ;1111776698 ;AB555678765' from dual union all
select 6, ';;1111776698 ;AB555678765' from dual union all
select 7, 'xx135 ;1111776698 ;AB555678765' from dual union all
select 8, '135;1111776698 ;AB555678765' from dual union all
select 9, '135xx;1111776698 ;AB555678765' from dual
)
select id, regexp_substr(str, '(\d*?)( ?;|$)', 1, 1, NULL, 1) element_1
from tbl
order by id;
ID ELEMENT_1
---------- ------------------------------
1 135
2 135
3 135
4
5
6
7 135
8 135
9
9 rows selected.
To get the desired result, you should use REGEX_SUBSTR as it will substring your desired data from the string you give. Here is the example of the Query.
Solution to your example data:
SELECT REGEXP_SUBSTR('135 ;1111776698 ;AB555678765','[^;]+',1,1) FROM DUAL;
So what it does, Regex splits the string on the basis of ; separator. You needed the first occurrence so I gave arguments as 1,1.
So if you need the second string 1111776698 as your output you can give an argument as 1,2.
The syntax for Regexp_substr is as following:
REGEXP_SUBSTR( string, pattern [, start_position [, nth_appearance [, match_parameter [, sub_expression ] ] ] ] )
Here is the link for more examples:
https://www.techonthenet.com/oracle/functions/regexp_substr.php
Let me know if this works for you. Best luck.

How to search for records with special characters which are not present in either English or Spanish alphabet?

I have a table which has records with special characters in a production environment for data correction. Now the DB can have data which contain either English or Spanish characters. So I need to find only those special characters which do not belong to either of these alphabets. For example, I can have data like the following:
Here the character Ñ is correct because it is a spanish character, but the second one is not. The query I have written is the following but it fetches all the above and not only the second one.
select customerid,customername
from prodschema.prodtable
where not regexp_like(customername, '.*[^a-zA-Z0-9 .{}\[\]].*') and
customernamelike 'YOLANDA RIOS CAS%';
So what should be the correct query for this?
with t as
(
select 'YOLANDA RIOS CASTANO' str from dual
union all select 'YOLANDA RIOS CASTAÑO' str from dual
union all select 'YOLANDA RIOS CASTA°O' str from dual
)
select str,
length(regexp_replace(str, '[a-z[=n=] ]', null, 1, 0, 'i'))
as cnt_not_recognized_chars
from t;
STR CNT_NOT_RECOGNIZED_CHARS
-------------------- ------------------------
YOLANDA RIOS CASTANO
YOLANDA RIOS CASTAÑO
YOLANDA RIOS CASTA°O 1
3 rows selected.
Find additional details here http://docs.oracle.com/cd/E18283_01/server.112/e17118/ap_posix001.htm
Return the ASCII value for your column customername, and remove like that. Just for future you could set the column to have a Accent Sensitivity collation.
SELECT ASCII(CustomerName)
FROM prodschema.prodtable
WHERE ASCII(CustomerName) != Value

SQL How to extract numbers from a string?

I am working on a query in SQL that should be able to extract numbers on different/random lenght from the beginning of the text string.
Text string: 666 devils number is not 8888.
Text string: 12345 devils number is my PIN, that is 6666.
I want to get in a column
666
12345
Use a combination of Substr & instr
SELECT Substr (textstring, 1,instr(textstring,' ') - 1) AS Output
FROM yourtable
Result:
OUTPUT
666
12345
Use this if you have text at the beginning e.g. aa12345 devils number is my PIN, that is 6666. as it utilises the REGEXP_REPLACE function.
SELECT REGEXP_REPLACE(Substr (textstring, 1,instr(textstring,' ') - 1), '[[:alpha:]]','') AS Output
FROM yourtable
SQL Fiddle: http://sqlfiddle.com/#!4/8edc9/1/0
This version utilizes a regular expression which gives you the first number whether or not it's preceded by text and does not use the ghastly nested instr/substr calls:
SQL> with tbl(data) as (
select '666 devils number is not 8888' from dual
union
select '12345 devils number is my PIN, that is 6666' from dual
union
select 'aa12345 devils number is my PIN, that is 6666' from dual
)
select regexp_substr(data, '^\D*(\d+) ', 1, 1, null, 1) first_nbr
from tbl;
FIRST_NBR
---------------------------------------------
12345
666
12345
SQL>