SQL How to extract numbers from a string? - sql

I am working on a query in SQL that should be able to extract numbers on different/random lenght from the beginning of the text string.
Text string: 666 devils number is not 8888.
Text string: 12345 devils number is my PIN, that is 6666.
I want to get in a column
666
12345

Use a combination of Substr & instr
SELECT Substr (textstring, 1,instr(textstring,' ') - 1) AS Output
FROM yourtable
Result:
OUTPUT
666
12345
Use this if you have text at the beginning e.g. aa12345 devils number is my PIN, that is 6666. as it utilises the REGEXP_REPLACE function.
SELECT REGEXP_REPLACE(Substr (textstring, 1,instr(textstring,' ') - 1), '[[:alpha:]]','') AS Output
FROM yourtable
SQL Fiddle: http://sqlfiddle.com/#!4/8edc9/1/0

This version utilizes a regular expression which gives you the first number whether or not it's preceded by text and does not use the ghastly nested instr/substr calls:
SQL> with tbl(data) as (
select '666 devils number is not 8888' from dual
union
select '12345 devils number is my PIN, that is 6666' from dual
union
select 'aa12345 devils number is my PIN, that is 6666' from dual
)
select regexp_substr(data, '^\D*(\d+) ', 1, 1, null, 1) first_nbr
from tbl;
FIRST_NBR
---------------------------------------------
12345
666
12345
SQL>

Related

USING SQL . extract numbers comma separated from string 'HEADER|N1000|E1001|N1002|E1003|N1004|N1005'

'HEADER|N1000|E1001|N1002|E1003|N1004|N1005'
'HEADER|N156|E1|N7|E122|N4|E5'
'HEADER|E0|E1|E2|E3|E4|E5'
'HEADER|N0|N1|N2|N3|N4|N5'
'HEADER|N125'
How to extract the numbers in comma-separated format from this stringS?
Expected result:
1000,1001,1002,1003,1004,1005
How to extract the numbers with N or E as suffix/prefix ie.
N1000
Expected result:
1000,1002,1004,1005
below regex does not return the result needed. But I want some thing like this
select REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005', '.*?(\d+)', '\1,'), ',?\.*$', '') from dual
the problem here is
when i want numbers with E OR N
select REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005', '.*?N(\d+)', '\1,'), ',?\.*$', '') from dual
select REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005', '.*?E(\d+)', '\1,'), ',?\.*$', '') from dual
they give good results for this scenerio
but when i input 'HEADER|N1000|E1001' it gives wrong answer plzzz verify and correct it
Update
Based on the changes to the question, the original answer is not valid. Instead, the solution is considerably more complex, using a hierarchical query to extract all the numbers from the string and then LISTAGG to put back together a list of numbers extracted from each string. To extract all numbers we use this query:
WITH cte AS (
SELECT DISTINCT data, level AS l, REGEXP_SUBSTR(data, '[NE]\d+', 1, level) AS num FROM test
CONNECT BY REGEXP_SUBSTR(data, '[NE]\d+', 1, level) IS NOT NULL
)
SELECT data, LISTAGG(SUBSTR(num, 2), ',') WITHIN GROUP (ORDER BY l) AS "All numbers"
FROM cte
GROUP BY data
Output (for the new sample data):
DATA All numbers
HEADER|E0|E1|E2|E3|E4|E5 0,1,2,3,4,5
HEADER|N0|N1|N2|N3|N4|N5 0,1,2,3,4,5
HEADER|N1000|E1001|N1002|E1003|N1004|N1005 1000,1001,1002,1003,1004,1005
HEADER|N125 125
HEADER|N156|E1|N7|E122|N4|E5 156,1,7,122,4,5
To select only numbers beginning with E, we modify the query to replace the [EN] in the REGEXP_SUBSTR expressions with just E i.e.
SELECT DISTINCT data, level AS l, REGEXP_SUBSTR(data, 'E\d+', 1, level) AS num FROM test
CONNECT BY REGEXP_SUBSTR(data, 'E\d+', 1, level) IS NOT NULL
Output:
DATA E-numbers
HEADER|E0|E1|E2|E3|E4|E5 0,1,2,3,4,5
HEADER|N0|N1|N2|N3|N4|N5
HEADER|N1000|E1001|N1002|E1003|N1004|N1005 1001,1003
HEADER|N125
HEADER|N156|E1|N7|E122|N4|E5 1,122,5
A similar change can be made to extract numbers commencing with N.
Demo on dbfiddle
Original Answer
One way to achieve your desired result is to replace a string of characters leading up to a number with that number and a comma, and then replace any characters from the last ,| to the end of string from the result:
SELECT REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '.*?(\d+)', '\1,'), ',?\|.*$', '') FROM dual
Output:
1000,1001,1002,1003,1004,1005
To only output the numbers beginning with N, we add that to the prefix string before the capture group:
SELECT REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '.*?N(\d+)', '\1,'), ',?\|.*$', '') FROM dual
Output:
1000,1002,1004,1005
To only output the numbers beginning with E, we add that to the prefix string before the capture group:
SELECT REGEXP_REPLACE(REGEXP_REPLACE('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '.*?E(\d+)', '\1,'), ',?\|.*$', '') FROM dual
Output:
1001,1003
Demo on dbfiddle
I don't know what DBMS you are using, but here's one way to do it in Postgres:
WITH cte AS (
SELECT CAST('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|' AS VARCHAR(1000)) AS myValue
)
SELECT SUBSTRING(MyVal FROM 2)
FROM (
SELECT REGEXP_SPLIT_TO_TABLE(myValue,'\|') MyVal
FROM cte
) src
WHERE SUBSTRING(MyVal FROM 1 FOR 1) = 'N'
;
SQL Fiddle
As Far as I have understood the question , you want to extract substrings starting with N from the string, You can try following (And then you can merge the output seperated by commas if needed)
select REPLACE(value, 'N', '')
from STRING_SPLIT('HEADER|N1000|E1001|N1002|E1003|N1004|N1005|', '|')
where value like 'N%'
OutPut :
1000
1002
1004
1005

How to remove only letters from a string in BigQuery?

So I'm working with BigQuery SQL right now trying to figure out how to remove letters but keep numeric numbers. For example:
XXXX123456
AAAA123456789
XYZR12345678
ABCD1234567
1111
2222
All have the same amount of letters in front of the numbers along with regular numbers no letters. I want the end result to look like:
123456
123456789
12345678
1234567
1111
2222
I tried using PATINDEX but BigQuery doesn't support the function. I've also tried using LEFT but that function will get rid of any value and I don't want to get rid of any numeric value only letter values. Any help would be much appreciated!
-Maykid
You can use regexp_replace():
select regexp_replace(str, '[^0-9]', '')
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'XXXX123456' str UNION ALL
SELECT 'AAAA123456789' UNION ALL
SELECT 'XYZR12345678' UNION ALL
SELECT 'ABCD1234567' UNION ALL
SELECT '1111' UNION ALL
SELECT '2222'
)
SELECT str, REGEXP_REPLACE(str, r'[a-zA-Z]', '') str_adjusted
FROM `project.dataset.table`

Oracle SQL query to convert a string into a comma separated string with comma after every n characters

How can we convert a string of any length into a comma separated string with comma after every n characters. I am using Oracle 10g and above. I tried with REGEXP_SUBSTR but couldn't get desired result.
e.g.: for below string comma after every 5 characters.
input:
aaaaabbbbbcccccdddddeeeeefffff
output:
aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff,
or
aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff
Thanks in advance.
This can be done with regexp_replace, like so:
WITH sample_data AS (SELECT 'aaaaabbbbbcccccdddddeeeeefffff' str FROM dual UNION ALL
SELECT 'aaaa' str FROM dual UNION ALL
SELECT 'aaaaabb' str FROM dual)
SELECT str,
regexp_replace(str, '(.{5})', '\1,')
FROM sample_data;
STR REGEXP_REPLACE(STR,'(.{5})','\
------------------------------ --------------------------------------------------------------------------------
aaaaabbbbbcccccdddddeeeeefffff aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff,
aaaa aaaa
aaaaabb aaaaa,bb
The regexp_replace simply looks for any 5 characters (.{5}), and then replaces them with the same 5 characters plus a comma. The brackets around the .{5} turn it into a labelled subexpression - \1, since it's the first set of brackets - which we can then use to represent our 5 characters in the replacement section.
You would then need to trim the extra comma off the resultant string, if necessary.
SELECT RTRIM ( REGEXP_REPLACE('aaaaabbbbbcccccdddddeeeeefffff', '(.{5})' ,'\1,') ,',') replaced
FROM DUAL;
This worked for me:
WITH strlen AS
(
SELECT 'aaaaabbbbbcccccdddddeeeeefffffggggg' AS input,
LENGTH('aaaaabbbbbcccccdddddeeeeefffffggggg') AS LEN,
5 AS part
FROM dual
)
,
pattern AS
(
SELECT regexp_substr(strlen.input, '[[:alnum:]]{5}', 1, LEVEL)
||',' AS line
FROM strlen,
dual
CONNECT BY LEVEL <= strlen.len / strlen.part
)
SELECT rtrim(listagg(line, '') WITHIN GROUP (
ORDER BY 1), ',') AS big_bang$
FROM pattern ;

How to extract e-mail from string

I'd like to extract e-mail from string.
I have the string abc defg email#email.com and I would like to get the string email#email.com.
How could I do it in PL / SQL?
Something like this will work for many situations, but is far from perfect. I added one string that demonstrates two different ways in which this may fail, you will notice them. It will not be easy to write a query that catches ALL possible situations; how far you take the further refinement of the "match pattern" depends on how out-of-the-ordinary the emails in your input data may be.
In the regular expression, note that the dot (.) must be escaped with a backslash, and within matching lists (lists of characters in square brackets) the hyphen - must be either the first or the last characters in the list, anywhere else it is a metacharacter.
In the output, notice the last row; the input string is empty, so the output is null as well.
with
input_strings ( str ) as (
select 'sdss abc#gmail.com sdsda sdsds ' from dual union all
select 'pele#1-futbol.br may not work' from dual union all
select 'sql#oracle.com, sam#att.net,solo#violin.com' from dual union all
select '' from dual union all
select 'this string contains no email addresses' from dual union all
select '-this:email#address.illegal_domain' from dual union all
select 'alpha#123.34.23.1 talk#radio#mike.com' from dual
)
select str as original_string,
level as idx,
regexp_substr(str, '[[:alnum:]_-]+#[[:alnum:]_-]+\.[[:alnum:]_-]+', 1, level)
as email_address
from input_strings
connect by regexp_substr(str, '[[:alnum:]_-]+#[[:alnum:]_-]+\.[[:alnum:]_-]+', 1, level)
is not null
and prior str = str
and prior sys_guid() is not null
;
ORIGINAL_STRING IDX EMAIL_ADDRESS
------------------------------------------- ---------- --------------------------------
-this:email#address.illegal_domain 1 email#address.illegal_domain
alpha#123.34.23.1 talk#radio#mike.com 1 alpha#123.34
alpha#123.34.23.1 talk#radio#mike.com 2 radio#mike.com
pele#1-futbol.br may not work 1 pele#1-futbol.br
sdss abc#gmail.com sdsda sdsds 1 abc#gmail.com
sql#oracle.com, sam#att.net,solo#violin.com 1 sql#oracle.com
sql#oracle.com, sam#att.net,solo#violin.com 2 sam#att.net
sql#oracle.com, sam#att.net,solo#violin.com 3 solo#violin.com
this string contains no email addresses 1
1
10 rows selected.
Try this (regular expression) :
select regexp_substr ('sdss abc#gmail.com sdsda sdsds ','[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') email from dual

PL SQL - Trimming a string proceedurally

I need a way to trim a string in PL/SQL based on the location of the last commas in the string. However, there is no uniform format for the incoming strings, and I can't find a way to trim the string effectively.
HU-15-02 | HU, NYI, HAA East (should be trimmed to just HAA East)
MX-01-05 | MX, 01-05, OFFICES (OFFICES)
DK-94-02 | DK, ViewCom (VIEWCOM)
the format is country code, followed by a building ID (if applicable), followed by the name of the building (which is what I want)
Get the location of last comma by counting back from the end of the string and then trim from the comma plus space forward
select substr(your_text,INSTR(your_text,',',-1) +2)
from your_table;
REGEXP_SUBSTR() to the rescue:
SQL> with tbl(str) as (
select 'HU, NYI, HAA East' from dual
union
select 'MX-01-05 | MX, 01-05, OFFICES' from dual
union
select 'DK-94-02 | DK, ViewCom' from dual
)
select regexp_substr(str, '^.*, (.*)$', 1, 1, null, 1) bldg_name
from tbl;
BLDG_NAME
-----------------------------
ViewCom
HAA East
OFFICES
SQL>