SQL SELECT with bad data - sql

If a column has bad data such as:
45612345698
(456)123-7452
125-145-9856
Without fixing the data. Is it possible to have a sql query of 1251459856 which then would return the 3rd item in the column?

Hmmm . . . you could use replace():
where replace(replace(replace(col, '-', ''), '(', ''), ')', '') = '1251459856'

If your data is worse than just "-",")" and "(" you could go for a more generic solution and strip on any non-numeric character with the following
WITH sample_data_tab (str) AS
(
SELECT '45612345698' FROM DUAL UNION
SELECT '(456)123-7452' FROM DUAL UNION
SELECT '125-145-9856' FROM DUAL UNION
SELECT '989 145 9856' FROM DUAL
)
SELECT regexp_replace(str, '[^0-9]', '') FROM sample_data_tab

Related

How can I do a replace with values from a table

This is my current code, what I want to do is rather than hard code this replace is put those values in a table and use those values to do the replace without a while or cursor. Keep in mind multiple replaces may happen to the same field for instance Mr. Guy would replace the "." but then would also need to replace "Mr ".
SELECT
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(di.FirstName), '.', ''), ',', ''), 'Mr ', ''), 'Dr ', ''), 'Mrs ', ''), 'Ms', '')
FROM core..asdf di
If your DBMS supports both GROUP_CONCAT() (or equivalent, like LISTAGG() in Vertica),
You can
create an in-line table with the titles you want to remove,
group-concatenate that in-line table into a single string, bar separated
surround that bar-separated list by rounded parentheses, and add '\b' for "word" boundary, '\.?' meaning zero or one times the dot character (and not any character), and '\s*' for one or more white spaces
and finally use that regular expression you just created on a REGEXP_REPLACE() call.
WITH
indata(fname) AS (
SELECT 'Mr Arthur'
UNION ALL SELECT 'Mrs Tricia'
UNION ALL SELECT 'Ms Eccentrica'
UNION ALL SELECT 'Dr Gag'
UNION ALL SELECT 'Mr. Arthur'
UNION ALL SELECT 'Mrs. Tricia'
UNION ALL SELECT 'Ms. Eccentrica'
UNION ALL SELECT 'Dr. Gag'
)
,
titles(title) AS (
SELECT 'Mr'
UNION ALL SELECT 'Mrs'
UNION ALL SELECT 'Ms'
UNION ALL SELECT 'Dr'
)
,
regx(regx) AS (
SELECT
'('||LISTAGG(title USING PARAMETERS separator='|')||')\b\.?\s*'
-- OR GROUP_CONCAT(title,',') in other DBMSs ...
FROM titles
)
-- control query ...
-- SELECT * FROM regx;
-- out regx
-- out ----------------------
-- out (Mr|Mrs|Ms|Dr)\.?\s*
SELECT
REGEXP_REPLACE(fname,regx) AS fname
FROM indata CROSS JOIN regx;
-- out fname
-- out ------------
-- out Arthur
-- out Tricia
-- out Eccentrica
-- out Gag
-- out Arthur
-- out Tricia
-- out Eccentrica
-- out Gag

Substring from underscore and onwards in Oracle

I have a string with under score and some characters. I need to apply substring and get values to the left excluding underscore. So I applied below formula and its working correctly for those strings which have underscore (_). But for strings without (_) it is bringing NULL. Any suggestions how this can be handled in the substring itself.
Ex: ABC_BASL ---> Works correctly; ABC ---> gives null
My formula as below -
select SUBSTR('ABC_BAS',1,INSTR('ABC_BAS','_')-1) from dual;
ABC
select SUBSTR('ABC',1,INSTR('ABC','_')-1) from dual;
(NULL)
You could use a CASE expression to first check for an underscore:
WITH yourTable AS (
SELECT 'ABC_BAS' AS col FROM dual UNION ALL
SELECT 'ABC' FROM dual
)
SELECT
CASE WHEN col LIKE '%\_%' ESCAPE '\'
THEN SUBSTR(col, 1, INSTR(col, '_') - 1)
ELSE col END AS col_out
FROM yourTable;
Use regular expression matching:
SELECT REGEXP_SUBSTR('ABC_BAS', '(.*)([_]|$)?', 1, 1, NULL, 1) FROM DUAL;
returns 'ABC', and
SELECT REGEXP_SUBSTR('ABC', '(.*)([_]|$)?', 1, 1, NULL, 1) FROM DUAL;
also returns 'ABC'.
db<>fiddle here
EDIT
The above gives correct results, but I missed the easiest possible regular expression to do the job:
SELECT REGEXP_SUBSTR('ABC_BAS', '[^_]*') FROM DUAL;
returns 'ABC', as does
SELECT REGEXP_SUBSTR('ABC', '[^_]*') FROM DUAL;
db<>fiddle here
Yet another approach is to use the DECODE in the length parameter of the substr as follows:
substr(str,
1,
decode(instr(str,'_'), 0, lenght(str), instr(str,'_') - 1)
)
You seem to want everything up to the first '_'. If so, one method usesregexp_replace():
select regexp_replace(str, '(^[^_]+)_.*$', '\1')
from (select 'ABC' as str from dual union all
select 'ABC_BAS' from dual
) s
A simpler method is:
select regexp_substr(str, '^[^_]+')
from (select 'ABC' as str from dual union all
select 'ABC_BAS' from dual
) s
Here is a db<>fiddle.
I'd use
regexp_replace(text,'_.*')
or if performance was a concern,
substr(text, 1, instr(text||'_', '_') -1)
For example,
with demo(text) as
( select column_value
from table(sys.dbms_debug_vc2coll('ABC', 'ABC_DEF', 'ABC_DEF_GHI')) )
select text
, regexp_replace(text,'_.*')
, substr(text, 1, instr(text||'_', '_') -1)
from demo;
TEXT REGEXP_REPLACE(TEXT,'_.*') SUBSTR(TEXT,1,INSTR(TEXT||'_','_')-1)
------------ --------------------------- -------------------------------------
ABC ABC ABC
ABC_DEF ABC ABC
ABC_DEF_GHI ABC ABC
Ok i think i got it. Add nvl to the substring and insert the condition as below -
select nvl(substr('ABC',1,instr('F4001Z','_')-1),'ABC') from dual;

how to fetch specific number of characters before the pattern

I have values like the below in my table:
SER : 3-576509910214, 4182 5979WM
I need to remove the white spaces first. then fetch the 8 numbers before or after the occurrence of matching string 'WM'. I need the output for the above value as '41825979'. I need to fetch 8 numbers after each occurrence of 'WM'.
WM can occur anywhere in the string.
How can I do that in with an Oracle SQL query?
This will return a string of digits up to 8 long from such a string:
select replace(regexp_substr(replace(str, ' ', ''), '[0-9]{1,8}WM'), 'WM', '')
If you want before and after, just modify the pattern:
select replace(regexp_substr(replace(str, ' ', ''), '[0-9]{1,8}WM|WM[0-9]{1-8}'), 'WM', '')
To take 8 digits before or after WM (after spaces are removed), use following:
WITH Demo(t) AS
(
SELECT 'SER : 3-576509910214, 4182 5979WM' FROM dual
UNION ALL
SELECT 'SER : 3-576509910214, 4182 5 979 WM' FROM dual
UNION ALL
SELECT 'SER : 3-576509910214,WM 4182 5979' FROM dual
)
SELECT
REPLACE(COALESCE(
REGEXP_SUBSTR(REPLACE(t, ' ', ''), '[0-9]{8}WM'),
REGEXP_SUBSTR(REPLACE(t, ' ', ''), 'WM[0-9]{8}')
), 'WM', '')
FROM Demo

sql separators (DB2 but ORACLE can be too)

I need help with separators in sql. I'm working on DB2 but Oracle is also good.
I need to build query where I've got data in format: aaa.bbb.ccc.ddd#domain.com
where 'aaa', 'bbb', 'ccc', 'ddd' got not constant length. Query should return bbb and ddd. In DB2 I can cut '#domain.com' which takes me really long line. Rest I have no idea how to move. I tried with SUBSTR but nothing has work like it should nad my queries are super long.
I need query not block.
EXAMPLE:
data in column:
John.W.Smith.JWS1#domain.com
Alexia.Nova.Alnov#domain.com
Martha.Heart.Martha2#domain.com
etc.
In general I need to get data from between 1st and 2nd separator . and the one which is in front of #.
I'm sure someone will have some clever REGEX way of doing it, but this is one way to do it.
with test as
( select 'John.W.Smith.JWS1#domain.com' col1 from dual union all
select 'Alexia.Nova.Alnov#domain.com' from dual union all
select 'Martha.Heart.Martha2#domain.com' from dual
)
select col1
, substr( col1, 1, dot_one-1 ) f1
, substr( col1, dot_one+1, dot_two - dot_one -1 ) f2
, no_domain
, substr( no_domain, dot_before_at+1 ) f3
from
(
select col1
,instr( col1, '#', -1 ) at_pos
,instr( col1, '.',1,1) dot_one
,instr( col1, '.',1,2) dot_two
,substr( col1, 1, instr(col1, '#', -1 )-1) no_domain
,instr( substr( col1, 1, instr( col1, '#', -1 ) -1 ) , '.', -1 ) dot_before_at
from test
)

Oracle split by regex and aggregate again

I have a table from where I need to get only some part of record with comma after one part of record.
for example I have
ABCD [1000-1987] BCD[101928-876] adgs[10987-786]
I want to get the record like :
1000-1987,101928-876,10987-786
Can you please help me out to get the record as mentioned.
If you don't use 11g and do not want to use wm_concat:
WITH
my_data AS (
SELECT 'ABCD [1000-1987] BCD[101928-876] adgs[10987-786]' AS val FROM dual
)
SELECT
ltrim(
MAX(
sys_connect_by_path(
rtrim(ltrim(regexp_substr(val, '\[[0-9-]*\]', 1, level, NULL), '['), ']'),
',')
),
',') AS val_part
FROM my_data
CONNECT BY regexp_substr(val, '\[[0-9-]*\]', 1, level, NULL) IS NOT NULL
;
If using wm_concat is ok for you:
WITH
my_data AS (
SELECT 'ABCD [1000-1987] BCD[101928-876] adgs[10987-786]' AS val FROM dual
)
SELECT
wm_concat(rtrim(ltrim(regexp_substr(val, '\[[0-9-]*\]', 1, level, NULL), '['), ']')) AS val_part
FROM my_data
CONNECT BY regexp_substr(val, '\[[0-9-]*\]', 1, level, NULL) IS NOT NULL
;
If you use 11g:
WITH
my_data AS (
SELECT 'ABCD [1000-1987] BCD[101928-876] adgs[10987-786]' AS val FROM dual
)
SELECT
listagg(regexp_substr(val, '[a-b ]*\[([0-9-]*)\] ?', 1, level, 'i', 1), ',') WITHIN GROUP (ORDER BY 1) AS val_part
FROM my_data
CONNECT BY regexp_substr(val, '[a-b ]*\[([0-9-]*)\] ?', 1, level, 'i', 1) IS NOT NULL
;
Read more about string aggregation techniques: Tim Hall about aggregation techniques
Read more about regexp_substr: regexp_substr - Oracle Documentation - 10g
Read more about regexp_substr: regexp_substr - Oracle Documentation - 11g
You don't have to split and then aggregate it. You can use regexp_replace to keep only those characters within square brackets, then replace the square brackets by comma.
WITH my_data
AS (SELECT 'ABCD [1000-1987] BCD[101928-876] adgs[10987-786]' AS val
FROM DUAL)
SELECT RTRIM (
REPLACE (
REGEXP_REPLACE (val, '(\[)(.*?\])|(.)', '\2'),
']', ','),
',')
FROM my_data;