Oracle SQL query to find special characters for phone numbers - sql

I am trying to write a query to find special character for phone numbers.
Expected phone number is : 2047653894
Actual: 204765389(4, 204-7653894, -2047653894, (204)7653894, 20476+53894, ....
NOTE: I only want to find the phone numbers with special characters. I don't want to replace special characters.

Another option is to remove all non-digits (here, where I live, phone numbers are digits only; I'm not talking about various formats phone numbers might have):
SQL> with test (col) as
2 (select '204765389(4' from dual union all
3 select '204-7653894' from dual union all
4 select '-2047653894' from dual union all
5 select '(204)7653894' from dual union all
6 select '20476+53894' from dual
7 )
8 select
9 col,
10 regexp_replace(col, '\D') result
11 from test;
COL RESULT
------------ ------------------------------------------------
204765389(4 2047653894
204-7653894 2047653894
-2047653894 2047653894
(204)7653894 2047653894
20476+53894 2047653894
SQL>
[EDIT]
If you just want to find phone numbers that contain anything but digits, use regexp_like:
SQL> with test (col) as
2 (select '204765389(4' from dual union all
3 select '204-7653894' from dual union all
4 select '-2047653894' from dual union all
5 select '(204)7653894' from dual union all
6 select '20476+53894' from dual union all
7 select '2047653897' from dual
8 )
9 select col
10 from test
11 where regexp_like(col, '\D');
COL
------------
204765389(4
204-7653894
-2047653894
(204)7653894
20476+53894
SQL>

You can use [[:punct:]] posix along with REGEXP_REPLACE() such as
SELECT REGEXP_REPLACE(col,'[[:punct:]]') AS col
FROM t
assuming each comma-separated value represents a column value within a table
Demo

While you can use regular expressions, they are slow and it may be faster to use simple string functions and use TRANSLATE to find all the non-numeric characters and then replace them:
SELECT TRANSLATE(
phone_number,
'0' || TRANSLATE(phone_number, 'x0123456789', 'x')
'0'
) AS simplified_phone_number
FROM table_name;
Which, for your sample data:
CREATE TABLE table_name (phone_number) AS
SELECT '204765389(4' FROM DUAL UNION ALL
SELECT '204-7653894' FROM DUAL UNION ALL
SELECT '-2047653894' FROM DUAL UNION ALL
SELECT '(204)7653894' FROM DUAL UNION ALL
SELECT '20476+53894' FROM DUAL;
Outputs:
SIMPLIFIED_PHONE_NUMBER
2047653894
2047653894
2047653894
2047653894
2047653894
fiddle
Update
If you want to list phone numbers with non-digit characters then you can also use TRANSLATE to remove the digits and check if there are any other characters:
SELECT *
FROM table_name
WHERE TRANSLATE(phone_number, 'x0123456789', 'x') IS NOT NULL
you could also use REGEXP_LIKE to check that the string is not entirely digits:
SELECT *
FROM table_name
WHERE NOT REGEXP_LIKE(phone_number, '^\d+$')
or that there are non-digits:
SELECT *
FROM table_name
WHERE REGEXP_LIKE(phone_number, '\D')
However, regular expressions are probably going to be slower than simple string functions like TRANSLATE.
fiddle

Related

Workaround for REGEXP_REPLACE in Oracle SQL | Regular Expression too long

I am using REGEXP_REPLACE to search multiple source strings (>1000) in a column1 of table1 and replace with pattern 'xyz' using select statement. But I am getting below error as REGEXP_REPLACE has limitation of 512 bytes.
ORA-12733: regular expression too long
I was wondering if there is any work around for it.
Below is my initial query.
select REGEXP_REPLACE(table1.Column1,'SearchString1|SearchString2|SearchString1|.....SearchString1000','xyz')
from table1
My query would be very long if I use below solution.
Can it be done in loop using shell script?
https://stackoverflow.com/questions/21921658/oracle-regular-expression-regexp-like-too-long-error-ora-12733
I don't know whether you can do it in loop using shell script, but - why? Regular expressions still work, only if you adjust it a little bit.
I'd suggest you to store search strings into a separate table (or use a CTE, as in the following example). Then outer join it to the source table (test in my example) and - see the result.
Sample data:
SQL> with
2 test (col) as
3 (select 'Littlefoot' from dual union all
4 select 'Bigfoot' from dual union all
5 select 'Footloose' from dual union all
6 select 'New York' from dual union all
7 select 'Yorkshire' from dual union all
8 select 'None' from dual
9 ),
10 search_strings (sstring) as
11 (select 'foot' from dual union all
12 select 'york' from dual
13 )
Query:
14 select t.col,
15 regexp_replace(t.col, s.sstring, 'xyz', 1, 1, 'i') result
16 from test t left join search_strings s on regexp_instr(t.col, s.sstring, 1, 1, 0, 'i') > 0;
COL RESULT
---------- --------------------
Littlefoot Littlexyz
Bigfoot Bigxyz
Footloose xyzloose
New York New xyz
Yorkshire xyzshire
None None
6 rows selected.
SQL>

how to get the number after '-' in Oracle

I have some strings in my table. They are like 1101-1, 1101-2, 1101-10, 1101-11 pulse, shock, abc, 1104-2, 1104-11, 2201-1, 2202-4. I tried to sort them like below:
1101-1
1101-2
1101-10
1101-11
1104-2
1104-11
2201-1
2202-4
abc
pulse
shock
But I can't get the sort correctly. Below is my codes:
select column from table
order by regexp_substr(column, '^\D*') nulls first,
to_number(substr(regexp_substr(column, '\d+'),1,4)) asc
Sort numbers as numbers:
first the ones in front of the hyphen (line #16)
then the ones after it (line #17),
then the rest (line #18)
Mind the to_number function! Without it, you'll be sorting strings! and get the wrong result.
SQL> with test (col) as
2 ( select '1101-1' from dual union all
3 select '1101-2' from dual union all
4 select '1101-10' from dual union all
5 select '1101-11' from dual union all
6 select 'pulse' from dual union all
7 select 'shock' from dual union all
8 select 'abc' from dual union all
9 select '1104-2' from dual union all
10 select '1104-11' from dual union all
11 select '2201-1' from dual union all
12 select '2202-4' from dual
13 )
14 select col
15 from test
16 order by to_number(regexp_substr(col, '^\d+')),
17 to_number(regexp_substr(col, '\d+$')),
18 col;
COL
-------
1101-1
1101-2
1101-10
1101-11
1104-2
1104-11
2201-1
2202-4
abc
pulse
shock
11 rows selected.
SQL>
For your examples, this should do:
order by regexp_substr(column, '^[^-]+'), -- everything before the hyphen
len(column),
column
To get the number after '-' specifically:
with ttt (col) as (
select cast(column_value as varchar2(10)) as second_str
from table(sys.dbms_debug_vc2coll
( '1101-1'
, '1101-2'
, '1101-10'
, '1101-11'
, '1104-2'
, '1104-11'
, '2201-1'
, '2202-4'
, 'abc'
, 'pulse'
, 'shock'
))
)
select col
, regexp_substr(col, '(^\d+-)(\d+)', 1, 1, '', 2)
from ttt;
COL SECOND_STR
---------- ----------
1101-1 1
2201-1 1
1101-10 10
1101-11 11
1104-11 11
1101-2 2
1104-2 2
2202-4 4
abc
pulse
shock
11 rows selected
This treats the text string as two values, (^\d+-) followed by (\d+), and takes the second substring (the final '2' parameter). As only positional parameters are allowed for built-in SQL functions, you also have to specify occurrence (1) and match param (null, as we don't care about case etc).

Find strings starting with alphanumeric in Oracle

I want to search for all records starting alphabet or number only.
I know there is REGEXP_LIKE to find if col contains alphanumeric but couldn't apply it for starting with.
SELECT * FROM mytable WHERE col1 like 'ABC:XYZ%'
I have data in below format:--
ABC:XYZ
ABC:XYZ (ERW)
ABC:XYZ TMN
ABC:XYZ123
ABC:XYZRTY:YER
I am trying to get only below output
ABC:XYZ
ABC:XYZ123
ABC:XYZRTY:YER
Regards
Something like this? Sample data up to line #7; query you might be interested in begins at line #8.
SQL> with mytable (col1) as
2 (select 'ABC:XYZ' from dual union all
3 select 'ABC:XYZ (ERW)' from dual union all
4 select 'ABC:XYZ TMN' from dual union all
5 select 'ABC:XYZ123' from dual union all
6 select 'ABC:XZYRTY:YER' from dual
7 )
8 select col1
9 from mytable
10 where not regexp_like(col1, '[^[:alnum:]:]');
COL1
--------------
ABC:XYZ
ABC:XYZ123
ABC:XZYRTY:YER
SQL>

How can I get a natural numeric sort order in Oracle?

I have a column with a letter followed by either numbers or letters:
ID_Col
------
S001
S1001
S090
SV911
SV800
Sfoofo
Szap
Sbart
How can I order it naturally with the numbers first (ASC) then the letters alphabetically? If it starts with S and the remaining characters are numbers, sort by the numbers. Else, sort by the letter. So SV911should be sorted at the end with the letters since it also contains a V. E.g.
ID_Col
------
S001
S090
S1001
Sbart
Sfoofo
SV800
SV911
Szap
I see this solution uses regex combined with the TO_NUMBER function, but since I also have entries with no numbers this doesn't seem to work for me. I tried the expression:
ORDER BY
TO_NUMBER(REGEXP_SUBSTR(ID_Col, '^S\d+$')),
ID_Col
/* gives ORA-01722: invalid number */
Would this help?
SQL> with test (col) as
2 (select 'S001' from dual union all
3 select 'S1001' from dual union all
4 select 'S090' from dual union all
5 select 'SV911' from dual union all
6 select 'SV800' from dual union all
7 select 'Sfoofo' from dual union all
8 select 'Szap' from dual union all
9 select 'Sbart' from dual
10 )
11 select col
12 from test
13 order by substr(col, 1, 1),
14 case when regexp_like(col, '^[[:alpha:]]\d') then to_number(regexp_substr(col, '\d+$')) end,
15 substr(col, 2);
COL
------
S001
S090
S1001
Sbart
Sfoofo
SV800
SV911
Szap
8 rows selected.
SQL>

Filter the rows with number only data in a column SQL

I am trying to SELECT rows in a table, by applying a filter condition of identifying number only columns. It is a report only query, so we least bother the performance, as we dont have the privilege to compile a PL/SQL am unable to check by TO_NUMBER() and return if it is numeric or not.
I have to achieve it in SQL. Also the column is having the values like this, which have to be treated as Numbers.
-1.0
-0.1
-.1
+1,2034.89
+00000
1023
After ground breaking research, I wrote this.(Hard time)
WITH dummy_data AS
( SELECT '-1.0' AS txt FROM dual
UNION ALL
SELECT '+0.1' FROM dual
UNION ALL
SELECT '-.1' FROM dual
UNION ALL
SELECT '+1,2034.89.00' FROM dual
UNION ALL
SELECT '+1,2034.89' FROM dual
UNION ALL
SELECT 'Deva +21' FROM dual
UNION ALL
SELECT '1+1' FROM dual
UNION ALL
SELECT '1023' FROM dual
)
SELECT dummy_data.*,
REGEXP_COUNT(txt,'.')
FROM dummy_data
WHERE REGEXP_LIKE (TRANSLATE(TRIM(txt),'+,-.','0000'),'^[-+]*[[:digit:]]');
I got this.
TXT REGEXP_COUNT(TXT,'.')
------------- ---------------------
-1.0 4
+0.1 4
-.1 3
+1,2034.89.00 13 /* Should not be returned */
+1,2034.89 10
1+1 3 /* Should not be returned */
1023 4
7 rows selected.
Now terribly confused with 2 Questions.
1) I get +1,2034.89.00 too in result, I should eliminate it. (means, two decimal points) Not just decimal point, double in every other special character (-+,) should be eliminated)
2) To make it uglier, planned to do a REGEXP_COUNT('.') <= 1. But it is not returning my expectation, while selecting it, I see strange values returned.
Can someone help me to frame the REGEXP for the avoiding the double occurences of ('.','+','-')
The following expression works for everything, except the commas:
'^[-+]*[0-9,]*[.]*[0-9]+$'
You can check for bad comma placement with additional checks like:
not regexp_like(txt, '[-+]*,$') and not regexp_like(txt, [',,'])
First you remove plus and minus with translate and then you wonder why their position is not considered? :-)
This should work:
WITH dummy_data AS
( SELECT '-1.0' AS txt FROM dual
UNION ALL
SELECT '+0.1' FROM dual
UNION ALL
SELECT '-.1' FROM dual
UNION ALL
SELECT '+12034.89.00' FROM dual -- invalid: duplicate decimal separator
UNION ALL
SELECT '+1,2034.89' FROM dual -- invalid: thousand separator placement
UNION ALL
SELECT 'Deva +21' FROM dual -- invalid: letters
UNION ALL
SELECT '1+1' FROM dual -- invalid: plus sign placement
UNION ALL
SELECT '1023' FROM dual
UNION ALL
SELECT '1.023,88' FROM dual -- invalid: decimal/thousand separators mixed up
UNION ALL
SELECT '1,234' FROM dual
UNION ALL
SELECT '+1,234.56' FROM dual
UNION ALL
SELECT '-123' FROM dual
UNION ALL
SELECT '+123,0000' FROM dual -- invalid: thousand separator placement
UNION ALL
SELECT '+234.' FROM dual -- invalid: decimal separator not followed by digits
UNION ALL
SELECT '12345,678' FROM dual -- invalid: missing thousand separator
UNION ALL
SELECT '+' FROM dual -- invalid: digits missing
UNION ALL
SELECT '.' FROM dual -- invalid: digits missing
)
select * from dummy_data
where regexp_like(txt, '[[:digit:]]') and
(
regexp_like(txt, '^[-+]{0,1}([[:digit:]]){0,3}(\,([[:digit:]]){0,3})*(\.[[:digit:]]+){0,1}$')
or
regexp_like(txt, '^[-+]{0,1}[[:digit:]]*(\.[[:digit:]]+){0,1}$')
);
You see, you need three regular expressions; one to guarantee that there is at least one digit in the string, one for numbers with thousand separators, and one for numbers without.
With thousand separators: txt may start with one plus or minus sign, then there may be up to three digits. These may be followed by a thousand separator plus three digits several times. Then there may be a decimal separator with at least one following number.
Without thousand separators: txt may start with one plus or minus sign, then there may be digits. Then there may be a decimal separator with at least one following number.
I hope I haven't overlooked anything.
I just tried to correct the mistakes of you and made the SQL simple as possible. But not neat!
WITH dummy_data AS
( SELECT '-1.0' AS txt FROM dual
UNION ALL
SELECT '+.0' FROM dual
UNION ALL
SELECT '-.1' FROM dual
UNION ALL
SELECT '+1,2034.89.0' FROM dual
UNION ALL
SELECT '+1,2034.89' FROM dual
UNION ALL
SELECT 'Deva +21' FROM dual
UNION ALL
SELECT 'DeVA 234 Deva' FROM dual
UNION ALL
SELECT '1023' FROM dual
)
SELECT to_number(REPLACE(txt,',')),
REGEXP_COUNT(txt,'.')
FROM dummy_data
WHERE REGEXP_LIKE (txt,'^[-+]*')
AND NOT REGEXP_LIKE (TRANSLATE(txt,'+,-.','0000'),'[^[:digit:]]')
AND REGEXP_COUNT(txt,',') <= 1
AND REGEXP_COUNT(txt,'\+') <= 1
AND REGEXP_COUNT(txt,'\-') <= 1
AND REGEXP_COUNT(txt,'\.') <= 1;