Find phone numbers with unexpected characters using SQL - sql

I posted the same question below for SQL in Oracle here and was provided the SQL info within that works.
However, I now need to perform the same in a DB2 database and if I attempt to run the same SQL it errors out.
I need to find rows where the phone number field contains unexpected characters.
Most of the values in this field look like:
123456-7890
This is expected. However, we are also seeing character values in this field such as * and #.
I want to find all rows where these unexpected character values exist.
Expected:
Numbers are expected
Hyphen with numbers is expected (hyphen alone is not)
NULL is expected
Empty is expected
This SQL works in Oracle:
...
WHERE regexp_like(phone_num, '[^ 0123456789-]|^-|-$')
When using the same SQL above in DB2, the statement errors out.

I found it easiest to answer your question by phrasing a regex which matches the positive cases. Then, we can just use NOT to find the negative cases. DB2 supports a REGEXP_LIKE function:
SELECT *
FROM yourTable
WHERE
NOT REGEXP_LIKE(phone_num, '^[0-9]+(-?[0-9]+)*$') AND
COALESCE(phone_num, '') <> '';
Here is a demo of the regex:
Demo

For newer version of db2, regexp is the way to go. If you are on an older version (perhaps why you get an error), you can replace all accepted chars with '' and check if the result is an empty string. Can't check right now, but from memory, it would be
WHERE TRANSLATE(phone_num, '', '0123456789-')<>''
EDIT:
For what it's worth your regexp works for V11 so you probably have an older version of Db2. Example of translate and regexp side by side:
]$ db2 "with t(s) as ( values '123456-7890', '12345*-7890' )
select s, 'regexp' as method from t
where regexp_like(s, '[^ 0123456789-]|^-|-$')
union all
select s, 'translate' as method
from t where TRANSLATE(s, '', '0123456789-')<>''"
S METHOD
----------- ---------
12345*-7890 translate
12345*-7890 regexp
2 record(s) selected.

Related

SQL Server and Oracle query ignoring accents

I have a table called A and a column called Keywords Varchar(255). The Keywords column can contain strings like "TEST, CÃO, ódio" and so on... with or without accents:
ID Keywords
1 TEST, CÃO, ódio, oracle, SQL, açaí
2 Valor, Deputado Rafael, Costelão, estilo
3 São Sebastião, cao, projeto de lei
I'm trying to create a SQL query that compare strings ignoring brazilian accents (áéíóúç and so on...). So if the user searches for "cao", it should return the rows 1 and 3 in the example.
I tried something like:
SELECT keywords
FROM A WHERE UPPER(TRANSLATE(keywords,
'ÁÇÉÍÓÚÀÈÌÒÙÂÊÎÔÛÃÕËÜáçéíóúàèìòùâêîôûãõëü','ACEIOUAEIOUAEIOUAOEUaceiouaeiouaeiouaoeu'))
LIKE UPPER((TRANSLATE('%cao%',
'ÁÇÉÍÓÚÀÈÌÒÙÂÊÎÔÛÃÕËÜáçéíóúàèìòùâêîôûãõëü', 'ACEIOUAEIOUAEIOUAOEUaceiouaeiouaeiouaoeu')));
But it doesn't work.
I also tried using NLS_SORT, but it is only for Oracle, and I need a query that works both on SQL Server and Oracle (it's a client requirement). How can I do that?
One issue is that Microsoft SQL Server did not have a translate function until 2017. It does now, but since it doesn't work for you, you are probably not on this version yet.
You can do a nested replace instead. This is not difficult but is tedious to write. Once it is written and tested, it will be fine.
The Microsoft SQL Server documentation explains this: https://learn.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql
You also should be aware of the character encoding that is being used in Oracle and SQL Server. With the translate and replace functions you should be OK, but if you ever transfer data via files it will be important. I have described more of this at: http://www.thedatastudio.net/dodgy_characters.htm
Here's an example for the first few characters you want to translate:
select
replace
(
replace
(
replace
(
replace
(
'ABÇDÉFGHÍJÁBÇDÉFGHÍJ', 'Á', 'A'
), 'Ç', 'C'
), 'É', 'E'
), 'Í', 'I'
) as clean_keyword;
Just substitute your keyword for 'ABÇDÉFGHÍJÁBÇDÉFGHÍJ'.
The result is:
ABCDEFGHIJABCDEFGHIJ
There is an example on https://learn.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql too.

SQL : REGEX MATCH - Character followed by numbers inside quotes

I have a column in sql which holds value inside double quotes like "P1234567" , "P1234" etc..
I need to identify only columns which start with letter P and is followed by seven digits (numbers) only. I tried where column like'"P[0-9][0-9][0-9][0-9][0-9][0-9][0-9]"' but it doesn't seem to work.
Can someone please correct me or point me to a thread which can help me out?
Thanks
Standard SQL has no regex support, but most SQL engines have regex extensions added to them on top of the standard SQL. So, for example, if you're using MySQL then you'd do this:
... WHERE column REGEXP '^"P[0-9]{7}"'
And if you're using Postgres then that would be:
... WHERE column ~ '^"P[0-9]{7}"'
(updated to match the double-quote part of the question, I'd misunderstood that to begin with)
How about using length and isnumeric:
Select
*
from
mytable
where
mycolumn like '"P%'
and len(mycolumn) = 10 --2 chars for quotes + 1 for 'P' + 7 for the digits
and isnumeric(substring(mycolumn, 3, 7))=1
This answer is for SQL Server, other DBMS's may have a different syntax for length

Finding first and second word in a string in SQL Developer

How can I find the first word and second word in a string separated by unknown number of spaces in SQL Developer? I need to run a query to get the expected result.
String:
Hello Monkey this is me
Different sentences have different number of spaces between the first and second word and I need a generic query to get the result.
Expected Result:
Hello
Monkey
I have managed to find the first word using substr and instr. However, I do not know how to find the second word due to the unknown number of spaces between the first and second word.
select substr((select ltrim(sentence) from table1),1,
(select (instr((select ltrim(sentence) from table1),' ',1,1)-1)
from table1))
from table1
Since you seem to want them as separate result rows, you could use a simple common table expression to duplicate the rows, once with the full row, then with the first word removed. Then all you have to do is get the first word from each;
WITH cte AS (
SELECT value FROM table1
UNION ALL
SELECT SUBSTR(TRIM(value), INSTR(TRIM(value), ' ')) FROM table1
)
SELECT SUBSTR(TRIM(value), 1, INSTR(TRIM(value), ' ') -1) word
FROM cte
Note that this very simple example assumes that there is a second word, if there isn't, NULL will be returned for both words.
An SQLfiddle to test with.
While Joachim Isaksson's answer is a robust and fast approach, you can also consider splitting the string and selecting from the resulting pieces set. This is just meant as hint for another approach, if your requirements alter (e.g. more than two string pieces).
You could split finally by the regex /[ ]+/, and so getting the words between the blanks.
Find more about splitting here: How do I split a string so I can access item x?
This will strongly depend on the SQL dialect you are using.
Try this with REGEXP_SUBSTR:
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),
REGEXP_SUBSTR(REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),'\w+$'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+$')
FROM table1;
result:
1 2 3 4 5
Hello Monkey Monkey this this is_me
Learn more about REGEXP_SUBSTR reference to Using Regular Expressions With Oracle Database
Test use SqlFiddle: http://sqlfiddle.com/#!4/8e9ef/9
If you only want to get the first and the second word, use REGEXP_INSTR to get second word start position :
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+') AS FIRST,
REGEXP_SUBSTR(sentence,'\w+\s',REGEXP_INSTR(sentence,'\w+\s+')+length(REGEXP_SUBSTR(sentence,'\w+\s+'))) AS SECOND
FROM table1;

Escaping a single quote in Oracle regex query

This is really starting to hurt!
I'm attempting to write a query in Oracle developer using a regex condition
My objective is to find all last names that contain charachters not commonly contained in names (non-alpha, spaces, hyphens and single quotes)
i.e.
I need to find
J00ls
McDonald "Macca"
Smithy (Smith)
and NOT find
Smith
Mckenzie-Smith
El Hassan
O'Dowd
My present query is
select * from dm_name
WHERE regexp_like(last_name, '([^A-Za-z -])')
and batch_id = 'ATEST';
which excludes everything expected except the single quote. When it comes to putting the single quote character, the Oracvel SQL Develoepr parser takes it as a literal.
I've tried:
\' -- but got a "missing right parenthesis" error
||chr(39)|| -- but the search returned nothing
'' -- negated the previous character in the matching group e.g. '([^A-Za-z -''])' made names with '-' return.
I'd appreciate anything you could offer.
Just double the single quote to escape your quote.
So
select *
from dm_name
where regexp_like(last_name, '[^A-Za-z ''-]')
and batch_id = 'ATEST'
See also this sqlfiddle. Note, I tried a similar query in SQL developer and that worked as well as the fiddle.
Note also, for this to work the - character has to be the last character in the group as otherwise it tries to find the group SPACE to ' rather than the character -.
The following works:
select *
from dm_name
WHERE regexp_like(last_name, '([^A-Za-z ''-])');
See this SQLFiddle.
Whether SQL Developer will like it or not is something I cannot attest to as I don't have that product installed.
Share and enjoy.

Select query that displays Joined words separately, not using a function

I require a select query that adds a space to the data based on the placement of the capital letters i.e. 'HelpMe' using this query would be displayed as 'Help Me' . Note i cannot use a stored function to do this the it must be done in the query itself. The Data is of variable length and query must be in SQL. Any Help will be appreciated.
Thanks
You need to use user defined function for this until MS give us support for regular expressions. Solution would be something like:
SELECT col1, dbo.RegExReplace(col1, '([A-Z])',' \1') FROM Table
Aldo this would produce leading space that you can remove with TRIM.
Replace regular expresion function:
http://connect.microsoft.com/SQLServer/feedback/details/378520
About dbo.RegexReplace you can read at:
TSQL Replace all non a-z/A-Z characters with an empty string
Assume if you are using Oracle RDBMS, you use the following,
REGEX_REPLACE
SELECT REGEXP_REPLACE('ILikeToWatchCSIMiami',
'([A-Z.])', ' \1')
AS RX_REPLACE
FROM dual
;
Managed to get this output: * SQLFIDDLE
But as you see it doesn't treat well on words such as CSI though.