Search for multiple phone number formats in database - sql

I got a database with a user table. This table contains a column phonenumber. The problem is that its fields use multiple number patterns.
The current patterns I found:
06403/975-0
+496403975 0
06403 975-0
06403 975 0
+49 6403 975 0
When searching for a user in the database, is there a way to search for all number patterns?
SELECT id FROM user WHERE phone = '0123456789'
I use Oracle and MS SQL

Assuming your question means this:
"Is it possible to remove all the non-digit characters from the stored phone number, before making the comparison in the WHERE clause?"
a possible solution looks like this:
...
where translate(phone, '0123456789' || phone, '0123456789') = <input value here>
TRANSLATE will translate every digit to itself, and all other characters in phone to nothing (they will simply be deleted from the string). This is exactly what you want.
If you find that the query is slow, you may want to create a (function-based) index on translate(phone, '0123456789' || phone, '0123456789').
EDIT: I missed the part where you said you are using both Oracle and SQL Server. I did a quick search and found that SQL Server does not have a function similar to Oracle's TRANSLATE. I will leave it to SQL Server experts to help you with that part; I don't know SQL Server.

In Oracle you could do it like this. Strip out the non-numeric characters with translate() to get the phone number. You need to handle the leading zero or international dialling code:
select username from your_table
where translate(phone, '1234567890+/ -', '1234567890') in ('064039750', '4964039750')
You may need to tweak this if you don't know what the international dialling code is.
Obviously the actual problem is one of data quality: the application should enforce a strict format on phone numbers. One bout of data cleansing on write saves a whole bunch of grief on read.

You have a database containing phone numbers. These are sometimes in international format, but often in some national format, probably German, where two leading zeros introduce a country code, while a single leading zero would introduce an area code instead (assuming the home country Germany then). Moreover, a phone number can contain symbols for readability, namely '-', '/', and ' '.
So
+49 12/3456-7 means +491234567 of course
00441234567 means +441234567
04412345 means +494412345
I suggest you convert all numbers into international format in these steps:
replace a leading + with a leading 00, thus making only digits important
remove every character that is not a digit
replace a leading 00 with a leading +
replace a leading 0 with a leading +49
Use Oracle's REGEXP_REPLACE for this:
select
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(trim(phone),
'^\+', '00'), -- leading '+' -> leading '00'
'[^[:digit:]]', ''), -- remove all non-digits
'^00' , '+'), -- leading '00' -> leading '+'
'^0', '+49') -- leading '0' -> leading '+49'
as international_phone
from mytable;
You can do this in the WHERE clause of course:
SELECT id FROM user WHERE regexp_replace(...) = '+49123456789'
or even
SELECT id FROM user WHERE regexp_replace(...phone...) = regexp_replace(...'0123456789'...)
And you may write a PL/SQL function for this for convenience and use it so:
SELECT id FROM user WHERE international_phone(phone) = international_phone('0123456789')
This is for Oracle. There may be something alike for SQL Server.

Related

How to search by SQL while doing "a cut of trailing zeros" on a number field?

I have a db table in oracle where I have a column defined as a number.
The columns contains numbers like:
MyColumn
12540000000
78590000000
I want to find the records by searching MyColumn=12540000000 as well as MyColumn=1254 (without trailing zeros).
What could I try? TO_CHAR and a cutting logic or is there something more simple?
rtrim(MyColumn, '0') = '1254'
Note that on the right I enclosed the string within quotes (so it is really seen as a string, not a number). Apparently you are treating these as strings, right? Even if MyColumn is a number, it will be implicitly converted to a string before applying rtrim.

Get only Number

How to ignore special characters and get only number with the below input as string.
Input: '33-01-616-000'
Output should be 3301616000
Use the REPLACE() function to remove the - characters.
REPLACE(columnname, '-', '')
Or if there can be other non-numeric characters, you can use REGEXP_REPLACE() to remove anything that isn't a number.
REGEXP_REPLACE(columnname, '\D', '')
Standard string functions (like REPLACE, TRANSLATE etc.) are often much faster (one order of magnitude faster) than their regular expression counterparts. Of course, this is only important if you have a lot of data to process, and/or if you don't have that much data but you must process it very frequently.
Here is one way to use TRANSLATE for this problem even if you don't know ahead of time what other characters there may be in the string - besides digits:
TRANSLATE(columnname, '0123456789' || columnname, '0123456789')
This will map 0 to 0, 1 to 1, etc. - and all other characters in the input string columnname to nothing (so they will be simply removed). Note that in the TRANSLATE mapping, only the first occurrence of a character in the second argument matters - any additional mapping (due to the appearance of the same character in the second argument more than once) is ignored.
You can also use REGEXP_REPLACE function. Try code below,
SELECT REGEXP_REPLACE('33-01-61ASDF6-0**(98)00[],./123', '([^[:digit:]])', NULL)
FROM DUAL;
SELECT regexp_replace('33-01-616-000','[^0-9]') digits_only FROM dual;
/

Sybase SQLAnywhere matching phone numbers formatted different ways

I have a table with a phone number varchar field. This field has phone numbers that are formatted many different ways. 999-999-9999 or (999) 999-9999 and so on. I have a phone number that I am trying to find which is formatted like this: "9999999999". I would like to do something like this:
SELECT …
WHERE replace(PHO_PhoneNumber, "[^\\d]", “") = “9999999999”
Basically remove all non digits from the field and then compare.
Is there such a function "replace" that uses regex, or is there a better way of trying to find this number when the phone number field can have many different formatting characters in it ? I have no control over how phone numbers get entered into this table.
Thanks,
Warren
You don't say what version of SQL Anywhere you're using, but as of version 11.0, SQL Anywhere supports the REGEXP operator in the where clause, so you could do something like:
select ...
where PHO_PhoneNumber regexp '\(?\d{3}\)?-?\d{3}-?\d{4}'
Disclaimer: I work for SAP in SQL Anywhere engineering.
I don't think Sybase has such a function. You could write one. However, the "special" characters in phone numbers are typically: "()+- ". You can use multiple replaces for these:
WHERE replace(replace(replace(replace(replace(PHO_PhoneNumber, ' ', ''), ')', ''), '(', ''), '+', ''), '-', '') = '9999999999'

How to find multiple spaces inside text for a varchar2 field in Oracle 11g?

I don't recall ever seeing a field like this before, but it combines the city, state, and zipcode into a single string of varchar2. Fortunately, I believe most of the fields are in the same city space state, space zipcode format, but I started finding a few that deviated from that norm.
Right now I'm trying to identify all these distinct conditions
in the database with over 5 million rows and my queries aren't working for what I wanted.
I started with:
SELECT PROJECT_CTY_ST_ZIP FROM PAYMENT WHERE PROJECT_CTY_ST_ZIP LIKE '%' || CHR(32) || '%';
Then tried:
SELECT PROJECT_CTY_ST_ZIP FROM PAYMENT WHERE PROJECT_CTY_ST_ZIP LIKE '% %' AND PROJECT_CTY_ST_ZIP LIKE '% %' AND PROJECT_CTY_ST_ZIP LIKE '% %';
but they are both pulling based on leading and trailing spaces and I was really wanted to find spaces in the inside of the text. I don't want to remove them, just identify them with a query so I can parse them properly in my java code and then do an insert later to put them into city, state, and zipcode fields in another table.
While it doesn't show it here, I found this field in IA with no leading spaces, then one leading space and then two leading spaces. I fixed the leading spaces with trim.
WEST LIBERTY, IA 52776
This last one I wasn't expecting and I wanted to see if there are other conditions that might be unusual, but my query doesn't find them as the spaces are in the middle of the text:
TRUTH OR CONSEQUENCE, NM 87901
How would I go about a query to find these kinds of distinct records?
This query replaces each of the spaces with a dot (.) so you can see them
SELECT
REGEXP_REPLACE(PROJECT_CTY_ST_ZIP,
'([[:space:]])',
'.') spaces_or_now_dots
FROM PAYMENT
This query finds the ones that have one or more spaces.
SELECT PROJECT_CTY_ST_ZIP
FROM PAYMENT
where REGEXP_LIKE(PROJECT_CTY_ST_ZIP,
'[[:space:]]'
)
I have not considered the cases of spaces in the beginning and end, because you have already taken care of them.

Count with muliple where conditions in ms access

I have the query below;
Select count(*) as poor
from records where deviceId='00019' and type='Poor' and timestamp between #14-Sep-2012 01:01:01# and #24-Sep-2012 01:01:01#
table is like;
id. deviceId, type, timestamp
data is like;
data is like;
1, '00019', 'Poor', '19-Sep-2012 01:01:01'
2, '00019', 'Poor', '19-Sep-2012 01:01:01'
3, '00019', 'Poor', '19-Sep-2012 01:01:01'
4, '00019', 'Poor', '19-Sep-2012 01:01:01'
i am trying to count the devices with a specific specific type.
Please help.. access always returns wrong data. it is returning 1 while 00019 has 4 entries for poor
Type and timestamp are both reserved words, so enclose them in square brackets in your query like this: [type] and [timestamp]. I doubt those reserved words are the cause of your problem, but it's hard to predict exactly when reserved words will cause query problems, so just rule out this possibility by using the square brackets.
Beyond that, stored text values sometimes contained extra non-visible characters. Check the lengths of the stored text values to see whether any are longer than expected.
SELECT
Len(deviceId) AS LenOfDeviceId,
Len([type]) AS LenOfType,
Len([timestamp]) AS LenOfTimestamp
FROM records;
In comments you mentioned spaces (ASCII value 32) in your stored values. I had been thinking we were dealing with other non-printable/invisible characters. If you have one or more actual space characters at the beginning and/or end of a stored deviceId value, the Trim() function will discard them. So this query will give you different length numbers in the two columns:
SELECT
Len(deviceId) AS LenOfDeviceId,
Len(Trim(deviceId)) AS LenOfDeviceId_NoSpaces
FROM records;
If the stored values can also include spaces within the string (not just at the beginning and/or end), Trim() will not remove those. In that case, you could use the Replace() function to discard all the spaces. Note however a query which uses Replace() must be run from inside an Access application session --- you can't use it from Java code.
SELECT
Len(deviceId) AS LenOfDeviceId,
Len(Replace(deviceId, ' ', '')) AS LenOfDeviceId_NoSpaces
FROM records;
If that query returns the same length numbers in both columns, then we are not dealing with actual space characters (ASCII value 32) ... but some other type of character(s) which look "space-like".
If you want to count devices with specific type irrespective of deviceids then use this:
Select count(*) as excellent
from records where type='Poor'
If you want to count devices with specific deviceid irrespective of types then use this:
Select count(*) as excellent
from records where deviceId='00019'