I have a requirement to format phone numbers in the following way:
No spaces
No special characters
Remove preceding zero - if area code exists
Remove country code if present e.g. +44
For instance this: (03069) 990927 would become: 3069990927.
So far I have come up this this:
replace(replace(replace(replace(replace(replace(substr(replace(ltrim([VALUE],0), ' ', ''),nvl(length(substr(replace(ltrim([VALUE],0), ' ', ''),11)),0)+1), '-', ''), '(', ''), ')', ''),'/', ''), '.', ''), '+', '')
Is there a shorter version of this, maybe using a regular expression?
The final version of this snippet will become a column in a view that will return the following columns:
Customer Number
Customer Name
Country
Formatted Phone Number
The formatted phone number will be concatenated with the international dial code (e.g. +44) that are saved in the database in a table - DIALCODE_TAB(COUNTRY_CODE, CODE). Below is an example using the replace syntax above:
CREATE OR REPLACE FORCE VIEW "CUST_PHONE" ("CUSTOMER_ID", "NAME", "COUNTRY", "PHONE_NUMBER") AS
select
cicm.customer_id,
cicm.name,
dct.country,
dct.code || replace(replace(replace(replace(replace(replace(substr(replace(ltrim(cicm.value,0), ' ', ''),nvl(length(substr(replace(ltrim(cicm.value,0), ' ', ''),11)),0)+1), '-', ''), '(', ''), ')', ''),'/', ''), '.', ''), '+', '') phone_number
from customer_info_comm_method cicm
join dialcode_tab dct
on dct.country_code = customer_info_api.get_country_code(cicm.customer_id)
where cicm.method_id_db = 'PHONE'
--and dct.code || replace(replace(replace(replace(replace(replace(substr(replace(ltrim(cicm.value,0), ' ', ''),nvl(length(substr(replace(ltrim(cicm.value,0), ' ', ''),11)),0)+1), '-', ''), '(', ''), ')', ''),'/', ''), '.', ''), '+', '') = [phone_number]
--in terms of performance this SQL has to be written so that it returns all the records or a specific record when searching for the phone number - very quickly (<10s).
WITH read only;
N.B. A customer record can have more than 1 phone number and the same phone number can exist on more than 1 customer record.
To begin with a remark: This only works if the country is stored elsewhere for the record and there are no telephone numbers without an area code. Otherwise one would not be able to reconstruct the complete phone number again.
Then: How are country codes represented in your data? Is it always +44 or can it be 0044? Be careful here. Especially don't remove a single zero (assuming it's an area code), when it's actually the first of two zeros representing the country code :-)
Then: You need a list of all country codes. Let's take for example +1441441441. Where does the country code end? (Solution: +1441 is Bermudas.)
As to "no spaces" and "no special characters" you can solve this best with regexp_replace.
So all in all not so simple a task as you obviously expected it to be. (But not too hard to do either.)
I would use PL/SQL for this.
Hope my hints help you. Good luck.
EDIT: Here is what is needed. I still think a PL/SQL function will be best here.
Make sure your DIALCODE_TAB contains all country codes necessary.
1. Trim the phone number.
2. Then check if its starts with a country identifyer (+, 00).
2.1. If so: remove that. Remove all non-digits. Look up the country code in your table and remove it.
2.2. If not so: check if it starts with an area identifyer (0).
2.2.1. If so: remove it.
2.2.2. In any case: remove all non-digits.
That should do it, provided the numbers are valid. In Germany sometimes people write +49(0)40-123456, which is not valid, because one either uses a country code or an area code, not both in the same number. The (0) would have to be removed to make the number valid.
SELECT LTRIM(REGEXP_REPLACE(
REGEXP_REPLACE('+44(03069) 990927',
'(\+).([[:digit:]])+'), -- to strip off country code
'[^[:alnum:]]'),-- Strip off non-aplanumeric [:digit] if only digit
'0') -- Remove preceding Zero
FROM DUAL;
Wont work for +44990927 (If country code ends without any space or something or country didnt start with +)
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE phone_numbers ( phone_number ) AS
SELECT '(03069) 990927' FROM DUAL
UNION ALL SELECT '+44 1234 567890' FROM DUAL
UNION ALL SELECT '+44(0)1234 567890' FROM DUAL
UNION ALL SELECT '+44(012) 34-567-890' FROM DUAL
UNION ALL SELECT '+44-1234-567-890' FROM DUAL
UNION ALL SELECT '+358-1234567890' FROM DUAL;
Query 1:
If you are just dealing with +44 international dialling codes then you could:
use ^\+44|\D to strip the +44 international code and all non-digit characters; then
use ^0 to strip a leading zero if its present.
Like this:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE(
phone_number,
'^\+44|\D',
''
),
'^0', '' ) AS phone_number
FROM phone_numbers
Results:
| PHONE_NUMBER |
|---------------|
| 3069990927 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 3581234567890 |
(You can see it doesn't work for the final number with a +358 international code.)
Query 2:
This can be simplified into a single regular expression (that's slightly less readable):
SELECT REGEXP_REPLACE(
phone_number,
'^(\+44)?\D*0?|\D',
''
) AS phone_number
FROM phone_numbers
Results:
| PHONE_NUMBER |
|---------------|
| 3069990927 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 3581234567890 |
Query 3:
If you want to deal with multiple international dialling codes then you will need to know which ones are valid (see http://en.wikipedia.org/wiki/List_of_country_calling_codes for a list).
This is an example of a regular expression which will strip out valid international dialling codes beginning with +3, +4 or +5 (I'll leave all the other dialling codes for you to code up):
SELECT REGEXP_REPLACE(
phone_number,
'^(\+(3[0123469]|3[57]\d|38[01256789]|4[013456789]|42[013]|5[09]\d|5[12345678]))?\D*0?|\D',
''
) AS phone_number
FROM phone_numbers
Results:
| PHONE_NUMBER |
|--------------|
| 3069990927 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
| 1234567890 |
If the + at the start of the international dialling code is optional then just replace \+ (near the start of the regular expression) with \+?.
Related
I have a column of values where some values contain brackets with text which I would like to remove. This is an example of what I have and what I want:
CREATE TABLE test
(column_i_have varchar(50),
column_i_want varchar(50))
INSERT INTO test (column_i_have, column_i_want)
VALUES ('hospital (PWD)', 'hopistal'),
('nursing (LLC)','nursing'),
('longterm (AT)', 'longterm'),
('inpatient', 'inpatient')
I have only come across approaches that use the number of characters or the position to trim the string, but these values have varying lengths. One way I was thinking was something like:
TRIM('(*',col1)
Doesn't work. Is there a way to do this in postgres SQL without using the position? THANK YOU!
If all the values contain "valid" brackets, then you may use split_part function without any regular expressions:
select
test.*,
trim(split_part(column_i_have, '(', 1)) as res
from test
column_i_have | column_i_want | res
:------------- | :------------ | :--------
hospital (PWD) | hopistal | hospital
nursing (LLC) | nursing | nursing
longterm (AT) | longterm | longterm
inpatient | inpatient | inpatient
db<>fiddle here
You can replace partial patterns using regular expressions. For example:
select *, regexp_replace(v, '\([^\)]*\)', '', 'g') as r
from (
select '''hospital (PWD)'', ''nursing (LLC)'', ''longterm (AT)'', ''inpatient''' as v
) x
Result:
r
-------------------------------------------------
'hospital ', 'nursing ', 'longterm ', 'inpatient'
See example at db<>fiddle.
Could it be as easy as:
SELECT SUBSTRING(column_i_have, '\w+') AS column_i_want FROM test
See demo
If not, and you still want to use SUBSTRING() to get upto but exclude paranthesis, then maybe:
SELECT SUBSTRING(column_i_have, '^(.+?)(?:\s*\(.*)?$') AS column_i_want FROM test
See demo
But if you really are looking upto the opening paranthesis, then maybe just use SPLIT_PART():
SELECT SPLIT_PART(column_i_have, ' (', 1) AS column_i_want FROM test
See demo
I have a string that has this format "number - name" I'm using REGEXP_SUBSTR to split it in two separate columns one for name and one for number.
SELECT
REGEXP_SUBSTR('123 - ABC','[^-]+',1,1) AS NUM,
REGEXP_SUBSTR('123 - ABC','[^-]+',1,2) AS NAME
from dual;
But it doesn't work if the name includes a hyphen for example: ABC-Corp then the name is shown only like 'ABC' instead of 'ABC-Corp'. How can I get a regex exp to ignore everything before the first hypen and include everything after it?
You want to split the string on the first occurence of ' - '. It is a simple enough task to be efficiently performed by string functions rather than regexes:
select
substr(mycol, 1, instr(mycol, ' - ') - 1) num,
substr(mycol, instr(mycol, ' - ') + 3) name
from mytable
Demo on DB Fiddlde:
with mytable as (
select '123 - ABC' mycol from dual
union all select '123 - ABC - Corp' from dual
)
select
mycol,
substr(mycol, 1, instr(mycol, ' - ') - 1) num,
substr(mycol, instr(mycol, ' - ') + 3) name
from mytable
MYCOL | NUM | NAME
:--------------- | :-- | :---------
123 - ABC | 123 | ABC
123 - ABC - Corp | 123 | ABC - Corp
NB: #GMB solution is much better in your simple case. It's an overkill to use regular expressions for that.
tldr;
Usually it's easierr and more readable to use subexpr parameter instead of occurrence in case of such fixed masks. So you can specify full mask: \d+\s*-\s*\S+
ie numbers, then 0 or more whitespace chars, then -, again 0 or more whitespace chars and 1+ non-whitespace characters.
Then we adding () to specify subexpressions: since we need only numbers and trailing non-whitespace characters we puts them into ():
'(\d+)\s*-\s*(\S+)'
Then we just specify which subexpression we need, 1 or 2:
SELECT
REGEXP_SUBSTR(column_value,'(\d+)\s*-\s*(\S+)',1,1,null,1) AS NUM,
REGEXP_SUBSTR(column_value,'(\d+)\s*-\s*(\S+)',1,1,null,2) AS NAME
from table(sys.odcivarchar2list('123 - ABC', '123 - ABC-Corp'));
Result:
NUM NAME
---------- ----------
123 ABC
123 ABC-Corp
https://docs.oracle.com/database/121/SQLRF/functions164.htm#SQLRF06303
https://docs.oracle.com/database/121/SQLRF/ap_posix003.htm#SQLRF55544
I would like to change multiple substrings at once for example
'0100001000100'
The desired output.
' | ABC | | | | DFG | | | HIG | |'
REPLACE(SUBSTRING([column],1,1),' 1 ' ,' XYZ '),SUBSTRING([column],2,1),' 1 ' ,' zzz ')....
but does not work.
It's extremely unclear how you mean your input to turn to your desired output. However, hopefully this will clear up your confusion about the REPLACE function.
REPLACE(targetstring, str1, str2) will find each occurrence of str1 in targetstring and replace it with str2. So REPLACE('I know John and John knows me', 'John', 'Fred') would result in the string 'I know Fred and Fred knows me'.
To chain REPLACEs together, so that the output from the first REPLACE is used as the target for the next replace, you need a structure like this:
REPLACE(
REPLACE(
REPLACE(
'I know Fred and Fred knows me', 'Fred', 'Bill'
), 'know', 'like'
), 'and', 'but'
)
Next point: if you want to string together your replacement results on each character, you need to not chain the REPLACEs but instead join them with +:
REPLACE(SUBSTRING([column],1,1),'1','XYZ')
+
REPLACE(SUBSTRING([column],2,1),'1','zzz')
+ ...
I'm new with SQL. I'm trying to get data from database (Postgres) and replace them on the fly if those are not valid. Is this possible to do this using pure SQL? For example in my DB users I've got the field phone with following data:
|phone |
------------------
|+79844533324 |
|893233314215 |
|dfgdhf |
|45 |
|+ |
| |
|(345)53326784457|
|8(123)346-34-45 |
etc..
I'd like to get only last ten digits if:
the phone number starts from 8 or 7 or +7 or +8 .
the number contains ten digits after 8 or 7 or +7 or +8 not more and not less
like this:
|phone |
------------------
|9844533324 |
|1233463445 |
I guess it might be to complicated for SQL. I've researched ton of manuals but most of them cover just SELECT with regex condition.
I think this might work:
select
right (regexp_replace (phone, '\D', '', 'g'), 10)
from foo
where
phone ~ '^\+?[78]' and
regexp_replace (phone, '\D', '', 'g') ~ '^[78]\d{10}$'
In the way of explanation:
right (regexp_replace (phone, '\D', '', 'g'), 10)
Removes all non-digits from the field and then takes the right ten characters -- it will only do this if it meets the following conditions:
phone ~ '^\+?[78]' and
The phone begins with an optional '+' and then a 7 or an 8
regexp_replace (phone, '\D', '', 'g') ~ '^[78]\d{10}$'
And the field, stripped of all non-digits starts with a 7 or 8 followed by exactly ten other digits.
Results:
9844533324
1233463445
SELECT name FROM clients;
Table clients
id | name |
1 John
2 John Bravo
3 John Alves
4 Jo
In postgres, how can I select only names with more than one word? For example. This should be the output:
John Bravo
John ALves
I think just test if the name contains a space: where name like '% %'
But this will give you some problem if you name can contain space char befor or after the name, like: ' JOHN' or 'Luck '
If word means to you a space delimited token, then the following would do the trick:
SELECT name FROM clients WHERE name LIKE '% %';
But this will also give you those that have empty names made out of spaces only. Also performance-wise, this will be a costly query.
First you may have to find the number of words in the column, then only select where the number of words is greater than 1.
For example:
SELECT LENGTH(name) - LENGTH(REPLACE(name , ' ', ''))+1 FROM clients;
This will return the number of words in each row, which we have in the field name.
Then you can proceed putting this in a nested select SQL statement something like :
SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', ''))+1 as mycheck, name FROM clients where (SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', ''))+1)>1
This will return only where words are greater than 1 (>1). Else you can user >2 to return columns with more than 2 words.
LIKE is an option, or you can use split_part:
SELECT name FROM clients WHERE split_part(trim(name), ' ', 2) <> ''