how to replace all numbers from a string using sql - sql

i'm trying to return a column with every instance of a number replaced by '!'.
the following code only replaces the first instance from each row:
select project, commits, contributors, regexp_replace(address, '[0-9]', '!') as address
from repositories
1BcJBCAYqW9 should return as so !BcJBCAYqW!
but the output I get is !BcJBCAYqW9,
where the second digit does not change.

The code works as expected in MySQL and Oracle. So, I am going to guess you are using Postgres or a Postgres-derived database.
Postgres requires an additional argument to specify that all occurrences should be replaced:
select regexp_replace(address, '[0-9]', '!', 'g') as address
from (select '1BcJBCAYqW9' as address ) x

Related

Oracle SQL replace

Unfortunately I don't have the possibility to change field type.
I would like to REPLACE a , to . in a Typ=1 type of field (e.g.: 4,37 so in the end it should be 4.37), and I've tried CAST() and TO_NUMBER and TO_CHAR and I don't even know what else also, but I keep getting the ORA-01722 and it drives me crazy already. Why does it have to be a number for replacing ???
SELECT REPLACE(fmm, ',', '.') fmm FROM ...
Or do you have a better idea how can I do it without REPLACE maybe ?
UPDATE: it seems he has a problem with:
ORDER BY TO_NUMBER(fmm, '99D99')
So it seems he is taking the replaced version, so with . of fmm, but why ????
Try to remove the commas by replace(nvl(nr,0),',',''), and then formatting by
with tab as
(
select '1,234,567' as nr
from dual
)
select to_char(
replace(nvl(nr,0),',','')
,'fm999G999G990','NLS_NUMERIC_CHARACTERS = '',.''')
as "Number"
from tab;
Number
----------
1.234.567
Demo
Passing a string (varchar2) value into the replace function cannot throw an ORA-01722.
it seems he has a problem with:
ORDER BY TO_NUMBER(fmm, '99D99')
If that's complaining when fnm is '4,37' then you could add a replace() call inside the to_number(), but it's simpler/clearer to specify the NLS_NUMERIC_CHARACTERS as part of the conversion, so it knows that D is represented by a comma, and doesn't rely on the session settings:
order by to_number(fnm, '99D99', 'NLS_NUMERIC_CHARACTERS=,.')
If your table has a mix of values with period and comma decimal separators then you need to fix the data - this is the main reason you should not be storing numbers as strings in the first place. If you can't fix the data then you can workaround it with replace(), but it isn't ideal; you can then use a fixed period as the decimal character:
order by to_number(replace(fnm, ',', '.'), '99.99');
or still specify NLS_NUMERIC_CHARACTERS:
order by to_number(replace(fnm, ',', '.'), '99D99', 'NLS_NUMERIC_CHARACTERS=.,')
Either way that is 'normalising' all the string to only have periods, with no commas; and that allows them all to be converted.
db<>fiddle
what I don't understand, if I do some changes in the SELECT to a field, how can it affect the ORDER BY section? fmm should still remain 4,37 and not 4.37 in the ORDER BY section, shouldn't it?
No, because you gave the column expression REPLACE(fmm, ',', '.') the alias fnm, which is the same as the original column name; and the order-by clause is the only place column aliases are allowed, where it masks the original table column. When you do:
ORDER BY TO_NUMBER(fmm, '99D99')
the fnm in that conversion is the value of the column expression aliased as fnm, and not the original table column.
You can still access the table column, but to do so you have to prefix it with table name or alias, as the column from expression from the select list takes precedence (which is implied but not stated clearly in the docs:
expr orders rows based on their value for expr. The expression is based on columns in the select list or columns in the tables, views, or materialized views in the FROM clause.
So you can either explicitly refer to the table column via the table name or, here, an alias:
SELECT REPLACE(t.fmm, ',', '.') fmm
FROM your_table t
ORDER BY TO_NUMBER(t.fmm, '99D99')
though you still shouldn't rely on the session NLS settings really, so can/should still specify the NLS option to match the table column format:
SELECT REPLACE(t.fmm, ',', '.') fmm
FROM your_table t
ORDER BY TO_NUMBER(t.fmm, '99D99', 'NLS_NUMERIC_CHARACTERS=,.')
or use the replaced value and specify the NLS option for that (notice the option itself is different):
SELECT REPLACE(fmm, ',', '.') fmm
FROM your_table
ORDER BY TO_NUMBER(fmm, '99D99', 'NLS_NUMERIC_CHARACTERS=.,')
db<>fiddle
If your table has a mix of period and comma values then you need to use the column-alias version so it is consistent when it tries to convert. If you you only have commas then you can use either. (But again, you shouldn't be storing numbers as strings in the first place...)

Find phone numbers with unexpected characters using SQL

I posted the same question below for SQL in Oracle here and was provided the SQL info within that works.
However, I now need to perform the same in a DB2 database and if I attempt to run the same SQL it errors out.
I need to find rows where the phone number field contains unexpected characters.
Most of the values in this field look like:
123456-7890
This is expected. However, we are also seeing character values in this field such as * and #.
I want to find all rows where these unexpected character values exist.
Expected:
Numbers are expected
Hyphen with numbers is expected (hyphen alone is not)
NULL is expected
Empty is expected
This SQL works in Oracle:
...
WHERE regexp_like(phone_num, '[^ 0123456789-]|^-|-$')
When using the same SQL above in DB2, the statement errors out.
I found it easiest to answer your question by phrasing a regex which matches the positive cases. Then, we can just use NOT to find the negative cases. DB2 supports a REGEXP_LIKE function:
SELECT *
FROM yourTable
WHERE
NOT REGEXP_LIKE(phone_num, '^[0-9]+(-?[0-9]+)*$') AND
COALESCE(phone_num, '') <> '';
Here is a demo of the regex:
Demo
For newer version of db2, regexp is the way to go. If you are on an older version (perhaps why you get an error), you can replace all accepted chars with '' and check if the result is an empty string. Can't check right now, but from memory, it would be
WHERE TRANSLATE(phone_num, '', '0123456789-')<>''
EDIT:
For what it's worth your regexp works for V11 so you probably have an older version of Db2. Example of translate and regexp side by side:
]$ db2 "with t(s) as ( values '123456-7890', '12345*-7890' )
select s, 'regexp' as method from t
where regexp_like(s, '[^ 0123456789-]|^-|-$')
union all
select s, 'translate' as method
from t where TRANSLATE(s, '', '0123456789-')<>''"
S METHOD
----------- ---------
12345*-7890 translate
12345*-7890 regexp
2 record(s) selected.

SQL Substring \g

I would just like to know where do I put the \g in this query?
SELECT project,
SUBSTRING(address FROM 'A-Za-z') AS letters,
SUBSTRING(address FROM '\d') AS numbers
FROM repositories
I tried this but this brings back nothing (it doesn't throw an error though)
SELECT project,
SUBSTRING(CONCAT(address, '#') FROM 'A-Za-z' FOR '#') AS letters,
SUBSTRING(CONCAT(address, '#') FROM '\d' FOR '#') AS numbers
FROM repositories
Here is an example: I would like the string 1DDsg6bXmh3W63FTVN4BLwuQ4HwiUk5hX to return DDsgbXmhWFTVNBLwuQHwiUkhX. So basically return all the letters...and then my second one is to return all the numbers.
The g (“global”) modifier in regular expressions indicates that all matches rather than only the first one should be used.
That doesn't make much sense in the substring function, which returns only a single value, namely the first match. So there is no way to use g with substring.
In those functions where it makes sense in PostgreSQL (regexp_replace and regexp_matches), the g can be specified in the optional last flags parameter.
If you want to find all substrings that match a pattern, use regexp_matches.
For your example, which really has nothing to do with substring at all, I'd use
SELECT translate('1DDsg6bXmh3W63FTVN4BLwuQ4HwiUk5hX', '0123456789', '');
translate
---------------------------
DDsgbXmhWFTVNBLwuQHwiUkhX
(1 row)
So this is not pure SQL but Postgresql, but this also does the job:
SELECT project,
regexp_replace(address, '[^A-Za-z]', '', 'g') AS letters,
regexp_replace(address, '[^0-9]', '', 'g') AS numbers
FROM repositories;

Remove all characters at the beginning of string up to certain character in Oracle

I have an address field in which I only want to extract only the city and the state. The data is stored as such: (1234 Cherry ST_Sometown_ST). I would like to removed everything up to and including the first underscore. Is there an easy way to do this with REGEXP_REPLACE() or another similar function?
The only think I have found so far is the ability to remove an Nth number.
Try this
SQL Fiddle Demo
select substr(address, instr(address, '_') + 1, length(address)) as "CityState"
from address

How to extract group from regular expression in Oracle?

I got this query and want to extract the value between the brackets.
select de_desc, regexp_substr(de_desc, '\[(.+)\]', 1)
from DATABASE
where col_name like '[%]';
It however gives me the value with the brackets such as "[TEST]". I just want "TEST". How do I modify the query to get it?
The third parameter of the REGEXP_SUBSTR function indicates the position in the target string (de_desc in your example) where you want to start searching. Assuming a match is found in the given portion of the string, it doesn't affect what is returned.
In Oracle 11g, there is a sixth parameter to the function, that I think is what you are trying to use, which indicates the capture group that you want returned. An example of proper use would be:
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]', 1,1,NULL,1) from dual;
Where the last parameter 1 indicate the number of the capture group you want returned. Here is a link to the documentation that describes the parameter.
10g does not appear to have this option, but in your case you can achieve the same result with:
select substr( match, 2, length(match)-2 ) from (
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]') match FROM dual
);
since you know that a match will have exactly one excess character at the beginning and end. (Alternatively, you could use RTRIM and LTRIM to remove brackets from both ends of the result.)
You need to do a replace and use a regex pattern that matches the whole string.
select regexp_replace(de_desc, '.*\[(.+)\].*', '\1') from DATABASE;