Using regexp_replace to remove letters from numbers

Using regexp_replace to remove letters from numbers - sql

I've got a bunch of data with altitude - some of it just numbers, some include meters at the end or '. I also have few ranges 1200-1300 etc (I guess it the second problem would have to be solved a different way). I tried experimenting with regexp_replace but [^a-z] doesn't seem to be working.
Any of you have a good idea on how to get rid of everything that's not a digit? Also, if you could recommend good website/book/course on how to clear data, I'd be much appreciated. Thanks!

Let's leave the ranges (like 1200-1300) to the side, since - even regardless of any kind of programming - it is not clear what you would want to "extract" from that. And, you may also have problems with things like '5 ft 10 in' or similar, if they are possible in your data. (And it is not clear what the whole thing means if all altitudes aren't using the same unit of measurement anyway - some are in meters, some in feet, the info disappears when you just keep the number).
To remove all the non-digits from a string and to keep the digits, you do NOT need regular expressions, which may be quite slow (an order of magnitude slower!) than standard string functions.
One way to remove all non-digit characters uses the TRANSLATE function. Like so:
translate(input_string, '0123456789' || input_string, '0123456789')
The function "translates" (replaces) 0 with 0, 1 with 1, etc., and any character in the input string that hasn't already appeared earlier in the second argument (which in this case means "non-digit") to nothing (null, zip, disappears, is removed).
Example (note the use of TO_NUMBER to also convert to actual numbers):
with
data (input_string) as (
select '1500' from dual union all
select '2100 m' from dual union all
select '535 ft' from dual
)
select input_string,
to_number(translate(input_string, '0123456789' || input_string,
'0123456789')) as extracted_number
from data;
INPUT_STRING EXTRACTED_NUMBER
------------ ----------------
1500 1500
2100 m 2100
535 ft 535

Related

How to remove leftmost group of numbers from string in Oracle SQL?

I have a string like T_44B56T4 that I'd like to make T_B56T4. I can't use positional logic because the string could instead be TE_2BMT that I'd like to make TE_BMT.
What is the most concise Oracle SQL logic to remove the leftmost grouping on consecutive numbers from the string?
EDIT:
regex_replace is unavailable but I have LTRIM,REPLACE,SUBSTR, etc.

would this fit the bill? I am assuming there are alphanumeric characters, then underscore, and then the numbers you want to remove followed by anything.
select regexp_replace(s, '^([[:alnum:]]+)_\d*(.*)$', '\1_\2')
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual
)
It uses regular expressions with matched groups.
Alphanumeric characters before underscore are matched and stored in first group, then underscore followed by 0-many digits (it will match as many digits as possible) followed by anything else that is stored in second group.
If we have a match, the string will be replaced by content of the first group followed by underscore and content of the second group.
if there is no match, the string will not be changed.

It seems that you must use standard string functions, as regular expression functions are not available to you. (Comment under Gordon Linoff's answer; it would help if you would add the same at the bottom of your original question, marked clearly as EDIT).
Also, it seems that the input will always have at least one underscore, and any digits that must be removed will always be immediately after the first underscore.
If so, here is one way you could solve it:
select s, substr(s, 1, instr(s, '_')) ||
ltrim(substr(s, instr(s, '_') + 1), '0123456789') as result
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual union all
select '34_AB3_1D' from dual
)
S RESULT
--------- ------------------
T_44B56T4 T_B56T4
TXM_1JK7B TXM_JK7B
34_AB3_1D 34_AB3_1D
I added one more test string, to show that only digits immediately following the first underscore are removed; any other digits are left unchanged.
Note that this solution would very likely be faster than regexp solutions, too (assuming that matters; sometimes it does, but often it doesn't).

If I understand correctly, you can use regexp_replace():
select regexp_replace('T_44B56T4', '_[0-9]+', '_')
Here is a db<>fiddle with your two examples.
Note: Your questions says the left most grouping, but the examples all have the number following an underscore, so the underscore seems to be important.
EDIT:
If you really just want the first string of digits replaced without reference to the underscore:
select regexp_replace(code, '[0-9]+', '', 1, 1)
from (select 'T_44B56T4' as code from dual union all select 'TE_2BMT' from dual ) t

Adding trailing and leading zeroes

How do we convert and add trailing and leading zeros to a number? For example 123.45. I need to make this ten digits long and have padding numbers in front and back. I would like to convert it to 0001234500. Two trailing numbers after the last digit of the decimal. Remove the decimal. Fill in the remaining space with zeroes for the leading end.
I have this so far and it adds trailing zeroes and removes the decimal.
REPLACE(RIGHT('0'+CAST(rtrim(convert(char(10),convert(decimal(10,4),Field))) AS VARCHAR(10)),10),'.','') as New_Field

In MySQL you would have RPAD and LPAD to get stuff like this done, in SQL Server (2012+) you can get something similar by working with FORMAT.
Easiest way is to FORMAT your numbers with a dot so that they take the right place in the format string, then remove that dot. You need to specify a locale, since in different regions you will get a different decimal sign (even if you use . within the format pattern, you would get , in various locales) - using en-US makes sure you get a dot.
REPLACE(FORMAT(somenumber, '000000.0000', 'en-US'), '.', '')
A few examples:
WITH TempTable(somenumber) AS (
SELECT 3
UNION SELECT 3.4
UNION SELECT 3.45
UNION SELECT 23.45
UNION SELECT 123.45
)
SELECT
somenumber,
REPLACE(FORMAT(somenumber, '000000.0000', 'en-US'), '.', '')
FROM
TempTable;
Gives
3.00 0000030000
3.40 0000034000
3.45 0000034500
23.45 0000234500
123.45 0001234500

You seem to really be overthinking what you need to do here. If we take it in steps, perhaps you'll see that this can be achieved much more easily. This solution runs under the idea that the value 123.45 becomes 0001234500 and 6.5 becomes 0000065000.
Firstly, let's pad out the right hand side of the number 123.45 so that we have 1234500 That's easy enough : 123. 45 * 100 = 12345 So, to get 1234500 we simply need to multiple it by a couple of extra factors of 10:
SELECT 123.45 * 10000; --1234500.00
Ok, now, let's get rid of those decimal places. Easiest way, convert it to an int:
SELECT CONVERT(int, 123.45 * 10000); --1234500
Nice! Now, the finalstep, the leading 0's. A numerical value, in SQL Server, won't display leading zeros. SELECT 01, 001.00; Will return 1 and 1.00 respectively. A varchar however, will though (as it's not a number). We can, therefore, make use of that with a further conversion, and then then use of RIGHT:
SELECT RIGHT('0000000000' + CONVERT(varchar(10),CONVERT(int,123.45 * 10000)),10);
Now you have the value you want '0001234500'.
If you're only after padding, (so 6.5 becomes 0006500) then you should be able to work out how to achieve this with the help above (hint you don't need RIGHT).
Any questions, please do ask.

REGEXP to insert special characters, not remove

How would i put double quotes around the two fields that are missing it? Would i be able to use like a INSTR/SUBSTR/REPLACE in one statement to accomplish it?
string := '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Expected string := '"ES26653","ABCBEVERAGES","861526999728","**606.32**","2017-01-26","2017-01-27","","","**77910467**","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"';
Please suggest! Thank you.

This answer does not work in this case, because some fields contain commas. I am leaving it in case it helps anyone else.
One rather brute force method for internal fields is:
replace(replace(string, ',', '","'), '""', '"')
This adds double quotes on either side of a comma and then removes double double quotes. You don't need to worry about "". It becomes """" and then back to "".
This can be adapted for the first and last fields as well, but it complicates the expression.

This offering attempts to address a number of end cases:
Addressing issues with first and last fields. Here only the last field is a special case as we look out for the end-of-string $ rather than a comma.
Empty unquoted fields i.e. leading commas, consecutive commas and trailing commas.
Preserving a pair of double quotes within a field representing a single double quote.
The SQL:
WITH orig(str) AS (
SELECT '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","68144"'
FROM dual
),
rpl_first(str) AS (
SELECT REGEXP_REPLACE(str, '("(([^"]|"")*)"|([^,]*))(,|$)','"\2\4"\5')
FROM orig
)
SELECT REGEXP_REPLACE(str, '"""$','"') fixed_string
FROM rpl_first;
The technique is to find either a quoted field and remember it or a non-quoted field and remember it, terminated by a comma or end-of-string and remember that. The answers is then a " followed by one of the fields followed by " and then the terminator.
The quoted field is basically "[^"]*" where [^"] is a any character that is not a quote and * is repeated zero or more times. This is complicated by the fact the not-a-quote character could also be a pair of quotes so we need an OR construct (|) i.e. "([^"]|"")*". However we must remember just the field inside the quotes so add brackets so we can later back reference just that i.e. "(([^"]|"")*)".
The unquoted field is simply a non-comma repeated zero or more times where we want to remember it all ([^,]*).
So we want to find either of these, the OR construct again i.e. ("(([^"]|"")*)"|([^,]*)). Followed by the terminator, either a comma or end-of-string, which we want to remember i.e. (,|$).
Now we can replace this with one of the two types of field we found enclosed in quotes followed by the terminator i.e. "\2\4"\5. The number n for the back reference \n is just a matter of counting the open brackets.
The second REGEXP_REPLACE is to work around something I suspect is an Oracle bug. If the last field is quoted then a extra pair of quotes is added to the end of the string. This suggests that the end-of-string is being processed twice when it is parsed, which would be a bug. However regexp processing is probably done by a standard library routine so it may be my interpretation of the regexp rules. Comments are welcome.
Oracle regexp documentation can be found at Using Regular Expressions in Database Applications.
My thanks to #Gary_W for his template. Here I am keeping the two separate regexp blocks to separate the bit I can explain from the bit I can't (the bug?).

This method makes 2 passes on the string. First look for a grouping of a double-quote followed by a comma, followed by a character that is not a double-quote. Replace them by referring to them with the shorthand of their group, the first group, '\1', the missing double-quote, the second group '\2'. Then do it again, but the other way around. Sure you could nest the regex_replace calls and end up with one big ugly statement, but just make it 2 statements for easier maintenance. The guy working on this after you will thank you, and this is ugly enough as it is.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2017
-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA
","NE","68144"'
from dual
),
rpl_first(str) as (
select regexp_replace(str, '(",)([^"])', '\1"\2')
from orig
)
select regexp_replace(str, '([^"])(,")', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
SQL>
EDIT: Changed regex's and added a third step to allow for empty, unquoted fields per Unoembre's comment. Good catch! Also added additional test cases. Always expect the unexpected and make sure to add test cases for all data combinations.
SQL> with orig(str) as (
select '"ES26653","ABCBEVERAGES","861526999728",606.32,"2017-01-26","2
017-01-27","","",77910467,"DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OM
AHA","NE","68144"'
from dual union
select 'ES26653,"ABCBEVERAGES","861526999728"' from dual union
select '"ES26653","ABCBEVERAGES",861526999728' from dual union
select '1S26653,"ABCBEVERAGES",861526999728' from dual union
select '"ES26653",,861526999728' from dual
),
rpl_empty(str) as (
select regexp_replace(str, ',,', ',"",')
from orig
),
rpl_first(str) as (
select regexp_replace(str, '(",|^)([^"])', '\1"\2')
from rpl_empty
)
select regexp_replace(str, '([^"])(,"|$)', '\1"\2') fixed_string
from rpl_first;
FIXED_STRING
--------------------------------------------------------------------------------
"ES26653","ABCBEVERAGES","861526999728","606.32","2017-01-26","2017-01-27","",""
,"77910467","DOROTHY","","RAPP","14219 PIERCE STREET, APT1","","OMAHA","NE","681
44"
"ES26653","ABCBEVERAGES","861526999728"
"ES26653","","861526999728"
"1S26653","ABCBEVERAGES","861526999728"
"ES26653","ABCBEVERAGES","861526999728"
SQL>

format a lengthy number with commas in Oracle

I have a requirement to convert the very lengthy amount to a comma separated value in oracle. I was searching in google. but I got some solutions which works only for small numbers. But not for lengthy number. Below is the solution I have. But not working properly. I was getting ############... if I run the below.
SELECT TO_CHAR(6965854565787645667634565432234565432345643265432345643242087,
'99G999G999G9999', 'NLS_NUMERIC_CHARACTERS=",."') as test
FROM dual;
Desired output:
6,965,854,565,787,645,667,634,565,432,234,565,432,345,643,265,432,345,643,242,087
Please help me. thanks in advance.

Please check if below query can help.
SELECT ltrim(regexp_replace('00'
|| '6965854565787645667634565432234565432345643265432345643242087', '(...)', ',\1' ),',|0') AS t
FROM dual;

Numbers in Oracle can't have more than 38 significant digits. You have many more than that.
If I may, what kind of "amount" is that? My understanding is that Oracle was designed to handle real-life values. What possible meaning is there to the sample number you posted?
Added: Original poster in a comment (below) stated that he is getting the same error with a shorter number, only 34 digits.
Two issues. First, the format model must have at least the needed number of digits (of 9's). to_char(100000, '9G999') will produce the output #### because the format model allows only 4 digits, but the input is 6 digits.
Then, after that is corrected, the output may still look incorrect in the front-end application, like SQL*Plus. In SQL*Plus the default width of a number column is 10 (I believe). That can be changed to 38, for example with the command set numwidth 38. In other front-ends, like Toad and SQL Developer, the default numeric width is a setting that can be changed through the graphical user interface.
More added - actually the result of to_char is a string, and by default strings of any length should be displayed OK in any front-end, so the numeric width is probably irrelevant. (And, in any case, it does not affect the displaying of strings, including the result of to_char().)

SELECT TO_CHAR(
6676345654322345654323456432654323456,
'999G999G999G999G999G999G999G999G999G999G999G999G999',
'NLS_NUMERIC_CHARACTERS=",."') as test FROM dual
TEST
------------------------------------------------------------
6,676,345,654,322,345,654,323,456,432,654,323,456

#AlexPoole pointed out that perhaps your input is a string.
I didn't get that vibe; but if in fact your input IS a string, and if you know the length is no more than 99 digits, you could do something like below. If your strings can be longer than 99, replace 99 below with a sufficiently large multiple of 3. (Or, you can replace it with a calculated value, 3 * ceil(length(str)/3)).
with
inputs ( str ) as (
select '12345678912345' from dual
)
-- WITH clause is only for testing/illustration, not part of the solution
select ltrim(regexp_replace(lpad(str, 99, ','), '(.{3})', ',\1'), ',') as test
from inputs;
TEST
------------------
12,345,678,912,345

How can I use RTRIM or REPLACE when I know the length I want to trim but not what it may contain?

I need to RTRIM the last 7 characters from a result set in an Oracle query. These 7 chars can be anything; spaces, alpha numeric etc... and I don't know the exact length of any value.
So for example I'd like to run something like this
SELECT RTRIM (COl_A, (SELECT LENGTH (COL_A)-7) FROM TABLE_ONE;
or a replace equivalent
SELECT REPLACE(COL_A, (SELECT LENGTH (COL_A)-7 FROM TABLE_ONE),'');
Do I need to do something with SUBSTRING maybe?
I know how to remove/replace specific chars but I'm having trouble when dealing with unknown chars. I've seen a few examples of similar problems but they seem unnecessarily complicated... or does this require a more in depth solution than I think it should?
As always thanks in advance for advice or hints.

You are in search of the substr function.
select substr(col_a, 1, length(col_a) - 7) from table_one

Actually, the correct solution is:
select substr(col_a, 1, (case when length(col_a) < 7 then 0 else length(col_a) - 7 end) from table_one
To be general, you would want to take into account what happens when the length is less than 7.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using regexp_replace to remove letters from numbers - sql

Related

How to remove leftmost group of numbers from string in Oracle SQL?

Adding trailing and leading zeroes

REGEXP to insert special characters, not remove

format a lengthy number with commas in Oracle

How can I use RTRIM or REPLACE when I know the length I want to trim but not what it may contain?

Categories

Resources