How to understand to_number formatting in PosgreSQL - sql

I'm not getting behavior that I expect from PostgreSQL's function "to_number" based on my reading of the formatting documentation. So I'm probably reading it wrong. Can someone explain this so that I'll know what to expect in other similar contexts?
-- I find this intuitive:
# select to_number( '12,345.67', '99999.999') ;
to_number
-----------
12345.67
-- I find this surprising:
# select to_number( '12,345.67', '99999.99') ;
to_number
-----------
12345.6
-- EDIT: I found this surprising new variation:
# select to_number( '12,345.67', '999999.99') ;
to_number
-----------
12345.67
Why did my final hundredths digit get dropped in the second case?
EDIT: It seems that the issue is not anything to do with rounding or with how many digits appear to the right of the decimal in my format. Rather, the issue has to do with the total number of characters the format contains and therefore the total number of characters that get parsed. I think the final complete answer will be a slight variation on what mu is too short posted.
In practice I could just always return more digits than I think I need. But that's not very satisfying. It will probably bite me someday. Note: It's not an issue with '9' vs '0' in the format. Those behave identically in to_number, which I find slightly surprising... but clearly documented in the above link.

The problem is that your "number" has a comma as a thousands separator but your pattern does not. Lining them up vertically to make the comparison easier:
12,345.67
99999.99
^
We see that the pattern is looking for a number but it finds a comma. Your pattern doesn't quite match the string you're using so you get unexpected results.
If you add the separator to your pattern (see Table 9.26: Template Patterns for Numeric Formatting in the documentation) then you'll get what you're looking for:
=> select to_number('12,345.67', '99,999.99');
to_number
-----------
12345.67
(1 row)

I will start by thanking mu. His answer was clearly helpful. But I'm posting a separate answer because I think his answer as stated misses an important part of the answer.
I have not looked at any PostgreSQL code, so my answer comes purely from observation of its behavior. When I created my first format, I implicitly assumed something like the following:
# My pseudocode for select to_number( '12,345.67', '99999.99') ;
# I guessed PostgreSQL would do this:
1. Parse up to 5 digits
2. [optionally] find a decimal
3. [optionally] if decimal was found, find up to 2 more digits
in this example:
1. Up to five digits: 12345
2. Decimal: yes
3. Two more digits: 67
4. All together: 12345.67
# But in fact what it does is closer to this:
1. Parse up to 8 characters
2. Find the first decimal point in the parsed characters
3. In the set of parsed characters, find up to 5 characters before the decimal
4. In the set of parsed characters, find up to 2 characters after the decimal.
in this example:
1. Up to 8 characters: 12,345.6
2. First decimal: the penultimate character
3. Before decimal: 12345
4. After decimal: 6
5. All together: 12345.6
Therefore, my problem was fundamentally that PostgreSQL was parsing only 8 characters, but I was passing in 9 characters. Thus the solutions are:
# Mu's suggestion: include comma in the format. Now the format is 9 characters.
# This way it parses all 9 characters:
select to_number('12,345.67', '99,999.99');
to_number
-----------
12345.67
# Or include another character before the decimal
# This way it also parses 9 characters before limiting to 5.2:
select to_number( '12,345.67', '999999.99') ;
to_number
-----------
12345.67
# Or include another character after the decimal
# This way it parses 9 characters before limiting to 5.3:
select to_number( '12,345.67', '99999.999') ;
to_number
-----------
12345.67
And once you look at it like that, it becomes clear why otherwise inscrutable degenerate cases work as they do:
# like this one work as they do:
select to_number('1x2,3yz45.67', '9999999.9999');
to_number
-----------
12345.67
select to_number('12.3.45.67', '9999999.9999');
to_number
-----------
12.3456
I'm not sure that I would have specified the behavior like this. But now it's much clearer what to expect.

Related

Extract string between different special symbols

I am having following string in my query
.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt
beginning with a period from which I need to extract the segment between the final \ and the file extension period, meaning following expected result
ABC__123_123_123_ABC123
Am fairly new to using REGEXP and couldn't help myself to an elegant (or workable) solution with what Q&A here or else. In all queries the pattern is the same in quantity and order but for my growth of knowledge I'd prefer to not just count and cut.
You can use REGEXP_REPLACE function such as
REGEXP_REPLACE(col,'(.*\\)(.*)\.(.*)','\2')
in order to extract the piece starting from the last slash upto the dot. Preceding slashes in \\ and \. are used as escape characters to distinguish the special characters and our intended \ and . characters.
Demo
You need just regexp_substr and simple regexp ([^\]+)\.[^.]*$
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'([^\]+)\.[^.]*$',
1, -- position
1, -- occurence
null, -- match_parameter
1 -- subexpr
) substring
from dual;
([^\]+)\.[^.]*$ means:
([^\]+) - find one or more(+) any characters except slash([] - set, ^ - negative, ie except) and name it as group \1(subexpression #1)
\. - then simple dot (. is a special character which means any character, so we need to "escape" it using \ which is an escape character)
[^.]* - zero or more any characters except .
$ - end of line
So this regexp means: find a substring which consist from: one or more any characters except slash followed by dot followed by zero or more any characters except dot and it should be in the end of string. And subexpr parameter = 1, says oracle to return first subexpression (ie first matched group in (...))
Other parameters you can find in the doc.
Here is my simple full compatible example with Oracle 11g R2, PCRE2 and some other languages.
Oracle 11g R2 using function substr (Reference documentation)
select
regexp_substr(
'.\ABC\ABC\2021\02\24\ABC__123_123_123_ABC123.txt',
'((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}',
1,
1
) substring
from dual;
Pattern: ((\w)+(_){2}(((\d){3}(_)){3}){1}((\w)+(\d)+){1}){1}
Result: ABC__123_123_123_ABC123
Just as simple as it can be, regular expressions always follow a minimal standard, as you can see portability also provided, just for the case someone else is interested in going the simplest way.
Hopefully, this will help you out!

Adding trailing and leading zeroes

How do we convert and add trailing and leading zeros to a number? For example 123.45. I need to make this ten digits long and have padding numbers in front and back. I would like to convert it to 0001234500. Two trailing numbers after the last digit of the decimal. Remove the decimal. Fill in the remaining space with zeroes for the leading end.
I have this so far and it adds trailing zeroes and removes the decimal.
REPLACE(RIGHT('0'+CAST(rtrim(convert(char(10),convert(decimal(10,4),Field))) AS VARCHAR(10)),10),'.','') as New_Field
In MySQL you would have RPAD and LPAD to get stuff like this done, in SQL Server (2012+) you can get something similar by working with FORMAT.
Easiest way is to FORMAT your numbers with a dot so that they take the right place in the format string, then remove that dot. You need to specify a locale, since in different regions you will get a different decimal sign (even if you use . within the format pattern, you would get , in various locales) - using en-US makes sure you get a dot.
REPLACE(FORMAT(somenumber, '000000.0000', 'en-US'), '.', '')
A few examples:
WITH TempTable(somenumber) AS (
SELECT 3
UNION SELECT 3.4
UNION SELECT 3.45
UNION SELECT 23.45
UNION SELECT 123.45
)
SELECT
somenumber,
REPLACE(FORMAT(somenumber, '000000.0000', 'en-US'), '.', '')
FROM
TempTable;
Gives
3.00 0000030000
3.40 0000034000
3.45 0000034500
23.45 0000234500
123.45 0001234500
You seem to really be overthinking what you need to do here. If we take it in steps, perhaps you'll see that this can be achieved much more easily. This solution runs under the idea that the value 123.45 becomes 0001234500 and 6.5 becomes 0000065000.
Firstly, let's pad out the right hand side of the number 123.45 so that we have 1234500 That's easy enough : 123. 45 * 100 = 12345 So, to get 1234500 we simply need to multiple it by a couple of extra factors of 10:
SELECT 123.45 * 10000; --1234500.00
Ok, now, let's get rid of those decimal places. Easiest way, convert it to an int:
SELECT CONVERT(int, 123.45 * 10000); --1234500
Nice! Now, the finalstep, the leading 0's. A numerical value, in SQL Server, won't display leading zeros. SELECT 01, 001.00; Will return 1 and 1.00 respectively. A varchar however, will though (as it's not a number). We can, therefore, make use of that with a further conversion, and then then use of RIGHT:
SELECT RIGHT('0000000000' + CONVERT(varchar(10),CONVERT(int,123.45 * 10000)),10);
Now you have the value you want '0001234500'.
If you're only after padding, (so 6.5 becomes 0006500) then you should be able to work out how to achieve this with the help above (hint you don't need RIGHT).
Any questions, please do ask.

format a lengthy number with commas in Oracle

I have a requirement to convert the very lengthy amount to a comma separated value in oracle. I was searching in google. but I got some solutions which works only for small numbers. But not for lengthy number. Below is the solution I have. But not working properly. I was getting ############... if I run the below.
SELECT TO_CHAR(6965854565787645667634565432234565432345643265432345643242087,
'99G999G999G9999', 'NLS_NUMERIC_CHARACTERS=",."') as test
FROM dual;
Desired output:
6,965,854,565,787,645,667,634,565,432,234,565,432,345,643,265,432,345,643,242,087
Please help me. thanks in advance.
Please check if below query can help.
SELECT ltrim(regexp_replace('00'
|| '6965854565787645667634565432234565432345643265432345643242087', '(...)', ',\1' ),',|0') AS t
FROM dual;
Numbers in Oracle can't have more than 38 significant digits. You have many more than that.
If I may, what kind of "amount" is that? My understanding is that Oracle was designed to handle real-life values. What possible meaning is there to the sample number you posted?
Added: Original poster in a comment (below) stated that he is getting the same error with a shorter number, only 34 digits.
Two issues. First, the format model must have at least the needed number of digits (of 9's). to_char(100000, '9G999') will produce the output #### because the format model allows only 4 digits, but the input is 6 digits.
Then, after that is corrected, the output may still look incorrect in the front-end application, like SQL*Plus. In SQL*Plus the default width of a number column is 10 (I believe). That can be changed to 38, for example with the command set numwidth 38. In other front-ends, like Toad and SQL Developer, the default numeric width is a setting that can be changed through the graphical user interface.
More added - actually the result of to_char is a string, and by default strings of any length should be displayed OK in any front-end, so the numeric width is probably irrelevant. (And, in any case, it does not affect the displaying of strings, including the result of to_char().)
SELECT TO_CHAR(
6676345654322345654323456432654323456,
'999G999G999G999G999G999G999G999G999G999G999G999G999',
'NLS_NUMERIC_CHARACTERS=",."') as test FROM dual
TEST
------------------------------------------------------------
6,676,345,654,322,345,654,323,456,432,654,323,456
#AlexPoole pointed out that perhaps your input is a string.
I didn't get that vibe; but if in fact your input IS a string, and if you know the length is no more than 99 digits, you could do something like below. If your strings can be longer than 99, replace 99 below with a sufficiently large multiple of 3. (Or, you can replace it with a calculated value, 3 * ceil(length(str)/3)).
with
inputs ( str ) as (
select '12345678912345' from dual
)
-- WITH clause is only for testing/illustration, not part of the solution
select ltrim(regexp_replace(lpad(str, 99, ','), '(.{3})', ',\1'), ',') as test
from inputs;
TEST
------------------
12,345,678,912,345

SQL - need help in parsing text of a field

I have a select query and it fetches a field with complex data. I need to parse that data in specified format. please help with your expertise:
selected string = complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I
expected output - PB|I
Please help me in writing a sql regular expression to accomplish this output.
The first step in figuring out the regular expression is to be able to describe it plain language. Based on what we know (and as others have said, more info is really needed) from your post, some assumptions have to be made.
I'd take a stab at it by describing it like this, which is based on the sample data you provided: I want the sets of one or more characters that follow the equal signs but not including the following space or end of the line. The output should be these sets of characters, separated by a pipe, in the order they are encountered in the string when reading from left to right. My assumptions are based on your test data: only 2 equal signs exist in the string and the last data element is not followed by a space but by the end of the line. A regular expression can be built using that info, but you also need to consider other facts which would change the regex.
Could there be more than 2 equal signs?
Could there be an empty data element after the equal sign?
Could the data set after the equal sign contain one or more spaces?
All these affect how the regex needs to be designed. All that said, and based on the data provided and the assumptions as stated, next I would build a regex that describes the string (really translating from the plain language to the regex language), grouping around the data sets we want to preserve, then replace the string with those data sets separated by a pipe.
SQL> with tbl(str) as (
2 select 'complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I' from dual
3 )
4 select regexp_replace(str, '^.*=([^ ]+).*=([^ ]+)$', '\1|\2') result from tbl;
RESU
----
PB|I
The match regex explained:
^ Match the beginning of the line
. followed by any character
* followed by 0 or more 'any characters' (refers to the previous character class)
= followed by an equal sign
( start remembered group 1
[^ ]+ which is a set of one or more characters that are not a space
) end remembered group one
.*= followed by any number of any characters but ending in an equal sign
([^ ]+) followed by the second remembered group of non-space characters
$ followed by the end of the line
The replace string explained:
\1 The first remembered group
| a pipe character
\2 the second remember group
Keep in mind this answer is for your exact sample data as shown, and may not work in all cases. You need to analyse the data you will be working with. At any rate, these steps should get you started on breaking down the problem when faced with a challenging regex. The important thing is to consider all types of data and patterns (or NULLs) that could be present and allow for all cases in the regex so you return accurate data.
Edit: Check this out, it parses all the values right after the equal signs and allows for nulls:
SQL> with tbl(str) as (
2 select 'a=zz|complexType|ChannelCode=PB - Phone In A Box|IncludeExcludeIndicator=I - testing|test1=|test2=test2 - testing' from dual
3 )
4 select regexp_substr(str, '=([^ |]*)( |||$)', 1, level, null, 1) output, level
5 from tbl
6 connect by level <= regexp_count(str, '=')
7 ORDER BY level;
OUTPUT LEVEL
-------------------- ----------
zz 1
PB 2
I 3
4
test2 5
SQL>

using oracle sql substr to get last digits

I have a result of a query and am supposed to get the final digits of one column say 'term'
The value of column term can be like:
'term' 'number' (output)
---------------------------
xyz012 12
xyz112 112
xyz1 1
xyz02 2
xyz002 2
xyz88 88
Note: Not limited to above scenario's but requirement being last 3 or less characters can be digit
Function I used: to_number(substr(term.name,-3))
(Initially I assumed the requirement as last 3 characters are always digit, But I was wrong)
I am using to_number because if last 3 digits are '012' then number should be '12'
But as one can see in some specific cases like 'xyz88', 'xyz1') would give a
ORA-01722: invalid number
How can I achieve this using substr or regexp_substr ?
Did not explore regexp_substr much.
Using REGEXP_SUBSTR,
select column_name, to_number(regexp_substr(column_name,'\d+$'))
from table_name;
\d matches digits. Along with +, it becomes a group with one or more digits.
$ matches end of line.
Putting it together, this regex extracts a group of digits at the end of a string.
More details here.
Demo here.
Oracle has the function regexp_instr() which does what you want:
select term, cast(substr(term, 1-regexp_instr(reverse(term),'[^0-9]')) as int) as number
select SUBSTRING(acc_no,len(acc_no)-1,len(acc_no)) from table_name;