How to handle precision problems of floating point numbers? - sql

I am using Firebird 3.0.4 (both in Windows and Linux) and I have the following procedure that clearly demonstrates my problem with floating point numbers, and that also demonstrates a possible workaround:
create or alter procedure test_float returns (res double precision,
res1 double precision,
res2 double precision)
as
declare variable z1 double precision;
declare variable z2 double precision;
declare variable z3 double precision;
begin
z1=15;
z2=1.1;
z3=0.49;
res=z1*z2*z3; /* one expects res to be 8.085, but internally, inside the procedure
it is represented as 8.084999999999.
The procedure-internal representation is repaired when then
res is sent to the output of the procedure, but the procedure-internal
representation (which is worng) impacts the further calculations */
res1=round(res, 2);
res2=round(round(res, 8), 2);
suspend;
end
On can see the result of the procedure with:
select proc.res, proc.res1, proc.res2
from test_float proc
The result is
RES RES1 RES2
8,085 8,08 8,09
But one can expect that RES2 should be 8.09.
One can clearly see that the internal representation of the res contains 8.0849999 (e.g. one can assign res to the exception message and then raise this exception), it is repaired during output but it leads to the failed calculations when such variable is used in the further calculations.
RES2 demonstrates the repair: I can always apply ROUND(..., 8) to repair the internal representation. I am ready to go with this solution, but my question is - is it acceptable workaround (when the outer ROUND is with strictly less than 5 decimal places) or is there better workaround.
All my tests pass with this workaround, but the feeling is bad.
Of course, I know the minimum that every programmer should know about floats (there is article about that) and I know that one should not use double for business calculations.

This is an inherent problem with calculating with floating point numbers, and is not specific to Firebird. The problem is that the calculation of 15 * 1.1 * 0.49 using double precision numbers is not exactly 8.085. In fact, if you would do 8.085 - RES, you'd get a value that is (approximately) 1.776356839400251e-015 (although likely your client will just present it as 0.00000000).
You would get similar results in different languages. For example, in Java
DecimalFormat df = new DecimalFormat("#.00");
df.format(15 * 1.1 * 0.49);
will also produce 8.08 for exactly the same reason.
Also, if you would change the order of operations, you would get a different result. For example using 15 * 0.49 * 1.1 would produce 8.085 and round to 8.09, so the actual results would match your expectations.
Given round itself also returns a double precision, this isn't really a good way to handle this in your SQL code, because the rounded value with a higher number of decimals might still yield a value slightly less than what you'd expect because of how floating point numbers work, so the double round may still fail for some numbers even if the presentation in your client 'looks' correct.
If you purely want this for presentation purposes, it might be better to do this in your frontend, but alternatively you could try tricks like adding a small value and casting to decimal, for example something like:
cast(RES + 1e-10 as decimal(18,2))
However this still has rounding issues, because it is impossible to distinguish between values that genuinely are 8.08499999999 (and should be rounded down to 8.08), and values where the result of calculation just happens to be 8.08499999999 in floating point, while it would be 8.085 in exact numerics (and therefor need to be rounded up to 8.09).
In a similar vein, you could try to use double casting to decimal (eg cast(cast(res as decimal(18,3)) as decimal(18,2))), or casting the decimal and then rounding (eg round(cast(res as decimal(18,3)), 2). This would be a bit more consistent than double rounding because the first cast will convert to exact numerics, but again this has similar downside as mentioned above.
Although you don't want to hear this answer, if you want exact numeric semantics, you shouldn't be using floating point types.

Related

Why BigFloat.to_s is not precise enough?

I am not sure if this is a bug. But I've been playing with big and I cant understand why this code works this way:
https://carc.in/#/r/2w96
Code
require "big"
x = BigInt.new(1<<30) * (1<<30) * (1<<30)
puts "BigInt: #{x}"
x = BigFloat.new(1<<30) * (1<<30) * (1<<30)
puts "BigFloat: #{x}"
puts "BigInt from BigFloat: #{x.to_big_i}"
Output
BigInt: 1237940039285380274899124224
BigFloat: 1237940039285380274900000000
BigInt from BigFloat: 1237940039285380274899124224
First I though that BigFloat requires to change BigFloat.default_precision to work with bigger number. But from this code it looks like it only matters when trying to output #to_s value.
Same with precision of BigFloat set to 1024 (https://carc.in/#/r/2w98):
Output
BigInt: 1237940039285380274899124224
BigFloat: 1237940039285380274899124224
BigInt from BigFloat: 1237940039285380274899124224
BigFloat.to_s uses LibGMP.mpf_get_str(nil, out expptr, 10, 0, self). Where GMP is saying:
mpf_get_str (char *str, mp_exp_t *expptr, int base, size_t n_digits, const mpf_t op)
Convert op to a string of digits in base base. The base argument may vary from 2 to 62 or from -2 to -36. Up to n_digits digits will be generated. Trailing zeros are not returned. No more digits than can be accurately represented by op are ever generated. If n_digits is 0 then that accurate maximum number of digits are generated.
Thanks.
In GMP (it applies to all languages not just Crystal), integers (C mpz_t, Crystal BigInt) and floats (C mpf_t, Crystal BigFloat) have separate default precision.
Also, note that using an explicit precision is better than setting a default one, because the default precision might not be reentrant (it depends on a configure-time switch). Also, if someone reads only a part of your code, they may skip the part with setting the default precision and assume a wrong one. Although I do not know the Crystal binding well, I assume that such functionality is exposed somewhere.
The zero parameter passed to mpf_get_str means to guess the value from the precision. I know the number of significant digits is proportional and close to precision / log2(10). Floating point numbers have finite precision. In that case, it was not the mpf_get_str call which made the last digits zero - it was the internal representation that did not keep such data. It looks like your (default) precision is too small to store all the necessary digits.
To summarize, there are two solutions:
Set a global default precision. Although this approach will work, it will require to either change the default precision frequently, or use one in the whole program. Both ways, the approach with the default precision is a form of procrastination which is going to have its vengeance later.
Set a precision on variable basis. This is a better solution than the former. Although it requires more code (1-2 more lines per variable initialization), it is going to pay back later. For example, in a space object tracking system, the physics calculations have to be super-precise, but other systems could use lower precision numbers for speed and memory saving.
I am still unsure what made the conversion BigFloat --> BigInt yield the missing digits.

Converting int to double screws up the decimal point

In the debug window, when I input this command:
po 1912/10.0
The output is 191.19999999999999.
What I really want to get back is 191.2.
Why is this happening, and how can I convert an int into a double with precision?
From What Every Programmer Should Know About Floating-Point Arithmetic:
Why don’t my numbers, like 0.1 + 0.2 add up to a nice round 0.3, and instead I get a weird result like 0.30000000000000004?
Because internally, computers use a format (binary floating-point) that cannot accurately represent a number like 0.1, 0.2 or 0.3 at all.
When the code is compiled or interpreted, your “0.1” is already rounded to the nearest number in that format, which results in a small rounding error even before the calculation happens.
This is why programmers say you should only ever store money as an integer. For example int cents = 1995; rather than float dollars = 19.95.
If your app doesn't need to be 100% precise (for example, if you're calculating screen coordinates or translucency or a color) just format your float rounded to 1 or 2 decimal places:
double someValue = 1912/10.0;
NSLog(#"2 decimals: %.2f", someValue);
NSLog(#"0 decimals: %.0f", someValue);
This code will output:
2 decimals: 191.20
0 decimals: 191
That's normal for a floating point number. Double is obviously just an extended precision floating point number. If you want to keep the pristine decimal digits, then don't allow any float/double conversion. Instead store the result as a scaled integer (in your case 1912) and place the decimal manually.
Let me try to explain this another way. When you express a number with a fractional part with a float or double, precision is most often lost. There's no way around that. If you store 1912 as a float and store 10 as a float then divide the first stored value by the second, the value will NEVER be 191.2. That's just the way floating point numbers work. If you look at the number in a debugger you'll see something like 191.19999999999999 as you describe. This, in itself, is an approximation as the value should be 191.19999999999999... but of course you can't even type all the digits in the decimal value of that stored result as the number of digits approaches infinity.
If you're going to use floating point, that's what you'll get. No way around it.
If you really want to get 191.2, then you can't use floating point, at least without doing rounding. Instead, you need to normalize the numbers by just storing the value as 1912 and printing the value with a decimal point to the left of the 2.
There's another brief online description at http://floating-point-gui.de/basic/

How does VB.NET 2008 round off integer numbers? [duplicate]

According to the documentation, the decimal.Round method uses a round-to-even algorithm which is not common for most applications. So I always end up writing a custom function to do the more natural round-half-up algorithm:
public static decimal RoundHalfUp(this decimal d, int decimals)
{
if (decimals < 0)
{
throw new ArgumentException("The decimals must be non-negative",
"decimals");
}
decimal multiplier = (decimal)Math.Pow(10, decimals);
decimal number = d * multiplier;
if (decimal.Truncate(number) < number)
{
number += 0.5m;
}
return decimal.Round(number) / multiplier;
}
Does anybody know the reason behind this framework design decision?
Is there any built-in implementation of the round-half-up algorithm into the framework? Or maybe some unmanaged Windows API?
It could be misleading for beginners that simply write decimal.Round(2.5m, 0) expecting 3 as a result but getting 2 instead.
The other answers with reasons why the Banker's algorithm (aka round half to even) is a good choice are quite correct. It does not suffer from negative or positive bias as much as the round half away from zero method over most reasonable distributions.
But the question was why .NET use Banker's actual rounding as default - and the answer is that Microsoft has followed the IEEE 754 standard. This is also mentioned in MSDN for Math.Round under Remarks.
Also note that .NET supports the alternative method specified by IEEE by providing the MidpointRounding enumeration. They could of course have provided more alternatives to solving ties, but they choose to just fulfill the IEEE standard.
Probably because it's a better algorithm. Over the course of many roundings performed, you will average out that all .5's end up rounding equally up and down. This gives better estimations of actual results if you are for instance, adding a bunch of rounded numbers. I would say that even though it isn't what some may expect, it's probably the more correct thing to do.
While I cannot answer the question of "Why did Microsoft's designers choose this as the default?", I just want to point out that an extra function is unnecessary.
Math.Round allows you to specify a MidpointRounding:
ToEven - When a number is halfway between two others, it is rounded toward the nearest even number.
AwayFromZero - When a number is halfway between two others, it is rounded toward the nearest number that is away from zero.
Decimals are mostly used for money; banker’s rounding is common when working with money. Or you could say.
It is mostly bankers that need the
decimal type; therefore it does
“banker’s rounding”
Bankers rounding have the advantage that on average you will get the same result if you:
round a set of “invoice lines” before adding them up,
or add them up then round the total
Rounding before adding up saved a lot of work in the days before computers.
(In the UK when we went decimal banks would not deal with half pence, but for many years there was still a half pence coin and shop often had prices ending in half pence – so lots of rounding)
Use another overload of Round function like this:
decimal.Round(2.5m, 0,MidpointRounding.AwayFromZero)
It will output 3. And if you use
decimal.Round(2.5m, 0,MidpointRounding.ToEven)
you will get banker's rounding.

Objective C, division between floats not giving an exact answer

Right now I have a line of code like this:
float x = (([self.machine micSensitivity] - 0.0075f) / 0.00025f);
Where [self.machine micSensitivity] is a float containing the value 0.010000
So,
0.01 - 0.0075 = 0.0025
0.0025 / 0.00025 = 10.0
But in this case, it keeps returning 9.999999
I'm assuming there's some kind of rounding error but I can't seem to find a clean way of fixing it. micSensitivity is incremented/decremented by 0.00025 and that formula is meant to return a clean integer value for the user to reference so I'd rather get the programming right than just adding 0.000000000001.
Thanks.
that formula is meant to return a clean integer value for the user to reference
If that is really important to you, then why do you not multiply all the numbers in this story by 10000, coerce to int, and do integer arithmetic?
Or, if you know that the answer is arbitrarily close to an integer, round to that integer and present it.
Floating-point arithmetic is binary, not decimal. It will almost always give rounding errors. You need to take that into account. "float" has about six digit precision. "double" has about 15 digits precision. You throw away nine digits precision for no reason.
Now think: What do you want to display? What do you want to display if the result of your calculation is 9.999999999? What would you want to display if the result is 9.538105712?
None of the numbers in your question, except 10.0, can be exactly represented in a float or a double on iOS. If you want to do float math with those numbers, you will have rounding errors.
You can round your result to the nearest integer easily enough:
float x = rintf((self.machine.micSensitivity - 0.0075f) / 0.00025f);
Or you can just multiply all your numbers, including the allowed values of micSensitivity, by 4000 (which is 1/0.00025), and thus work entirely with integers.
Or you can change the allowed values of micSensitivity so that its increment is a fraction whose denominator is a power of 2. For example, if you use an increment of 0.000244140625 (which is 2-12), and change 0.0075 to 0.00732421875 (which is 30 * 2-12), you should get exact results, as long as your micSensitivity is within the range ±4096 (since 4096 is 212 and a float has 24 bits of significand).
The code you have posted is correct and functioning properly. This is a known side effect of using floating point arithmetic. See the wiki on floating point accuracy problems for a dull explanation as to why.
There are several ways to work around the problem depending on what you need to use the number for.
If you need to compare two floats, then most everything works OK: less than and greater than do what you would expect. The only trouble is testing if two floats are equal.
// If x and y are within a very small number from each other then they are equal.
if (fabs(x - y) < verySmallNumber) { // verySmallNumber is usually called epsilon.
// x and y are equal (or at least close enough)
}
If you want to print a float, then you can specify a precision to round to.
// Get a string of the x rounded to five digits of precision.
NSString *xAsAString = [NSString stringWithFormat:#"%.5f", x];
9.999999 is equal 10. there is prove:
9.999999 = x then 10x = 99.999999 then 10x-x = 9x = 90 then x = 10

Convert.ToSingle() from double in vb.net returns wrong value

Here is my question :
If we have the following value
0.59144706948010461
and we try to convert it to Single we receive the next value:
0.591447055
As you can see this is not that we should receive. Could you please explain how does this value get created and how can I avoid this situation?
Thank you!
As you can see this is not that we should receive.
Why not? I strongly suspect that's the closest Single value to the Double you've given.
From the documentation for Single, having fixed the typo:
All floating-point numbers have a limited number of significant digits, which also determines how accurately a floating-point value approximates a real number. A Single value has up to 7 decimal digits of precision, although a maximum of 9 digits is maintained internally.
Your Double value is 0.5914471 when limited to 7 significant digits - and so is the Single value you're getting. Your original Double value isn't exactly 0.59144706948010461 either... the exact values of the Double and Single values are:
Double: 0.5914470694801046146693579430575482547283172607421875
Single: 0.591447055339813232421875
It's important that you understand a bit about how binary floating point works - see my articles on binary floating point and decimal floating point for more background.
When converting from double to float you're also rounding. The result should be the single-precision number that is closest to the number you are rounding.
That is exactly what you're getting here.
Floating-point numbers between 0.5 and 1 are of the form n / 2^24 where n is between 2^23 and 2^24.
0.59144706948010461... = 9922835.23723472274456576... / 2^24
so the closest single-precision floating-point number is
9922835 / 2^24 = 0.5914470553...