Kotlin - Converting Float to Double while maintaining precision - kotlin

In Kotlin 123.456 is a valid Double value, however, 123.456F.toDouble() results in 123.45600128173828 - presumably just the way precision is handled between the two.
I want to be able to convert freely between the two, specifically for cases like this:
123.456F -> 123.456 // Float to Double
123.456 -> 123.456F // Double to Float
How can I convert a float to a double in cases like this, and maintain precision?

It's a big ugly, but you could convert your Float to a String and back out to a Double:
val myDouble: Double = 123.456f.toString().toDouble()
// 123.456d
You could always encapsulate this in an extension function:
fun Float.toExactDouble(): Double =
this.toString().toDouble()
val myDouble = 123.456f.toExactDouble()

In Kotlin 123.456 is a valid Double value
Actually, that's not quite true.  There's a Double value very close to 123.456, but it's not exactly 123.456.  What you're seeing is the consequences of that.
So you can't maintain precision, because you don't have that precision to start with!
Short answer:
If you need exact values, don't use floating-point!
(In particular: Never store money values in floating-point! See for example this question.)
The best alternative is usually BigDecimal which can store and calculate decimal fractions to an arbitrary precision. They're less efficient, but Kotlin's operator overloading makes them painless to use (unlike Java!).
Or if you're not going to be doing any calculations, you could store them as Strings.
Or if you'll only need a certain number of decimal places, you could scale them all up to Ints (or Longs).
Technical explanation:
Floats and Doubles use binary floating-point; they store an integer, and an integer power of 2 to multiple or divide it by.  (For example, 3/4 would be stored as 3*2⁻².)  This means they can store a wide range of binary fractions exactly.
However, just as you can't store 1/3 as a decimal fraction (it's 0.3333333333…, but any finite number of digits will only be an approximation), so you can't store 1/10 as a binary fraction (it's 0.000110011001100…).  This means that a binary floating-point number can't store most decimal numbers exactly.
Instead, they store the nearest possible value to the number you want.  And the routines which convert them to a String will try to undo that difference, by rounding appropriately.  But that doesn't always give the result you expect.
Floating-point numbers are great when you need a huge range of values (e.g. in scientific and technical use), but don't care about storing them exactly.

Related

How to handle precision problems of floating point numbers?

I am using Firebird 3.0.4 (both in Windows and Linux) and I have the following procedure that clearly demonstrates my problem with floating point numbers, and that also demonstrates a possible workaround:
create or alter procedure test_float returns (res double precision,
res1 double precision,
res2 double precision)
as
declare variable z1 double precision;
declare variable z2 double precision;
declare variable z3 double precision;
begin
z1=15;
z2=1.1;
z3=0.49;
res=z1*z2*z3; /* one expects res to be 8.085, but internally, inside the procedure
it is represented as 8.084999999999.
The procedure-internal representation is repaired when then
res is sent to the output of the procedure, but the procedure-internal
representation (which is worng) impacts the further calculations */
res1=round(res, 2);
res2=round(round(res, 8), 2);
suspend;
end
On can see the result of the procedure with:
select proc.res, proc.res1, proc.res2
from test_float proc
The result is
RES RES1 RES2
8,085 8,08 8,09
But one can expect that RES2 should be 8.09.
One can clearly see that the internal representation of the res contains 8.0849999 (e.g. one can assign res to the exception message and then raise this exception), it is repaired during output but it leads to the failed calculations when such variable is used in the further calculations.
RES2 demonstrates the repair: I can always apply ROUND(..., 8) to repair the internal representation. I am ready to go with this solution, but my question is - is it acceptable workaround (when the outer ROUND is with strictly less than 5 decimal places) or is there better workaround.
All my tests pass with this workaround, but the feeling is bad.
Of course, I know the minimum that every programmer should know about floats (there is article about that) and I know that one should not use double for business calculations.
This is an inherent problem with calculating with floating point numbers, and is not specific to Firebird. The problem is that the calculation of 15 * 1.1 * 0.49 using double precision numbers is not exactly 8.085. In fact, if you would do 8.085 - RES, you'd get a value that is (approximately) 1.776356839400251e-015 (although likely your client will just present it as 0.00000000).
You would get similar results in different languages. For example, in Java
DecimalFormat df = new DecimalFormat("#.00");
df.format(15 * 1.1 * 0.49);
will also produce 8.08 for exactly the same reason.
Also, if you would change the order of operations, you would get a different result. For example using 15 * 0.49 * 1.1 would produce 8.085 and round to 8.09, so the actual results would match your expectations.
Given round itself also returns a double precision, this isn't really a good way to handle this in your SQL code, because the rounded value with a higher number of decimals might still yield a value slightly less than what you'd expect because of how floating point numbers work, so the double round may still fail for some numbers even if the presentation in your client 'looks' correct.
If you purely want this for presentation purposes, it might be better to do this in your frontend, but alternatively you could try tricks like adding a small value and casting to decimal, for example something like:
cast(RES + 1e-10 as decimal(18,2))
However this still has rounding issues, because it is impossible to distinguish between values that genuinely are 8.08499999999 (and should be rounded down to 8.08), and values where the result of calculation just happens to be 8.08499999999 in floating point, while it would be 8.085 in exact numerics (and therefor need to be rounded up to 8.09).
In a similar vein, you could try to use double casting to decimal (eg cast(cast(res as decimal(18,3)) as decimal(18,2))), or casting the decimal and then rounding (eg round(cast(res as decimal(18,3)), 2). This would be a bit more consistent than double rounding because the first cast will convert to exact numerics, but again this has similar downside as mentioned above.
Although you don't want to hear this answer, if you want exact numeric semantics, you shouldn't be using floating point types.

Saving float and double precision in C++ during cin/cout and scanf/printf

I want to read floats and doubles from standart input and save its precision (exact the same digits after floating point) and be able to output (cout/printf) as it is. What the most convinient (and simplest way) to do this?
Thanks!
float f;
cin >> f;
cout << f;
Use setprecision.
Here is the solution
cout<<setprecision(the precision you want to set here)<<variablename;
eg. If you want to set precision of the output to 5 for variable var use it like this:
cout<<setprecision(5)<<var;
setprecision is a manipulator. Learn more about manipulators here.
It sets the decimal precision to be used to
format floating-point values on output operations.
Behaves as if member precision were called with n as argument on the
stream on which it is inserted/extracted as a manipulator (it can be
inserted/extracted on input streams or output streams).
This is a manipulator and is declared in header <iomanip>
Since the input has an unknown precision, the simplest method is to read them as strings, not doubles/floats.
If you need the float value, a simple string to double conversion is needed.
Any other method will probably fail since you rely on imperfect conversion from string to float done by the standard library.
The latter can't distinguish between 0.4 and 0.40.

Converting int to double screws up the decimal point

In the debug window, when I input this command:
po 1912/10.0
The output is 191.19999999999999.
What I really want to get back is 191.2.
Why is this happening, and how can I convert an int into a double with precision?
From What Every Programmer Should Know About Floating-Point Arithmetic:
Why don’t my numbers, like 0.1 + 0.2 add up to a nice round 0.3, and instead I get a weird result like 0.30000000000000004?
Because internally, computers use a format (binary floating-point) that cannot accurately represent a number like 0.1, 0.2 or 0.3 at all.
When the code is compiled or interpreted, your “0.1” is already rounded to the nearest number in that format, which results in a small rounding error even before the calculation happens.
This is why programmers say you should only ever store money as an integer. For example int cents = 1995; rather than float dollars = 19.95.
If your app doesn't need to be 100% precise (for example, if you're calculating screen coordinates or translucency or a color) just format your float rounded to 1 or 2 decimal places:
double someValue = 1912/10.0;
NSLog(#"2 decimals: %.2f", someValue);
NSLog(#"0 decimals: %.0f", someValue);
This code will output:
2 decimals: 191.20
0 decimals: 191
That's normal for a floating point number. Double is obviously just an extended precision floating point number. If you want to keep the pristine decimal digits, then don't allow any float/double conversion. Instead store the result as a scaled integer (in your case 1912) and place the decimal manually.
Let me try to explain this another way. When you express a number with a fractional part with a float or double, precision is most often lost. There's no way around that. If you store 1912 as a float and store 10 as a float then divide the first stored value by the second, the value will NEVER be 191.2. That's just the way floating point numbers work. If you look at the number in a debugger you'll see something like 191.19999999999999 as you describe. This, in itself, is an approximation as the value should be 191.19999999999999... but of course you can't even type all the digits in the decimal value of that stored result as the number of digits approaches infinity.
If you're going to use floating point, that's what you'll get. No way around it.
If you really want to get 191.2, then you can't use floating point, at least without doing rounding. Instead, you need to normalize the numbers by just storing the value as 1912 and printing the value with a decimal point to the left of the 2.
There's another brief online description at http://floating-point-gui.de/basic/

Convert.ToSingle() from double in vb.net returns wrong value

Here is my question :
If we have the following value
0.59144706948010461
and we try to convert it to Single we receive the next value:
0.591447055
As you can see this is not that we should receive. Could you please explain how does this value get created and how can I avoid this situation?
Thank you!
As you can see this is not that we should receive.
Why not? I strongly suspect that's the closest Single value to the Double you've given.
From the documentation for Single, having fixed the typo:
All floating-point numbers have a limited number of significant digits, which also determines how accurately a floating-point value approximates a real number. A Single value has up to 7 decimal digits of precision, although a maximum of 9 digits is maintained internally.
Your Double value is 0.5914471 when limited to 7 significant digits - and so is the Single value you're getting. Your original Double value isn't exactly 0.59144706948010461 either... the exact values of the Double and Single values are:
Double: 0.5914470694801046146693579430575482547283172607421875
Single: 0.591447055339813232421875
It's important that you understand a bit about how binary floating point works - see my articles on binary floating point and decimal floating point for more background.
When converting from double to float you're also rounding. The result should be the single-precision number that is closest to the number you are rounding.
That is exactly what you're getting here.
Floating-point numbers between 0.5 and 1 are of the form n / 2^24 where n is between 2^23 and 2^24.
0.59144706948010461... = 9922835.23723472274456576... / 2^24
so the closest single-precision floating-point number is
9922835 / 2^24 = 0.5914470553...

Difference between Objective-C primitive numbers

What is the difference between objective-c C primitive numbers? I know what they are and how to use them (somewhat), but I'm not sure what the capabilities and uses of each one is. Could anyone clear up which ones are best for some scenarios and not others?
int
float
double
long
short
What can I store with each one? I know that some can store more precise numbers and some can only store whole numbers. Say for example I wanted to store a latitude (possibly retrieved from a CLLocation object), which one should I use to avoid loosing any data?
I also noticed that there are unsigned variants of each one. What does that mean and how is it different from a primitive number that is not unsigned?
Apple has some interesting documentation on this, however it doesn't fully satisfy my question.
Well, first off types like int, float, double, long, and short are C primitives, not Objective-C. As you may be aware, Objective-C is sort of a superset of C. The Objective-C NSNumber is a wrapper class for all of these types.
So I'll answer your question with respect to these C primitives, and how Objective-C interprets them. Basically, each numeric type can be placed in one of two categories: Integer Types and Floating-Point Types.
Integer Types
short
int
long
long long
These can only store, well, integers (whole numbers), and are characterized by two traits: size and signedness.
Size means how much physical memory in the computer a type requires for storage, that is, how many bytes. Technically, the exact memory allocated for each type is implementation-dependendant, but there are a few guarantees: (1) char will always be 1 byte (2) sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long).
Signedness means, simply whether or not the type can represent negative values. So a signed integer, or int, can represent a certain range of negative or positive numbers (traditionally –2,147,483,648 to 2,147,483,647), and an unsigned integer, or unsigned int can represent the same range of numbers, but all positive (0 to 4,294,967,295).
Floating-Point Types
float
double
long double
These are used to store decimal values (aka fractions) and are also categorized by size. Again the only real guarantee you have is that sizeof(float) <= sizeof(double) <= sizeof (long double). Floating-point types are stored using a rather peculiar memory model that can be difficult to understand, and that I won't go into, but there is an excellent guide here.
There's a fantastic blog post about C primitives in an Objective-C context over at RyPress. Lots of intro CPS textbooks also have good resources.
Firstly I would like to specify the difference between au unsigned int and an int. Say that you have a very high number, and that you write a loop iterating with an unsigned int:
for(unsigned int i=0; i< N; i++)
{ ... }
If N is a number defined with #define, it may be higher that the maximum value storable with an int instead of an unsigned int. If you overflow i will start again from zero and you'll go in an infinite loop, that's why I prefer to use an int for loops.
The same happens if for mistake you iterate with an int, comparing it to a long. If N is a long you should iterate with a long, but if N is an int you can still safely iterate with a long.
Another pitfail that may occur is when using the shift operator with an integer constant, then assigning it to an int or long. Maybe you also log sizeof(long) and you notice that it returns 8 and you don't care about portability, so you think that you wouldn't lose precision here:
long i= 1 << 34;
Bit instead 1 isn't a long, so it will overflow and when you cast it to a long you have already lost precision. Instead you should type:
long i= 1l << 34;
Newer compilers will warn you about this.
Taken from this question: Converting Long 64-bit Decimal to Binary.
About float and double there is a thing to considerate: they use a mantissa and an exponent to represent the number. It's something like:
value= 2^exponent * mantissa
So the more the exponent is high, the more the floating point number doesn't have an exact representation. It may also happen that a number is too high, so that it will have a so inaccurate representation, that surprisingly if you print it you get a different number:
float f= 9876543219124567;
NSLog("%.0f",f); // On my machine it prints 9876543585124352
If I use a double it prints 9876543219124568, and if I use a long double with the .0Lf format it prints the correct value. Always be careful when using floating points numbers, unexpected things may happen.
For example it may also happen that two floating point numbers have almost the same value, that you expect they have the same value but there is a subtle difference, so that the equality comparison fails. But this has been treated hundreds of times on Stack Overflow, so I will just post this link: What is the most effective way for float and double comparison?.