MPFR - Loss precision after addition - gmp

First, sorry if this question looks "silly", because I'm new to MPFR, LOL.
I have two mpfr_t variables with precision of 1024, and they have the value of 0.2 and 0.06 stored in them.
But when I add these variables, things goes wrong and the result (which is also a mpfr_t variable) has the value of 0.2599999...
This is strange because the MPFR library should maintain the precision (isn't it?).
Could you please help me with this? Thanks so much, so much in advance.

MPFR numbers are represented in binary (base 2). In this system, the only numbers that can be represented exactly have the form N·2k, where N and k are integers. Neither 0.2 = 1/5 nor 0.06 = 3/50 have this form, so that they are approximated with some small error. When you add these variables, you are seeing a consequence of this error (there may be also another error in the addition operation since in binary these numbers have many nonzero digits, unlike in decimal).
This is the same issue as the one described in: Is floating point math broken?
EDIT:
To answer the question in comment "Is there a way to avoid this situation?", no, there is no way to avoid this situation in practice, except in very specific cases. For instance, if all your numbers (inputs and results of each intermediate operations) are decimal numbers, representable with a small enough number of digits, you can use a decimal arithmetic (but MPFR can't do that). Computer algebra systems may help in some cases. There's also iRRAM... I'll come back to it later.
However, there are solutions to attempt to hide issues with numerical errors. You need to estimate the maximum possible error on a computed value. With an error analysis, you can obtain rigorous bounds, but this may be difficult or take time to do. Note that rigorous bounds are pessimistic in general, but if you use arbitrary precision (e.g. with MPFR), this is less an issue. The analysis can be done dynamically with interval arithmetic (still pessimistic, even worse). But perhaps a simple estimate is sufficient for you. Once you have an estimate of the maximum error:
For the output, choose the number of displayed digits so that the error is less than the weight of the last displayed digit.
For discontinuous functions (e.g. equality test, floor, ceil): if the distance between the computed value and a discontinuity point is less than the maximum error, assume that the actual value is equal to the discontinuity point. Note that this is just a heuristic, but if it fails (this may remain unnoticed and will probably invalidate your estimate), this means that you have not done your computations with enough precision.
Note: MPFR won't do that for you. But you can write code to take these rules into account.
The iRRAM package, which is based on MPFR, can track the error in a rigorous way (like with interval arithmetic) and automatically redo all the computations in a higher precision if it notices that the accuracy is too low. However, if some mathematical result is a discontinuity point, iRRAM won't help. In particular, it cannot provide a rigorous equality test.
Finally, I suggest that you have a look at Goldberg's paper What Every Computer Scientist Should Know About Floating-Point Arithmetic, in particular the notion of cancellation.

Related

Why is closed form for fibonacci sequence not used in practice?

There is a closed form for the Fibonacci sequence that can be obtained via generating functions. It is:
f_n = 1/sqrt(5) (phi^n-\psi^n)
For what the terms mean, see the link above or here.
However, it is discussed here that this closed form isn't really used in practice because it starts producing the wrong answers when n becomes around a hundred and larger.
But in the answer here, it seems one of the methods employed is fast matrix exponentiation which can be used to get the nth Fibonacci number very efficiently in O(log(n)) time.
But then, the closed form expression involves a bunch of terms that are raised to the nth power. So, you could calculate all those terms with fast exponentiation and get the result efficiently that way. Why would fast exponentiation on a matrix be better than doing it on scalars that show up in the closed-form expression? And besides, looking for how to do fast exponentiation of a matrix efficiently, the accepted answer here suggests we convert to the diagonal form and do it on scalars anyway.
The question then is - if fast exponentiation of a matrix is good for calculating the nth Fibonacci number in O(log(n)) time, why isn't the closed form a good way to do it when it involves fast exponentiation on scalars?
The "closed form" formula for computing Fibonacci numbers, you need to raise irrational numbers to the power n, which means you have to accept using only approximations (typically, double-precision floating-point arithmetic) and therefore inaccurate results for large numbers.
On the contrary, in the "matrix exponentiation" formula for computing Fibonacci numbers, the matrix you are raising to the power n is an integer matrix, so you can do integer calculations with no loss of precision using a "big int" library to do arithmetic with arbitrarily large integers (or if you use a language like Python, "big ints" are the default).
So the difference is that you can't do exact arithmetic with irrational numbers but you can with integers.
Note that "In practice" here is referring to competitive programming (in reality, you basically never want to compute massive fibonacci numbers). So, the first reason is that the normal way of calculating fibonacci numbers is way faster to type without making any errors, and less code. Plus, it will be faster than the fancy method for small numbers.
When it comes to big numbers, Fast Matrix multiplication is O(log(n)) if you don't care about precision. However, in competitive programming we almost always care about precision and want the correct answer. To do that, we would need to increase the precision of our numbers. And the bigger n gets, the more precision is required. I don't know the exact formula, but I would imagine that because of the increased precision required, the matrix multiplication which requires only O(log n) multiplications will require something like O(log n) bits of precision so the time complexity will actually end up being somewhat bad (O(log^3 n) maybe?). Not to mention, even harder to code and very slow because you are multiplying arbitrary-precision numbers.

Pseudo-inverse via signular value decomposition in numpy.linalg.lstsq

at the risk of using the wrong SE...
My question concerns the rcond parameter in numpy.linalg.lstsq(a, b, rcond). I know it's used to define the cutoff for small singular values in the singular value decomposition when numpy computed the pseudo-inverse of a.
But why is it advantageous to set "small" singular values (values below the cutoff) to zero rather than just keeping them as small numbers?
PS: Admittedly, I don't know exactly how the pseudo-inverse is computed and exactly what role SVD plays in that.
Thanks!
To determine the rank of a system, you could need compare against zero, but as always with floating point arithmetic we should compare against some finite range. Hitting exactly 0 never happens when adding and subtracting messy numbers.
The (reciprocal) condition number allows you to specify such a limit in a way that is independent of units and scale in your matrix.
You may also opt of of this feature in lstsq by specifying rcond=None if you know for sure your problem can't run the risk of being under-determined.
The future default sounds like a wise choice from numpy
FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions
because if you hit this limit, you are just fitting against rounding errors of floating point arithmetic instead of the "true" mathematical system that you hope to model.

Numerical Accuracy: to scale or not?

I am working on a n-body gravitational simulator that takes input and produces output in metric MKS units. This involves dealing with some very large numbers (like solar masses expressed in kilograms, semimajor axes of planetary orbits expressed in meters, and timescales of years expressed in seconds), which get multiplied by some very small numbers (notably, the gravitational constant, which is 6.67384e-11 in MKS units), and also the occasional very small number getting added to or subtracted from a very large number (mainly when summing up pairwise accelerations), which gets me concerned about the effects of rounding errors.
I've already taken the step of replacing all masses m by Gm (premultiplying by the gravitational constant), which significantly reduces the total number of multiplies, and makes the mass numbers much smaller, and that seems to have had a positive effect on both efficiency and accuracy, as judged by how well the simulator conserves energy.
I am wondering, however: is potentially it worth trying to do some internal re-scaling into different units to further minimize floating point errors? And if so, what kind of range (for double-precision floats) should I be trying to get my numbers centered on for maximum accuracy?
In general if you want precise results in physical based rendering you don't want to use floats or doubles since they have massive rounding problems and thus introduce errors in your simulation.
If you need or want to stick with floats/double you probably should rescale around zero. The reason is that often floating point representations have a higher "density" of values around this point and tend to have fewer on the min/max sides. Image example from google
I would suggest that you change all values to integer based number variables. This erases rounding errors (over/underflow can still happen!) and speeds up the calculation process by an order of magnitude because normal CPUs work faster with integer operations. In case of GPU its basically the same but thats another story all by its own...
But before you take such an effort to further improve your accuracy i would strongly advise an arbitrary precision number library. This may come with an performance loss but should be way easier and yield better results than a rescaling of your values.
Most of the numerical mathematicians come across this problem.
At first let me remind you that you can not deal with numbers (or phsycal values) smaller than the machine epsilon for each calculation. Unfortunately the epsilon depends around which number you are analyzing. You can try eps(a) for any value of a in MATLAB, as far as I remember eps(1.0)~=2.3e-16 and eps(0)~1e-298.
That's why in numerical methods you avoid calculations using very different scaled numbers. Because one is just an ignored (smaller than its epsilon) by the other value and rounding errors are inevitable.
But what else people do? If they encounter such physical problems, before coding, mathematicians analyse the problem theoritically, they make simplifications to use similarly scaled numbers.

Are there compilers that optimise floating point operations for accuracy (as opposed to speed)?

We know that compilers are getting better and better at optimising our code and make it run faster, but my question are there compilers that can optimise floating point operations to ensure greater accuracy.
For example a basic rule is to perform multiplications before addition, this is because multiplication and division using floating point numbers does not introduce inaccuracies as great as that of addition and subtraction but can increase the magnitude of inaccuracies introduced by addition and subtraction, so it should be done first in many cases.
So a floating point operation like
y = x*(a + b); // faster but less accurate
Should be changed to
y = x*a + x*b; // slower but more accurate
Are there any compilers that will optimise for improved floating point accuracy at the expense of speed like I showed above? Or is the main concern of compilers speed with out looking at accuracy of floating point operations?
Thanks
Update: The selected answer, showed a very good example where this type of optimisation would not work, so it wouldn't be possible for the compiler to know before hand what is the more accurate way to evaluate y. Thanks for the counter example.
Your premise is faulty. x*(a + b), is (in general) no less accurate than x*a + x*b. In fact, it will often be more accurate, because it performs only two floating point operations (and therefore incurs only two rounding errors), whereas the latter performs three operations.
If you know something about the expected distribution of values for x, a, and b a priori, then you could make an informed decision, but compilers almost never have access to that type of information.
That aside, what if the person writing the program actually meant x*(a+b) and specifically wanted the exactly roundings that are caused by that particular sequence of operations? This sort of thing is actually pretty common in high-quality numerical algorithms.
Better to do what the programmer wrote, not what you think he might have intended.
Edit -- An example to illustrate a case where the transformation you suggested results in a catastrophic loss of accuracy: suppose
x = 3.1415926535897931
a = 1.0e15
b = -(1.0e15 - 1.0)
Then, evaluating in double we get:
x*(a + b) = 3.1415926535897931
but
x*a + x*b = 3.0
Compilers typically "optimize" for accuracy over speed, accuracy defined as exact implementation of the IEEE 754 standard. Whereas integer operations can be reordered in any way that doesn't cause overflow, FP operations need to be performed exactly as the programmer specifies. This may sacrifice numerical accuracy (ordinary C compilers are not equipped to optimize for that) but faithfully implements the what the programmer asked.
A programmer who is sure he hasn't manually optimized for accuracy may enable compiler features like GCC's -funsafe-math-optimizations and -ffinite-math-only to possibly extract extra speed. But usually there isn't much gain.
No, there isn't. Stephen Canon gives some good reasons why this would be a stupid idea, and he's correct; so you won't find a compiler that does this.
If you as the programmer have some knowledge about the ranges of numbers you're manipulating, you can use parentheses, temporary variables and similar constructs to strongly hint the compiler about how you want things done.

Need help generating discrete random numbers from distribution

I searched the site but did not find exactly what I was looking for... I wanted to generate a discrete random number from normal distribution.
For example, if I have a range from a minimum of 4 and a maximum of 10 and an average of 7. What code or function call ( Objective C preferred ) would I need to return a number in that range. Naturally, due to normal distribution more numbers returned would center round the average of 7.
As a second example, can the bell curve/distribution be skewed toward one end of the other? Lets say I need to generate a random number with a range of minimum of 4 and maximum of 10, and I want the majority of the numbers returned to center around the number 8 with a natural fall of based on a skewed bell curve.
Any help is greatly appreciated....
Anthony
What do you need this for? Can you do it the craps player's way?
Generate two random integers in the range of 2 to 5 (inclusive, of course) and add them together. Or flip a coin (0,1) six times and add 4 to the result.
Summing multiple dice produces a normal distribution (a "bell curve"), while eliminating high or low throws can be used to skew the distribution in various ways.
The key is you are going for discrete numbers (and I hope you mean integers by that). Multiple dice throws famously generate a normal distribution. In fact, I think that's how we were first introduced to the Gaussian curve in school.
Of course the more throws, the more closely you approximate the bell curve. Rolling a single die gives a flat line. Rolling two dice just creates a ramp up and down that isn't terribly close to a bell. Six coin flips gets you closer.
So consider this...
If I understand your question correctly, you only have seven possible outcomes--the integers (4,5,6,7,8,9,10). You can set up an array of seven probabilities to approximate any distribution you like.
Many frameworks and libraries have this built-in.
Also, just like TokenMacGuy said a normal distribution isn't characterized by the interval it's defined on, but rather by two parameters: Mean μ and standard deviation σ. With both these parameters you can confine a certain quantile of the distribution to a certain interval, so that 95 % of all points fall in that interval. But resticting it completely to any interval other than (−∞, ∞) is impossible.
There are several methods to generate normal-distributed values from uniform random values (which is what most random or pseudorandom number generators are generating:
The Box-Muller transform is probably the easiest although not exactly fast to compute. Depending on the number of numbers you need, it should be sufficient, though and definitely very easy to write.
Another option is Marsaglia's Polar method which is usually faster1.
A third method is the Ziggurat algorithm which is considerably faster to compute but much more complex to program. In applications that really use a lot of random numbers it may be the best choice, though.
As a general advice, though: Don't write it yourself if you have access to a library that generates normal-distributed random numbers for you already.
For skewing your distribution I'd just use a regular normal distribution, choosing μ and σ appropriately for one side of your curve and then determine on which side of your wanted mean a point fell, stretching it appropriately to fit your desired distribution.
For generating only integers I'd suggest you just round towards the nearest integer when the random number happens to fall within your desired interval and reject it if it doesn't (drawing a new random number then). This way you won't artificially skew the distribution (such as you would if you were clamping the values at 4 or 10, respectively).
1 In testing with deliberately bad random number generators (yes, worse than RANDU) I've noticed that the polar method results in an endless loop, rejecting every sample. Won't happen with random numbers that fulfill the usual statistic expectations to them, though.
Yes, there are sophisticated mathematical solutions, but for "simple but practical" I'd go with Nosredna's comment. For a simple Java solution:
Random random=new Random();
public int bell7()
{
int n=4;
for (int x=0;x<6;++x)
n+=random.nextInt(2);
return n;
}
If you're not a Java person, Random.nextInt(n) returns a random integer between 0 and n-1. I think the rest should be similar to what you'd see in any programming language.
If the range was large, then instead of nextInt(2)'s I'd use a bigger number in there so there would be fewer iterations through the loop, depending on frequency of call and performance requirements.
Dan Dyer and Jay are exactly right. What you really want is a binomial distribution, not a normal distribution. The shape of a binomial distribution looks a lot like a normal distribution, but it is discrete and bounded whereas a normal distribution is continuous and unbounded.
Jay's code generates a binomial distribution with 6 trials and a 50% probability of success on each trial. If you want to "skew" your distribution, simply change the line that decides whether to add 1 to n so that the probability is something other than 50%.
The normal distribution is not described by its endpoints. Normally it's described by it's mean (which you have given to be 7) and its standard deviation. An important feature of this is that it is possible to get a value far outside the expected range from this distribution, although that will be vanishingly rare, the further you get from the mean.
The usual means for getting a value from a distribution is to generate a random value from a uniform distribution, which is quite easily done with, for example, rand(), and then use that as an argument to a cumulative distribution function, which maps probabilities to upper bounds. For the standard distribution, this function is
F(x) = 0.5 - 0.5*erf( (x-μ)/(σ * sqrt(2.0)))
where erf() is the error function which may be described by a taylor series:
erf(z) = 2.0/sqrt(2.0) * Σ∞n=0 ((-1)nz2n + 1)/(n!(2n + 1))
I'll leave it as an excercise to translate this into C.
If you prefer not to engage in the exercise, you might consider using the Gnu Scientific Library, which among many other features, has a technique to generate random numbers in one of many common distributions, of which the Gaussian Distribution (hint) is one.
Obviously, all of these functions return floating point values. You will have to use some rounding strategy to convert to a discrete value. A useful (but naive) approach is to simply downcast to integer.