Numpy: Overcoming machine imprecision by relative rounding - numpy

Goal
I want to apply "relative" rounding to the elements of a numpy array. Relative rounding means here that I round to a given number significant figures, whereby I do not care whether this are decimal or binary figures.
Suppose we are given two arrays a and b so that some elements are close to each other. That is,
np.isclose(a, b, tolerance)
has some True entries for a given relative tolerance. Suppose that we know that all entries that are not equal within the tolerance differ by a relative difference of at least 100*tolerance. I want to obtain some arrays a2 and b2 so that
np.all(np.isclose(a, b, tolerance) == (a2 == b2))
My idea is to round the arrays to an appropriate significant digit:
a2 = relative_rounding(a, precision)
b2 = relative_rounding(b, precision)
However, whether the numbers are rounded or floor is applied does not matter as long as the goal is achieved.
An example:
a = np.array([1.234567891234, 2234.56789123, 32.3456789123])
b = np.array([1.234567895678, 2234.56789456, 42.3456789456])
# desired output
a2 = np.array([1.2345679, 2234.5679, 3.2345679])
b2 = np.array([1.2345679, 2234.5679, 4.2345679])
Motivation
The purpose of this exercise is to allow me to work with clearly defined results of binary operations so that little errors do not matter. For example, I want that the result of np.unique is not affected by imprecisions of floating point operations.
You may suppose that the error introduced by the floating point operations is known/can be bounded.
Question
I am aware of similar questions concerning rounding up to given significant figures with numpy and respective solutions. Though the respective answers may be sufficient for my purposes, I think there should be a simpler and more efficient solution to this problem: since floating point numbers have the "relative precision" builtin, it should be possible to just set the n least significant binary values in the mantissa to 0. This should be even more efficient than the usual rounding procedure. However, I do not know how to implement that with numpy. It is essential that the solution is vectorized and more efficient than the naive way. Is there a direct way of directly manipulating the binaries of an array in numpy?

This is impossible, except for special cases such as a precision of zero (isclose becomes equivalent to ==) or infinity (all numbers are close to each other).
numpy.isclose is not transitive. We may have np.isclose(x, y, precision) and np.isclose(y, z, precision) but not np.isclose(x, z, precision). (For example, 10 and 11 are within 10% of each other, and 11 and 12 are within 10% of each other, but 10 and 12 are not within 10% of each other.)
Give the above isclose relations for x, y, and z, the requested property would require that x2 == y2 and y2 == z2 be true but that x2 == z2 be false. However, == is transitive, so x2 == y2 and y2 == z2 implies x2 == z2. Thus, the requested function requires that x2 == z2 be both true and false, and hence it is impossible.

Related

How do I test binary floating point math functions

I implemented the powf(float x, float y) math function. This function is a binary floating point operation. I need to test it for correctness,but the test can't iterate over all floating point. what should I do.
Consider 2 questions:
How do I test binary floating point math functions?
Break FP values into groups:
Largest set: Normal values including +/- values near 1.0 and near the extremes as well as randomly selected ones.
Subnormals
Zeros: +0.0, -0.0
NANs
Use at least combinations of 100,000s+ sample values from the first set (including +/-min, +/-1.0, +/-max), 1,000s from the second set (including +/-min, +/-max) and -0.0, +0.0, -NAN, +NAN.
Additional tests for the function's edge cases.
How do I test powf()?
How: Test powf() against pow() for result correctness.
Values to test against: powf() has many concerns.
*pow(x,y) functions are notoriously difficult to code well. The error in little sub-calculations errors propagate to large final errors.
*pow() includes expected integral results with integral value arguments. E.g. pow(2, y) is expected to be exact for all in range results. pow(10, y) is expected to be within 0.5 unit in the last place for all y in range.
*pow() includes expected integer results with negative x.
There is little need to test every x, y combination. Consider every x < 0, y non-whole number value leads to a NAN.
z = powf(x,y) readily underflows to 0.0. Testing of x, y, values near a result of z == 0 needs some attention.
z = powf(x,y) readily overflows to ∞. Testing of x, y, values near a result of z == FMT_MAX needs more attention as a slight error result in FLT_MAX vs. INF. Since overflow is so rampant with powf(x,y), this reduces the numbers of combinations needed as it is the edge that is important and larger values need light testing.

Numerically stable maximum step in fixed direction satisfying inequality

This is a question about floating point analysis and numerical stability. Say I have two [d x 1] vectors a and x and a scalar b such that a.T # x < b (where # denotes a dot product).
I additionally have a unit [d x 1] vector d. I want to derive the maximum scalar s so that a.T # (x + s * d) < b. Without floating point errors this is trivial:
s = (b - a.T # x) / (a.T # d).
But with floating point errors though this s is not guaranteed to satisfy a.T # (x + s * d) < b.
Currently my solution is to use a stabilized division, which helps:
s = sign(a.T # x) * sign(a.T # d) * exp(log(abs(a.T # x) + eps) - log(abs(a.T # d) + eps)).
But this s still does not always satisfy the inequality. I can check how much this fails by:
diff = a.T # (x + s * d) - b
And then "push" that diff back through: (x + s * d - a.T # (diff + eps2)). Even with both the stable division and pushing the diff back sometimes the solution fails to satisfy the inequality. So these attempts at a solution are both hacky and they do not actually work. I think there is probably some way to do this that would work and be guaranteed to minimally satisfy the inequality under floating point imprecision, but I'm not sure what it is. The solution needs to be very efficient because this operation will be run trillions of times.
Edit: Here is an example in numpy of this issue coming into play, because a commenter had some trouble replicating this problem.
np.random.seed(1)
p, n = 10, 1
k = 3
x = np.random.normal(size=(p, n))
d = np.random.normal(size=(p, n))
d /= np.sum(d, axis=0)
a, b = np.hstack([np.zeros(p - k), np.ones(k)]), 1
s = (b - a.T # x) / (a.T # d)
Running this code gives a case where a.T # (s * d + x) > b failing to satisfy the constraint. Instead we have:
>>> diff = a.T # (x + s * d) - b
>>> diff
array([8.8817842e-16])
The question is about how to avoid this overflow.
The problem you are dealing with appear to be mainly rounding issues and not really numerical stability issues. Indeed, when a floating-point operation is performed, the result has to be rounded so to fit in the standard floating point representation. The IEEE-754 standard specify multiple rounding mode. The default one is typically the rounding to nearest.
This mean (b - a.T # x) / (a.T # d) and a.T # (x + s * d) can be rounded to the previous or nest floating-point value. As a result, there is slight imprecision introduced in the computation. This imprecision is typically 1 unit of least precision (ULP). 1 ULP basically mean a relative error of 1.1e−16 for double-precision numbers.
In practice, every operation can result in rounding and not the whole expression so the error is typically of few ULP. For operation like additions, the rounding tends to mitigate the error while for some others like a subtraction, the error can dramatically increase. In your case, the error seems only to be due to the accumulation of small errors in each operations.
The floating point computing units of processors can be controlled in low-level languages. Numpy also provides a way to find the next/previous floating point value. Based on this, you can round the value up or down for some parts of the expression so for s to be smaller than the target theoretical value. That being said, this is not so easy since some the computed values can be certainly be negative resulting in opposite results. One can round positive and negative values differently but the resulting code will certainly not be efficient in the end.
An alternative solution is to compute the theoretical error bound so to subtract s by this value. That being said, this error is dependent of the computed values and the actual algorithm used for the summation (eg. naive sum, pair-wise, Kahan, etc.). For example the naive algorithm and the pair-wise ones (used by Numpy) are sensitive to the standard deviation of the input values: the higher the std-dev, the bigger the resulting error. This solution only works if you exactly know the distribution of the input values or/and the bounds. Another issues is that it tends to over-estimate the error bounds and gives a just an estimation of the average error.
Another alternative method is to rewrite the expression by replacing s by s+h or s*h and try to find the value of h based on the already computed s and other parameters. This methods is a bit like a predictor-corrector. Note that h may not be precise also due to floating point errors.
With the absolute correction method we get:
h_abs = (b - a # (x + s * d)) / (a # d)
s += h_abs
With the relative correction method we get:
h_rel = (b - a # x) / (a # (s * d))
s *= h_rel
Here are the absolute difference with the two methods:
Initial method: 8.8817842e-16 (8 ULP)
Absolute method: -8.8817842e-16 (8 ULP)
Relative method: -8.8817842e-16 (8 ULP)
I am not sure any of the two methods are guaranteed to fulfil the requirements but a robust method could be to select the smallest s value of the two. At least, results are quite encouraging since the requirement are fulfilled for the two methods with the provided inputs.
A good method to generate more precise results is to use the Decimal package which provide an arbitrary precision at the expense of a much slower execution. This is particularly useful to compare practical results with more precise ones.
Finally, a last solution is to increase/decrease s one by one ULP so to find the best result. Regarding the actual algorithm used for the summation and inputs, results can change. The exact expression used to compute the difference also matter. Moreover, the result is certainly not monotonic because of the way floating-point arithmetic behave. This means one need to increase/decrease s by many ULP so to be able to perform the optimization. This solution is not very efficient (at least, unless big steps are used).

Relaxation of linear constraints?

When we need to optimize a function on the positive real half-line, and we only have non-constraints optimization routines, we use y = exp(x), or y = x^2 to map to the real line and still optimize on the log or the (signed) square root of the variable.
Can we do something similar for linear constraints, of the form Ax = b where, for x a d-dimensional vector, A is a (N,n)-shaped matrix and b is a vector of length N, defining the constraints ?
While, as Ervin Kalvelaglan says this is not always a good idea, here is one way to do it.
Suppose we take the SVD of A, getting
A = U*S*V'
where if A is n x m
U is nxn orthogonal,
S is nxm, zero off the main diagonal,
V is mxm orthogonal
Computing the SVD is not a trivial computation.
We first zero out the elements of S which we think are non-zero just due to noise -- which can be a slightly delicate thing to do.
Then we can find one solution x~ to
A*x = b
as
x~ = V*pinv(S)*U'*b
(where pinv(S) is the pseudo inverse of S, ie replace the non zero elements of the diagonal by their multiplicative inverses)
Note that x~ is a least squares solution to the constraints, so we need to check that it is close enough to being a real solution, ie that Ax~ is close enough to b -- another somewhat delicate thing. If x~ doesn't satisfy the constraints closely enough you should give up: if the constraints have no solution neither does the optimisation.
Any other solution to the constraints can be written
x = x~ + sum c[i]*V[i]
where the V[i] are the columns of V corresponding to entries of S that are (now) zero. Here the c[i] are arbitrary constants. So we can change variables to using the c[] in the optimisation, and the constraints will be automatically satisfied. However this change of variables could be somewhat irksome!

Find global maximum in the lest number of computations

Let's say I have a function f defined on interval [0,1], which is smooth and increases up to some point a after which it starts decreasing. I have a grid x[i] on this interval, e.g. with a constant step size of dx = 0.01, and I would like to find which of those points has the highest value, by doing the smallest number of evaluations of f in the worst-case scenario. I think I can do much better than exhaustive search by applying something inspired with gradient-like methods. Any ideas? I was thinking of something like a binary search perhaps, or parabolic methods.
This is a bisection-like method I coded:
def optimize(f, a, b, fa, fb, dx):
if b - a <= dx:
return a if fa > fb else b
else:
m1 = 0.5*(a + b)
m1 = _round(m1, a, dx)
fm1 = fa if m1 == a else f(m1)
m2 = m1 + dx
fm2 = fb if m2 == b else f(m2)
if fm2 >= fm1:
return optimize(f, m2, b, fm2, fb, dx)
else:
return optimize(f, a, m1, fa, fm1, dx)
def _round(x, a, dx, right = False):
return a + dx*(floor((x - a)/dx) + right)
The idea is: find the middle of the interval and compute m1 and m2- the points to the right and to the left of it. If the direction there is increasing, go for the right interval and do the same, otherwise go for the left. Whenever the interval is too small, just compare the numbers on the ends. However, this algorithm still does not use the strength of the derivatives at points I computed.
Such a function is called unimodal.
Without computing the derivatives, you can work by
finding where the deltas x[i+1]-x[i] change sign, by dichotomy (the deltas are positive then negative after the maximum); this takes Log2(n) comparisons; this approach is very close to what you describe;
adapting the Golden section method to the discrete case; it takes Logφ(n) comparisons (φ~1.618).
Apparently, the Golden section is more costly, as φ<2, but actually the dichotomic search takes two function evaluations at a time, hence 2Log2(n)=Log√2(n) .
One can show that this is optimal, i.e. you can't go faster than O(Log(n)) for an arbitrary unimodal function.
If your function is very regular, the deltas will vary smoothly. You can think of the interpolation search, which tries to better predict the searched position by a linear interpolation rather than simple halving. In favorable conditions, it can reach O(Log(Log(n)) performance. I don't know of an adaptation of this principle to the Golden search.
Actually, linear interpolation on the deltas is very close to parabolic interpolation on the function values. The latter approach might be the best for you, but you need to be careful about the corner cases.
If derivatives are allowed, you can use any root solving method on the first derivative, knowing that there is an isolated zero in the given interval.
If only the first derivative is available, use regula falsi. If the second derivative is possible as well, you may consider Newton, but prefer a safe bracketing method.
I guess that the benefits of these approaches (superlinear and quadratic convergence) are made a little useless by the fact that you are working on a grid.
DISCLAIMER: Haven't test the code. Take this as an "inspiration".
Let's say you have the following 11 points
x,f(x) = (0,3),(1,7),(2,9),(3,11),(4,13),(5,14),(6,16),(7,5),(8,3)(9,1)(1,-1)
you can do something like inspired to the bisection method
a = 0 ,f(a) = 3 | b=10,f(b)=-1 | c=(0+10/2) f(5)=14
from here you can see that the increasing interval is [a,c[ and there is no need to that for the maximum because we know that in that interval the function is increasing. Maximum has to be in interval [c,b]. So at the next iteration you change the value of a s.t. a=c
a = 5 ,f(a) = 14 | b=10,f(b)=-1 | c=(5+10/2) f(6)=16
Again [a,c] is increasing so a is moved on the right
you can iterate the process until a=b=c.
Here the code that implements this idea. More info here:
int main(){
#define STEP (0.01)
#define SIZE (1/STEP)
double vals[(int)SIZE];
for (int i = 0; i < SIZE; ++i) {
double x = i*STEP;
vals[i] = -(x*x*x*x - (0.6)*(x*x));
}
for (int i = 0; i < SIZE; ++i) {
printf("%f ",vals[i]);
}
printf("\n");
int a=0,b=SIZE-1,c;
double fa=vals[a],fb=vals[b] ,fc;
c=(a+b)/2;
fc = vals[c];
while( a!=b && b!=c && a!=c){
printf("%i %i %i - %f %f %f\n",a,c,b, vals[a], vals[c],vals[b]);
if(fc - vals[c-1] > 0){ //is the function increasing in [a,c]
a = c;
}else{
b=c;
}
c=(a+b)/2;
fa=vals[a];
fb=vals[b];
fc = vals[c];
}
printf("The maximum is %i=%f with %f\n", c,(c*STEP),vals[a]);
}
Find points where derivative(of f(x))=(df/dx)=0
for derivative you could use five-point-stencil or similar algorithms.
should be O(n)
Then fit those multiple points (where d=0) on a polynomial regression / least squares regression .
should be also O(N). Assuming all numbers are neighbours.
Then find top of that curve
shouldn't be more than O(M) where M is resolution of trials for fit-function.
While taking derivative, you could leap by k-length steps until derivate changes sign.
When derivative changes sign, take square root of k and continue reverse direction.
When again, derivative changes sign, take square root of new k again, change direction.
Example: leap by 100 elements, find sign change, leap=10 and reverse direction, next change ==> leap=3 ... then it could be fixed to 1 element per step to find exact location.
I am assuming that the function evaluation is very costly.
In the special case, that your function could be approximately fitted with a polynomial, you can easily calculate the extrema in least number of function evaluations. And since you know that there is only one maximum, a polynomial of degree 2 (quadratic) might be ideal.
For example: If f(x) can be represented by a polynomial of some known degree, say 2, then, you can evaluate your function at any 3 points and calculate the polynomial coefficients using Newton's difference or Lagrange interpolation method.
Then its simple to solve for the maximum for this polynomial. For a degree 2 you can easily get a closed form expression for the maximum.
To get the final answer you can then search in the vicinity of the solution.

Normal Distribution function

edit
So based on the answers so far (thanks for taking your time) I'm getting the sense that I'm probably NOT looking for a Normal Distribution function. Perhaps I'll try to re-describe what I'm looking to do.
Lets say I have an object that returns a number of 0 to 10. And that number controls "speed". However instead of 10 being the top speed, I need 5 to be the top speed, and anything lower or higher would slow down accordingly. (with easing, thus the bell curve)
I hope that's clearer ;/
-original question
These are the times I wish I remembered something from math class.
I'm trying to figure out how to write a function in obj-C where I define the boundries, ex (0 - 10) and then if x = foo y = ? .... where x runs something like 0,1,2,3,4,5,6,7,8,9,10 and y runs 0,1,2,3,4,5,4,3,2,1,0 but only on a curve
Something like the attached image.
I tried googling for Normal Distribution but its way over my head. I was hoping to find some site that lists some useful algorithms like these but wasn't very successful.
So can anyone help me out here ? And if there is some good sites which shows useful mathematical functions, I'd love to check them out.
TIA!!!
-added
I'm not looking for a random number, I'm looking for.. ex: if x=0 y should be 0, if x=5 y should be 5, if x=10 y should be 0.... and all those other not so obvious in between numbers
alt text http://dizy.cc/slider.gif
Okay, your edit really clarifies things. You're not looking for anything to do with the normal distribution, just a nice smooth little ramp function. The one Paul provides will do nicely, but is tricky to modify for other values. It can be made a little more flexible (my code examples are in Python, which should be very easy to translate to any other language):
def quarticRamp(x, b=10, peak=5):
if not 0 <= x <= b:
raise ValueError #or return 0
return peak*x*x*(x-b)*(x-b)*16/(b*b*b*b)
Parameter b is the upper bound for the region you want to have a slope on (10, in your example), and peak is how high you want it to go (5, in the example).
Personally I like a quadratic spline approach, which is marginally cheaper computationally and has a different curve to it (this curve is really nice to use in a couple of special applications that don't happen to matter at all for you):
def quadraticSplineRamp(x, a=0, b=10, peak=5):
if not a <= x <= b:
raise ValueError #or return 0
if x > (b+a)/2:
x = a + b - x
z = 2*(x-a)/b
if z > 0.5:
return peak * (1 - 2*(z-1)*(z-1))
else:
return peak * (2*z*z)
This is similar to the other function, but takes a lower bound a (0 in your example). The logic is a little more complex because it's a somewhat-optimized implementation of a piecewise function.
The two curves have slightly different shapes; you probably don't care what the exact shape is, and so could pick either. There are an infinite number of ramp functions meeting your criteria; these are two simple ones, but they can get as baroque as you want.
The thing you want to plot is the probability density function (pdf) of the normal distribution. You can find it on the mighty Wikipedia.
Luckily, the pdf for a normal distribution is not difficult to implement - some of the other related functions are considerably worse because they require the error function.
To get a plot like you showed, you want a mean of 5 and a standard deviation of about 1.5. The median is obviously the centre, and figuring out an appropriate standard deviation given the left & right boundaries isn't particularly difficult.
A function to calculate the y value of the pdf given the x coordinate, standard deviation and mean might look something like:
double normal_pdf(double x, double mean, double std_dev) {
return( 1.0/(sqrt(2*PI)*std_dev) *
exp(-(x-mean)*(x-mean)/(2*std_dev*std_dev)) );
}
A normal distribution is never equal to 0.
Please make sure that what you want to plot is indeed a
normal distribution.
If you're only looking for this bell shape (with the tangent and everything)
you can use the following formula:
x^2*(x-10)^2 for x between 0 and 10
0 elsewhere
(Divide by 125 if you need to have your peek on 5.)
double bell(double x) {
if ((x < 10) && (x>0))
return x*x*(x-10.)*(x-10.)/125.;
else
return 0.;
}
Well, there's good old Wikipedia, of course. And Mathworld.
What you want is a random number generator for "generating normally distributed random deviates". Since Objective C can call regular C libraries, you either need a C-callable library like the GNU Scientific Library, or for this, you can write it yourself following the description here.
Try simulating rolls of dice by generating random numbers between 1 and 6. If you add up the rolls from 5 independent dice rolls, you'll get a surprisingly good approximation to the normal distribution. You can roll more dice if you'd like and you'll get a better approximation.
Here's an article that explains why this works. It's probably more mathematical detail than you want, but you could show it to someone to justify your approach.
If what you want is the value of the probability density function, p(x), of a normal (Gaussian) distribution of mean mu and standard deviation sigma at x, the formula is
p(x) = exp( ((x-mu)^2)/(2*sigma^2) ) / (sigma * 2 * sqrt(pi))
where pi is the area of a circle divided by the square of its radius (approximately 3.14159...). Using the C standard library math.h, this is:
#include <math>
double normal_pdf(double x, double mu, double sigma) {
double n = sigma * 2 * sqrt(M_PI); //normalization factor
p = exp( -pow(x-mu, 2) / (2 * pow(sigma, 2)) ); // unnormalized pdf
return p / n;
}
Of course, you can do the same in Objective-C.
For reference, see the Wikipedia or MathWorld articles.
It sounds like you want to write a function that yields a curve of a specific shape. Something like y = f(x), for x in [0:10]. You have a constraint on the max value of y, and a general idea of what you want the curve to look like (somewhat bell-shaped, y=0 at the edges of the x range, y=5 when x=5). So roughly, you would call your function iteratively with the x range, with a step that gives you enough points to make your curve look nice.
So you really don't need random numbers, and this has nothing to do with probability unless you want it to (as in, you want your curve to look like a the outline of a normal distribution or something along those lines).
If you have a clear idea of what function will yield your desired curve, the code is trivial - a function to compute f(x) and a for loop to call it the desired number of times for the desired values of x. Plot the x,y pairs and you're done. So that's your algorithm - call a function in a for loop.
The contents of the routine implementing the function will depend on the specifics of what you want the curve to look like. If you need help on functions that might return a curve resembling your sample, I would direct you to the reading material in the other answers. :) However, I suspect that this is actually an assignment of some sort, and that you have been given a function already. If you are actually doing this on your own to learn, then I again echo the other reading suggestions.
y=-1*abs(x-5)+5