This is a question about floating point analysis and numerical stability. Say I have two [d x 1] vectors a and x and a scalar b such that a.T # x < b (where # denotes a dot product).
I additionally have a unit [d x 1] vector d. I want to derive the maximum scalar s so that a.T # (x + s * d) < b. Without floating point errors this is trivial:
s = (b - a.T # x) / (a.T # d).
But with floating point errors though this s is not guaranteed to satisfy a.T # (x + s * d) < b.
Currently my solution is to use a stabilized division, which helps:
s = sign(a.T # x) * sign(a.T # d) * exp(log(abs(a.T # x) + eps) - log(abs(a.T # d) + eps)).
But this s still does not always satisfy the inequality. I can check how much this fails by:
diff = a.T # (x + s * d) - b
And then "push" that diff back through: (x + s * d - a.T # (diff + eps2)). Even with both the stable division and pushing the diff back sometimes the solution fails to satisfy the inequality. So these attempts at a solution are both hacky and they do not actually work. I think there is probably some way to do this that would work and be guaranteed to minimally satisfy the inequality under floating point imprecision, but I'm not sure what it is. The solution needs to be very efficient because this operation will be run trillions of times.
Edit: Here is an example in numpy of this issue coming into play, because a commenter had some trouble replicating this problem.
np.random.seed(1)
p, n = 10, 1
k = 3
x = np.random.normal(size=(p, n))
d = np.random.normal(size=(p, n))
d /= np.sum(d, axis=0)
a, b = np.hstack([np.zeros(p - k), np.ones(k)]), 1
s = (b - a.T # x) / (a.T # d)
Running this code gives a case where a.T # (s * d + x) > b failing to satisfy the constraint. Instead we have:
>>> diff = a.T # (x + s * d) - b
>>> diff
array([8.8817842e-16])
The question is about how to avoid this overflow.
The problem you are dealing with appear to be mainly rounding issues and not really numerical stability issues. Indeed, when a floating-point operation is performed, the result has to be rounded so to fit in the standard floating point representation. The IEEE-754 standard specify multiple rounding mode. The default one is typically the rounding to nearest.
This mean (b - a.T # x) / (a.T # d) and a.T # (x + s * d) can be rounded to the previous or nest floating-point value. As a result, there is slight imprecision introduced in the computation. This imprecision is typically 1 unit of least precision (ULP). 1 ULP basically mean a relative error of 1.1e−16 for double-precision numbers.
In practice, every operation can result in rounding and not the whole expression so the error is typically of few ULP. For operation like additions, the rounding tends to mitigate the error while for some others like a subtraction, the error can dramatically increase. In your case, the error seems only to be due to the accumulation of small errors in each operations.
The floating point computing units of processors can be controlled in low-level languages. Numpy also provides a way to find the next/previous floating point value. Based on this, you can round the value up or down for some parts of the expression so for s to be smaller than the target theoretical value. That being said, this is not so easy since some the computed values can be certainly be negative resulting in opposite results. One can round positive and negative values differently but the resulting code will certainly not be efficient in the end.
An alternative solution is to compute the theoretical error bound so to subtract s by this value. That being said, this error is dependent of the computed values and the actual algorithm used for the summation (eg. naive sum, pair-wise, Kahan, etc.). For example the naive algorithm and the pair-wise ones (used by Numpy) are sensitive to the standard deviation of the input values: the higher the std-dev, the bigger the resulting error. This solution only works if you exactly know the distribution of the input values or/and the bounds. Another issues is that it tends to over-estimate the error bounds and gives a just an estimation of the average error.
Another alternative method is to rewrite the expression by replacing s by s+h or s*h and try to find the value of h based on the already computed s and other parameters. This methods is a bit like a predictor-corrector. Note that h may not be precise also due to floating point errors.
With the absolute correction method we get:
h_abs = (b - a # (x + s * d)) / (a # d)
s += h_abs
With the relative correction method we get:
h_rel = (b - a # x) / (a # (s * d))
s *= h_rel
Here are the absolute difference with the two methods:
Initial method: 8.8817842e-16 (8 ULP)
Absolute method: -8.8817842e-16 (8 ULP)
Relative method: -8.8817842e-16 (8 ULP)
I am not sure any of the two methods are guaranteed to fulfil the requirements but a robust method could be to select the smallest s value of the two. At least, results are quite encouraging since the requirement are fulfilled for the two methods with the provided inputs.
A good method to generate more precise results is to use the Decimal package which provide an arbitrary precision at the expense of a much slower execution. This is particularly useful to compare practical results with more precise ones.
Finally, a last solution is to increase/decrease s one by one ULP so to find the best result. Regarding the actual algorithm used for the summation and inputs, results can change. The exact expression used to compute the difference also matter. Moreover, the result is certainly not monotonic because of the way floating-point arithmetic behave. This means one need to increase/decrease s by many ULP so to be able to perform the optimization. This solution is not very efficient (at least, unless big steps are used).
Related
Consider the simple expression:
c = ((a * b) / (x * y)) * sqrt(a * b + 1 + x * y)
The computer only needs to calculate a * b and x * y once, but evaluating the expression as given would make the computer calculate them twice. It would have been better to write:
d = a * b
e = x * y
f = d / e
g = d + 1 + e
c = f * sqrt(g)
Is there a function (ideally python but any language should suffice) to convert the first code block into the second? The problem is I have a very long (pages long output from a Mathematica Solve) expression that has many unique but duplicate terms (meaning, many more than the 2 unique terms a * b and x * y that were duplicated (each appeared more than once) in the expression above) so it's not reasonable for me to do the reducing by hand. In addition, a * b could appear later when calculating another quantity. FYI I need speed in my application and so these sort of optimizations are useful.
I tried googling for this functionality but perhaps I don't know the right search terms.
Thank you, Jerome. (I tried upvoting but don't yet have enough "reputation"). Your response led me to the answer: Use sympy.cse and then copy paste its output and then use a compiler module like Numba as you suggested. One note is sympy appears to only reduce to a certain extent and then decides it's simple enough, e.g.
import sympy as sp
a,b,c,x,y = sp.symbols('a b x y')
input_expr = ((a * b) / (x * y)) * sp.sqrt(a * b + 1 + x * y)
repl, redu = sp.cse(input_expr)
print(repl, redu)
for variable, expr in repl:
print(f'{variable} = {expr}')
>>> [(x0, a*b)] [x0*sqrt(x*y + x0 + 1)/(x*y)]
>>> x0 = a*b
So, it does not assign x1 to x*y, but if instead the expression is slightly more complicated, then it does.
input_expr = ((a * b) / (x * y + 2)) * sp.sqrt(a * b + 1 + x * y)
>>> [(x0, x*y), (x1, a*b)] [x1*sqrt(x0 + x1 + 1)/(x0 + 2)]
>>> x0 = x*y
>>> x1 = a*b
But, even for my very large expression, sympy reduced it to a point that I can finish up by hand without difficulty, so I consider my problem solved.
Best practice for using common subexpression elimination with lambdify in SymPy
Yes, this optimization is called Common Sub-expression Elimination and it is done in all mainstream optimising compilers since decades (I advise you to read the associated books and research papers about it).
However, this is certainly not a good idea to use this kind of optimization in Python. Indeed, Python is not design with speed in mind and the default interpreter does not perform such optimization (in fact it almost perform no optimization). Python codes are expected to be readable and not generated. A generated code is likely slower than a native code while having nearly no benefit over a generated native code. There are just-in-time (JIT) compilers for Python like PyPy doing such optimization so it is certainly wise to try them. Alternatively, Cython (not to be confused with CPython) and Numba can strongly help to remove the overheads Python thanks to the (JIT/AOT) compilation of a Python code to a native execution. Indeed, recomputing multiple time the same expression if far from being the only one issue with the CPython interpreter: basic floating-point/arithmetic operations tends to be at least one order of magnitude slower than native code (mainly due to the dynamic management of objects including their allocation, not to mention the overhead of the interpreter loop). Thus, please use a compiler that does such optimizations very well instead of rewriting the wheel.
I implemented the powf(float x, float y) math function. This function is a binary floating point operation. I need to test it for correctness,but the test can't iterate over all floating point. what should I do.
Consider 2 questions:
How do I test binary floating point math functions?
Break FP values into groups:
Largest set: Normal values including +/- values near 1.0 and near the extremes as well as randomly selected ones.
Subnormals
Zeros: +0.0, -0.0
NANs
Use at least combinations of 100,000s+ sample values from the first set (including +/-min, +/-1.0, +/-max), 1,000s from the second set (including +/-min, +/-max) and -0.0, +0.0, -NAN, +NAN.
Additional tests for the function's edge cases.
How do I test powf()?
How: Test powf() against pow() for result correctness.
Values to test against: powf() has many concerns.
*pow(x,y) functions are notoriously difficult to code well. The error in little sub-calculations errors propagate to large final errors.
*pow() includes expected integral results with integral value arguments. E.g. pow(2, y) is expected to be exact for all in range results. pow(10, y) is expected to be within 0.5 unit in the last place for all y in range.
*pow() includes expected integer results with negative x.
There is little need to test every x, y combination. Consider every x < 0, y non-whole number value leads to a NAN.
z = powf(x,y) readily underflows to 0.0. Testing of x, y, values near a result of z == 0 needs some attention.
z = powf(x,y) readily overflows to ∞. Testing of x, y, values near a result of z == FMT_MAX needs more attention as a slight error result in FLT_MAX vs. INF. Since overflow is so rampant with powf(x,y), this reduces the numbers of combinations needed as it is the edge that is important and larger values need light testing.
In going through the exercises of SICP, it defines a fixed-point as a function that satisfies the equation F(x)=x. And iterating to find where the function stops changing, for example F(F(F(x))).
The thing I don't understand is how a square root of, say, 9 has anything to do with that.
For example, if I have F(x) = sqrt(9), obviously x=3. Yet, how does that relate to doing:
F(F(F(x))) --> sqrt(sqrt(sqrt(9)))
Which I believe just converges to zero:
>>> math.sqrt(math.sqrt(math.sqrt(math.sqrt(math.sqrt(math.sqrt(9))))))
1.0349277670798647
Since F(x) = sqrt(x) when x=1. In other words, how does finding the square root of a constant have anything to do with finding fixed points of functions?
When calculating the square-root of a number, say a, you essentially have an equation of the form x^2 - a = 0. That is, to find the square-root of a, you have to find an x such that x^2 = a or x^2 - a = 0 -- call the latter equation as (1). The form given in (1) is an equation which is of the form g(x) = 0, where g(x) := x^2 - a.
To use the fixed-point method for calculating the roots of this equation, you have to make some subtle modifications to the existing equation and bring it to the form f(x) = x. One way to do this is to rewrite (1) as x = a/x -- call it (2). Now in (2), you have obtained the form required for solving an equation by the fixed-point method: f(x) is a/x.
Observe that this method requires both sides of the equation to have an 'x' term; an equation of the form sqrt(a) = x doesn't meet the specification and hence can't be solved (iteratively) using the fixed-point method.
The thing I don't understand is how a square root of, say, 9 has anything to do with that.
For example, if I have F(x) = sqrt(9), obviously x=3. Yet, how does that relate to doing: F(F(F(x))) --> sqrt(sqrt(sqrt(9)))
These are standard methods for numerical calculation of roots of non-linear equations, quite a complex topic on its own and one which is usually covered in Engineering courses. So don't worry if you don't get the "hang of it", the authors probably felt it was a good example of iterative problem solving.
You need to convert the problem f(x) = 0 to a fixed point problem g(x) = x that is likely to converge to the root of f(x). In general, the choice of g(x) is tricky.
if f(x) = x² - a = 0, then you should choose g(x) as follows:
g(x) = 1/2*(x + a/x)
(This choice is based on Newton's method, which is a special case of fixed-point iterations).
To find the square root, sqrt(a):
guess an initial value of x0.
Given a tolerance ε, compute xn+1 = 1/2*(xn + a/xn) for n = 0, 1, ... until convergence.
When we need to optimize a function on the positive real half-line, and we only have non-constraints optimization routines, we use y = exp(x), or y = x^2 to map to the real line and still optimize on the log or the (signed) square root of the variable.
Can we do something similar for linear constraints, of the form Ax = b where, for x a d-dimensional vector, A is a (N,n)-shaped matrix and b is a vector of length N, defining the constraints ?
While, as Ervin Kalvelaglan says this is not always a good idea, here is one way to do it.
Suppose we take the SVD of A, getting
A = U*S*V'
where if A is n x m
U is nxn orthogonal,
S is nxm, zero off the main diagonal,
V is mxm orthogonal
Computing the SVD is not a trivial computation.
We first zero out the elements of S which we think are non-zero just due to noise -- which can be a slightly delicate thing to do.
Then we can find one solution x~ to
A*x = b
as
x~ = V*pinv(S)*U'*b
(where pinv(S) is the pseudo inverse of S, ie replace the non zero elements of the diagonal by their multiplicative inverses)
Note that x~ is a least squares solution to the constraints, so we need to check that it is close enough to being a real solution, ie that Ax~ is close enough to b -- another somewhat delicate thing. If x~ doesn't satisfy the constraints closely enough you should give up: if the constraints have no solution neither does the optimisation.
Any other solution to the constraints can be written
x = x~ + sum c[i]*V[i]
where the V[i] are the columns of V corresponding to entries of S that are (now) zero. Here the c[i] are arbitrary constants. So we can change variables to using the c[] in the optimisation, and the constraints will be automatically satisfied. However this change of variables could be somewhat irksome!
Goal
I want to apply "relative" rounding to the elements of a numpy array. Relative rounding means here that I round to a given number significant figures, whereby I do not care whether this are decimal or binary figures.
Suppose we are given two arrays a and b so that some elements are close to each other. That is,
np.isclose(a, b, tolerance)
has some True entries for a given relative tolerance. Suppose that we know that all entries that are not equal within the tolerance differ by a relative difference of at least 100*tolerance. I want to obtain some arrays a2 and b2 so that
np.all(np.isclose(a, b, tolerance) == (a2 == b2))
My idea is to round the arrays to an appropriate significant digit:
a2 = relative_rounding(a, precision)
b2 = relative_rounding(b, precision)
However, whether the numbers are rounded or floor is applied does not matter as long as the goal is achieved.
An example:
a = np.array([1.234567891234, 2234.56789123, 32.3456789123])
b = np.array([1.234567895678, 2234.56789456, 42.3456789456])
# desired output
a2 = np.array([1.2345679, 2234.5679, 3.2345679])
b2 = np.array([1.2345679, 2234.5679, 4.2345679])
Motivation
The purpose of this exercise is to allow me to work with clearly defined results of binary operations so that little errors do not matter. For example, I want that the result of np.unique is not affected by imprecisions of floating point operations.
You may suppose that the error introduced by the floating point operations is known/can be bounded.
Question
I am aware of similar questions concerning rounding up to given significant figures with numpy and respective solutions. Though the respective answers may be sufficient for my purposes, I think there should be a simpler and more efficient solution to this problem: since floating point numbers have the "relative precision" builtin, it should be possible to just set the n least significant binary values in the mantissa to 0. This should be even more efficient than the usual rounding procedure. However, I do not know how to implement that with numpy. It is essential that the solution is vectorized and more efficient than the naive way. Is there a direct way of directly manipulating the binaries of an array in numpy?
This is impossible, except for special cases such as a precision of zero (isclose becomes equivalent to ==) or infinity (all numbers are close to each other).
numpy.isclose is not transitive. We may have np.isclose(x, y, precision) and np.isclose(y, z, precision) but not np.isclose(x, z, precision). (For example, 10 and 11 are within 10% of each other, and 11 and 12 are within 10% of each other, but 10 and 12 are not within 10% of each other.)
Give the above isclose relations for x, y, and z, the requested property would require that x2 == y2 and y2 == z2 be true but that x2 == z2 be false. However, == is transitive, so x2 == y2 and y2 == z2 implies x2 == z2. Thus, the requested function requires that x2 == z2 be both true and false, and hence it is impossible.