While I am trying to solve this problem in a context where numpy is used heavily (and therefore an elegant numpy-based solution would be particularly welcome) the fundamental problem has nothing to do with numpy (or even Python) as such.
The task is to create an automated test for an algorithm which is supposed to produce points distributed on a grid whose pitch is specified as an input to the algorithm. The absolute positions of the points do not matter, but their relative positions do. For example, following
collection_of_points = algorithm(data, pitch=[1.3, 1.5, 2])
collection_of_points should contain only points whose x-coordinates differ by multiples of 1.3, whose y-coordinates differ by multiples of 1.5 and whose z-coordinates differ by multiples of 2.
The test should verify that this condition is satisfied.
One thing that I have tried, which doesn't seem too ugly, but doesn't work is
points = algo(data, pitch=requested_pitch)
for p1, p2 in itertools.combinations(points, 2):
distance_between_points = np.array(p2) - np.array(p1)
assert np.allclose(distance_between_points % requested_pitch, 0)
[ Aside for those unfamiliar with python or numpy:
itertools.combinations(points, 2) is a simple way of iterating through all pairs of points
Arithmetic operations on np.arrays are performed elementwise, so np.array([5,6,7]) % np.array([2,3,4]) evaluates to np.array([1, 0, 3]) via np.array([5%2, 6%3, 7%4])
np.allclose checks whether all corresponding elements in the two inputs arrays are approximately equal, and numpy automatically pretends that the 0 which is passed in as the second argument, was really an all-zero array of the correct size
]
To see why the idea shown above fails, consider a desired pitch of 3 and two points which are separated by 8.9999999 in the relevant dimension. 8.999999 % 3 is around 2.999999 which is nowhere near the required 0.
In all of this, I can't help feeling that I'm missing something obvious or that I'm re-inventing some wheel.
Can you suggest an elegant way of writing such a check?
Change your assertion to:
np.all(np.logical_or(np.isclose(x % y, 0), np.isclose((x % y) - y, 0)))
If you want to make it more readable, you should functionalize the statement. Something like:
def is_multiple(x, y, rtol=1e-05, atol=1e-08):
"""
Test if x is a multiple of y.
"""
remainder = x % y
is_zero = np.isclose(remainder, 0., rtol, atol)
is_y = np.isclose(remainder, y, rtol, atol)
return np.logical_or(is_zero, is_y)
And then:
assert np.all(is_multiple(distance_between_points, requested_pitch))
Related
I'm new to automatic differentiation programming, so this maybe a naive question. Below is a simplified version of what I'm trying to solve.
I have two input arrays - a vector A of size N and a matrix B of shape (N, M), as well a parameter vector theta of size M. I define a new array C(theta) = B * theta to get a new vector of size N. I then obtain the indices of elements that fall in the upper and lower quartile of C, and use them to create a new array A_low(theta) = A[lower quartile indices of C] and A_high(theta) = A[upper quartile indices of C]. Clearly these two do depend on theta, but is it possible to differentiate A_low and A_high w.r.t theta?
My attempts so far seem to suggest no - I have using the python libraries of autograd, JAX and tensorflow, but they all return a gradient of zero. (The approaches I have tried so far involve using argsort or extracting the relevant sub-arrays using tf.top_k.)
What I'm seeking help with is either a proof that the derivative is not defined (or cannot be analytically computed) or if it does exist, a suggestion on how to estimate it. My eventual goal is to minimize some function f(A_low, A_high) wrt theta.
This is the JAX computation that I wrote based on your description:
import numpy as np
import jax.numpy as jnp
import jax
N = 10
M = 20
rng = np.random.default_rng(0)
A = jnp.array(rng.random((N,)))
B = jnp.array(rng.random((N, M)))
theta = jnp.array(rng.random(M))
def f(A, B, theta, k=3):
C = B # theta
_, i_upper = lax.top_k(C, k)
_, i_lower = lax.top_k(-C, k)
return A[i_lower], A[i_upper]
x, y = f(A, B, theta)
dx_dtheta, dy_dtheta = jax.jacobian(f, argnums=2)(A, B, theta)
The derivatives are all zero, and I believe this is correct, because the change in value of the outputs does not depend on the change in value of theta.
But, you might ask, how can this be? After all, theta enters into the computation, and if you put in a different value for theta, you get different outputs. How could the gradient be zero?
What you must keep in mind, though, is that differentiation doesn't measure whether an input affects an output. It measures the change in output given an infinitesimal change in input.
Let's use a slightly simpler function as an example:
import jax
import jax.numpy as jnp
A = jnp.array([1.0, 2.0, 3.0])
theta = jnp.array([5.0, 1.0, 3.0])
def f(A, theta):
return A[jnp.argmax(theta)]
x = f(A, theta)
dx_dtheta = jax.grad(f, argnums=1)(A, theta)
Here the result of differentiating f with respect to theta is all zero, for the same reasons as above. Why? If you make an infinitesimal change to theta, it will in general not affect the sort order of theta. Thus, the entries you choose from A do not change given an infinitesimal change in theta, and thus the derivative with respect to theta is zero.
Now, you might argue that there are circumstances where this is not the case: for example, if two values in theta are very close together, then certainly perturbing one even infinitesimally could change their respective rank. This is true, but the gradient resulting from this procedure is undefined (the change in output is not smooth with respect to the change in input). The good news is this discontinuity is one-sided: if you perturb in the other direction, there is no change in rank and the gradient is well-defined. In order to avoid undefined gradients, most autodiff systems will implicitly use this safer definition of a derivative for rank-based computations.
The result is that the value of the output does not change when you infinitesimally perturb the input, which is another way of saying the gradient is zero. And this is not a failure of autodiff – it is the correct gradient given the definition of differentiation that autodiff is built on. Moreover, were you to try changing to a different definition of the derivative at these discontinuities, the best you could hope for would be undefined outputs, so the definition that results in zeros is arguably more useful and correct.
I am using Tensorflow to minimize a function. The function takes about 10 parameters. Every single parameter has bounds, e.g. a minimum and a maximum value the parameter is allowed to take. For example, the parameter x1 needs to be between 1 and 10.
I also have a pair of parameters that need to have the following constraint x2 > x3. In other words, x2 must always be bigger than x3. (In addition to this, x2 and x3 also have bounds, similarly to the example of x1 above.)
I know that tf.Variable has a "constraint" argument, however I can't really find any examples or documentation on how to use this to achieve the bounds and constraints as mentioned above.
Thank you!
It seems to me (I can be mistaken) that constrained optimization (you can google for it in tensorflow) is not exactly the case for which tensroflow was designed. You may want to take a look at this repo, it may satisfy your needs, but as far as I understand, it's still not solving arbitrary constrained optimization, just some classification problems with labels and features, compatible with precision/recall scores.
If you want to use constraints on the tensorflow variable (i.e. some function applied after gradient step - which you can do manually also - by taking variable values, doing manipulations, and reassigning then), it means that you will be cutting variables after each step done using gradient in general space. It's a question whether you will successfully reach the right optimization goal this way, or your variables will stuck at boundaries, because general gradient will point somewhere outside.
My approach 1
If your problem is simple enough. you can try to parametrize your x2 and x3 as x2 = x3 + t, and then try to do cutting in the graph:
x3 = tf.get_variable('x3',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
t = tf.get_variable('t',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
x2 = x3 + t
Then, on a separate call additionally do
sess.run(tf.assign(x2, tf.clip_by_value(x2, 1.0, 10.0)))
But my opinion is that it won't work well.
My approach 2
I would also try to invent some loss terms to keep variables within constraints, which is more likely to work. For example, constraint for x2 to be in the interval [1,10] will be:
loss += alpha*tf.abs(tf.math.tan(((x-5.5)/4.5)*pi/2))
Here the expression under tan is brought to -pi/2,pi/2 and then tan function is used to make it grow very rapidly when it reaches boundaries. In this case I think you're more likely to find your optimum, but again the loss weight alpha might be too big and training will stuck somewhere nearby, if required value of x2 lies near the boundary. In this case you can try to use smaller alpha.
In addition to the answer by Slowpoke, reparameterization is another option. E.g. let's say you have a param p which should be bounded in [lower_bound,upper_bound], you could write:
p_inner = tf.Variable(...) # unbounded
p = tf.sigmoid(p_inner) * (upper_bound - lower_bound) + lower_bound
However, this will change the behavior of gradient descent.
I want to calculate the numerical derivative of two arrays a and b.
If I do
c = diff(a) / diff(b)
I get what I want, but I loose the edge (the last point) so c.shape ~= a.shape.
If I do
c = gradient(a, b)
then c.shape = a.shape, but I get a completely different result.
I have read how gradient is calculated in numpy and I guess it does a completely different thing, although I dont understand quite well the difference yet. But is there a way or another function to calculate the differential which also gives the values at the edges?
And why is the result so different between gradient and diff?
These functions, although related, do different actions.
np.diff simply takes the differences of matrix slices along a given axis, and used for n-th difference returns a matrix smaller by n along the given axis (what you observed in the n=1 case). Please see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.diff.html
np.gradient produces a set of gradients of an array along all its dimensions while preserving its shape https://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html Please also observe that np.gradient should be executed for one input array, your second argument b does not make sense here (was interpreted as first non-keyword argument from *varargs which is meant to describe spacings between the values of the first argument), hence the results that don't match your intuition.
I would simply use c = diff(a) / diff(b) and append values to c if you really need to have c.shape match a.shape. For instance, you might append zeros if you expect the gradient to vanish close to the edges of your window.
I need to do something very similar to what is detailed in this post. But the way the stencils are done are not obvious to me... well the stencil for _flux is, but the ones for temp_bz & temp_bx are not.
I think the picture would get clearer with variables, instead of numbers (something like stencil = np.array([[a, b], [c, d]]) with a=0.5, b=...
As example, if the recurrence relation is
flux2[i,j] = a*flux2[i-1,j] + b*bz[i-1,j]*dx + c*flux2[i,j-1] - d*bx[i,j-1]*dz
how the code would be changed ?
Having flux2, bz and bx variables, and assuming they are numpy arrays (if they are not, they should), you could write that ecuation in a vectorized form as follows:
flux2[1:,1:] = a * flux2[:-1,1:] + b * bz[:-1,1:] * dx + c * flux2[1:,:-1] - d * bx[1:,:-1] * dz
Note that, since you didn't mention dz, I assumed it is a constant, if it is a matrix of the same shape as flux2, replace with dz[1:, 1:] (same applies to dx).
That line above will vectorize the operation to every i,j of the matrix, and thus, remove the for loop, giving a considerable speedup.
You would have to define the boundary conditions for row and column 0, as your equation doesn't define what to do in those special cases.
So, in short, as your stencil only uses one position for each variable, and only has 4 interactions, I would say is way faster to calculate it in its analytic form, rather than convolving 3 images with almost all-0 stencils (which would be quite a lot of overkill).
I'm trying to compute the maxima of some function of one variable (something like this:)
(which is calculated from a non-trivial convolution, so, no, I don't have an expression for it)
Using the command:
NMaximize[{f[x], 0 < x < 1}, x, AccuracyGoal -> 4, PrecisionGoal -> 4]
(I'm not that worried about super accuracy, a rough estimate of 10^-4 is already enough)
The result of this is x* = 0.55, which is not what should be. (i.e., it is picking the third peak).
Is there any way of telling mathematica that the global maxima is the first one when counting from x = 0 (I know this is always true), or make mathematica search with a better approach? (Notice, I don't want things like Stimulated Annealing approach; each evaluation is very costly!)
Thanks very much!
Try FindMaximum with a starting point of 0 or some similarly small value.