I am looking forward to minimize a non linear function with 3 arguments (x1,x2 and x3)
My sources of information are:
the explanation of the minimization function:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
And an example they provide:
https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html
I do not belong to a mathematical area, so first off forgive me if I am using incorrect wording / expressions.
This is my code :
import numpy as np
from scipy.optimize import minimize
def rosen(x1,x2,x3):
return np.sqrt(((x1**2)*0.002)+((x2**2)*0.0035)+((x3**2)*0.0015)+(2*x1*x2*0.015)+(2*x1*x3*0.01)+(2*x2*x3*0.02))
I think that the first step is okey up to here..
Then it is required to state the:
x0 : ndarray
Initial guess. len(x0) is the dimensionality of the minimization problem.
Given that I am stating 3 args in the minimization function I shall state a 3 dim array , such like this?
x0=np.array([1,1,1])
res = minimize(rosen, x0)
print(res.x)
The undesired output is:
rosen() missing 2 required positional arguments: 'x2' and 'x3'
Which I do not really understand where shall I state the positional arguments.
Apart from that I would like to set some bounds for the outputing values for x1,x2,x3 .
Which I tried
res = minimize(rosen, x0, bounds=([0,None]),options={"disp": False})
Which outputs also that :
ValueError: length of x0 != length of bounds
How should I then express the bounds inside the res then?
The desired output would be simply to output an array for x1,x2,x3 according to the minimization of the function where each value is minimun 0, as stated in the bounds and that the sum of the args add up to 1.
Function-definition
Read the docs carefully, e.g. for your function-def:
fun : callable
The objective function to be minimized. Must be in the form f(x, *args). The
optimizing argument, x, is a 1-D array of points, and args is a tuple of any
additional fixed parameters needed to completely specify the function.
Your function should take a 1d-array, while you implement the multi-argument for multi-variables approach!
Changing:
def rosen(x1,x2,x3):
return np.sqrt(((x1**2)*0.002)+((x2**2)*0.0035)+((x3**2)*0.0015)+(2*x1*x2*0.015)+(2*x1*x3*0.01)+(2*x2*x3*0.02))
def rosen(x):
x1,x2,x3 = x # unpack vector for your kind of calculations
return np.sqrt(((x1**2)*0.002)+((x2**2)*0.0035)+((x3**2)*0.0015)+(2*x1*x2*0.015)+(2*x1*x3*0.01)+(2*x2*x3*0.02))
should work. This is a bit a repair-something-to-keep-my-other-code approach but won't hurt much in this example. Usually you implement your function-definition on the 1d-array-input assumption!
Bounds
Again from the docs:
bounds : sequence, optional
Bounds for variables (only for L-BFGS-B, TNC and SLSQP). (min, max) pairs for each
element in x, defining the bounds on that parameter. Use None for one of min or max
when there is no bound in that direction.
So you need n_vars pairs! Easily achieved by using a list-comprehension, deducing the necessary info from x0.
res = minimize(rosen, x0, bounds=[[0,None] for i in range(len(x0))],options={"disp": False})
Make variables sum up to 1 / Constraints
Your comment implies you want the variables to sum up to 1. You would need to use an equality-constraint then (only 1 solver supporting this and inequality-constraints; one other only inequality-constraints; the rest no constraints; solver will be picked automatically if none explicitly given).
It looks somewhat like:
cons = ({'type': 'eq', 'fun': lambda x: sum(x) - 1}) # read docs to understand!
# to think about:
# sum vs. np.sum
# (not much diff here)
res = minimize(rosen, x0, bounds=[[0,None] for i in range(len(x0))],options={"disp": False}, constraints=cons)
For the case of x nonnegative, the constraint is usually called the probability-simplex.
(untested code; conceptually correct!)
Related
I'm new to automatic differentiation programming, so this maybe a naive question. Below is a simplified version of what I'm trying to solve.
I have two input arrays - a vector A of size N and a matrix B of shape (N, M), as well a parameter vector theta of size M. I define a new array C(theta) = B * theta to get a new vector of size N. I then obtain the indices of elements that fall in the upper and lower quartile of C, and use them to create a new array A_low(theta) = A[lower quartile indices of C] and A_high(theta) = A[upper quartile indices of C]. Clearly these two do depend on theta, but is it possible to differentiate A_low and A_high w.r.t theta?
My attempts so far seem to suggest no - I have using the python libraries of autograd, JAX and tensorflow, but they all return a gradient of zero. (The approaches I have tried so far involve using argsort or extracting the relevant sub-arrays using tf.top_k.)
What I'm seeking help with is either a proof that the derivative is not defined (or cannot be analytically computed) or if it does exist, a suggestion on how to estimate it. My eventual goal is to minimize some function f(A_low, A_high) wrt theta.
This is the JAX computation that I wrote based on your description:
import numpy as np
import jax.numpy as jnp
import jax
N = 10
M = 20
rng = np.random.default_rng(0)
A = jnp.array(rng.random((N,)))
B = jnp.array(rng.random((N, M)))
theta = jnp.array(rng.random(M))
def f(A, B, theta, k=3):
C = B # theta
_, i_upper = lax.top_k(C, k)
_, i_lower = lax.top_k(-C, k)
return A[i_lower], A[i_upper]
x, y = f(A, B, theta)
dx_dtheta, dy_dtheta = jax.jacobian(f, argnums=2)(A, B, theta)
The derivatives are all zero, and I believe this is correct, because the change in value of the outputs does not depend on the change in value of theta.
But, you might ask, how can this be? After all, theta enters into the computation, and if you put in a different value for theta, you get different outputs. How could the gradient be zero?
What you must keep in mind, though, is that differentiation doesn't measure whether an input affects an output. It measures the change in output given an infinitesimal change in input.
Let's use a slightly simpler function as an example:
import jax
import jax.numpy as jnp
A = jnp.array([1.0, 2.0, 3.0])
theta = jnp.array([5.0, 1.0, 3.0])
def f(A, theta):
return A[jnp.argmax(theta)]
x = f(A, theta)
dx_dtheta = jax.grad(f, argnums=1)(A, theta)
Here the result of differentiating f with respect to theta is all zero, for the same reasons as above. Why? If you make an infinitesimal change to theta, it will in general not affect the sort order of theta. Thus, the entries you choose from A do not change given an infinitesimal change in theta, and thus the derivative with respect to theta is zero.
Now, you might argue that there are circumstances where this is not the case: for example, if two values in theta are very close together, then certainly perturbing one even infinitesimally could change their respective rank. This is true, but the gradient resulting from this procedure is undefined (the change in output is not smooth with respect to the change in input). The good news is this discontinuity is one-sided: if you perturb in the other direction, there is no change in rank and the gradient is well-defined. In order to avoid undefined gradients, most autodiff systems will implicitly use this safer definition of a derivative for rank-based computations.
The result is that the value of the output does not change when you infinitesimally perturb the input, which is another way of saying the gradient is zero. And this is not a failure of autodiff – it is the correct gradient given the definition of differentiation that autodiff is built on. Moreover, were you to try changing to a different definition of the derivative at these discontinuities, the best you could hope for would be undefined outputs, so the definition that results in zeros is arguably more useful and correct.
I am using Tensorflow to minimize a function. The function takes about 10 parameters. Every single parameter has bounds, e.g. a minimum and a maximum value the parameter is allowed to take. For example, the parameter x1 needs to be between 1 and 10.
I also have a pair of parameters that need to have the following constraint x2 > x3. In other words, x2 must always be bigger than x3. (In addition to this, x2 and x3 also have bounds, similarly to the example of x1 above.)
I know that tf.Variable has a "constraint" argument, however I can't really find any examples or documentation on how to use this to achieve the bounds and constraints as mentioned above.
Thank you!
It seems to me (I can be mistaken) that constrained optimization (you can google for it in tensorflow) is not exactly the case for which tensroflow was designed. You may want to take a look at this repo, it may satisfy your needs, but as far as I understand, it's still not solving arbitrary constrained optimization, just some classification problems with labels and features, compatible with precision/recall scores.
If you want to use constraints on the tensorflow variable (i.e. some function applied after gradient step - which you can do manually also - by taking variable values, doing manipulations, and reassigning then), it means that you will be cutting variables after each step done using gradient in general space. It's a question whether you will successfully reach the right optimization goal this way, or your variables will stuck at boundaries, because general gradient will point somewhere outside.
My approach 1
If your problem is simple enough. you can try to parametrize your x2 and x3 as x2 = x3 + t, and then try to do cutting in the graph:
x3 = tf.get_variable('x3',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
t = tf.get_variable('t',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
x2 = x3 + t
Then, on a separate call additionally do
sess.run(tf.assign(x2, tf.clip_by_value(x2, 1.0, 10.0)))
But my opinion is that it won't work well.
My approach 2
I would also try to invent some loss terms to keep variables within constraints, which is more likely to work. For example, constraint for x2 to be in the interval [1,10] will be:
loss += alpha*tf.abs(tf.math.tan(((x-5.5)/4.5)*pi/2))
Here the expression under tan is brought to -pi/2,pi/2 and then tan function is used to make it grow very rapidly when it reaches boundaries. In this case I think you're more likely to find your optimum, but again the loss weight alpha might be too big and training will stuck somewhere nearby, if required value of x2 lies near the boundary. In this case you can try to use smaller alpha.
In addition to the answer by Slowpoke, reparameterization is another option. E.g. let's say you have a param p which should be bounded in [lower_bound,upper_bound], you could write:
p_inner = tf.Variable(...) # unbounded
p = tf.sigmoid(p_inner) * (upper_bound - lower_bound) + lower_bound
However, this will change the behavior of gradient descent.
While I am trying to solve this problem in a context where numpy is used heavily (and therefore an elegant numpy-based solution would be particularly welcome) the fundamental problem has nothing to do with numpy (or even Python) as such.
The task is to create an automated test for an algorithm which is supposed to produce points distributed on a grid whose pitch is specified as an input to the algorithm. The absolute positions of the points do not matter, but their relative positions do. For example, following
collection_of_points = algorithm(data, pitch=[1.3, 1.5, 2])
collection_of_points should contain only points whose x-coordinates differ by multiples of 1.3, whose y-coordinates differ by multiples of 1.5 and whose z-coordinates differ by multiples of 2.
The test should verify that this condition is satisfied.
One thing that I have tried, which doesn't seem too ugly, but doesn't work is
points = algo(data, pitch=requested_pitch)
for p1, p2 in itertools.combinations(points, 2):
distance_between_points = np.array(p2) - np.array(p1)
assert np.allclose(distance_between_points % requested_pitch, 0)
[ Aside for those unfamiliar with python or numpy:
itertools.combinations(points, 2) is a simple way of iterating through all pairs of points
Arithmetic operations on np.arrays are performed elementwise, so np.array([5,6,7]) % np.array([2,3,4]) evaluates to np.array([1, 0, 3]) via np.array([5%2, 6%3, 7%4])
np.allclose checks whether all corresponding elements in the two inputs arrays are approximately equal, and numpy automatically pretends that the 0 which is passed in as the second argument, was really an all-zero array of the correct size
]
To see why the idea shown above fails, consider a desired pitch of 3 and two points which are separated by 8.9999999 in the relevant dimension. 8.999999 % 3 is around 2.999999 which is nowhere near the required 0.
In all of this, I can't help feeling that I'm missing something obvious or that I'm re-inventing some wheel.
Can you suggest an elegant way of writing such a check?
Change your assertion to:
np.all(np.logical_or(np.isclose(x % y, 0), np.isclose((x % y) - y, 0)))
If you want to make it more readable, you should functionalize the statement. Something like:
def is_multiple(x, y, rtol=1e-05, atol=1e-08):
"""
Test if x is a multiple of y.
"""
remainder = x % y
is_zero = np.isclose(remainder, 0., rtol, atol)
is_y = np.isclose(remainder, y, rtol, atol)
return np.logical_or(is_zero, is_y)
And then:
assert np.all(is_multiple(distance_between_points, requested_pitch))
def get_z(epsilon):
return tf.cond(flag,lambda: mean + sigma*epsilon,lambda: epsilon)
In this, when I call the function with flag = True, I have my mean and sigma tensors defined and epsilon is the placeholder I give, and it works well.
If I call it with flag = False, I have to simply return epsilon, the placeholder I give. But at this stage, the mean and sigma are not defined as I am not providing the data to compute mean and sigma. However that shouldn't matter because mean and sigma are not needed. But running this is throwing the error to define mean and sigma. Is there any work-around for this?
Thank you.
mean and sigma should be defined as the lambda function depends on the two values, you may use them as function input. Two placeholders with mean and sigma may be used.
def get_z(epsilon, mean, sigma):
return tf.cond(flag,lambda: mean + sigma*epsilon,lambda: epsilon)
I have a 1000x784 matrix of data (10000 examples and 784 features) called X_valid and I'd like to apply the following function to each row in this matrix and get the numerical result:
def predict_prob(x_valid, cov, mean, prior):
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(
np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(
prior)
(x_valid is simply a row of data). I'm using numpy's vectorize to do this with the following code:
v_predict_prob = np.vectorize(predict_prob)
scores = v_predict_prob(X_valid, covariance[num], means[num], priors[num])
(covariance[num], means[num], and priors[num] are just constants.)
However, I get the following error when running this:
File "problem_5.py", line 48, in predict_prob
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(prior)
AttributeError: 'numpy.float64' object has no attribute 'dot'
That is, it's not passing in each row of the matrix individually. Instead, it is passing in each entry of the matrix (not what I want).
How can I alter this to get the desired behavior?
vectorize is NOT a general substitute for iteration, nor does it claim to be faster. It mainly streamlines access to the numpy broadcasting functionality. In general the function that you vectorize will take scalar inputs, not rows or 1d arrays.
I don't think there is a way of configuring vectorize to pass an array to your function as opposed to an item.
You describe x_valid as 2d that you want to evaluate row by row. And the other terms as 'constants' which you select with [num]. What shape are those constants?
You function treats a lot of these terms as 2d arrays:
x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) +
mean.T.dot(np.linalg.inv(cov)).dot(mean) +
np.linalg.slogdet(cov)[1]) + np.log(prior)
x_valid.T is meaningful only if x_valid is 2d. If it is 1d, the transpose does noting.
np.linalg.inv(cov) only makes sense if cov is 2d.
mean.T.dot... assumes mean is 2d.
np.linalg.slogdet(cov)[1] assumes np.linalg.slogdet(cov) has 2 or more elements (or rows).
You need to show us that the function works with some real arrays before jumping into iteration or 'vectorize'.
I suggest just using a for loop:
def v_predict_prob(X_valid, c, m, p):
out = []
for row in X_valid:
out.append(predict_prob(row, c, m, p))
return np.array(out)
Under the hood np.vectorize is doing the same thing: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.vectorize.html
I know this question is a bit outdated, but I thought I would provide an answer for 2020.
Since the release of numpy 1.12, there is a new optional argument, "signature", which should allow 2D array functionality in most cases. Additionally, you will want to "exclude" the constants since they will not be vectorized.
All you would need to change is:
v_predict_prob = np.vectorize(predict_prob, exclude=['cov', 'mean', 'prior'], signature='(n)->()')
This signifies that the function should expect an n-dim array and output a scalar, and cov, mean, and prior will not be vectorized.