Z3 maximization API: possible bug? - optimization

I'm currently playing with the maximization API for Z3 (opt branch), and I've stumbled upon a following bug:
Whenever I give it any unbounded problem, it simply returns me OPT and gives zero in the resulting model (e.g. maximize Real('x') with no constraints on the model).
Python example:
from z3 import *
context = main_ctx()
x = Real('x')
optimize_context = Z3_mk_optimize(context.ctx)
Z3_optimize_assert(context.ctx, optimize_context, (x >= 0).ast)
Z3_optimize_maximize(context.ctx, optimize_context, x.ast)
out = Z3_optimize_check(context.ctx, optimize_context)
print out
And I get the value of out to be 1 (OPT), while it seems like it should be -1.

Thanks for trying out this experimental branch.
Development is still churning quite a bit these days, but most of the features are reasonably stable and you are invited to try them out.
To answer your question. There is a native way to use the optimization features from Z3.
To paraphrase your example, here is what is relevant:
from z3 import *
x = Real('x')
opt = Optimize()
opt.add(x >= 0)
h = opt.maximize(x)
print opt.check()
print opt.upper(h)
print opt.model()
When running it, you will see the following output:
sat
oo
[x = 0]
The first line says that the assertions are satisfiable.
The second line prints the value of the handle "h" under the satisfiabilty call.
The value of the handle holds an expression that meets the maximization/minimization criteria declared by the call to opt.maximize/opt.minimize.
In this case the expression is "oo". It is somewhat of a "hack" because it is going to be up to you to guess that "oo" means infinity. If you interpret this value back to Z3, you will not get infinity.
(I am here restricting the use of Z3 where we don't expose non-standard numbers, there is another part of Z3 that includes non-standard numbers, but that is another story).
Note that the opt.maximize call returns the handle "h",
which is later used to query what was the optimal value.
The last line is some model satisfying the constraints.
When the objective is bounded, the model will be what
you expect, but in this case the objective is unbounded.
There is no finite best value.
Try for example instead:
x = Real('x')
opt = Optimize()
opt.add(x >= 0)
opt.add(x <= 10)
h = opt.maximize(x)
print opt.check()
print opt.upper(h)
print opt.model()
This time you get a model that sets x = 10, and this is also the maximal value.
You could also try:
x = Real('x')
opt = Optimize()
opt.add(x >= 0)
opt.add(x < 10)
h = opt.maximize(x)
print opt.check()
print opt.upper(h)
print opt.model()
The output is now:
sat
10 + -1*epsilon
[x = 9]
epsilon refers to a non-standard number (infinitesimal). You can set it arbitrarily small.
Again the model uses only standard numbers, so it picks some number, in this case 9.

Related

Vectorize a function with a condition

I would like to vectorize a function with a condition, meaning to calculate its values with array arithmetic. np.vectorize handles vectorization, but it does not work with array arithmetic, so it is not a complete solution
An answer was given as the solution in the question "How to vectorize a function which contains an if statement?" but did not prevent errors here; see the MWE below.
import numpy as np
def myfx(x):
return np.where(x < 1.1, 1, np.arcsin(1 / x))
y = myfx(x)
This runs but raises the following warnings:
<stdin>:2: RuntimeWarning: divide by zero encountered in true_divide
<stdin>:2: RuntimeWarning: invalid value encountered in arcsin
What is the problem, or is there a better way to do this?
I think this could be done by
Getting the indices ks of x for which x[k] > 1.1 for each k in ks.
Applying np.arcsin(1 / x[ks]) to the slice x[ks], and using 1 for the rest of the elements.
Recombining the arrays.
I am not sure about the efficiency, though.
The statement np.where(x < 1.1, 1, np.arcsin(1 / x)) is equivalent to
mask = x < 1.1
a = 1
b = np.arcsin(1 / x)
np.where(mask, a, b)
Notice that you're calling np.arcsin on all the elements of x, regardless of whether 1 / x <= 1 or not. Your basic plan is correct. You can do the operations in-place on an output array using the where keyword of np.arcsin and np.reciprocal, without having to recombine anything:
def myfx(x):
mask = (x >= 1.1)
out = np.ones(x.shape)
np.reciprocal(x, where=mask, out=out) # >= 1.1 implies != 0
return np.arcsin(out, where=mask, out=out)
Using np.ones ensures that the unmasked elements of out are initialized correctly. An equivalent method would be
out = np.empty(x.shape)
out[~mask] = 1
You can always find an arithmetic expression that prevents the "divide by zero".
Example:
def myfx(x):
return np.where( x < 1.1, 1, np.arcsin(1/np.maximum(x, 1.1)) )
The values where x<1.1 in the right wing are not used, so it's not an issue computing np.arcsin(1/1.1) where x < 1.1.

Finding n-tuple that minimizes expensive cost function

Suppose there are three variables that take on discrete integer values, say w1 = {1,2,3,4,5,6,7,8,9,10,11,12}, w2 = {1,2,3,4,5,6,7,8,9,10,11,12}, and w3 = {1,2,3,4,5,6,7,8,9,10,11,12}. The task is to pick one value from each set such that the resulting triplet minimizes some (black box, computationally expensive) cost function.
I've tried the surrogate optimization in Matlab but I'm not sure it is appropriate. I've also heard about simulated annealing but found no implementation applied to this instance.
Which algorithm, apart from exhaustive search, can solve this combinatorial optimization problem?
Any help would be much appreciated.
The requirement/benefit of Simulated Annealing (SA), is that the objective surface is somewhat smooth, that is, we can be close to a solution.
For a completely random spiky surface- you might as well do a random search
If it is anything smooth, or even sometimes, it makes sense to try SA.
The idea is that (sometimes) changing only 1 of the 3 values, we have little effect on out blackbox function.
Here is a basic example to do this with Simulated Annealing, using frigidum in Python
import numpy as np
w1 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
w2 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
w3 = np.array( [1,2,3,4,5,6,7,8,9,10,11,12] )
W = np.array([w1,w2,w3])
LENGTH = 12
I define a black-box using the Rastrigin function.
def rastrigin_function_n( x ):
"""
N-dimensional Rastrigin
https://en.wikipedia.org/wiki/Rastrigin_function
x_i is in [-5.12, 5.12]
"""
A = 10
n = x.shape[0]
return A*n + np.sum( x**2- A*np.cos(2*np.pi * x) )
def black_box( x ):
"""
Transform from domain [1,12] to [-5,5]
to be able to push to rastrigin
"""
x = (x - 6.5) * (5/5.5)
return rastrigin_function_n(x)
Simulated Annealing needs to modify state X. Instead of taking/modifying values directly, we keep track of indices. This simplifies creating new proposals as an index is always an integer we can simply add/subtract 1 modulo LENGTH.
def random_start():
"""
returns 3 random indices
"""
return np.random.randint(0, LENGTH, size=3)
def random_small_step(x):
"""
change only 1 index
"""
d = np.array( [1,0,0] )
if np.random.random() < .5:
d = np.array( [-1,0,0] )
np.random.shuffle(d)
return (x+d) % LENGTH
def random_big_step(x):
"""
change 2 indici
"""
d = np.array( [1,-1,0] )
np.random.shuffle(d)
return (x+d) % LENGTH
def obj(x):
"""
We have a triplet of indici,
1. Calculate corresponding values in W = [w1,w2,w3]
2. Push the values in out black-box function
"""
indices = x
values = W[np.array([0,1,2]), indices]
return black_box(values)
And throw a SA Scheme at it
import frigidum
local_opt = frigidum.sa(random_start=random_start,
neighbours=[random_small_step, random_big_step],
objective_function=obj,
T_start=10**4,
T_stop=0.000001,
repeats=10**3,
copy_state=frigidum.annealing.naked)
I am not sure what the minimum for this function should be, but it found a objective with 47.9095 with indicis np.array([9, 2, 2])
Edit:
For frigidum to change the cooling schedule, use alpha=.9. My experience is that all the work of experiment which cooling scheme works best doesn't out-weight simply let it run a little longer. The multiplication you proposed, (sometimes called geometric) is the standard one, also implemented in frigidum. So to implement Tn+1 = 0.9*Tn you need a alpha=.9. Be aware this cooling step is done after N repeats, so if repeats=100, it will first do 100 proposals before lowering the temperature with factor alpha
Simple variations on current state often works best. Since its best practice to set the initial temperature high enough to make most proposals (>90%) accepted, it doesn't matter the steps are small. But if you fear its soo small, try 2 or 3 variations. Frigidum accepts a list of proposal functions, and combinations can enforce each other.
I have no experience with MINLP. But even if, so many times experiments can surprise us. So if time/cost is small to bring another competitor to the table, yes!
Try every possible combination of the three values and see which has the lowest cost.

Coordinate Descent Algorithm in Julia for Least Squares not converging

As a warm-up to writing my own elastic net solver, I'm trying to get a fast enough version of ordinary least squares implemented using coordinate descent.
I believe I've implemented the coordinate descent algorithm correctly, but when I use the "fast" version (see below), the algorithm is insanely unstable, outputting regression coefficients that routinely overflow a 64-bit float when the number of features is of moderate size compared to the number of samples.
Linear Regression and OLS
If b = A*x, where A is a matrix, x a vector of the unknown regression coefficients, and y is the output, I want to find x that minimizes
||b - Ax||^2
If A[j] is the jth column of A and A[-j] is A without column j, and the columns of A are normalized so that ||A[j]||^2 = 1 for all j, the coordinate-wise update is then
Coordinate Descent:
x[j] <-- A[j]^T * (b - A[-j] * x[-j])
I'm following along with these notes (page 9-10) but the derivation is simple calculus.
It's pointed out that instead of recomputing A[j]^T(b - A[-j] * x[-j]) all the time, a faster way to do it is with
Fast Coordinate Descent:
x[j] <-- A[j]^T*r + x[j]
where the total residual r = b - Ax is computed outside the loop over coordinates. The equivalence of these update rules follows from noting that Ax = A[j]*x[j] + A[-j]*x[-j] and rearranging terms.
My problem is that while the second method is indeed faster, it's wildly numerically unstable for me whenever the number of features isn't small compared to the number of samples. I was wondering if anyone might have some insight as to why that's the case. I should note that the first method, which is more stable, still starts disagreeing with more standard methods as the number of features approaches the number of samples.
Julia code
Below is some Julia code for the two update rules:
function OLS_builtin(A,b)
x = A\b
return(x)
end
function OLS_coord_descent(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
for j = 1:P
x[j] = dot(A[:,j], b - A[:,1:P .!= j]*x[1:P .!= j])
end
end
return(x)
end
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
end
end
return(x)
end
Example of the problem
I generate data with the following:
n = 100
p = 50
σ = 0.1
β_nz = float([i*(-1)^i for i in 1:10])
β = append!(β_nz,zeros(Float64,p-length(β_nz)))
X = randn(n,p); X .-= mean(X,1); X ./= sqrt(sum(abs2(X),1))
y = X*β + σ*randn(n); y .-= mean(y);
Here I use p=50, and I get good agreement between OLS_coord_descent(X,y) and OLS_builtin(X,y), whereas OLS_coord_descent_fast(X,y)returns exponentially large values for the regression coefficients.
When p is less than about 20, OLS_coord_descent_fast(X,y) agrees with the other two.
Conjecture
Since things agrees for the regime of p << n, I think the algorithm is formally correct, but numerically unstable. Does anyone have any thoughts on whether this guess is correct, and if so how to correct for the instability while retaining (most) of the performance gains of the fast version of the algorithm?
The quick answer: You forgot to update r after each x[j] update. Following is the fixed function which behaves like OLS_coord_descent:
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
r -= A[:,j]*dot(A[:,j],r) # Add this line
end
end
return(x)
end

Min dominating set software

Cross posting this from CS Theory since it is more of a software question.
I need a code for calculating exact MIN-DOM-SET. Currently the best option suggested has been to formulate it as an SMT problem and throw it at an SMT solver.
Curious if there were any good MIN-DOM-SET specific codes out there or a good SMT-LIB formulation.
I coded one up in Z3's Python bindings using the new Optimize functionality.
def min_dom_set(graph):
"""Try to dominate the graph with the least number of verticies possible"""
s = Optimize()
nodes_colors = dict((node_name, Int('k%r' % node_name)) for node_name in graph.nodes())
for node in graph.nodes():
s.add(And(nodes_colors[node] >= 0, nodes_colors[node] <= 1)) # dominator or not
dom_neighbor = Sum ([ (nodes_colors[j]) for j in graph.neighbors(node) ])
s.add(Sum(nodes_colors[node], dom_neighbor ) >= 1 )
s.minimize( Sum([ nodes_colors[y] for y in graph.nodes() ]) )
if s.check() == sat:
m = s.model()
return dict((name, m[color].as_long()) for name, color in nodes_colors.iteritems())
raise Exception('Could not find a solution.')

Z3 maximization API: switching objectives

Another question on the maximization API in Z3.
I get wrong answers if I switch maximization objectives midway through:
from z3 import Real, Optimize
x = Real('x')
y = Real('y')
opt = Optimize()
opt.add(x >= 0)
opt.add(y >= 0)
opt.add(x + y <= 15)
print "Optimizing", x
h = opt.maximize(x)
print opt.check()
print opt.upper(h)
print opt.model()
print "Optimizing", y
h = opt.maximize(y)
print opt.check()
print opt.upper(h)
print opt.model()
The latter call to opt.model() returns y = 0, whereas clearly the answer should be 15.
Is it a bug or simply unsupported feature? (and should I manually re-add the constraints each time I want to switch the objective?)
Moreover, there is a separate bug which comes out when I remove the non-negativity constraint, but that's a separate issue (bad handling for unbounded objectives, I presume?)
from z3 import Real, Optimize
x = Real('x')
y = Real('y')
opt = Optimize()
opt.add(x + y <= 15)
print "Optimizing", x
h = opt.maximize(x)
print opt.check()
print opt.upper(h)
print opt.model()
Dies with
Optimizing x
terminate called after throwing an instance of 'std::bad_typeid'
what(): std::bad_typeid
fish: Job 1, 'python opt.py' terminated by signal SIGABRT (Abort)
Thanks for the bug report on the crash. That should not happen.
For the first problem you report. the semantics of adding objectives is additive. That is, you instruct the engine to optimize both x and also y (in the second call). It chooses by default the lexicographic combination of x, y. In the lexicographic ordering the value (15,0) dominates other values.