Can not solve Assignment Problem Using ILOG CPLEX Optimizer - docplex

The Assignment problem is from Google OR-Tools
Others framework can solve this problem, even using Excel Solver. But ILOG CPLEX cannot solve this problem.
Here is my code in jupyter notebook:
import cplex
import docplex.mp
from docplex.mp.model import Model
import numpy as np
assignment_model = Model(name='Assignemnt_Problem', log_output=True)
costs = np.array([[90,80,75,70],
[35,85,55,65],
[125,95,90,95],
[45,110,95,115],
[50,100,90,100]])
x = assignment_model.binary_var_matrix(costs.shape[0], costs.shape[1], name="a")
assignment_model.add_constraints((sum(x[i,j] for i in range (costs.shape[0])) <=1
for j in range (costs.shape[1])), names ="workers")
assignment_model.add_constraints((sum(x[i,j] for j in range (costs.shape[1])) ==1
for i in range (costs.shape[0])), names ="tasks")
obj_fn = sum(x[i,j]*costs[i,j] for i in range (costs.shape[0]) for j in range(costs.shape[1]))
assignment_model.set_objective('min', obj_fn)
assignment_model.print_information()
assignment_model.solve()
print('Optimization is done. Objective Function Value: %.2f' % assignment_model.objective_value)
The error "DOcplexException: Model<Assignemnt_Problem> did not solve successfully"
Thanks

Indeed.
If you add the line
print("status = ",assignment_model.solve_details.status)
after the solve you will see
status = integer infeasible
And this is why print solution does not work
Now if you change
costs = np.array([[90,80,75,70],
[35,85,55,65],
[125,95,90,95],
[45,110,95,115],
[50,100,90,100]])
into
costs = np.array([[90,80,75,70],
[35,85,55,65],
[125,95,90,95],
[45,110,95,115],
[50,100,90,100]])
costs=costs.transpose()
then you will get a solution.
( Dirichlet's drawer principle )
import cplex
import docplex.mp
from docplex.mp.model import Model
import numpy as np
assignment_model = Model(name='Assignemnt_Problem', log_output=True)
costs = np.array([[90,80,75,70],
[35,85,55,65],
[125,95,90,95],
[45,110,95,115],
[50,100,90,100]])
costs=costs.transpose()
x = assignment_model.binary_var_matrix(costs.shape[0], costs.shape[1], name="a")
assignment_model.add_constraints((sum(x[i,j] for i in range (costs.shape[0])) <=1
for j in range (costs.shape[1])), names ="workers")
assignment_model.add_constraints((sum(x[i,j] for j in range (costs.shape[1])) ==1
for i in range (costs.shape[0])), names ="tasks")
obj_fn = sum(x[i,j]*costs[i,j] for i in range (costs.shape[0]) for j in range(costs.shape[1]))
assignment_model.set_objective('min', obj_fn)
assignment_model.print_information()
assignment_model.solve()
print('Optimization is done. Objective Function Value: %.2f' % assignment_model.objective_value)
for i in range(costs.shape[0]):
for j in range(costs.shape[1]):
if (x[i,j].solution_value>=0.9):
print('Worker ',i,' assigned to task ',j)
gives
Optimization is done. Objective Function Value: 265.00
Worker 0 assigned to task 3
Worker 1 assigned to task 2
Worker 2 assigned to task 1
Worker 3 assigned to task 0

Related

Sliding window method over a large range using numpy vectorization

I'm trying to implement a sliding window method for a genomics dataset that I have, over a fairly long range (upwards of 50k nucleotide's). My approach so far works fine, however is fairly slow (taking several seconds per range, and several minutes per range at intervals >150k bp). Here is my code so far:
import numpy as np
VectorizedRange = np.arange(Start, End)#Start, End genomic flags on the reference genome
SlidingWindow = np.lib.stride_tricks.sliding_window_view(VectorizedRange, 100)#100 = the window size
GroupedDictFrame = pd.DataFrame({"Bins":GenomeRange})
GroupedDictFrame["ReadCov"] = 0
GroupedDictFrame["ReadSeq"] = [list() for _ in range(len(GroupedDictFrame.index.values))]
GroupedDictFrame.set_index(keys=["Bins"], inplace=True, drop=True)
def Appender(Start, End, Width, Seq):
AvgCov = 0
SeqList = []
if End <= Window[-1]:
AvgCov += 1
SeqList.append(Seq)
elif End > Window[-1]:
AvgCov += (Window[-1] - Start)/Width
SeqList.append(Seq[0:(Window[-1] - Start)])
GroupedDictFrame.loc[Window[0], "ReadCov"] += AvgCov
GroupedDictFrame.loc[Window[0], "ReadSeq"] = SeqList
for Window in SlidingWindow:
SubsetBAM = BAMFrame[(
(BAMFrame["start_coord"]>=Window[0])&
(BAMFrame["start_coord"]<=Window[-1])
)].reset_index(drop=True)
SubsetBAM.apply(
lambda x: Appender(x.start_coord,
x.end_coord,
x.width_lis,
x.seq_lis), axis=1
)
I think my vectorization isn't the best, any suggestions for speeding this up?
So I think I figured it out on my own, I'll add my solution in case anyone else faces a similar problem.
Essentially, I stopped subsetting my dataframe containing the small DNA read fragments in the for loop, and did one subset before the loop and converted it to a numpy array.
I removed my function and used numpy.where to do all my logic.
import numpy as np
VectorizedRange = np.arange(Start, End)
SlidingWindow = np.lib.stride_tricks.sliding_window_view(VectorizedRange, 100)
GroupedDictFrame = pd.DataFrame({"Bins":GenomeRange})
GroupedDictFrame["ReadCov"] = 0
GroupedDictFrame["ReadSeq"] = [list() for _ in range(len(GroupedDictFrame.index.values))]
GroupedDictFrame.set_index(keys=["Bins"], inplace=True, drop=True)
CoordArray = BAMFrame.loc[:, "start_coord":"end_coord"].to_numpy()
for Window in SlidingWindow:
ReadCovIn = np.where(((CoordArray[:,1] <= Window[-1]) & (CoordArray[:,0] >= Window[0])), 1, 0)
ReadCovOut = np.where(((CoordArray[:,1] > Window[-1]) & ((CoordArray[:,0] >= Window[0]) & (CoordArray[:,0] < Window[-1]))),
(Window[-1] - CoordArray[:,0])/(CoordArray[:,1] - CoordArray[:,0]), 0)
GroupedDictFrame.loc[Window[0], "ReadCov"] += np.sum((np.sum(ReadCovIn), np.sum(ReadCovOut)))
I've gotten it down to ~1 second per gene region which is typically about 50kb (so that would mean the SlidingWindow has a shape of (49900,100)), which is pretty good I think!

How to re-evaluate Gekko objective while minimizing objective's parameters

Apologies in advance, I just started to learn Gekko to see if I can use it for a project. I'm trying to optimize the win rate while playing a game with very finite game-states (50 ^ 2) and options per turn (0-10 inclusive).
From what I understand, I can use the m.solve() Gekko function to minimize the win rate of the opponent which I've set up here:
PLAYER_MAX_SCORE = 50 #Score player needs to win
OPPONENT_MAX_SCORE = 50 #Score opponent needs to win
#The opponent's current strategy: always roll 4 dice per turn
OPPONENT_MOVE = 4
m = GEKKO()
m.options.SOLVER = 1
"""
player_moves is a 2-d array where:
- the row represents player's current score
- the column represents opponent's current score
- the element represents the optimal move for the above game state
Thus the player's move for a game is player_moves[pScore, oScore].value.value
"""
player_moves = m.Array(m.Var, (PLAYER_MAX_SCORE, OPPONENT_MAX_SCORE), value=3, lb=0, ub=10, integer=True)
m.Obj(objective(player_moves, OPPONENT_MOVE, PLAYER_MAX_SCORE, OPPONENT_MAX_SCORE, 100))
m.solve(disp=False)
For reference, objective is a function that returns the win rate of the opponent based on how the current player acts (represented in player_moves).
The only issue is that m.solve() only calls the objective function once and then immediately returns the "solved" values in the player_moves array (which turn out to just be the initial values when player_moves was defined). I want m.solve() to call the objective function multiple times to determine if the new opponent's win rate is decreasing or increasing.
Is this possible with Gekko? Or is there a different library I should use for this type of problem?
Gekko creates a symbolic representation of the optimization problem that is compiled into byte-code. For this reason, the objective function must be expressed with Gekko variables and equations. For black-box models that do not use Gekko variables, an alternative is to use scipy.optimize.minimize(). There is a comparison of Gekko and Scipy.
Scipy
import numpy as np
from scipy.optimize import minimize
def objective(x):
return x[0]*x[3]*(x[0]+x[1]+x[2])+x[2]
def constraint1(x):
return x[0]*x[1]*x[2]*x[3]-25.0
def constraint2(x):
sum_eq = 40.0
for i in range(4):
sum_eq = sum_eq - x[i]**2
return sum_eq
# initial guesses
n = 4
x0 = np.zeros(n)
x0[0] = 1.0
x0[1] = 5.0
x0[2] = 5.0
x0[3] = 1.0
# show initial objective
print('Initial Objective: ' + str(objective(x0)))
# optimize
b = (1.0,5.0)
bnds = (b, b, b, b)
con1 = {'type': 'ineq', 'fun': constraint1}
con2 = {'type': 'eq', 'fun': constraint2}
cons = ([con1,con2])
solution = minimize(objective,x0,method='SLSQP',\
bounds=bnds,constraints=cons)
x = solution.x
# show final objective
print('Final Objective: ' + str(objective(x)))
# print solution
print('Solution')
print('x1 = ' + str(x[0]))
print('x2 = ' + str(x[1]))
print('x3 = ' + str(x[2]))
print('x4 = ' + str(x[3]))
Gekko
from gekko import GEKKO
import numpy as np
#Initialize Model
m = GEKKO()
#initialize variables
x1,x2,x3,x4 = [m.Var(lb=1,ub=5) for i in range(4)]
#initial values
x1.value = 1
x2.value = 5
x3.value = 5
x4.value = 1
#Equations
m.Equation(x1*x2*x3*x4>=25)
m.Equation(x1**2+x2**2+x3**2+x4**2==40)
#Objective
m.Minimize(x1*x4*(x1+x2+x3)+x3)
#Solve simulation
m.solve()
#Results
print('')
print('Results')
print('x1: ' + str(x1.value))
print('x2: ' + str(x2.value))
print('x3: ' + str(x3.value))
print('x4: ' + str(x4.value))

How to optimize the linear coefficients for numpy arrays in a maximization function?

I have to optimize the coefficients for three numpy arrays which maximizes my evaluation function.
I have a target array called train['target'] and three predictions arrays named array1, array2 and array3.
I want to put the best linear coefficients i.e., x,y,z for these three arrays which will maximize the function
roc_aoc_curve(train['target'], xarray1 + yarray2 +z*array3)
the above function would be maximum when prediction is closer to the target.
i.e, xarray1 + yarray2 + z*array3 should be closer to train['target'].
The range of x,y,z >=0 and x,y,z <= 1
Basically I am trying to put the weights x,y,z for each of the three arrays which would make the function
xarray1 + yarray2 +z*array3 closer to the train['target']
Any help in getting this would be appreciated.
I used pulp.LpProblem('Giapetto', pulp.LpMaximize) to do the maximization. It works for normal numbers, integers etc, however failing while trying to do with arrays.
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
prob += score
coef = x+y+z
prob += (coef==1)
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Getting error at the line
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
TypeError: unsupported operand type(s) for /: 'int' and 'LpVariable'
Can't progress beyond this line when using arrays. Not sure if my approach is correct. Any help in optimizing the function would be appreciated.
When you add sums of array elements to a PuLP model, you have to use built-in PuLP constructs like lpSum to do it -- you can't just add arrays together (as you discovered).
So your score definition should look something like this:
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
A few notes about this:
[+] You didn't provide the definition of roc_auc_score so I just pretended that it equals the sum of the element-wise difference between the target array and the weighted sum of the other 3 arrays.
[+] I suspect your actual calculation for roc_auc_score is nonlinear; more on this below.
[+] arr_ind is a list of the indices of the arrays, which I created like this:
# build array index
arr_ind = range(len(array1))
[+] You also didn't include the arrays, so I created them like this:
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
train = {}
train['target'] = np.ones((10, 1))
Here is my complete code, which compiles and executes, though I'm sure it doesn't give you the result you are hoping for, since I just guessed about target and roc_auc_score:
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# dummy arrays since arrays weren't in OP code
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
# build array index
arr_ind = range(len(array1))
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
# dummy roc_auc_score since roc_auc_score wasn't in OP code
train = {}
train['target'] = np.ones((10, 1))
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
prob += score
coef = x + y + z
prob += coef == 1
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Output:
Optimal weekly number of x to produce: 0
Optimal weekly number of y to produce: 0
Optimal weekly number of z to produce: 1
Process finished with exit code 0
Now, if your roc_auc_score function is nonlinear, you will have additional troubles. I would encourage you to try to formulate the score in a way that is linear, possibly using additional variables (for example, if you want the score to be an absolute value).

cardinality constraint in portfolio optimisation

I am using cvxpy to work on some simple portfolio optimisation problem. The only constraint I can't get my head around is the cardinality constraint for the number non-zero portfolio holdings. I tried two approaches, a MIP approach and a traditional convex one.
here is some dummy code for a working traditional example.
import numpy as np
import cvxpy as cvx
np.random.seed(12345)
n = 10
k = 6
mu = np.abs(np.random.randn(n, 1))
Sigma = np.random.randn(n, n)
Sigma = Sigma.T.dot(Sigma)
w = cvx.Variable(n)
ret = mu.T*w
risk = cvx.quad_form(w, Sigma)
objective = cvx.Maximize(ret - risk)
constraints = [cvx.sum_entries(w) == 1, w>= 0, cvx.sum_smallest(w, n-k) >= 0, cvx.sum_largest(w, k) <=1 ]
prob = cvx.Problem(objective, constraints)
prob.solve()
print prob.status
output = []
for i in range(len(w.value)):
output.append(round(w[i].value,2))
print 'Number of non-zero elements : ',sum(1 for i in output if i > 0)
I had the idea to use, sum_smallest and sum_largest (cvxpy manual) my thought was to constraint the smallest n-k entries to 0 and let my target range k sum up to one, I know I can't change the direction of the inequality in order to stay convex, but maybe anyone knows about a clever way of constraining the problem while still keeping it simple.
My second idea was to make this a mixed integer problem, s.th along the lines of
import numpy as np
import cvxpy as cvx
np.random.seed(12345)
n = 10
k = 6
mu = np.abs(np.random.randn(n, 1))
Sigma = np.random.randn(n, n)
Sigma = Sigma.T.dot(Sigma)
w = cvx.Variable(n)
binary = cvx.Bool(n)
integer = cvx.Int(n)
ret = mu.T*w
risk = cvx.quad_form(w, Sigma)
objective = cvx.Maximize(ret - risk)
constraints = [cvx.sum_entries(w) == 1, w>= 0, cvx.sum_entries(binary) == k ]
prob = cvx.Problem(objective, constraints)
prob.solve()
print prob.status
output = []
for i in range(len(w.value)):
output.append(round(w[i].value,2))
print sum(1 for i in output if i > 0)
for i in range(len(w.value)):
print round(binary[i].value,2)
print output
looking at my binary vector it seems to be doing the right thing but the sum_entries constraint doesn't work, looking into the binary vector values I noticed that 0 isn't 0 it's very small e.g xxe^-20 I assume this will mess things up. Anyone can give me any guidance if this is the right way to go? I can use the standard solvers, as well as Mosek if that helps. I would prefer to have a non MIP implementation as I understand this is a combinatorial problem and will get very slow for larger problems. Ultimately I would like to either constraint on exact number of target holdings or a range e.g. 20-30.
Also the documentation in cvxpy around MIP is very short. thanks
A bit chaotic, this question.
So first: this kind of cardinality-constraint is NP-hard. This means, you can't express it using cvxpy without using Integer-programming (or else it would implicate P=NP)!
That beeing said, it would have been nicer, if there would be a pure version of the code without trying to formulate this constraint. I just assume it's the first code without the sum_smallest and sum_largest constraints.
So let's tackle the MIP-approach:
Your code trying to do this makes no sense at all
You introduce some binary-vars, but they have no connection to any other variable at all (so a constraint on it's sum is useless)!
You introduce some integer-vars, but they don't have any use at all!
So here is a MIP-approach:
import numpy as np
import cvxpy as cvx
np.random.seed(12345)
n = 10
k = 6
mu = np.abs(np.random.randn(n, 1))
Sigma = np.random.randn(n, n)
Sigma = Sigma.T.dot(Sigma)
w = cvx.Variable(n)
ret = mu.T*w
risk = cvx.quad_form(w, Sigma)
objective = cvx.Maximize(ret - risk)
binary = cvx.Bool(n) # !!!
constraints = [cvx.sum_entries(w) == 1, w>= 0, w - binary <= 0., cvx.sum_entries(binary) == k] # !!!
prob = cvx.Problem(objective, constraints)
prob.solve(verbose=True)
print(prob.status)
output = []
for i in range(len(w.value)):
output.append(round(w[i].value,2))
print('Number of non-zero elements : ',sum(1 for i in output if i > 0))
So we just added some binary-variables and connected them to w to indicate if w is nonzero or not.
If w is nonzero:
w will be > 0 because of constraint w>= 0
binary needs to be 1, or else constraint w - binary <= 0. is not fulfilled
So it's just introducing these binaries and this one indicator-constraint.
Now the cvx.sum_entries(binary) == k does what it should do.
Be careful with the implication-direction we used here. It might be relevant when chaging the constraint on k (like <=).
Keep in mind, that the default MIP-solver is awful. I also fear that Mosek's interface (sub-optimal within cvxpy) won't solve this, but i might be wrong.
Edit: Your in-range can easily be formulated using two more indicators for:
(k >= a) <= ind_0
(k <= b) <= ind_1
and adding a constraint which equals a logical_and:
ind_0 + ind_1 >= 2
I've had a similar problem where my weights could be negative and did not need to sum to 1 (but still need to be bounded), so I've modified sascha's example to accommodate relaxing these constraints using the CVXpy absolute value function. This should allow for a more general approach to tackling cardinality constraints with MIP
import numpy as np
import cvxpy as cvx
np.random.seed(12345)
n = 10
k = 6
mu = np.abs(np.random.randn(n, 1))
Sigma = np.random.randn(n, n)
Sigma = Sigma.T.dot(Sigma)
w = cvx.Variable(n)
ret = mu.T*w
risk = cvx.quad_form(w, Sigma)
objective = cvx.Maximize(ret - risk)
binary = cvx.Variable(n,boolean=True) # !!!
maxabsw=2
constraints = [ w>= -maxabsw,w<=maxabsw, cvx.abs(w)/maxabsw - binary <= 0., cvx.sum(binary) == k] # !!!
prob = cvx.Problem(objective, constraints)
prob.solve(verbose=True)
print(prob.status)
output = []
for i in range(len(w.value)):
output.append(round(w[i].value,2))
print('Number of non-zero elements : ',sum(1 for i in output if i > 0))

Numpy slogdet computation error

There appears to be a major difference between numpy's slogdet and the exact result when computing the log determinant of Vanermonde matrix.
I compare against the exact log determinant, see eg here for proof.
The minimal code to see this is:
A = np.power.outer(np.linspace(0,1,50),range(50))
print np.linalg.slogdet(A)[1]
s = 0
for v1 in np.linspace(0,1,50):
for v2 in np.linspace(0,1,50):
if v1>v2:
s+= np.log(v1-v2)
print s
Which yeilds:
-1191.88408998
-1706.99560647
I was wondering if there was a more accurate log determinant implementation which I could use in this situation but also in non-Vandermonde matrix situation.
You can use sympy and mpmath like this:
import numpy as np
import sympy as smp
import mpmath as mp
mp.mp.dps = 50
linspace1 = list(map(smp.mpmath.mpf,np.linspace(0,1,50)))
A = np.power.outer(list(map(float,linspace1)),range(50))
first_print = smp.mpmath.mpf(np.linalg.slogdet(A)[1])
print(first_print)
s = 0
linspace2 = list(map(smp.mpmath.mpf,np.linspace(0,1,50)))
linspace3 = list(map(smp.mpmath.mpf,np.linspace(0,1,50)))
for v1 in linspace1:
for v2 in linspace2:
if v1>v2:
s+= mp.log(v1-v2)
print(s)
RESULTS
first_print = -1178.272517342130186079884879291057586669921875
s = -1706.9956064674289001970168329846189154212781094939