CVXPY returns Infeasible/Inaccurate on Quadratic Programming Optimization Problem - least-squares

I am trying to use CVXPY to solve a nonnegative least squares problem (with the additional constraint that the sum of entries in the solution vector must equal 1). However, when I run CVXPY on this simple quadratic program using the SCS solver, I let the solver run for a maximum of 100000 iterations, and I run into an error in the end saying that the quadratic program is infeasible/inaccurate. However, I am quite confident that the mathematical setup of the quadratic program is correct (such that the constraints of the problem should not be infeasible). Moreover, when I run this program on a smaller A matrix and smaller b vector (having only the first several rows), the solver runs fine. Here is a snapshot of my code and current error output:
x = cp.Variable(61)
prob = cp.Problem(cp.Minimize(cp.sum_squares(A # x - b)), [x >= 0, cp.sum(x) == 1])
print('The problem is QP: ', prob.is_qp())
result = prob.solve(solver = cp.SCS, verbose = True, max_iters = 100000)
print("status: ", prob.status)
print("optimal value: ", prob.value)
print("cvxpy solution:")
print(x.value, np.sum(np.array(x)))
And below is the output:
The problem is QP: True
----------------------------------------------------------------------------
SCS v2.1.1 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 2446306
eps = 1.00e-04, alpha = 1.50, max_iters = 100000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 62, constraints m = 278545
Cones: primal zero / dual free vars: 1
linear vars: 61
soc vars: 278483, soc blks: 1
Setup time: 1.96e+00s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 5.73e+18 1.10e+20 1.00e+00 -4.09e+22 1.55e+26 1.54e+26 1.47e-01
100| 8.00e+16 9.05e+18 1.79e-01 1.60e+25 2.29e+25 7.01e+24 8.82e+00
200| 6.83e+16 6.25e+17 8.10e-02 1.66e+25 1.95e+25 2.93e+24 1.81e+01
300| 6.62e+16 1.42e+18 1.00e-01 1.60e+25 1.96e+25 3.56e+24 2.61e+01
400| 9.49e+16 3.76e+18 1.98e-01 1.64e+25 2.45e+25 8.07e+24 3.39e+01
500| 6.24e+16 1.69e+19 3.12e-03 1.74e+25 1.75e+25 9.06e+22 4.20e+01
600| 8.47e+16 5.06e+18 1.42e-01 1.65e+25 2.20e+25 5.47e+24 5.04e+01
700| 8.57e+16 4.41e+18 1.55e-01 1.64e+25 2.25e+25 6.04e+24 5.79e+01
800| 6.03e+16 1.60e+18 1.30e-02 1.72e+25 1.77e+25 4.50e+23 6.51e+01
900| 7.35e+17 2.84e+20 3.21e-01 1.56e+25 3.04e+25 1.46e+25 7.21e+01
1000| 9.11e+16 1.38e+19 1.40e-01 1.68e+25 2.23e+25 5.46e+24 7.92e+01
.........many more iterations in between omitted........
99500| 2.34e+15 1.56e+17 1.61e-01 3.30e+24 4.57e+24 1.27e+24 5.32e+03
99600| 1.46e+18 1.10e+21 9.12e-01 2.56e+24 5.55e+25 5.26e+25 5.33e+03
99700| 1.94e+15 1.41e+16 8.97e-02 3.40e+24 4.07e+24 6.71e+23 5.34e+03
99800| 3.31e+17 6.68e+20 5.62e-01 2.71e+24 9.67e+24 6.91e+24 5.35e+03
99900| 2.51e+15 9.96e+17 1.03e-01 3.41e+24 4.19e+24 7.81e+23 5.35e+03
100000| 2.21e+15 3.79e+17 1.40e-01 3.29e+24 4.37e+24 1.07e+24 5.36e+03
----------------------------------------------------------------------------
Status: Infeasible/Inaccurate
Hit max_iters, solution may be inaccurate
Timing: Solve time: 5.36e+03s
Lin-sys: nnz in L factor: 2726743, avg solve time: 1.60e-02s
Cones: avg projection time: 7.99e-04s
Acceleration: avg step time: 2.72e-02s
----------------------------------------------------------------------------
Certificate of primal infeasibility:
dist(y, K*) = 0.0000e+00
|A'y|_2 * |b|_2 = 2.5778e-01
b'y = -1.0000
============================================================================
status: infeasible_inaccurate
optimal value: None
cvxpy solution:
None var59256
Does anyone know why: (1) CVXPY could be failing to solve my problem? Is it just a matter of more solver iterations being needed? And if so, how many? and (2) What do the column names "pri res", "dua res", "rel gap", etc. mean? I am trying to follow the values in these columns to understand what is happening during the solve process.
Also, for those who would like to follow along with the dataset that I'm working with, here is a CSV file holding my A matrix (dimensions 278481x61) and here is a CSV file holding my b vector (dimensions 278481x1). Thanks in advance!

Some remarks:
Intro
Question: Reproducible code
Question-setup is lazy: you could have added imports and csv-reading instead of making us replicate it...
Easy to do experiments on different data now due to different parsing...
Task + Approach
This looks like a combination of:
general-purpose math-opt solver
machine-learning scale data
= often hits limits!
Solvers: Expectations
SCS is first-order based and expected to be less robust compared to ECOS or other high-order methods
Your Model
Observations
Solver: SCS (default opts)
Fails / Runs havoc according to the progress of residuals (probably due no numerical-issues)
Looks more crazy on my side
Solver: ECOS (default opts)
Fails (primal infeasible due to numerical-issues)
Analysis
You won't be able to solve these numerical-issues by increasing iteration-limits only
More complex tuning of feasibility-tolerances and co. would be needed!
Transformation / Fixing
Can we make those solvers work? I think so.
Instead of minimizing sum of squares, let's minimize the l2-norm instead. This is equivalent in regards to our solution and we might square the objective-value if we are interested in this value anyway!
This is motivated by this:
One particular reformulation that we strongly encourage is to eliminate quadratic forms—that is, functions like sum_square, sum(square(.)) or quad_form—whenever it is possible to construct equivalent models using norm instead. Our experience tells us that quadratic forms often pose a numerical challenge for the underlying solvers that CVX uses.
We acknowledge that this advice goes against conventional wisdom: quadratic forms are the prototypical smooth convex function, while norms are nonsmooth and therefore unwieldy. But with the conic solvers that CVX uses, this wisdom is exactly backwards. It is the norm that is best suited for conic formulation and solution. Quadratic forms are handled by converting them to a conic form—using norms, in fact! This conversion process poses some interesting scaling challenges. It is better if the modeler can eliminate the need to perform this conversion.
Code
import pandas as pd
import cvxpy as cp
import numpy as np
A = pd.read_csv('A_matrix.csv').to_numpy()
b = pd.read_csv('b_vector.csv').to_numpy().ravel()
x = cp.Variable(61)
prob = cp.Problem(cp.Minimize(cp.norm(A # x - b)), [x >= 0, cp.sum(x) == 1])
result = prob.solve(solver = cp.SCS, verbose = True)
print("optimal value: ", prob.value)
print("cvxpy solution:")
print(x.value, np.sum(x.value))
Output solver=cp.SCS (slow CPU)
Valid solver-state + slow + solution looks not robust enough -> fluctuating around 0 symmetrically -> large primal-feas error in regards to x=>0
Could probably be improved by tunings, but it's probably better to use a different solver here! Not much analysis done here.
----------------------------------------------------------------------------
SCS v2.1.2 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 2446295
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 62, constraints m = 278543
Cones: primal zero / dual free vars: 1
linear vars: 61
soc vars: 278481, soc blks: 1
Setup time: 1.63e+00s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 9.14e+18 1.39e+20 1.00e+00 -5.16e+20 6.20e+23 6.04e+23 1.28e-01
100| 8.63e-01 1.90e+02 7.79e-01 5.96e+02 4.81e+03 1.17e-14 1.04e+01
200| 5.09e-02 3.50e+02 1.00e+00 6.20e+03 -1.16e+02 5.88e-15 2.08e+01
300| 3.00e-01 3.71e+03 7.64e-01 9.62e+03 7.19e+04 4.05e-15 3.17e+01
400| 5.19e-02 1.76e+02 1.91e-01 4.71e+03 6.94e+03 3.87e-15 4.25e+01
500| 4.60e-02 2.66e+02 2.83e-01 5.70e+03 1.02e+04 6.48e-15 5.25e+01
600| 5.13e-03 1.08e+02 1.24e-01 5.80e+03 7.44e+03 1.72e-14 6.23e+01
700| 3.35e-03 6.81e+01 9.64e-02 5.39e+03 4.44e+03 5.94e-15 7.15e+01
800| 1.62e-02 8.52e+01 1.17e-01 5.51e+03 6.97e+03 3.96e-15 8.07e+01
900| 1.93e-02 1.57e+01 1.89e-02 5.58e+03 5.38e+03 5.04e-15 8.98e+01
1000| 6.94e-03 6.85e+00 7.97e-03 5.57e+03 5.48e+03 1.75e-15 9.91e+01
1100| 4.64e-03 7.64e+00 1.42e-02 5.66e+03 5.50e+03 1.91e-15 1.09e+02
1200| 2.25e-04 3.25e-01 4.00e-04 5.61e+03 5.60e+03 5.33e-15 1.18e+02
1300| 4.73e-05 9.05e-02 5.78e-05 5.60e+03 5.60e+03 6.16e-15 1.28e+02
1400| 6.27e-07 4.58e-03 3.22e-06 5.60e+03 5.60e+03 7.17e-15 1.36e+02
1500| 2.02e-07 5.27e-05 4.58e-08 5.60e+03 5.60e+03 5.61e-15 1.46e+02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.46e+02s
Lin-sys: nnz in L factor: 2726730, avg solve time: 2.54e-02s
Cones: avg projection time: 1.16e-03s
Acceleration: avg step time: 5.61e-02s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 7.7307e-12, dist(y, K*) = 0.0000e+00, s'y/|s||y| = 3.0820e-18
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.0159e-07
dual res: |A'y + c|_2 / (1 + |c|_2) = 5.2702e-05
rel gap: |c'x + b'y| / (1 + |c'x| + |b'y|) = 4.5764e-08
----------------------------------------------------------------------------
c'x = 5602.9367, -b'y = 5602.9362
============================================================================
optimal value: 5602.936687635506
cvxpy solution:
[ 1.33143619e-06 -5.20173272e-07 -5.63980428e-08 -9.44340768e-08
6.07765135e-07 7.55998810e-07 8.45038786e-07 2.65626921e-06
-1.35669263e-07 -4.88286704e-07 -1.09285233e-06 8.63799377e-07
2.85145903e-07 -1.22240651e-06 2.14526505e-07 -2.40179173e-06
-1.75042884e-07 -1.27680170e-06 -1.40486649e-06 -1.12113037e-06
-2.26601198e-07 1.39878723e-07 -3.19396803e-06 -6.36480154e-07
2.16005860e-05 1.18205616e-06 2.15620316e-06 -1.94093348e-07
-1.88356275e-06 -7.07687270e-06 -1.99902966e-06 -2.28894738e-06
1.00000188e+00 -9.95601469e-07 -1.26333877e-06 1.26336565e-06
-5.31474195e-08 -9.81111443e-07 2.22755569e-07 -7.49418940e-07
-4.77882668e-07 6.89785690e-07 -2.46822613e-06 -5.73596077e-08
5.99307819e-07 -2.57301316e-07 -7.59150986e-07 -1.23753681e-08
-1.39938273e-06 1.48526305e-06 -2.41075790e-06 -3.50224485e-07
1.79214177e-08 6.71852182e-07 -5.10880844e-06 2.44821668e-07
-2.88655782e-06 -2.45457029e-07 -4.97712502e-07 -1.44497848e-06
-2.22294748e-07] 0.9999895863519757
Output solver=cp.ECOS (slow CPU)
Valid solver-state + much faster + solution looks ok
ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -0.000e+00 +7e+04 9e-01 1e-04 1e+00 1e+03 --- --- 1 2 - | - -
1 -5.108e+01 -4.292e+01 +7e+04 6e-01 1e-04 9e+00 1e+03 0.0218 3e-01 4 5 4 | 0 0
2 +2.187e+02 +2.387e+02 +5e+04 6e-01 8e-05 2e+01 8e+02 0.3427 4e-02 4 5 5 | 0 0
3 +1.109e+03 +1.118e+03 +1e+04 3e-01 2e-05 9e+00 2e+02 0.7403 6e-03 4 5 5 | 0 0
4 +1.873e+03 +1.888e+03 +1e+04 2e-01 2e-05 1e+01 2e+02 0.2332 1e-01 5 6 6 | 0 0
5 +3.534e+03 +3.565e+03 +4e+03 8e-02 8e-06 3e+01 7e+01 0.7060 2e-01 5 6 6 | 0 0
6 +5.452e+03 +5.453e+03 +2e+02 2e-03 2e-07 1e+00 3e+00 0.9752 2e-03 6 8 8 | 0 0
7 +5.584e+03 +5.585e+03 +4e+01 4e-04 4e-08 4e-01 7e-01 0.8069 6e-02 2 2 2 | 0 0
8 +5.602e+03 +5.602e+03 +5e+00 5e-05 6e-09 8e-02 9e-02 0.9250 5e-02 2 2 2 | 0 0
9 +5.603e+03 +5.603e+03 +1e-01 1e-06 1e-10 2e-03 2e-03 0.9798 2e-03 5 5 5 | 0 0
10 +5.603e+03 +5.603e+03 +6e-03 4e-07 6e-12 9e-05 1e-04 0.9498 3e-04 5 5 5 | 0 0
11 +5.603e+03 +5.603e+03 +4e-04 4e-07 3e-13 7e-06 6e-06 0.9890 4e-02 1 2 2 | 0 0
12 +5.603e+03 +5.603e+03 +1e-05 4e-08 8e-15 2e-07 2e-07 0.9816 8e-03 1 2 2 | 0 0
13 +5.603e+03 +5.603e+03 +2e-07 7e-10 1e-16 2e-09 2e-09 0.9890 1e-04 5 3 4 | 0 0
OPTIMAL (within feastol=7.0e-10, reltol=2.7e-11, abstol=1.5e-07).
Runtime: 18.727676 seconds.
optimal value: 5602.936884707248
cvxpy solution:
[7.47985848e-11 3.58238148e-11 4.53994101e-11 3.73056632e-11
3.47224797e-11 3.62895261e-11 3.59367993e-11 4.03642466e-11
3.58643375e-11 3.24886989e-11 3.25080912e-11 3.34866983e-11
3.66296670e-11 3.89612422e-11 3.54489152e-11 7.07301971e-11
3.95949165e-11 3.68235605e-11 3.05918372e-11 3.43890675e-11
3.71817538e-11 3.62561876e-11 3.55281653e-11 3.55800928e-11
4.10876077e-11 4.12877804e-11 4.11174782e-11 3.35519296e-11
3.43716575e-11 3.56588133e-11 3.66118962e-11 3.68789703e-11
9.99999998e-01 3.34857869e-11 3.21984616e-11 5.82577263e-11
2.85751155e-11 3.64710243e-11 3.59930485e-11 5.04742702e-11
3.07026084e-11 3.36507487e-11 4.19786324e-11 8.35032700e-11
3.33575857e-11 3.42732986e-11 3.70599423e-11 4.73856413e-11
3.39708564e-11 3.64354428e-11 2.95022064e-11 3.46315519e-11
3.04124702e-11 4.07870093e-11 3.57782184e-11 3.71824186e-11
3.72394185e-11 4.48194963e-11 4.09635820e-11 6.45638394e-11
4.00297122e-11] 0.9999999999673748
Outro
Final remarks
Above re-formulation might be enough to help you solve your problem
ECOS beating SCS on this large-scale problem is unintuitive, but can alway happen and i won't analyze it (SCS is still a great solver, but ECOS is too despite both being very different approaches! Open-source community should be happy to have those!)
I don't think i would go for these solvers with ML-scale data if i got time to implement something more customized
The easiest approach which comes to mind for large-scale solving here:
Projected (Accelerated) Gradient (which would be a first-order method being very robust in regards to your constraints here!)
"projection onto the probability simplex" (which you need) is a common (well-researched) thing
Going by the final weights/coefficients: data looks strange!
There seems to be a very dominating column (information-leakage); i don't know
Rounding, both solvers will output the solution vector: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Related

Is there a way to implement equations as Dymos path constraints?

For example, if I have a function h_max(mach) and I want the altitude to always respect this predefined altitude-mach relationship throughout the flight enveloppe, how could I impliment this?
I have tried calculating the limit quantity (in this case, h_max) as its own state and then calculating another state as h_max-h and then constraining that through a path constraint to being greater than 0. This type of approach has worked, but involved two explicit components, a group and alot of extra coding just to get a constraint working. I was wondering if there was a better way?
Thanks so much in advance.
The next version of Dymos, 1.7.0 will be released soon and will support this.
In the mean time, you can install the latest developmental version of Dymos directly from github to have access to this capability:
python -m pip install git+https://github.com/OpenMDAO/dymos.git
Then, you can define boundary and path constraints with an equation. Note the equation must have an equals sign in it, and then lower, upper, or equals will apply to the result of the equation.
In reality, dymos is just inserting an OpenMDAO ExecComp for you under the hood, so the one caveat to this is that your expression must be compatible with complex-step differentiation.
Here's an example of the brachistochrone that uses constraint expressions to set the final y value to a specific value while satisfying a path constraint defined with a second equation.
import openmdao.api as om
import dymos as dm
from dymos.examples.plotting import plot_results
from dymos.examples.brachistochrone import BrachistochroneODE
import matplotlib.pyplot as plt
#
# Initialize the Problem and the optimization driver
#
p = om.Problem(model=om.Group())
p.driver = om.ScipyOptimizeDriver()
p.driver.declare_coloring()
#
# Create a trajectory and add a phase to it
#
traj = p.model.add_subsystem('traj', dm.Trajectory())
phase = traj.add_phase('phase0',
dm.Phase(ode_class=BrachistochroneODE,
transcription=dm.GaussLobatto(num_segments=10)))
#
# Set the variables
#
phase.set_time_options(fix_initial=True, duration_bounds=(.5, 10))
phase.add_state('x', fix_initial=True, fix_final=True)
phase.add_state('y', fix_initial=True, fix_final=False)
phase.add_state('v', fix_initial=True, fix_final=False)
phase.add_control('theta', continuity=True, rate_continuity=True,
units='deg', lower=0.01, upper=179.9)
phase.add_parameter('g', units='m/s**2', val=9.80665)
Y_FINAL = 5.0
Y_MIN = 5.0
phase.add_boundary_constraint(f'bcf_y = y - {Y_FINAL}', loc='final', equals=0.0)
phase.add_path_constraint(f'path_y = y - {Y_MIN}', lower=0.0)
#
# Minimize time at the end of the phase
#
phase.add_objective('time', loc='final', scaler=10)
p.model.linear_solver = om.DirectSolver()
#
# Setup the Problem
#
p.setup()
#
# Set the initial values
#
p['traj.phase0.t_initial'] = 0.0
p['traj.phase0.t_duration'] = 2.0
p.set_val('traj.phase0.states:x', phase.interp('x', ys=[0, 10]))
p.set_val('traj.phase0.states:y', phase.interp('y', ys=[10, 5]))
p.set_val('traj.phase0.states:v', phase.interp('v', ys=[0, 9.9]))
p.set_val('traj.phase0.controls:theta', phase.interp('theta', ys=[5, 100.5]))
#
# Solve for the optimal trajectory
#
dm.run_problem(p)
# Check the results
print('final time')
print(p.get_val('traj.phase0.timeseries.time')[-1])
p.list_problem_vars()
Note the constraints from the list_problem_vars() call that come from timeseries_exec_comp - this is the OpenMDAO ExecComp that Dymos automatically inserts for you.
--- Constraint Report [traj] ---
--- phase0 ---
[final] 0.0000e+00 == bcf_y [None]
[path] 0.0000e+00 <= path_y [None]
/usr/local/lib/python3.8/dist-packages/openmdao/recorders/sqlite_recorder.py:227: UserWarning:The existing case recorder file, dymos_solution.db, is being overwritten.
Model viewer data has already been recorded for Driver.
Full total jacobian was computed 3 times, taking 0.057485 seconds.
Total jacobian shape: (71, 51)
Jacobian shape: (71, 51) (12.51% nonzero)
FWD solves: 12 REV solves: 0
Total colors vs. total size: 12 vs 51 (76.5% improvement)
Sparsity computed using tolerance: 1e-25
Time to compute sparsity: 0.057485 sec.
Time to compute coloring: 0.054118 sec.
Memory to compute coloring: 0.000000 MB.
/usr/local/lib/python3.8/dist-packages/openmdao/core/total_jac.py:1585: DerivativesWarning:Constraints or objectives [('traj.phases.phase0.timeseries.timeseries_exec_comp.path_y', inds=[(0, 0)])] cannot be impacted by the design variables of the problem.
Optimization terminated successfully (Exit mode 0)
Current function value: [18.02999766]
Iterations: 14
Function evaluations: 14
Gradient evaluations: 14
Optimization Complete
-----------------------------------
final time
[1.80299977]
----------------
Design Variables
----------------
name val size indices
-------------------------- -------------- ---- ---------------------------------------------
traj.phase0.t_duration [1.80299977] 1 None
traj.phase0.states:x |12.14992234| 9 [1 2 3 4 5 6 7 8 9]
traj.phase0.states:y |22.69124774| 10 [ 1 2 3 4 5 6 7 8 9 10]
traj.phase0.states:v |24.46289861| 10 [ 1 2 3 4 5 6 7 8 9 10]
traj.phase0.controls:theta |266.48489386| 21 [ 0 1 2 3 4 5 ... 4 15 16 17 18 19 20]
-----------
Constraints
-----------
name val size indices alias
----------------------------------------------------------- ------------- ---- --------------------------------------------- ----------------------------------------------------
timeseries.timeseries_exec_comp.bcf_y [0.] 1 [29] traj.phases.phase0->final_boundary_constraint->bcf_y
timeseries.timeseries_exec_comp.path_y |15.73297378| 30 [ 0 1 2 3 4 5 ... 3 24 25 26 27 28 29] traj.phases.phase0->path_constraint->path_y
traj.phase0.collocation_constraint.defects:x |6e-08| 10 None None
traj.phase0.collocation_constraint.defects:y |7e-08| 10 None None
traj.phase0.collocation_constraint.defects:v |3e-08| 10 None None
traj.phase0.continuity_comp.defect_control_rates:theta_rate |0.0| 9 None None
----------
Objectives
----------
name val size indices
------------- ------------- ---- -------
traj.phase0.t [18.02999766] 1 -1

How to deal with the error when using Gurobi with cvxpy :Unable to retrieve attribute 'BarIterCount'

How to deal with the error when using Gurobi with cvxpy :AttributeError: Unable to retrieve attribute 'BarIterCount'.
I have an Integer programming problem, using cvxpy and set gurobi as a solver.
When the number of variables is small, the result is ok. After the number of variables reaches a level of like 43*13*6, then the error occurred. I suppose it may be caused by the scale of the problem, in which the gurobi solver can not estimate the BarIterCount, which is the max Iterations needed.
Thus, I wonder, is there any way to manually set the BarItercount attribute of gurobi through the interface of the CVX? Or whether there exists another way to solve this problem?
Thanks for any suggestions you may provide for me.
The trace log is as follows:
If my model is small, like I set a number which indicates the scale of model as 3, then the program is ok. The trace is :
Using license file D:\software\lib\site-packages\gurobipy\gurobi.lic
Restricted license - for non-production use only - expires 2022-01-13
Parameter OutputFlag unchanged
Value: 1 Min: 0 Max: 1 Default: 1
D:\software\lib\site-packages\cvxpy\reductions\solvers\solving_chain.py:326: DeprecationWarning: Deprecated, use Model.addMConstr() instead
solver_opts, problem._solver_cache)
Changed value of parameter QCPDual to 1
Prev: 0 Min: 0 Max: 1 Default: 0
Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (win64)
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 126 rows, 370 columns and 2689 nonzeros
Model fingerprint: 0x70d49530
Variable types: 0 continuous, 370 integer (369 binary)
Coefficient statistics:
Matrix range [1e+00, 7e+00]
Objective range [1e+00, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 6e+00]
Found heuristic solution: objective 7.0000000
Presolve removed 4 rows and 90 columns
Presolve time: 0.01s
Presolved: 122 rows, 280 columns, 1882 nonzeros
Variable types: 0 continuous, 280 integer (279 binary)
Root relaxation: objective 4.307692e+00, 216 iterations, 0.00 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 4.30769 0 49 7.00000 4.30769 38.5% - 0s
H 0 0 6.0000000 4.30769 28.2% - 0s
0 0 5.00000 0 35 6.00000 5.00000 16.7% - 0s
0 0 5.00000 0 37 6.00000 5.00000 16.7% - 0s
0 0 5.00000 0 7 6.00000 5.00000 16.7% - 0s
Cutting planes:
Gomory: 4
Cover: 9
MIR: 4
StrongCG: 1
GUB cover: 9
Zero half: 1
RLT: 1
Explored 1 nodes (849 simplex iterations) in 0.12 seconds
Thread count was 32 (of 32 available processors)
Solution count 2: 6 7
Optimal solution found (tolerance 1.00e-04)
Best objective 6.000000000000e+00, best bound 6.000000000000e+00, gap 0.0000%
If the number is 6, then error occurs:
-------------------------------------------------------
Using license file D:\software\lib\site-packages\gurobipy\gurobi.lic
Restricted license - for non-production use only - expires 2022-01-13
Parameter OutputFlag unchanged
Value: 1 Min: 0 Max: 1 Default: 1
D:\software\lib\site-packages\cvxpy\reductions\solvers\solving_chain.py:326: DeprecationWarning: Deprecated, use Model.addMConstr() instead
solver_opts, problem._solver_cache)
Changed value of parameter QCPDual to 1
Prev: 0 Min: 0 Max: 1 Default: 0
Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (win64)
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Traceback (most recent call last):
File "model.py", line 274, in <module>
problem.solve(solver=cp.GUROBI,verbose=True)
File "D:\software\lib\site-packages\cvxpy\problems\problem.py", line 396, in solve
return solve_func(self, *args, **kwargs)
File "D:\software\lib\site-packages\cvxpy\problems\problem.py", line 754, in _solve
self.unpack_results(solution, solving_chain, inverse_data)
File "D:\software\lib\site-packages\cvxpy\problems\problem.py", line 1058, in unpack_results
solution = chain.invert(solution, inverse_data)
File "D:\software\lib\site-packages\cvxpy\reductions\chain.py", line 79, in invert
solution = r.invert(solution, inv)
File "D:\software\lib\site-packages\cvxpy\reductions\solvers\qp_solvers\gurobi_qpif.py", line 59, in invert
s.NUM_ITERS: model.BarIterCount,
File "src\gurobipy\model.pxi", line 343, in gurobipy.gurobipy.Model.__getattr__
File "src\gurobipy\model.pxi", line 1842, in gurobipy.gurobipy.Model.getAttr
File "src\gurobipy\attrutil.pxi", line 100, in gurobipy.gurobipy.__getattr
AttributeError: Unable to retrieve attribute 'BarIterCount'
Hopefully this can provide more hint for solution.
BarIterCount is the number of barrier iterations performed to solve an LP. This is not a limit on the number of iterations and it should only be queried when the current optimization process has been finished. You cannot set this attribute either, of course.
To actually limit the number of iterations the barrier algorithm is allowed to take, you can use the parameter BarIterLimit.
Please inspect your log file for further information about the solver's behavior.

How to use bob.measure.load.split()

I'm a student studying with a focus on machine learning, and I'm interested in authentication.
I am interested in your library because I want to calculate the EER.
Sorry for the basic question, but please tell me about bob.measure.load.split().
Is the file format required by this correct in the perception that the first column is the correct label and the second column is the predicted score of the model?
like
# file.txt
|label|prob |
| -1 | 0.3 |
| 1 | 0.5 |
| -1 | 0.8 |
...
In addition, to actually calculate the EER, should I follow the following procedure?
neg, pos = bob.measure.load.split('file.txt')
eer = bob.measure.eer(neg, pos)
Sincerely.
You have two options of calculating EER with bob.measure:
Use the Python API to calculate EER using numpy arrays.
Use the command line application to generate error rates (including EER) and plots
Using Python API
First, you need to load the scores into memory and split them into positive and negative scores.
For examples:
import numpy as np
import bob.measure
positives = np.array([0.5, 0.5, 0.6, 0.7, 0.2])
negatives = np.array([0.0, 0.0, 0.6, 0.2, 0.2])
eer = bob.measure.eer(negatives, positives)
print(eer)
This will print 0.2. All you need to take care is that your positive comparison scores are higher than negative comparisons. That is your model should score higher for positive samples.
Using command line
bob.measure also comes with a suite of command line commands that can help you get the error rates. To use the command line, you need to save the scores in a text file. This file is made of two columns where columns are separated by space. For example the score file for the same example would be:
$ cat scores.txt
1 0.5
1 0.5
1 0.6
1 0.7
1 0.2
-1 0.0
-1 0.0
-1 0.6
-1 0.2
-1 0.2
and then you would call
$ bob measure metrics scores.txt
[Min. criterion: EER ] Threshold on Development set `scores.txt`: 3.500000e-01
================================ =============
.. Development
================================ =============
False Positive Rate 20.0% (1/5)
False Negative Rate 20.0% (1/5)
Precision 0.8
Recall 0.8
F1-score 0.8
Area Under ROC Curve 0.8
Area Under ROC Curve (log scale) 0.7
================================ =============
Ok it didn't print EER exactly but EER = (FPR+FNR)/2.
Using bob.bio.base command line
If your scores are the results of a biometrics experiment,
then you want to save your scores in the 4 or 5 column formats of bob.bio.base.
See an example in https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/3efccd3b637ee73ec68ed0ac5fde2667a943bd6e/bob/bio/base/test/data/dev-4col.txt and documentation in https://www.idiap.ch/software/bob/docs/bob/bob.bio.base/stable/experiments.html#evaluating-experiments
Then, you would call bob bio metrics scores-4-col.txt to get biometrics related metrics.

How to access the BIC value in Stata after being calculated by estat ic?

This document explains that the values of AIC and BIC are stored in r(S), but when I try display r(S), it returns "type mismatch" and when I try sum r(S), it returns "r ambiguous abbreviation".
Sorry for my misunderstanding this r(S), but I'll appreciate it if you let me know how I can access the calculated BIC value.
The document you refer to mentions that r(S) is a matrix. The display command does not work with matrices. Try matrix list. Also see help matrix.
For example:
clear
sysuse auto
regress mpg weight foreign
estat ic
matrix list r(S)
matrix S=r(S)
scalar aic=S[1,5]
di aic
The same document that you cited explains that r(S) is a matrix. That explains the failure of your commands, as summarize is for summarizing variables and display is for displaying strings and scalar expressions, as their help explains. Matrices are neither.
Note that the document you cited
http://www.stata.com/manuals13/restatic.pdf
is at the time of writing not the most recent version
http://www.stata.com/manuals14/restatic.pdf
although the advice is the same either way.
Copy r(S) to a matrix that will not disappear when you run the next r-class command, and then list it directly. For basic help on matrices, start with
help matrix
Here is a reproducible example. I use the Stata 13 version of the dataset because your question hints that you may be using that version:
. use http://www.stata-press.com/data/r13/sysdsn1
(Health insurance data)
. mlogit insure age male nonwhite
Iteration 0: log likelihood = -555.85446
Iteration 1: log likelihood = -545.60089
Iteration 2: log likelihood = -545.58328
Iteration 3: log likelihood = -545.58328
Multinomial logistic regression Number of obs = 615
LR chi2(6) = 20.54
Prob > chi2 = 0.0022
Log likelihood = -545.58328 Pseudo R2 = 0.0185
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Prepaid |
age | -.0111915 .0060915 -1.84 0.066 -.0231305 .0007475
male | .5739825 .2005221 2.86 0.004 .1809665 .9669985
nonwhite | .7312659 .218978 3.34 0.001 .302077 1.160455
_cons | .1567003 .2828509 0.55 0.580 -.3976773 .7110778
-------------+----------------------------------------------------------------
Uninsure |
age | -.0058414 .0114114 -0.51 0.609 -.0282073 .0165245
male | .5102237 .3639793 1.40 0.161 -.2031626 1.22361
nonwhite | .4333141 .4106255 1.06 0.291 -.371497 1.238125
_cons | -1.811165 .5348606 -3.39 0.001 -2.859473 -.7628578
------------------------------------------------------------------------------
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 615 -555.8545 -545.5833 8 1107.167 1142.54
-----------------------------------------------------------------------------
Note: N=Obs used in calculating BIC; see [R] BIC note.
. ret li
matrices:
r(S) : 1 x 6
. mat S = r(S)
. mat li S
S[1,6]
N ll0 ll df AIC BIC
. 615 -555.85446 -545.58328 8 1107.1666 1142.5395
The BIC value is now in S[1,6].

Explain the Differential Evolution method

Can someone please explain the Differential Evolution method? The Wikipedia definition is extremely technical.
A dumbed-down explanation followed by a simple example would be appreciated :)
Here's a simplified description. DE is an optimisation technique which iteratively modifies a population of candidate solutions to make it converge to an optimum of your function.
You first initialise your candidate solutions randomly. Then at each iteration and for each candidate solution x you do the following:
you produce a trial vector: v = a + ( b - c ) / 2, where a, b, c are three distinct candidate solutions picked randomly among your population.
you randomly swap vector components between x and v to produce v'. At least one component from v must be swapped.
you replace x in your population with v' only if it is a better candidate (i.e. it better optimise your function).
(Note that the above algorithm is very simplified; don't code from it, find proper spec. elsewhere instead)
Unfortunately the Wikipedia article lacks illustrations. It is easier to understand with a graphical representation, you'll find some in these slides: http://www-personal.une.edu.au/~jvanderw/DE_1.pdf .
It is similar to genetic algorithm (GA) except that the candidate solutions are not considered as binary strings (chromosome) but (usually) as real vectors. One key aspect of DE is that the mutation step size (see step 1 for the mutation) is dynamic, that is, it adapts to the configuration of your population and will tend to zero when it converges. This makes DE less vulnerable to genetic drift than GA.
Answering my own question...
Overview
The principal difference between Genetic Algorithms and Differential Evolution (DE) is that Genetic Algorithms rely on crossover while evolutionary strategies use mutation as the primary search mechanism.
DE generates new candidates by adding a weighted difference between two population members to a third member (more on this below).
If the resulting candidate is superior to the candidate with which it was compared, it replaces it; otherwise, the original candidate remains unchanged.
Definitions
The population is made up of NP candidates.
Xi = A parent candidate at index i (indexes range from 0 to NP-1) from the current generation. Also known as the target vector.
Each candidate contains D parameters.
Xi(j) = The jth parameter in candidate Xi.
Xa, Xb, Xc = three random parent candidates.
Difference vector = (Xb - Xa)
F = A weight that determines the rate of the population's evolution.
Ideal values: [0.5, 1.0]
CR = The probability of crossover taking place.
Range: [0, 1]
Xc` = A mutant vector obtained through the differential mutation operation. Also known as the donor vector.
Xt = The child of Xi and Xc`. Also known as the trial vector.
Algorithm
For each candidate in the population
for (int i = 0; i<NP; ++i)
Choose three distinct parents at random (they must differ from each other and i)
do
{
a = random.nextInt(NP);
} while (a == i)
do
{
b = random.nextInt(NP);
} while (b == i || b == a);
do
{
c = random.nextInt(NP);
} while (c == i || c == b || c == a);
(Mutation step) Add a weighted difference vector between two population members to a third member
Xc` = Xc + F * (Xb - Xa)
(Crossover step) For every variable in Xi, apply uniform crossover with probability CR to inherit from Xc`; otherwise, inherit from Xi. At least one variable must be inherited from Xc`
int R = random.nextInt(D);
for (int j=0; j < D; ++j)
{
double probability = random.nextDouble();
if (probability < CR || j == R)
Xt[j] = Xc`[j]
else
Xt[j] = Xi[j]
}
(Selection step) If Xt is superior to Xi then Xt replaces Xi in the next generation. Otherwise, Xi is kept unmodified.
Resources
See this for an overview of the terminology
See Optimization Using Differential Evolution by Vasan Arunachalam for an explanation of the Differential Evolution algorithm
See Evolution: A Survey of the State-of-the-Art by Swagatam Das and Ponnuthurai Nagaratnam Suganthan for different variants of the Differential Evolution algorithm
See Differential Evolution Optimization from Scratch with Python for a detailed description of an implementation of a DE algorithm in python.
The working of DE algorithm is very simple.
Consider you need to optimize(minimize,for eg) ∑Xi^2 (sphere model) within a given range, say [-100,100]. We know that the minimum value is 0. Let's see how DE works.
DE is a population-based algorithm. And for each individual in the population, a fixed number of chromosomes will be there (imagine it as a set of human beings and chromosomes or genes in each of them).
Let me explain DE w.r.t above function
We need to fix the population size and the number of chromosomes or genes(named as parameters). For instance, let's consider a population of size 4 and each of the individual has 3 chromosomes(or genes or parameters). Let's call the individuals R1,R2,R3,R4.
Step 1 : Initialize the population
We need to randomly initialise the population within the range [-100,100]
G1 G2 G3 objective fn value
R1 -> |-90 | 2 | 1 | =>8105
R2 -> | 7 | 9 | -50 | =>2630
R3 -> | 4 | 2 | -9.2| =>104.64
R4 -> | 8.5 | 7 | 9 | =>202.25
objective function value is calculated using the given objective function.In this case, it's ∑Xi^2. So for R1, obj fn value will be -90^2+2^2+2^2 = 8105. Similarly it is found for all.
Step 2 : Mutation
Fix a target vector,say for eg R1 and then randomly select three other vectors(individuals)say for eg.R2,R3,R4 and performs mutation. Mutation is done as follows,
MutantVector = R2 + F(R3-R4)
(vectors can be chosen randomly, need not be in any order).F (scaling factor/mutation constant) within range [0,1] is one among the few control parameters DE is having.In simple words , it describes how different the mutated vector becomes. Let's keep F =0.5.
| 7 | 9 | -50 |
+
0.5 *
| 4 | 2 | -9.2|
+
| 8.5 | 7 | 9 |
Now performing Mutation will give the following Mutant Vector
MV = | 13.25 | 13.5 | -50.1 | =>2867.82
Step 3 : Crossover
Now that we have a target vector(R1) and a mutant vector MV formed from R2,R3 & R4 ,we need to do a crossover. Consider R1 and MV as two parents and we need a child from these two parents. Crossover is done to determine how much information is to be taken from both the parents. It is controlled by Crossover rate(CR). Every gene/chromosome of the child is determined as follows,
a random number between 0 & 1 is generated, if it is greater than CR , then inherit a gene from target(R1) else from mutant(MV).
Let's set CR = 0.9. Since we have 3 chromosomes for individuals, we need to generate 3 random numbers between 0 and 1. Say for eg, those numbers are 0.21,0.97,0.8 respectively. First and last are lesser than CR value, so those positions in the child's vector will be filled by values from MV and second position will be filled by gene taken from target(R1).
Target-> |-90 | 2 | 1 | Mutant-> | 13.25 | 13.5 | -50.1 |
random num - 0.21, => `Child -> |13.25| -- | -- |`
random num - 0.97, => `Child -> |13.25| 2 | -- |`
random num - 0.80, => `Child -> |13.25| 2 | -50.1 |`
Trial vector/child vector -> | 13.25 | 2 | -50.1 | =>2689.57
Step 4 : Selection
Now we have child and target. Compare the obj fn of both, see which is smaller(minimization problem). Select that individual out of the two for next generation
R1 -> |-90 | 2 | 1 | =>8105
Trial vector/child vector -> | 13.25 | 2 | -50.1 | =>2689.57
Clearly, the child is better so replace target(R1) with the child. So the new population will become
G1 G2 G3 objective fn value
R1 -> | 13.25 | 2 | -50.1 | =>2689.57
R2 -> | 7 | 9 | -50 | =>2500
R3 -> | 4 | 2 | -9.2 | =>104.64
R4 -> | -8.5 | 7 | 9 | =>202.25
This procedure will be continued either till the number of generations desired has reached or till we get our desired value. Hope this will give you some help.