svmtrain - unable to solve the optimization problem - optimization

I am using svmtrain to discriminate between several pairs of data. Although svmtrain works as desired in one case (outputting a classifier object with ~70 % accuracy as verified by svmclassify), all other cases seem to fail. My feature vectors are 134 dimensions and I am using between 300 and 800 data points for each class. (Each class does not necessarily have the same number of data points). I have tried using the default kernel for svmtrain using the method
SVM = svmtrain(double(train{k}), group_train{k},'showplot',true);
In this case I get the error:
Unable to solve the optimization problem: Maximum number of iterations exceeded; increase options.MaxIter. To continue solving the problem with the current solution as the starting point, set x0 = x before calling quadprog.
I have also tried extending the number of iterations and specifying a kernel using the call:
options = optimset('maxiter',1000,'largescale','on');
SVM = svmtrain(double(train{k}),group_train{k},'Kernel_Function','mlp','Method','QP',...
'quadprog_opts',options);
In this case, I get the error:
Unable to solve the optimization problem: Exiting: the solution is unbounded and at infinity; the constraints are not restrictive enough.
In the case that did work, I have 338 data points from the first class and 476 data points from the second class. As examples, in three of the cases that don't work, I have 828, 573, and 333 data points in the second class, while the first class remains the same and has 338 data points. Neither method call seems to work.
Could you please help me? I have been trying to solve this problem for a week and have had no luck. I am using MATLAB 7.9.0 R2009B on a virtual machine Windows XP with a 1 GHz processor and 2 GB RAM.
Thank you so much!
-Vivek

Make it like this :
options = optimset('maxiter',1000);
svmtrain(TotalResult,YResultsTotal,'Kernel_Function','mlp','Method','QP',...
'quadprog_opts',options);

Related

Getting "DUAL_INFEASIBLE" when solving a very simple linear programming problem

I am solving a simple LP problem using Gurobi with dual simplex and presolve. I get the model is unbounded but I couldn't see why such a model is unbounded. Can anyone help to tell me where goes wrong?
I attached the log and also the content in the .mps file.
Thanks very much in advance.
Kind regards,
Hongyu.
The output log and .mps file:
Link to the .mps file: https://studntnu-my.sharepoint.com/:u:/g/personal/hongyuzh_ntnu_no/EV5CBhH2VshForCL-EtPvBUBiFT8uZZkv-DrPtjSFi8PGA?e=VHktwf
Gurobi Optimizer version 9.5.2 build v9.5.2rc0 (mac64[arm])
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1 rows, 579 columns and 575 nonzeros
Coefficient statistics:
Matrix range [3e-02, 5e+01]
Objective range [7e-01, 5e+01]
Bounds range [0e+00, 0e+00]
RHS range [7e+03, 7e+03]
Iteration Objective Primal Inf. Dual Inf. Time
0 handle free variables 0s
Solved in 0 iterations and 0.00 seconds (0.00 work units)
Unbounded model
The easiest way to debug this is to put a bound on the objective, so the model is no longer unbounded. Then inspect the solution. This is a super easy trick that somehow few people know about.
When we do this with a bound of 100000, we see:
phi = 100000.0000
gamma[11] = -1887.4290
(the rest zero). Indeed we can make gamma[11] as negative as we want to obey R0. Note that gamma[11] is not in the objective.
More advice: It is also useful to write out the LP file of the model and study that carefully. You probably would have caught the error and that would have prevented this post.

For a buckling analysis, must all forces be multiplied by the resulting eigenvalue, or only the compressive load?

I am trying to do a linear buckling analysis (sol 105) with Nastran on a cylindrical shell structure. My understanding is that the compressive load that I apply to the structure must be multiplied by the resulting eigenvalue to get the buckling load. This gives me results that I expect.
However, now I apply a single perturbation load (SPL), a small transverse force acting midway along the cylinder on a single grid point. My understanding is that the magnitude of the SPL stays the way it is, (Unlike the compressive load where I multiply it with the eigenvalue to obtain buckling load.) The results I obtain are not what I expect, as the buckling load should not reduce so much as the SPL increases, according to the theory on this topic.
I am wondering if anyone knows what I am doing wrong. I feel like my mistake is probably very easy, but I haven't been able to solve it yet. Here is some more information on my implementation:
Axial compressive force spread over top grid points of cylinder.
Both SPL (the transverse point load) and axial loads are added to the static analysis subcase. Then the buckling subcase uses the static subcase for its analysis. This is how I understand it should be done.
boundary conditions:
SPC1 restraining 123 (xyz) directions at bottom grid points.
SPC1 restraining 12 (xy) directions at top grid points.
I'm not a Nastran user but I've done a lot of buckling analysis with Cast3M software.
The linear buckling analysis does not need perturbation loading, but only your main axial loading (F^0).
To recap,
Solve the linear problem for axial loading :
solve for u^0 : [K] * u^0 = F^0
get the linear stresses from the Hooke law : \sigma^0 = D * B * u^0
Solve the eigenvalue buckling problem :
[ K + \lambda Kgeo(\sigma^0)] * X = 0
Then, if you want to perform a non-linear (large displacement) post-buckling analysis, it is recommended to introduce a small perturbation which "excites" the buckling mode.
If you introduce the perturbation loading before the linear buckling analysis, maybe Nastran is adding it to F^0 and it is then logical that the result of buckling changes.
Hope this can help you.
There is a way to scale some loads and hold others constant. Create 2 Static Subcases with 2 (different) sets of loads:
Constant loads (that are not scaled, like a preload or internal pressure)
Those that will be scaled by the eigenvalue
Order does not matter
Use the Nastran STATSUB entry to define. It looks like this:
SUBCASE 100
LOAD = 1 $ Static pre-load
SUBCASE 200
LOAD = 2 $ Varying buckling load
$ -------------
SUBCASE 1000
STATSUB(PRELOAD) = 100
STATSUB(BUCKLING) = 200
METHOD = 10
The eigensolution is modified to include influence of static and varying loads.

Balance Dataset for Tensorflow Object Detection

I currently want to use Tensorflows Object Detection API for my custom problem.
I already created the dataset, but its pretty unbalanced.
The Dataset has 3 classes and my main problem is, that one class has about 16k samples and another class has only about 2.5k samples.
So I think I have to balance the dataset. Someone told me, that there is something called sample/class weights(Not sure if this is 100% correct), which balance the samples for training, so that the biggest class has a smaller impact on training then the smallest class.
I'm not able to find this method for balancing. Can someone pleas give me a hint where to start?
You can do normal cross entropy, giving you a ? x 1 tensor, X of losses
If you want class number N to count T times more, you can do
X = X * tf.reduce_sum(tf.multiply(one_hot_label, class_weight), axis = 1)
tf.multiply
scales the label by whatever weight you want,
tf.reduce_sum
converts the label vector a to a scalar, so you end up with a ? x 1 tensor filled with the class weightings. Then you simply multiply the tensor of losses with the tensor of weightings to achieve desired results.
Since one class is 6.4 times more common than the other, I would apply the weightings 1 and 6.4 to the more common and less common class respectively. This will mean that every time the less common class occurs, it has 6.4 times the affect of the more common class, so it's like it saw the same number of samples from each.
You might want to modify it so that the weighting add up to the number of classes. This matches the default case is all of the weightings are 1. In that case we have 1 /7.4 and 6.4/7.4

How to determine which constraints or variable bounds are rendering a GAMS model infeasible?

The solve summary in my GAMS model (NLP) is returning the following:
**** SOLVER STATUS 1 Normal Completion
**** MODEL STATUS 19 Infeasible - No Solution
**** OBJECTIVE VALUE NA
THE bounds on one of my variables are:
y.lo = 0, y.up = 0.15
if I change the bounds to:
y.lo = 0, y.up = 0.12
the model then converges and gives the following:
**** SOLVER STATUS 1 Normal Completion
**** MODEL STATUS 2 Locally Optimal
**** OBJECTIVE VALUE 66013164.0000
It turns out that the final variable level is
y.l = 0.12
how can it be that GAMS determined the model to be infeasible in the first case (upper bound = 0.15) even though the solution (0.12) was within the search space? (btw, I am using ANTIGONE solver)
Additionally, are there any methodical ways to identify which constraints/variable bounds are causing the model to be infeasible?
In order to find this (seemingly illogical) error, I had to spend hours guessing and checking arbitrary details within the model with no rhyme or reason. There has to be a better way, right?
That issue is not GAMS fault, but the solver you're using. Have you tried with CONOPT?
You can see the infeasible constraint in the lst file. Some equations should have (***INFES) mark
Also, to solve your problem, I would try to provide the NLP solver an initial solution that is somehow close enough to the optimal one, or at least feasible.
I would also try to check the options of the solvers you are using to start the solution procedure with a feasible starting point.
Non-convex optimization is not easy.
I hope this helps.

scipy.optimize.fmin_l_bfgs_b returns 'ABNORMAL_TERMINATION_IN_LNSRCH'

I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm.
sigma_sp_new, func_val, info_dict = fmin_l_bfgs_b(func_to_minimize, self.sigma_vector[si][pj],
args=(self.w_vectors[si][pj], Y, X, E_step_results[si][pj]),
approx_grad=True, bounds=[(1e-8, 0.5)], factr=1e02, pgtol=1e-05, epsilon=1e-08)
But sometimes I got a warning 'ABNORMAL_TERMINATION_IN_LNSRCH' in the information dictionary:
func_to_minimize value = 1.14462324063e-07
information dictionary: {'task': b'ABNORMAL_TERMINATION_IN_LNSRCH', 'funcalls': 147, 'grad': array([ 1.77635684e-05, 2.87769808e-05, 3.51718654e-05,
6.75015599e-06, -4.97379915e-06, -1.06581410e-06]), 'nit': 0, 'warnflag': 2}
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 6 M = 10
This problem is unconstrained.
At X0 0 variables are exactly at the bounds
At iterate 0 f= 1.14462D-07 |proj g|= 3.51719D-05
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
6 1 21 1 0 0 3.517D-05 1.145D-07
F = 1.144619474757747E-007
ABNORMAL_TERMINATION_IN_LNSRCH
Line search cannot locate an adequate point after 20 function
and gradient evaluations. Previous x, f and g restored.
Possible causes: 1 error in function or gradient evaluation;
2 rounding error dominate computation.
Cauchy time 0.000E+00 seconds.
Subspace minimization time 0.000E+00 seconds.
Line search time 0.000E+00 seconds.
Total User time 0.000E+00 seconds.
I do not get this warning every time, but sometimes. (Most get 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' or 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH').
I know that it means the minimum can be be reached in this iteration. I googled this problem. Someone said it occurs often because the objective and gradient functions do not match. But here I do not provide gradient function because I am using 'approx_grad'.
What are the possible reasons that I should investigate? What does it mean by "rounding error dominate computation"?
======
I also find that the log-likelihood does not monotonically increase:
########## Convergence !!! ##########
log_likelihood_history: [-28659.725891322563, 220.49993177669558, 291.3513633060345, 267.47745327823907, 265.31567762171181, 265.07311121000367, 265.04217683341682]
It usually start decrease at the second or the third iteration, even through 'ABNORMAL_TERMINATION_IN_LNSRCH' does not occurs. I do not know whether it this problem is related to the previous one.
Scipy calls the original L-BFGS-B implementation. Which is some fortran77 (old but beautiful and superfast code) and our problem is that the descent direction is actually going up. The problem starts on line 2533 (link to the code at the bottom)
gd = ddot(n,g,1,d,1)
if (ifun .eq. 0) then
gdold=gd
if (gd .ge. zero) then
c the directional derivative >=0.
c Line search is impossible.
if (iprint .ge. 0) then
write(0,*)' ascent direction in projection gd = ', gd
endif
info = -4
return
endif
endif
In other words, you are telling it to go down the hill by going up the hill. The code tries something called line search a total of 20 times in the descent direction that you provide and realizes that you are NOT telling it to go downhill, but uphill. All 20 times.
The guy who wrote it (Jorge Nocedal, who by the way is a very smart guy) put 20 because pretty much that's enough. Machine epsilon is 10E-16, I think 20 is actually a little too much. So, my money for most people having this problem is that your gradient does not match your function.
Now, it could also be that "2. rounding errors dominate computation". By this, he means that your function is a very flat surface in which increases are of the order of machine epsilon (in which case you could perhaps rescale the function),
Now, I was thiking that maybe there should be a third option, when your function is too weird. Oscillations? I could see something like $\sin({\frac{1}{x}})$ causing this kind of problem. But I'm not a smart guy, so don't assume that there's a third case.
So I think the OP's solution should be that your function is too flat. Or look at the fortran code.
https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb/lbfgsb.f
Here's line search for those who want to see it. https://en.wikipedia.org/wiki/Line_search
Note. This is 7 months too late. I put it here for future's sake.
As pointed out in the answer by Wilmer E. Henao, the problem is probably in the gradient. Since you are using approx_grad=True, the gradient is calculated numerically. In this case, reducing the value of epsilon, which is the step size used for numerically calculating the gradient, can help.
I also got the error "ABNORMAL_TERMINATION_IN_LNSRCH" using the L-BFGS-B optimizer.
While my gradient function pointed in the right direction, I rescaled the actual gradient of the function by its L2-norm. Removing that or adding another appropriate type of rescaling worked. Before, I guess that the gradient was so large that it went out of bounds immediately.
The problem from OP was unbounded if I read correctly, so this will certainly not help in this problem setting. However, googling the error "ABNORMAL_TERMINATION_IN_LNSRCH" yields this page as one of the first results, so it might help others...
I had a similar problem recently. I sometimes encounter the ABNORMAL_TERMINATION_IN_LNSRCH message after using fmin_l_bfgs_b function of scipy. I try to give additional explanations of the reason why I get this. I am looking for complementary details or corrections if I am wrong.
In my case, I provide the gradient function, so approx_grad=False. My cost function and the gradient are consistent. I double-checked it and the optimization actually works most of the time. When I get ABNORMAL_TERMINATION_IN_LNSRCH, the solution is not optimal, not even close (even this is a subjective point of view). I can overcome this issue by modifying the maxls argument. Increasing maxls helps to solve this issue to finally get the optimal solution. However, I noted that sometimes a smaller maxls, than the one that produces ABNORMAL_TERMINATION_IN_LNSRCH, results in a converging solution. A dataframe summarizes the results. I was surprised to observe this. I expected that reducing maxls would not improve the result. For this reason, I tried to read the paper describing the line search algorithm but I had trouble to understand it.
The line "search algorithm generates a sequence of
nested intervals {Ik} and a sequence of iterates αk ∈ Ik ∩ [αmin ; αmax] according to the [...] procedure". If I understand well, I would say that the maxls argument specifies the length of this sequence. At the end of the maxls iterations (or less if the algorithm terminates in fewer iterations), the line search stops. A final trial point is generated within the final interval Imaxls. I would say the the formula does not guarantee to get an αmaxls that respects the two update conditions, the minimum decrease and the curvature, especially when the interval is still wide. My guess is that in my case, after 11 iterations the generated interval I11 is such that a trial point α11 respects both conditions. But, even though I12 is smaller and still containing acceptable points, α12 is not. Finally after 24 iterations, the interval is very small and the generated αk respects the update conditions.
Is my understanding / explanation accurate?
If so, I would then be surprised that when maxls=12, since the generated α11 is acceptable but not α12, why α11 is not chosen in this case instead of α12?
Pragmatically, I would recommend to try a few higher maxls when getting ABNORMAL_TERMINATION_IN_LNSRCH.