GAMS takes a long time to return infeasibility for a certain parameter - optimization

currently, I am running a MILP in GAMS with different input parameters, just to check, for which values the model is feasible. Most of the time, infeasibility is found very fast, in a few seconds, but for one particular value it takes a lot, >240 seconds.
I am confused about it, since it only takes that long that one random value, while the duration is low for all other values, higher and lower ones. In general, I have always experienced that GAMS is very fast in finding the infeasibility of a model (since it doesn't solve the model). What could be a reason to take that long for GAMS, even before actually starting the optimisation?
Thank you in advance,
Michael

Related

Is there a way to use less decimals in xgb.cv loss calculation to allow 'early_stopping_rounds' to trigger sooner?

I am using xgb.cv to determine a correct number of estimators for my problem and I am using 'multi:softprob' and 'mlogloss'. Originally in my code I set:
num_boost_round = 999
early_stopping_rounds = 10
Problem is that the loss is returned with many decimals, and even though the last decimals change, it has no practical effect on model goodness for me. This is an example of the losses from around boost round 170 of my run:
0.012855
0.012855
0.012855
0.012854666666666667
0.012854666666666667
0.012853999999999999
0.012853999999999999
0.012853666666666666
0.012853666666666666
0.012853666666666666
0.012852999999999998
You can see that there is little or no idea continuing anymore. My cv got down to these figures already after 15-20 boosting rounds.
Is there a way to use less decimals for the loss comparisons (or reporting) and that way make 'early_stopping_rounds' trigger sooner and stop the cv?
Any ideas would be appreciated.

Does increasing the number of iterations affect log-lik, AIC etc.?

Whenever I try to solve a convergence issue in one of my glmer models with the help of a different optimizer, I repeat the entire model optimization procedure with the new optimizer. That is, I re-run all the models I've computed so far with the new optimizer and again conduct comparisons with anova (). I do this because as far as I know different optimizers may lead to differences in AICs and log-lik ratios for one and the same model, making comparisons between two models that use different optimizers problematic.
In my most recent analysis, I've increased the number of iterations with optCtrl=list(maxfun=100000) to avoid convergence errors. I'm now wondering whether this can also lead to differences in AIC/log-lik etc. for one and the same model? Is it equally problematic to compare two models that differ with regard to the inclusion of the optCtrl=list(maxfun=100000) argument?
I actually thought that increasing the number of iterations would simply lead to longer computation times (rather than different results), but I was unable to verify this online. Any hint/explanation is appreciated.
As far as I know, you should be fine. As long as the models were fit with the same number of observations you should be able to compare them using the AIC. Hopefully someone else can comment on the nuances of the computations of the AIC itself, but I just fit a bunch of models with the same formula and dataset and different number of max iterations, getting the AIC each time. It didn't change as a function of the iterations. The iterations are just the time the model fitting process can take to maximize the likelihood, which for complex models can be tricky. Once a model is fit, and has converged on an answer, the number of iterations shouldn't change anything about the model itself.
If you look at this question, the top answer explains the AIC quite well:https://stats.stackexchange.com/questions/232465/how-to-compare-models-on-the-basis-of-aic

Solving an optimization problem bounded by conditional constrains

Basically, I have a dataset that contains 'weights' for some (207) variables, some are more important than the others for determining the class variable (binary) and therefore they are bigger etc. at the end all weigths are summed up across all columns so that the resulting cumulative weight is obtained for each observation.
If this weight is higher then some number then class variable is 1 otherwise is 0. I do have true labels for a class variable so the problem is to minimize false positives.
The thing is, for me it looks like a OR problem as it's about finding optimal weights. However, I am not sure if there is an OR method for such problem, at least I have not heard about one. Question is: does anyone recognize this type of problems and can send some keywords for me to research?
Another thing of course is to predict that with machine learning rather then deterministic methods but I need to do it this way.
Thank you!
Are the variables discrete (integer numbers etc) or continuous (floating point numbers)?
If they are discrete, it sounds like the knapsack problem, which constraint solvers like OptaPlanner (see this training that builds a knapsack solver) excel at.
If they are continuous, look for an LP solver, like CPLEX.
Either way, you'll get much better results than machine learning approaches, because neural nets et al are great at pattern recognition use cases (image/voice recognition, prediction, catagorization, ...), but consistently inferior for constraint optimization problems (like this, I presume).

Break if Newton's method is not convergent

I'm trying to implement Newton's method for polynomials to find zero of function. But I must predict case when function hasn't a root. I'm wondering how can I detect moment when the method becomes divergent and then stop procedure?
Thank you in advance for any help
Generally, if the root is not found after 10 iterations, then the initial point was bad. To be safe take 15 or 20 iterations. Or check after 5-10 iterations for quadratic convergence, measured by the function value decreasing from iteration to iteration faster than by a factor of 0.25.
Restart in a bad case with a different point.

Finding Optimal Parameters In A "Black Box" System

I'm developing machine learning algorithms which classify images based on training data.
During the image preprocessing stages, there are several parameters which I can modify that affect the data I feed my algorithms (for example, I can change the Hessian Threshold when extracting SURF features). So the flow thus far looks like:
[param1, param2, param3...] => [black box] => accuracy %
My problem is: with so many parameters at my disposal, how can I systematically pick values which give me optimized results/accuracy? A naive approach is to run i nested for-loops (assuming i parameters) and just iterate through all parameter combinations, but if it takes 5 minute to calculate an accuracy from my "black box" system this would take a long, long time.
This being said, are there any algorithms or techniques which can search for optimal parameters in a black box system? I was thinking of taking a course in Discrete Optimization but I'm not sure if that would be the best use of my time.
Thank you for your time and help!
Edit (to answer comments):
I have 5-8 parameters. Each parameter has its own range. One parameter can be 0-1000 (integer), while another can be 0 to 1 (real number). Nothing is stopping me from multithreading the black box evaluation.
Also, there are some parts of the black box that have some randomness to them. For example, one stage is using k-means clustering. Each black box evaluation, the cluster centers may change. I run k-means several times to (hopefully) avoid local optima. In addition, I evaluate the black box multiple times and find the median accuracy in order to further mitigate randomness and outliers.
As a partial solution, a grid search of moderate resolution and range can be recursively repeated in the areas where the n-parameters result in the optimal values.
Each n-dimensioned result from each step would be used as a starting point for the next iteration.
The key is that for each iteration the resolution in absolute terms is kept constant (i.e. keep the iteration period constant) but the range decreased so as to reduce the pitch/granular step size.
I'd call it a ‘contracting mesh’ :)
Keep in mind that while it avoids full brute-force complexity it only reaches exhaustive resolution in the final iteration (this is what defines the final iteration).
Also that the outlined process is only exhaustive on a subset of the points that may or may not include the global minimum - i.e. it could result in a local minima.
(You can always chase your tail though by offsetting the initial grid by some sub-initial-resolution amount and compare results...)
Have fun!
Here is the solution to your problem.
A method behind it is described in this paper.