Why could a SCIPcopy model be infeasible a when original model is feasible? - optimization

I'm new to SCIP, so I'm not sure if this is a bug or if I'm just doing something wrong.
I have a MIP instance that solves perfectly using SCIP, however when I try to solve a copy of the model SCIP says that it is infeasible. It seems to be more noticeable when presolve is turned off.
I'm using windows with the pre-built SCIP v3.2.0. The model only has binary and integer variables.
The following code outlines my attempt:
SCIP* _scip, subscip;
SCIPcreate(&_scip);
SCIPincludeDefaultPlugins(_scip);
SCIPcreateProbBasic(_scip, "interval_solver")); // create an empty problem
SCIPsetPresolving(_scip, SCIP_PARAMSETTING_OFF, true); //disable presolving
// build model (snipped)
SCIPsolve(_scip); // succeeds and gives feasible solution
SCIP_Bool valid = FALSE;
SCIPcreate(&subscip);
SCIPcopy(_scip, subscip, NULL, NULL, "1", TRUE, FALSE, TRUE, &valid);
SCIPsolve(subscip); // infeasible
Something that might be related (and seems weird to me) is that after solving the original problem (and getting a feasible solution), checking the solution reports an infeasible result. i.e.
SCIP_SOL* sol = SCIPgetBestSol(_scip);
SCIPcheckSol(_scip, sol, TRUE, TRUE, TRUE, TRUE, &valid);
gives:
solution value 1 violates bounds of <t_x71_(6,1275,6805)_(9,1275,6805)>[-0,0] by 1
Any ideas why this could be happening? Thanks!

Propagation in SCIP may take into account the best solution known so far and do reductions which are only valid for the problem of finding a solution better than this.
For example, if you have a minimization problem with n variables x_1,...,x_n with objective coefficients c_1,...,c_n >= 0 and already found a solution with x_1 = 1, x_2 = ... = x_n = 0, then propagation will globally fix x_1 to 0, because the objective of any solution with x_1 = 1 will be at least as large as the objective of the solution you already found.
This means that the solutions found so far may not be feasible anymore for the remaining problem (which looks for a strictly better solution).
In order to check the solution, you should check it in the original problem space, which you can do with SCIPcheckSolOrig().
Disabling presolving an propagation might help, but does not guarantee that the global presolved problem is not changed. Presolving in the LP solver should not be the problem, but might have changed the reported optimal LP solution (if there are multiple optima) and therefore caused a change in the solving process. This might have avoided your issue in this case, but probably by pure luck and the issue may still appear again on other instances. Moreover, the more features you disable, the more this will have a negative impact on your performance.
However, there is an easy solution to your problem: You can copy the original unchanged problem by using SCIPcopyOrig().

Some of the variable bounds were still being presolved. To fix the issue I needed to add:
SCIPsetBoolParam(_scip, "lp/presolving", FALSE);
This fixed most things, but the following also helped fix some 'check solution' issues:
SCIPsetIntParam(_scip, "propagating/maxrounds", 0);
SCIPsetIntParam(_scip, "propagating/maxroundsroot", 0);

Related

Model is infeasible in Gurobi although it has a feasible solution

I am attempting to solve a non-convex quadratic optimization problem using Gurobi, but I have encountered an issue. Specifically, I have a specific objective function; however, I am only interested in finding a feasible solution. To do this, I tried two ways:
1- set my specific objective function as the model objective and set the parameter "SolutionLimit" to 1. This works fine, and Gurobi gives me a feasible solution.
2- give Gurobi no objective function (or set the objective to some arbitrary number like 0). In this case, Gurobi returns no feasible solution. The log it prints says:
Optimal solution found (tolerance 1.00e-04)
Warning: max constraint violation (1.5757e+01) exceeds tolerance
(model may be infeasible or unbounded - try turning presolve off)
Best objective -0.000000000000e+00, best bound -0.000000000000e+00, gap 0.0000%
I checked the solution it returned, and it is infeasible. I want the second method to work too. I have attempted to modify the solver parameters (such as "m.ModelSense = GRB.MAXIMIZE," "m.params.MIPFocus = 3," "m.params.NoRelHeurTime = 200," "m.params.DualReductions = 0," "m.params.Presolve = 2," and "m.params.Crossover = 0") in an effort to resolve this issue but have been unsuccessful. Are there any other parameters that I can adjust in order to successfully solve this problem?
This model has numerical issues; to understand more, please see Guidelines for Numerical Issues in the Gurobi Reference Manual.

Event Handling for ordinary differential equations for billiards

I am currently interested in billiards. However, I am interested in special billiards with a non-conventional reflection law and specific rules for the trajectories. I, therefore, need to calculate the trajectories using a differential equation solver. Finding one is not a problem at all. However, I still have trouble finding a suitable solution for the reflection. Previously I was working in Mathematica, whose Numerical ODE solver has a WhenEvent option:
Example:
NDSolve[{y''[t] == -9.81, y[0] == 5, y'[0] == 0, WhenEvent[y[t] == 0, y'[t] -> -0.95 y'[t]]}, y, {t, 0, 10}];
The solution that this line of code gives is a bouncing ball.
Basically after each integration step, it checks whether the condition is true and if so, it performs an action. (I suspect it checks if y has switched sign. If, for example, one puts in
WhenEvent[y²[t]==0],
the quantity does not switch sign and this method fails.)
Now, I would like to switch from Mathematica to something that is more openly available (C++ or python based.) but I could not find anything that can has this or a similar options. Does anyone perhaps have an Idea, what I could use instead? Basically I am looking for the option to check for a condition after each integration step and if the condition is met perform an action on the solution.
Does anyone have an idea what I could use?
Any help appreciated

I got segmentation fault\changing feastol dafault value

I have a SCIP project to solve a binary problem with nonlinear objective function that works well but for some instances I got a message saying "the best solution is not feasible", and there is some violation in the constraints. (The violation is mostly very small)
To solve this issue, I added the SCIP_CALL_EXC( SCIPsetRealParam(scip, "numerics/feastol", 1e-5) to change the default value of feastol. But I get segmentation fault!
Following your helpful suggestion, the violation value is now much lower. My objective function is in the form of Min: AX+ LSqrt(BX). In the previous version, I had used an auxiliary variable, let say, Q such that Q^2 - L^2(BX) >=0 and the objective function was expressed as Min: AX+ Q . In the new version, I changed the inequality sign into equality and in combination with SCIPsetRealParam(scip, "numerics/feastol",1e-8), the violation in the constraints are much lower. What can I do more to decrease the violation value? Moreover, when I printed the value of feastol, I seenumerics/lpfeastol=1e-8 but lpfeastol is different from feastol!. So, why lpfeastol is modified instead?. I see the modified value of lpfeastol several times printed on screen when SCIP is solving the problem. I appreciate your help in advance
Maybe try changing the define of UPGSCALE in src/scip/cons_soc.c to some larger value.
The output for the lpfeastol is unfortunate, but normal. Reducing feastol automatically leads to adjusting lpfeastol, too. I cannot reproduce "I printed the value of feastol, I seenumerics/lpfeastol=1e-8 but lpfeastol is different from feastol".

Root finding with a kinked function using NLsolve in Julia

I am currently trying to solve a complementarity problem with a function that features a downward discontinuity, using the mcpsolve() function of the NLsolve package in Julia. The function is reproduced here for specific parameters, and the numbers below refer to the three panels of the figure.
Unfortunately, the algorithm does not always return the interior solution, even though it exists:
In (1), when starting at 0, the algorithm stays at 0, thinking that the boundary constraint binds,
In (2), when starting at 0, the algorithm stops right before the downward jump, even though the solution lies to the right of this point.
These problems occur regardless of the method used - trust region or Newton's method. Ideally, the algorithm would look for potential solutions in the entire set before stopping.
I was wondering if some of you had worked with similar functions, and had found a clever solution to bypass these issues. Note that
Starting to the right of the solution would not solve these problems, as they would also occur for other parametrization - see (3) this time,
It is not known a priori where in the parameter space the particular cases occur.
As an illustrative example, consider the following piece of code. Note that the function is smoother, and yet here as well the algorithm cannot find the solution.
function f!(x,fvec)
if x[1] <= 1.8
fvec[1] = 0.1 * (sin(3*x[1]) - 3)
else
fvec[1] = 0.1 * (x[1]^2 - 7)
end
end
NLsolve.mcpsolve(f!,[0.], [Inf], [0.], reformulation = :smooth, autodiff = true)
Once more, setting the initial value to something else than 0 would only postpone the problem. Also, as pointed out by Halirutan, fzero from the Roots package would probably work, but I'd rather use mcpsolve() as the problem is initially a complementarity problem.
Thank you very much in advance for your help.

Circumventing R's `Error in if (nbins > .Machine$integer.max)`

This is a saga which began with the problem of how to do survey weighting. Now that I appear to be doing that correctly, I have hit a bit of a wall (see previous post for details on the import process and where the strata variable came from):
> require(foreign)
> ipums <- read.dta('/path/to/data.dta')
> require(survey)
> ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt)
Error in if (nbins > .Machine$integer.max) stop("attempt to make a table with >= 2^31 elements") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In pd * nl : NAs produced by integer overflow
> traceback()
9: tabulate(bin, pd)
8: as.vector(data)
7: array(tabulate(bin, pd), dims, dimnames = dn)
6: table(ids[, 1], strata[, 1])
5: inherits(x, "data.frame")
4: is.data.frame(x)
3: rowSums(table(ids[, 1], strata[, 1]) > 0)
2: svydesign.default(id = ~serial, weights = ~perwt, strata = ~strata,
data = ipums)
1: svydesign(id = ~serial, weights = ~perwt, strata = ~strata, data = ipums)
This error seems to come from the tabulate function, which I hoped would be straightforward enough to circumvent, first by changing .Machine$integer.max
> .Machine$integer.max <- 2^40
and when that didn't work the whole source code of tabulate:
> tabulate <- function(bin, nbins = max(1L, bin, na.rm=TRUE))
{
if(!is.numeric(bin) && !is.factor(bin))
stop("'bin' must be numeric or a factor")
#if (nbins > .Machine$integer.max)
if (nbins > 2^40) #replacement line
stop("attempt to make a table with >= 2^31 elements")
.C("R_tabulate",
as.integer(bin),
as.integer(length(bin)),
as.integer(nbins),
ans = integer(nbins),
NAOK = TRUE,
PACKAGE="base")$ans
}
Neither circumvented the problem. Apparently this is one reason why the ff package was created, but what worries me is the extent to which this is a problem I cannot avoid in R. This post seems to indicate that even if I were to use a package that would avoid this problem, I would only be able to access 2^31 elements at a time. My hope was to use sql (either sqlite or postgresql) to get around the memory problems, but I'm afraid I'll spend a while getting that to work, only to run into the same fundamental limit.
Attempting to switch back to Stata doesn't solve the problem either. Again see the previous post for how I use svyset, but the calculation I would like to run causes Stata to hang:
svy: mean age, over(strata)
Whether throwing more memory at it will solve the problem I don't know. I run R on my desktop which has 16 gigs, and I use Stata through a Windows server, currently setting memory allocation to 2000MB, but I could theoretically experiment with increasing that.
So in sum:
Is this a hard limit in R?
Would sql solve my R problems?
If I split it up into many separate files would that fix it (a lot of work...)?
Would throwing a lot of memory at Stata do it?
Am I seriously barking up the wrong tree somehow?
Yes, R uses 32-bit indexes for vectors so they can contain no more than 2^31-1 entries and you are trying to create something with 2^40. There is talk of introducing 64-bit indexes but that will be some way off before appearing in R. Vectors have the stated hard limit and that is it as far as base R is concerned.
I am unfamiliar with the details of what you are doing to offer any further advice on the other parts of your Q.
Why do you want to work with the full data set? Wouldn't a smaller sample that can fit in to the restrictions R places on you be just as useful? You could use SQL to store all the data and query it from R to return a random subset of more appropriate size.
Since this question was asked some time ago, I'd like to point that my answer here uses the version 3.3 of the survey package.
If you check the code of svydesign, you can see that the function that causes all the problem is within a check step that looks whether you should set the nest parameter to TRUE or not. This step can be disabled setting the option check.strata=FALSE.
Of course, you shouldn't disable a check step unless you know what you are doing. In this case, you should be able to decide yourself whether you need to set the nest option to TRUE or FALSE. nest should be set to TRUE when the same PSU (cluster) id is recycled in different strata.
Concretely for the IPUMS dataset, since you are using the serial variable for cluster identification and serial is unique for each household in a given sample, you may want to set nest to FALSE.
So, your survey design line would be:
ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt, check.strata=FALSE, nest=FALSE)
Extra advice: even after circumventing this problem you will find that the code is pretty slow unless you remap strata to a range from 1 to length(unique(ipums$strata)):
ipums$strata <- match(ipums$strata,unique(ipums$strata))
Both #Gavin and #Martin deserve credit for this answer, or at least leading me in the right direction. I'm mostly answering it separately to make it easier to read.
In the order I asked:
Yes 2^31 is a hard limit in R, though it seems to matter what type it is (which is a bit strange given it is the length of the vector, rather than the amount of memory (which I have plenty of) which is the stated problem. Do not convert strata or id variables to factors, that will just fix their length and nullify the effects of subsetting (which is the way to get around this problem).
sql could probably help, provided I learn how to use it correctly. I did the following test:
library(multicore) # make svy fast!
ri.ny <- subset(ipums, statefips_num %in% c(36, 44))
ri.ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri.ny)
svyby(~incwage, ~strata, ri.ny.design, svymean, data=ri.ny, na.rm=TRUE, multicore=TRUE)
ri <- subset(ri.ny, statefips_num==44)
ri.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri)
ri.mean <- svymean(~incwage, ri.design, data=ri, na.rm=TRUE)
ny <- subset(ri.ny, statefips_num==36)
ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ny)
ny.mean <- svymean(~incwage, ny.design, data=ny, na.rm=TRUE, multicore=TRUE)
And found the means to be the same, which seems like a reasonable test.
So: in theory, provided I can split up the calculation by either using plyr or sql, the results should still be fine.
See 2.
Throwing a lot of memory at Stata definitely helps, but now I'm running into annoying formatting issues. I seem to be able to perform most of the calculation I want (much quicker and with more stability as well) but I can't figure out how to get it into the form I want. Will probably ask a separate question on this. I think the short version here is that for big survey data, Stata is much better out of the box.
In many ways yes. Trying to do analysis with data this big is not something I should have taken on lightly, and I'm far from figuring it out even now. I was using the svydesign function correctly, but I didn't really know what's going on. I have a (very slightly) better grasp now, and it's heartening to know I was generally correct about how to solve the problem. #Gavin's general suggestion of trying out small data with external results to compare to is invaluable, something I should have started ages ago. Many thanks to both #Gavin and #Martin.