Why I am getting same answers in gurobi when I am finding multiple solutions? - gurobi

I am using Gurobi for solving an optimization problem. In my problem, the goal is to analyze the most possible solutions. For this purpose, I am using the parameter:
PoolSearchMode=2
in Gurobi to find multiple solutions. But, when I retrieve the solutions, there are some same results!. For example, if it returns 100 solutions, half of them are the same and actually I have 50 different solutions.
For more detail, I am trying to find some sets of nodes in a graph that have a special feature. So I have set the parameter "PoolSearchMode" to 2 that causes the MIP to do a systematic search for the n best solutions. I have defined a parameter "best" to find solutions that have "objVal" equal to the best one. In the blow there is a part of my code:
m.Params.PoolSearchMode = 2
m.Params.PoolSolutions = 100
b = m.addVars(Edges, vtype=GRB.BINARY, name = "b")
.
.
.
if m.status == GRB.Status.OPTIMAL:
best = 0
for key in range(m.SolCount):
m.setParam(GRB.Param.SolutionNumber, key)
if m.objVal == m.PoolObjVal:
best+=1
optimal_sets = [[] for i in range(best)]
for key in range(best):
m.setParam(GRB.Param.SolutionNumber, key)
for e in (Edges):
if b[e].Xn>0 and b[e].varname[2:]=="{}".format(External_node):
optimal_sets[key].append(int(b[e].varname[0:2]))
return optimal_sets
I have checked and I found that, if there are not 100 solutions in a graph, it returns fewer solutions. But in these sets also there are same results like:
[1,2,3],
[1,2,3],
[1,3,5]
How can I fix this issue to get different solutions?

It seems that Gurobi returns multiple solutions to the MIP that can be mapped to the same solution of your underlying problem. I'm sure that if you checked the entire solution vector all these solutions would indeed be different. You are only looking at a subset of the variables in your set optimal_sets.

Related

Model is infeasible in Gurobi although it has a feasible solution

I am attempting to solve a non-convex quadratic optimization problem using Gurobi, but I have encountered an issue. Specifically, I have a specific objective function; however, I am only interested in finding a feasible solution. To do this, I tried two ways:
1- set my specific objective function as the model objective and set the parameter "SolutionLimit" to 1. This works fine, and Gurobi gives me a feasible solution.
2- give Gurobi no objective function (or set the objective to some arbitrary number like 0). In this case, Gurobi returns no feasible solution. The log it prints says:
Optimal solution found (tolerance 1.00e-04)
Warning: max constraint violation (1.5757e+01) exceeds tolerance
(model may be infeasible or unbounded - try turning presolve off)
Best objective -0.000000000000e+00, best bound -0.000000000000e+00, gap 0.0000%
I checked the solution it returned, and it is infeasible. I want the second method to work too. I have attempted to modify the solver parameters (such as "m.ModelSense = GRB.MAXIMIZE," "m.params.MIPFocus = 3," "m.params.NoRelHeurTime = 200," "m.params.DualReductions = 0," "m.params.Presolve = 2," and "m.params.Crossover = 0") in an effort to resolve this issue but have been unsuccessful. Are there any other parameters that I can adjust in order to successfully solve this problem?
This model has numerical issues; to understand more, please see Guidelines for Numerical Issues in the Gurobi Reference Manual.

How to get list of basic variables from JuMP/Gurobi?

I'm solving an LP in Julia/JuMP using Gurobi as my solver. I want to know which variables in my solution are basic. How do I obtain this information?
Just checking which variables are non-zero is not sufficient, since we might be dealing with a degenerate solution (i.e., basic variables equal to zero).
I found two similar questions online, but the solutions suggested do not seem to apply to the current version of JuMP anymore (or am I missing something?):
https://discourse.julialang.org/t/how-to-obtain-a-basic-solution/1784
https://groups.google.com/g/julia-opt/c/vJ6QuFBfPbw?pli=1
There's a Gurobi attribute called VBasis (https://www.gurobi.com/documentation/9.0/refman/vbasis.html) that seems to be what I'm looking for, but I don't know how to access it.
This is poorly documented, but you can see which constraints are basic:
model = Model(Gurobi.Optimizer)
#variable(model, x >= 0)
#constraint(model, c, 2x >= 1)
#objective(model, Min, x)
optimize!(model)
julia> MOI.get(model, MOI.ConstraintBasisStatus(), c)
NONBASIC::BasisStatusCode = 1
julia> MOI.get(model, MOI.ConstraintBasisStatus(), LowerBoundRef(x))
BASIC::BasisStatusCode = 0
Note that since variables can have lower and upper bounds, we report which constraints are basic, rather than which variables are.
Documentation:
https://jump.dev/MathOptInterface.jl/stable/apireference/#MathOptInterface.ConstraintBasisStatus
https://jump.dev/MathOptInterface.jl/stable/apireference/#MathOptInterface.BasisStatusCode

Pyomo: Unbounded objective function though bounded

I am currently implementing an optimization problem with pyomo and since now some hours I get the message that my problem is unbounded. After searching for the issue, I came along one term which seems to be unbounded. I excluded this term from the objective function and it shows that it takes a very high negative value, which supports the assumption that it is unbounded to -Inf.
But I have checked the problem further and it is impossible that the term is unbounded, as following code and results show:
model.nominal_cap_storage = Var(model.STORAGE, bounds=(0,None)) #lower bound is 0
#I assumed very high CAPEX for each storage (see print)
dict_capex_storage = {'battery': capex_battery_storage,
'co2': capex_co2_storage,
'hydrogen': capex_hydrogen_storage,
'heat': capex_heat_storage,
'syncrude': capex_syncrude_storage}
print(dict_capex_storage)
>>> {'battery': 100000000000000000, 'co2': 100000000000000000,
'hydrogen': 1000000000000000000, 'heat': 1000000000000000, 'syncrude': 10000000000000000000}
From these assumptions I already assume that it is impossible that the one term can be unbounded towards -Inf as the capacity has the lower bound of 0 and the CAPEX is a positive fixed value. But now it gets crazy. The following term is has the issue of being unbounded:
model.total_investment_storage = Var()
def total_investment_storage_rule(model):
return model.total_investment_storage == sum(model.nominal_cap_storage[storage] * dict_capex_storage[storage] \
for storage in model.STORAGE)
model.total_investment_storage_con = Constraint(rule=total_investment_storage_rule)
If I exclude the term from the objective function, I get following value after the optimization. It seems, that it can take high negative values.
>>>>
Variable total_investment_storage
-1004724108.3426505
So I checked the term regarding the component model.nominal_cap_storage to see the value of the capacity:
model.total_cap_storage = Var()
def total_cap_storage_rule(model):
return model.total_cap_storage == sum(model.nominal_cap_storage[storage] for storage in model.STORAGE)
model.total_cap_storage_con = Constraint(rule=total_cap_storage_rule)
>>>>
Variable total_cap_storage
0.0
I did the same for the dictionary, but made a mistake: I forgot to delete the model.nominal_cap_storage. But the result is confusing:
model.total_capex_storage = Var()
def total_capex_storage_rule(model):
return model.total_capex_storage == sum(model.nominal_cap_storage[storage] * dict_capex_storage[storage] \
for storage in model.STORAGE)
model.total_capex_storage_con = Constraint(rule=total_capex_storage_rule)
>>>>
Variable total_capex_storage
0.0
So my question is why is the term unbounded and how is it possible that model.total_investment_storage and model.total_capex_storage have different solutions though both are calculated equally? Any help is highly appreciated.
I think you are misinterpreting "unbounded." When the solver says the problem is unbounded, that means the objective function value is unbounded based on the variables and constraints in the problem. It has nothing to do with bounds on variables, unless one of those variable bounds prevents the objective from being unbound.
If you want help on above problem, you need to edit and post the full problem, with the objective function, and (if possible) the error. What you have now is a collection of different snippets of different variations of a problem, which isn't really informative on the overall issue.
I solved the problem by setting a lower bound to the term, which takes a negative value:
model.total_investment_storage = Var(bounds=(0, None)
I am still not sure why this term can take negative values but this solved at least my problem

Setting initial values for non-linear parameters via tabuSearch

I'm trying to fit the lppl model to KLSE index to predict the most probable crash time. Many papers suggested tabuSearch to identify the initial value for non-linear parameters but none of them publish their code. I have tried to fit the mentioned index with the help of NLS And Log-Periodic Power Law (LPPL) in R. But the obtained error and p values are not significant. I believe that the initial values are not accurate. Can anyone help me on how to find the proper initial values?
library(tseries)
library(zoo)
ts<-get.hist.quote(instrument="^KLSE",start="2003-04-18",end="2008-01-30",quote="Close",provider="yahoo",origin="1970-01-01",compression="d",retclass="zoo")
df<-data.frame(ts)
df<-data.frame(Date=as.Date(rownames(df)),Y=df$Close)
df<-df[!is.na(df$Y),]
library(minpack.lm)
library(ggplot2)
df$days<-as.numeric(df$Date-df[1,]$Date)
f<-function(pars,xx){pars$a + (pars$tc - xx)^pars$m *(pars$b+ pars$c * cos(pars$omega*log(pars$tc - xx) + pars$phi))}
resids<-function(p,observed,xx){df$Y-f(p,xx)}
nls.out <- nls.lm(par=list(a=600,b=-266,tc=3000, m=.5,omega=7.8,phi=-4,c=-14),fn = resids, observed = df$Y, xx = df$days, control= nls.lm.control (maxiter =1024, ftol=1e-6, maxfev=1e6))
par<-nls.out$par
nls.final<-nls(Y~(a+(tc-days)^m*(b+c*cos(omega*log(tc-days)+phi))),data=df,start=par,algorithm="plinear",control=nls.control(maxiter=10024,minFactor=1e-8))
summary(nls.final)
I would look at some of the newer research on this topic, there is a good trig modification that will practically guarantee a singular optimization. Additionally, you can use r's built in linear equation solver, to find the linearizable parameters, ergo you will only need to optimize in 3 dimensions. The link below should get you started. I would cite recent literature and personal experience to strongly advise against a tabu search.
https://www.ethz.ch/content/dam/ethz/special-interest/mtec/chair-of-entrepreneurial-risks-dam/documents/dissertation/master%20thesis/MAS_final_Tuncay.pdf

Circumventing R's `Error in if (nbins > .Machine$integer.max)`

This is a saga which began with the problem of how to do survey weighting. Now that I appear to be doing that correctly, I have hit a bit of a wall (see previous post for details on the import process and where the strata variable came from):
> require(foreign)
> ipums <- read.dta('/path/to/data.dta')
> require(survey)
> ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt)
Error in if (nbins > .Machine$integer.max) stop("attempt to make a table with >= 2^31 elements") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In pd * nl : NAs produced by integer overflow
> traceback()
9: tabulate(bin, pd)
8: as.vector(data)
7: array(tabulate(bin, pd), dims, dimnames = dn)
6: table(ids[, 1], strata[, 1])
5: inherits(x, "data.frame")
4: is.data.frame(x)
3: rowSums(table(ids[, 1], strata[, 1]) > 0)
2: svydesign.default(id = ~serial, weights = ~perwt, strata = ~strata,
data = ipums)
1: svydesign(id = ~serial, weights = ~perwt, strata = ~strata, data = ipums)
This error seems to come from the tabulate function, which I hoped would be straightforward enough to circumvent, first by changing .Machine$integer.max
> .Machine$integer.max <- 2^40
and when that didn't work the whole source code of tabulate:
> tabulate <- function(bin, nbins = max(1L, bin, na.rm=TRUE))
{
if(!is.numeric(bin) && !is.factor(bin))
stop("'bin' must be numeric or a factor")
#if (nbins > .Machine$integer.max)
if (nbins > 2^40) #replacement line
stop("attempt to make a table with >= 2^31 elements")
.C("R_tabulate",
as.integer(bin),
as.integer(length(bin)),
as.integer(nbins),
ans = integer(nbins),
NAOK = TRUE,
PACKAGE="base")$ans
}
Neither circumvented the problem. Apparently this is one reason why the ff package was created, but what worries me is the extent to which this is a problem I cannot avoid in R. This post seems to indicate that even if I were to use a package that would avoid this problem, I would only be able to access 2^31 elements at a time. My hope was to use sql (either sqlite or postgresql) to get around the memory problems, but I'm afraid I'll spend a while getting that to work, only to run into the same fundamental limit.
Attempting to switch back to Stata doesn't solve the problem either. Again see the previous post for how I use svyset, but the calculation I would like to run causes Stata to hang:
svy: mean age, over(strata)
Whether throwing more memory at it will solve the problem I don't know. I run R on my desktop which has 16 gigs, and I use Stata through a Windows server, currently setting memory allocation to 2000MB, but I could theoretically experiment with increasing that.
So in sum:
Is this a hard limit in R?
Would sql solve my R problems?
If I split it up into many separate files would that fix it (a lot of work...)?
Would throwing a lot of memory at Stata do it?
Am I seriously barking up the wrong tree somehow?
Yes, R uses 32-bit indexes for vectors so they can contain no more than 2^31-1 entries and you are trying to create something with 2^40. There is talk of introducing 64-bit indexes but that will be some way off before appearing in R. Vectors have the stated hard limit and that is it as far as base R is concerned.
I am unfamiliar with the details of what you are doing to offer any further advice on the other parts of your Q.
Why do you want to work with the full data set? Wouldn't a smaller sample that can fit in to the restrictions R places on you be just as useful? You could use SQL to store all the data and query it from R to return a random subset of more appropriate size.
Since this question was asked some time ago, I'd like to point that my answer here uses the version 3.3 of the survey package.
If you check the code of svydesign, you can see that the function that causes all the problem is within a check step that looks whether you should set the nest parameter to TRUE or not. This step can be disabled setting the option check.strata=FALSE.
Of course, you shouldn't disable a check step unless you know what you are doing. In this case, you should be able to decide yourself whether you need to set the nest option to TRUE or FALSE. nest should be set to TRUE when the same PSU (cluster) id is recycled in different strata.
Concretely for the IPUMS dataset, since you are using the serial variable for cluster identification and serial is unique for each household in a given sample, you may want to set nest to FALSE.
So, your survey design line would be:
ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt, check.strata=FALSE, nest=FALSE)
Extra advice: even after circumventing this problem you will find that the code is pretty slow unless you remap strata to a range from 1 to length(unique(ipums$strata)):
ipums$strata <- match(ipums$strata,unique(ipums$strata))
Both #Gavin and #Martin deserve credit for this answer, or at least leading me in the right direction. I'm mostly answering it separately to make it easier to read.
In the order I asked:
Yes 2^31 is a hard limit in R, though it seems to matter what type it is (which is a bit strange given it is the length of the vector, rather than the amount of memory (which I have plenty of) which is the stated problem. Do not convert strata or id variables to factors, that will just fix their length and nullify the effects of subsetting (which is the way to get around this problem).
sql could probably help, provided I learn how to use it correctly. I did the following test:
library(multicore) # make svy fast!
ri.ny <- subset(ipums, statefips_num %in% c(36, 44))
ri.ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri.ny)
svyby(~incwage, ~strata, ri.ny.design, svymean, data=ri.ny, na.rm=TRUE, multicore=TRUE)
ri <- subset(ri.ny, statefips_num==44)
ri.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri)
ri.mean <- svymean(~incwage, ri.design, data=ri, na.rm=TRUE)
ny <- subset(ri.ny, statefips_num==36)
ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ny)
ny.mean <- svymean(~incwage, ny.design, data=ny, na.rm=TRUE, multicore=TRUE)
And found the means to be the same, which seems like a reasonable test.
So: in theory, provided I can split up the calculation by either using plyr or sql, the results should still be fine.
See 2.
Throwing a lot of memory at Stata definitely helps, but now I'm running into annoying formatting issues. I seem to be able to perform most of the calculation I want (much quicker and with more stability as well) but I can't figure out how to get it into the form I want. Will probably ask a separate question on this. I think the short version here is that for big survey data, Stata is much better out of the box.
In many ways yes. Trying to do analysis with data this big is not something I should have taken on lightly, and I'm far from figuring it out even now. I was using the svydesign function correctly, but I didn't really know what's going on. I have a (very slightly) better grasp now, and it's heartening to know I was generally correct about how to solve the problem. #Gavin's general suggestion of trying out small data with external results to compare to is invaluable, something I should have started ages ago. Many thanks to both #Gavin and #Martin.