I wrote a Gorubi optimization code, but because of some issues, I need to convert to Scipy code. Still have difficulties to convert it. Here is a part of code related to Gorubi:
m = Model()
#x is charging, discharging variable
x = m.addVars(n,lb=-1.5,ub=1.5,vtype=GRB.INTEGER, name="x")
#Y is SOC variable
Y = m.addVars(n+1,lb=0,ub=100,vtype=GRB.CONTINUOUS, name="Y")
# Add constraint: SOC[start]=50, initial SOC
m.addConstr(Y[0]==initialsoc,name='c1')
#Final targeted SOC
m.addConstr(Y[n]>=65,name='c2')
m.addConstrs((Y[i+1]-Y[i] == 3.75*x[i] for i in range(n)), name='c0')
#Objective function. 6 comes from capacity of inverter.
obj1=sum(((load[i+1]-(6*x[i]))*(load[i+1]-(6*x[i])) for i in range (n)))
m.setObjective(obj1,GRB.MINIMIZE)
m.optimize()
My x constraint can only have -1 or 0 or 1 values. The other constraint is Y where at each step y(i+1)-y(i) equals to 3.75*X(i).
Is it possible to convert this code to Scipy? Or do you recommend any other libraries?
Related
I want to use Gurobi to solve for a very simple LP:
minimize z
s.t. x + y <= z
where x, y, z are decision variables generated by gp.Model().addVar() which should be the default variable. The objective of the model is set to be m.setObjective(1.0*z, GRB.MINIMIZE).
Then I solved the model, and the program returns the optimal value for z is 0.000. I don't understand why this is the optimal value? Is there any constraint on the default decision variables of Gurobi, like they are non-positive. Otherwise, why 0.0 is the optimal value for this LP when x, y, and z are unbounded?
The convention for Gurobi and other LP/MIP solvers are that decision variables have a lower bound of zero. If you want another lower bound, then either set the LB attribute, or define it when you call Model.addVar(), ex:
m = Model()
x = m.addVar(lb=-20, name='x')
I have curve that initially Y increases linearly with X, then reach a plateau at point C.
In other words, the curve can be defined as:
if X < C:
Y = k * X + b
else:
Y = k * C + b
The training data is a list of X ~ Y values. I need to determine k, b and C through a machine learning approach (or similar), since the data is noisy and refection point C changes over time. I want something more robust than get C through observing the current sample data.
How can I do it using sklearn or maybe scipy?
WLOG you can say the second equation is
Y = C
looks like you have a linear regression to fit the line and then a detection point to find the constant.
You know that in the high values of X, as in X > C you are already at the constant. So just check how far back down the values of X you get the same constant.
Then do a linear regression to find the line with value of X, X <= C
Your model is nonlinear
I think the smartest way to solve this is to do these steps:
find the maximum value of Y which is equal to k*C+b
M=max(Y)
drop this maximum value from your dataset
df1 = df[df.Y != M]
and then you have simple dataset to fit your X to Y and you can use sklearn for that
I am trying to use Pytorch for non-convex optimisation, trying to maximise my objective (so minimise in SGD). I would like to bound my dependent variable x > 0, and also have the sum of my x values be less than 1000.
I think I have the penalty implemented correctly in the form of a ramp penalty, but am struggling with the bounding of the x variable. In Pytorch you can set the bounds using clamp but it doesn't seem appropriate in this case. I think this is because optim needs the gradients free under the hood. Full working example:
import torch
from torch.autograd import Variable
import numpy as np
def objective(x, a, b, c): # Want to maximise this quantity (so minimise in SGD)
d = 1 / (1 + torch.exp(-a * (x)))
# Checking constraint
exceeded_limit = constraint(x).item()
#print(exceeded_limit)
obj = torch.sum(d * (b * c - x))
# If overlimit add ramp penalty
if exceeded_limit < 0:
obj = obj - (exceeded_limit * 10)
print("Exceeded limit")
return - obj
def constraint(x, limit = 1000): # Must be > 0
return limit - x.sum()
N = 1000
# x is variable to optimise for
x = Variable(torch.Tensor([1 for ii in range(N)]), requires_grad=True)
a = Variable(torch.Tensor(np.random.uniform(0,100,N)), requires_grad=True)
b = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
c = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
# Would like to include the clamp
# x = torch.clamp(x, min=0)
# Non-convex methodf
opt = torch.optim.SGD([x], lr=.01)
for i in range(10000):
# Zeroing gradients
opt.zero_grad()
# Evaluating the objective
obj = objective(x, a, b, c)
# Calculate gradients
obj.backward()
opt.step()
if i%1000==0: print("Objective: %.1f" % -obj.item())
print("\nObjective: {}".format(-obj))
print("Limit: {}".format(constraint(x).item()))
if torch.sum(x<0) > 0: print("Bounds not met")
if constraint(x).item() < 0: print("Constraint not met")
Any suggestions as to how to impose the bounds would be appreciated, either using clamp or otherwise. Or generally advice on non-convex optimisation using Pytorch. This is a much simpler and scaled down version of the problem I'm working so am trying to find a lightweight solution if possible. I am considering using a workaround such as transforming the x variable using an exponential function but then you'd have to scale the function to avoid the positive values becoming infinite, and I want some flexibility with being able to set the constraint.
I meet the same problem with you.
I want to apply bounds on a variable in PyTorch, too.
And I solved this problem by the below Way3.
Your example is a little compliex but I am still learning English.
So I give a simpler example below.
For example, there is a trainable variable v, its bounds is (-1, 1)
v = torch.tensor((0.5, ), require_grad=True)
v_loss = xxxx
optimizer.zero_grad()
v_loss.backward()
optimizer.step()
Way1. RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
v.clamp_(-1, 1)
Way2. RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed.
v = torch.clamp(v, -1, +1) # equal to v = v.clamp(-1, +1)
Way3. NotError. I solved this problem in Way3.
with torch.no_grad():
v[:] = v.clamp(-1, +1) # You must use v[:]=xxx instead of v=xxx
I'm trying to optimize a configuration X (boolean), such that the total price : base_price + discount, on a configuration is minimized, but the problem formulation gives a Matmul error since x is a cvxpy Variable and thus doesn't conform to the Numpy shape even though it was defined with the correct length.
n = len(Configuration)
x = cp.Variable(n, boolean=True)
problem = cp.Problem(cp.Minimize(base_price + price#(price_rules_A#x <= price_rules_B)), [
config_rules_A#x <= config_rules_B,
config_rules_2A#x == config_rules_2B
])
# where price#(price_rules_A#x <= price_rules_B) is the total discount
# and price, price_rules_A and price_rules_B are numpy arrays
The error i get is
ValueError: matmul: Input operand 1 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)
I expect it to find an optimal config for x ( 0010110...) such that the discount is minimized but it doesn't. Any idea what might be causing this?
Assuming the evaluation of the inequality in the objective function is suppose to work as index to price, you can rewrite the function as
cp.Minimize(base_price + price#(1-(price_rules_B - price_rules_A#x))
Then the elements in price where the inequality is true will be summed.
As a warm-up to writing my own elastic net solver, I'm trying to get a fast enough version of ordinary least squares implemented using coordinate descent.
I believe I've implemented the coordinate descent algorithm correctly, but when I use the "fast" version (see below), the algorithm is insanely unstable, outputting regression coefficients that routinely overflow a 64-bit float when the number of features is of moderate size compared to the number of samples.
Linear Regression and OLS
If b = A*x, where A is a matrix, x a vector of the unknown regression coefficients, and y is the output, I want to find x that minimizes
||b - Ax||^2
If A[j] is the jth column of A and A[-j] is A without column j, and the columns of A are normalized so that ||A[j]||^2 = 1 for all j, the coordinate-wise update is then
Coordinate Descent:
x[j] <-- A[j]^T * (b - A[-j] * x[-j])
I'm following along with these notes (page 9-10) but the derivation is simple calculus.
It's pointed out that instead of recomputing A[j]^T(b - A[-j] * x[-j]) all the time, a faster way to do it is with
Fast Coordinate Descent:
x[j] <-- A[j]^T*r + x[j]
where the total residual r = b - Ax is computed outside the loop over coordinates. The equivalence of these update rules follows from noting that Ax = A[j]*x[j] + A[-j]*x[-j] and rearranging terms.
My problem is that while the second method is indeed faster, it's wildly numerically unstable for me whenever the number of features isn't small compared to the number of samples. I was wondering if anyone might have some insight as to why that's the case. I should note that the first method, which is more stable, still starts disagreeing with more standard methods as the number of features approaches the number of samples.
Julia code
Below is some Julia code for the two update rules:
function OLS_builtin(A,b)
x = A\b
return(x)
end
function OLS_coord_descent(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
for j = 1:P
x[j] = dot(A[:,j], b - A[:,1:P .!= j]*x[1:P .!= j])
end
end
return(x)
end
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
end
end
return(x)
end
Example of the problem
I generate data with the following:
n = 100
p = 50
σ = 0.1
β_nz = float([i*(-1)^i for i in 1:10])
β = append!(β_nz,zeros(Float64,p-length(β_nz)))
X = randn(n,p); X .-= mean(X,1); X ./= sqrt(sum(abs2(X),1))
y = X*β + σ*randn(n); y .-= mean(y);
Here I use p=50, and I get good agreement between OLS_coord_descent(X,y) and OLS_builtin(X,y), whereas OLS_coord_descent_fast(X,y)returns exponentially large values for the regression coefficients.
When p is less than about 20, OLS_coord_descent_fast(X,y) agrees with the other two.
Conjecture
Since things agrees for the regime of p << n, I think the algorithm is formally correct, but numerically unstable. Does anyone have any thoughts on whether this guess is correct, and if so how to correct for the instability while retaining (most) of the performance gains of the fast version of the algorithm?
The quick answer: You forgot to update r after each x[j] update. Following is the fixed function which behaves like OLS_coord_descent:
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
r -= A[:,j]*dot(A[:,j],r) # Add this line
end
end
return(x)
end