Using the piecewise function of the IBM CPLEX python API, but the problem cannot be solved - optimization

I try to use MILP (Mixed Integer Linear Programming) to calculate the unit commitment problem. (unit commitment: An optimization problem trying to find the best scheduling of generator)
Because the relationship between generator power and cost is a quadratic function, so I use piecewise function to convert power to cost.
I modify the answer on this page:
unit commitment problem using piecewise-linear approximation become MIQP
The simple program structure is like this:
from docplex.mp.model import Model
mdl = Model(name='buses')
nbbus40 = mdl.integer_var(name='nbBus40')
nbbus30 = mdl.integer_var(name='nbBus30')
mdl.add_constraint(nbbus40*40 + nbbus30*30 >= 300, 'kids')
#after 4 buses, additional buses of a given size are cheaper
f1=mdl.piecewise(0, [(0,0),(4,2000),(10,4400)], 0.8)
f2=mdl.piecewise(0, [(0,0),(4,1600),(10,3520)], 0.8)
cost1= f1(nbbus40)
cost2 = f2(nbbus30)
mdl.minimize(cost1+ cost1)
mdl.solve()
mdl.report()
for v in mdl.iter_integer_vars():
print(v," = ",v.solution_value)
which gives
* model buses solved with
objective = 3520.000
nbBus40 = 0
nbBus30 = 10.0
The answer is perfect but there is no way to apply my example.
I used a piecewise function to formulate a piecewise linear relationship between power and cost, and got a new object (cost1), and then calculated the minimum value of this object.
The following is my actual code(simply):
(min1,miny1), (pw1_1,pw1_1y),(pw1_2,pw1_2y), (max1,maxy1) are the breakpoints on the power-cost curve.
pwl_func_1phase = ucpm.piecewise(
0,
[(0,0),(min1,miny1),
(pw1_1,pw1_1y),
(pw1_2,pw1_2y),
(max1,maxy1)
],
0
)
#df_decision_vars_spinning is a dataframe store Optimization variables
df_decision_vars_spinning.at[
(units,period),
'variable_cost'
] = pwl_func_1phase(
df_decision_vars_spinning.at[
(units,period),
'production'
]
)
total_variable_cost = ucpm.sum(
(df_decision_vars_spinning.variable_cost))
ucpm.minimize(total_variable_cost )
I don’t know what causes this optimization problem can't be solve. Here is my complete code :
https://colab.research.google.com/drive/1JSKfOf0Vzo3E3FywsxcDdOz4sAwCgOHd?usp=sharing

With an unlimited edition of CPLEX, your model solves (though very slowly). Here are two ideas to better control what happens in solve()
use solve(log_output=True) to print the log: you'll see the gap going down
set a mip gap: setting mip gap to 5% stops the solve at 36s
ucpm.parameters.mip.tolerances.mipgap = 0.05
ucpm.solve(log_output=True)

Not an answer, but to illustrate my comment.
Let's say we have as the cost curve
cost = α + β⋅power^2
Furthermore, we are minimizing cost.
We can approximate using a few linear curves. Here I have drawn a few:
Let's say each linear curve has the form
cost = a(i) + b(i)⋅power
for i=1,...,n (n=number of linear curves).
It is easy to see that is we write:
min cost
cost ≥ a(i) + b(i)⋅power ∀i
we have a good approximation for the quadratic cost curve. This is exactly as I said in the comment.
No binary variables were used here.

Related

How to perform dynamic optimization for a nonlinear discrete optimization problem with nonlinear constraints, using non-linear solvers like SNOPT?

I am new to the field of optimization and I need help in the following optimization problem. I have tried to solve it using normal coding to make sure that I got he correct results. However, the results I got are different and I am not sure my way of analysis is correct or not. This is a short description of the problem:
The objective function shown in the picture is used to find the optimal temperature of the insulating system that minimizes the total cost over a given horizon.
[This image provides the mathematical description of the objective function and the constraints] (https://i.stack.imgur.com/yidrO.png)
The data of the problems are as follow:
1-
Problem data:
A=1.07×10^8
h=1
T_ref=87.5
N=20
p1=0.001;
p2=0.0037;
This is the curve I want to obtain
2- Optimization variable:
u_t
3- Model type:
The model is a nonlinear cost function with non-linear constraints and it is solved using non-linear solver SNOPT.
4-The meaning of the symbols in the objective and constrained functions
The optimization is performed over a prediction horizon of N years.
T_ref is The reference temperature.
Represent the degree of polymerization in the kth year.
X_DP Represents the temperature of the insulating system in the kth year.
h is the time step (1 year) of the discrete-time model.
R is the ratio of the load loss at the rated load to the no-load loss.
E is the activation energy.
A is the pre-exponential constant.
beta is a linear coefficient representing the cost due to the decrement of the temperature.
I have developed the source code in MATLAB, this code is used to check if my analysis is correct or not.
I have tried to initialize the Ut value in its increasing or decreasing states so that I can have the curves similar to the original one. [This is the curve I obtained] (https://i.stack.imgur.com/KVv2q.png)
I have tried to simulate the problem using conventional coding without optimization and I got the figure shown above.
close all; clear all;
h=1;
N=20;
a=250;
R=8.314;
A=1.07*10^8;
E=111000;
Tref=87.5;
p1=0.0019;
p2=0.0037;
p3=0.0037;
Utt=[80,80.7894736842105,81.5789473684211,82.3684210526316,83.1578947368421,... % The value of Utt given here represent the temperature increament over a predictive horizon.
83.9473684210526,84.7368421052632,85.5263157894737,86.3157894736842,...
87.1052631578947,87.8947368421053,88.6842105263158,89.4736842105263,...
90.2631578947369,91.0526315789474,91.8421052631579,92.6315789473684,...
93.4210526315790,94.2105263157895,95];
Utt1 = [95,94.2105263157895,93.4210526315790,92.6315789473684,91.8421052631579,... % The value of Utt1 given here represent the temperature decreament over a predictive horizon.
91.0526315789474,90.2631578947369,89.4736842105263,88.6842105263158,...
87.8947368421053,87.1052631578947,86.3157894736842,85.5263157894737,...
84.7368421052632,83.9473684210526,83.1578947368421,82.3684210526316,...
81.5789473684211,80.7894736842105,80];
Ut1=zeros(1,N);
Ut2=zeros(1,N);
Xdp =zeros(N,N);
Xdp(1,1)=1000;
Xdp1 =zeros(N,N);
Xdp1(1,1)=1000;
for L=1:N-1
for k=1:N-1
%vt(k+L)=Ut(k-L+1);
Xdq(k+1,L) =(1/Xdp(k,L))+A*exp((-1*E)/(R*(Utt(k)+273)))*24*365*h;
Xdp(k+1,L)=1/(Xdq(k+1,L));
Xdp(k,L+1)=1/(Xdq(k+1,L));
Xdq1(k+1,L) =(1/Xdp1(k,L))+A*exp((-1*E)/(R*(Utt1(k)+273)))*24*365*h;
Xdp1(k+1,L)=1/(Xdq1(k+1,L));
Xdp1(k,L+1)=1/(Xdq1(k+1,L));
end
end
% MATLAB code
for j =1:N-1
Ut1(j)= -p1*(Utt(j)-Tref);
Ut2(j)= -p2*(Utt1(j)-Tref);
end
sum00=sum(Ut1);
sum01=sum(Ut2);
X1=1./Xdp(:,1);
Xf=1./Xdp(:,20);
Total= table(X1,Xf);
Tdiff =a*(Total.Xf-Total.X1);
X22=1./Xdp1(:,1);
X2f=1./Xdp1(:,20);
Total22= table(X22,X2f);
Tdiff22 =a*(Total22.X2f-Total22.X22);
obj=(sum00+(Tdiff));
ob1 = min(obj);
obj2=sum01+Tdiff22;
ob2 = min(obj2);
plot(Utt,obj,'-o');
hold on
plot(Utt1,obj)

Confused by random.randn()

I am a bit confused by the numpy function random.randn() which returns random values from the standard normal distribution in an array in the size of your choosing.
My question is that I have no idea when this would ever be useful in applied practices.
For reference about me I am a complete programming noob but studied math (mostly stats related courses) as an undergraduate.
The Python function randn is incredibly useful for adding in a random noise element into a dataset that you create for initial testing of a machine learning model. Say for example that you want to create a million point dataset that is roughly linear for testing a regression algorithm. You create a million data points using
x_data = np.linspace(0.0,10.0,1000000)
You generate a million random noise values using randn
noise = np.random.randn(len(x_data))
To create your linear data set you follow the formula
y = mx + b + noise_levels with the following code (setting b = 5, m = 0.5 in this example)
y_data = (0.5 * x_data ) + 5 + noise
Finally the dataset is created with
my_data = pd.concat([pd.DataFrame(data=x_data,columns=['X Data']),pd.DataFrame(data=y_data,columns=['Y'])],axis=1)
This could be used in 3D programming to generate non-overlapping random values. This would be useful for optimization of graphical effects.
Another possible use for statistical applications would be applying a formula in order to test against spacial factors affecting a given constant. Such as if you were measuring a span of time with some formula doing something but then needing to know what the effectiveness would be given various spans of time. This would return a statistic measuring for example that your formula is more effective in the shorter intervals or longer intervals, etc.
np.random.randn(d0, d1, ..., dn) Return a sample (or samples) from the “standard normal” distribution(mu=0, stdev=1).
For random samples from , use:
sigma * np.random.randn(...) + mu
This is because if Z is a standard normal deviate, then will have a normal distribution with expected value and standard deviation .
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.randn.html
https://en.wikipedia.org/wiki/Normal_distribution

weighted regression in SQL

I'm new to SQL, so waiting for someone to shed me some lights hopefully. We got a stored procedure in place using the simple linear regression. Now I want to apply some weighting using a discount factor of lamda, i.e. 1, lamda, lamda^2, ..., lamda^n, while n is the length of the original series.
How should I generate the discounted weight series and apply to the current code structure below?
...
SUM((OASSpline-OASPriorSpline) * (AdjOASDolDur-AdjOASPriorDolDur))/SUM(SQUARE((AdjOASDolDur-AdjOASPriorDolDur))) as Beta, /* Beta = Sxy/Sxx */
SUM(SQUARE((AdjOASDolDur-AdjOASPriorDolDur))) as Sxx,
SUM((OASSpline-OASPriorSpline) * (AdjOASDolDur-AdjOASPriorDolDur)) as Sxy
...
e.g.
If I set discount factor (lamda) = 0.99, my weighting array should be formed generated automatically using the length of 10 from my series:
OASSpline = [1.11,1.45,1.79, 2.14, 2.48, 2.81,3.13,3.42,3.70,5.49]
AdjOASDolDur = [0.75,1.06,1.39, 1.73, 2.10, 2.48,2.85,3.20,3.52,3.61]
OASPriorSpline = 5.49
AdjOASPriorDolDur = 5.61
Weight = [1,0.99,0.9801,0.970299,0.96059601,0.9509900, 0.941480149,0.932065348,0.922744694,0.913517247]
The weighted linear regression should return a beta of 0.81243398, while the current simple linear regression should return a beta of 0.81164174.
Thanks much in advance!
I'll take a stab.
You could look at this article dealing generating sequence numbers and then use the current row number generated as an exponent. Does that work? I think a fair few are bamboozled by the request.

How to estimate confidence of nonlinear regression?

I use Levenberg -- Marquardt algorithm to fit my nonlinear function f(x,b) (x:Nx1, b:Mx1) to data X:NxK.
Now I want to estimate goodness (confidence) of solution b.
This post says that I should not try to find R-squared in nonlinear case. What should I do then? Are there any reliable universal metrics at all? I could not google any answer for this.
Standard errors are usually calculated as:
s.e. = sigma^2 inv(J'J)
or as
s.e. = sigma^2 inv(H)
where
J : Jacobian matrix
H : Hessian matrix
sigma^2 = SSE/df = sum of squared errors / (n-p)
A confidence interval is then
b +- s.e. * t(n-p,alpha/2)
where t is the critical value for the Student’s t distribution

backtracking line search parameter

I am reading/practicing a bit with optimization using Nocedal&Wright, when I got the the simple backtracking algorithm, where if d is my line direction and a is the step size the algorithm looks for a such that
for some 0 < c < 1. They advised to use a very small c, order of 10^-4.
That seemed very odd to me, as a very loss demand.
I did some experimenting with c = 0.3 and it seemed to work much better then the sugested 10^-4 ( for a simple quadratic problem and steepest descent).
Any intuition as to why such a low value should work and why didn't it do well for me?
Thanks.
∇ f() may have completely different scales for different problems;
one stepsize cannot fit all.
Consider f(x) = sin( ω . x ): the right c will depend on ω,
which may be on the order of 1, or 1e-6, or ...
Thus it's a good idea to scale ∇ f() to about norm 1, then play with c.
(People who recommend "c = ...", please describe your problem size and scales.)
Add some noise to your quadratic, see what happens as you increase the noise.
Try quadratic + noise in 2d, 10d.
In machine learning, there seems to be quite a lot of folklore on c a.k.a. learning rate;
google
learning-rate on stackexchange.com ,
also gradient-descent step-size
and adagrad adaptive gradient.