How to add sequential (time series) constraint to optimization problem using python PuLP? - optimization

A simple optimization problem: Find the optimal control sequence for a refrigerator based on the cost of energy. The only constraint is to stay below a temperature threshold, and the objective function tries to minimize the cost of energy used. This problem is simplified so the control is simply a binary array, ie. [0, 1, 0, 1, 0], where 1 means using electricity to cool the fridge, and 0 means to turn of the cooling mechanism (which means there is no cost for this period, but the temperature will increase). We can assume each period is fixed period of time, and has a constant temperature change based on it's on/off status.
Here are the example values:
Cost of energy (for our example 5 periods): [466, 426, 423, 442, 494]
Minimum cooling periods (just as a test): 3
Starting temperature: 0
Temperature threshold(must be less than or equal): 1
Temperature change per period of cooling: -1
Temperature change per period of warming (when control input is 0): 2
And here is the code in PuLP
from pulp import LpProblem, LpMinimize, LpVariable, lpSum, LpStatus, value
from itertools import accumulate
l = list(range(5))
costy = [466, 426, 423, 442, 494]
cost = dict(zip(l, costy))
min_cooling_periods = 3
prob = LpProblem("Fridge", LpMinimize)
si = LpVariable.dicts("time_step", l, lowBound=0, upBound=1, cat='Integer')
prob += lpSum([cost[i]*si[i] for i in l]) # cost function to minimize
prob += lpSum([si[i] for i in l]) >= min_cooling_periods # how many values must be positive
prob.solve()
The optimization seems to work before I try to account for the temperature threshold. With just the cost function, it returns an array of 0s, which does indeed minimize the cost (duh). With the first constraint (how many values must be positive) it picks the cheapest 3 cooling periods, and calculates the total cost correctly.
obj = value(prob.objective)
print(f'Solution is {LpStatus[prob.status]}\nThe total cost of this regime is: {obj}\n')
for v in prob.variables():
print(f'{v.name} = {v.varValue}')
output:
Solution is Optimal
The total cost of this regime is: 1291.0
time_step_0 = 0.0
time_step_1 = 1.0
time_step_2 = 1.0
time_step_3 = 1.0
time_step_4 = 0.0
So, if our control sequence is [0, 1, 1, 1, 0], the temperature will look like this at the end of each cooling/warming period: [2, 1, 0, -1, 1]. The temperature goes up 2 whenever the control input is 1, and down 1 whenever the control input is 1. This example sequence is a valid answer, but will have to change if we add a max temperature threshold of 1, which would mean the first value must be a 1, or else the fridge will warm to a temperature of 2.
However I get incorrect results when trying to specify the sequential constraint of staying within the temperature thresholds with the condition:
up_temp_thresh = 1
down = -1
up = 2
# here is where I try to ensure that the control sequence would never cause the temperature to
# surpass the threshold. In practice I would like a lower and upper threshold but for now
# let us focus only on the upper threshold.
prob += lpSum([e <= up_temp_thresh for e in accumulate([down if si[i] == 1. else up for i in l])]) >= len(l)
In this case the answer comes out the same as before, I am clearly not formulating it correctly as the sequence [0, 1, 1, 1, 0] would surpass the threshold.
I am trying to encode "the temperature at the end of each control sequence must be less than the threshold". I do this by turning the control sequence into an array of the temperature changes, so control sequence [0, 1, 1, 1, 0] gives us temperature changes [2, -1, -1, -1, 2]. Then using the accumulate function, it computes a cumulative sum, equal to the fridge temp after each step, which is [2, 1, 0, -1, 1]. I would like to just check if the max of this array is less than the threshold, but using lpSum I check that the sum of values in the array less than the threshold is equal to the length of the array, which should be the same thing.
However I'm clearly formulating this step incorrectly. As written this last constraint has no effect on the output, and small changes give other wrong answers. It seems the answer should be [1, 1, 1, 0, 0], which gives an acceptable temperature series of [-1, -2, -3, -1, 1]. How can I specify the sequential nature of the control input using PuLP, or another free python optimization library?

The easiest and least error-prone approach would be to create a new set of auxillary variables of your problem which track the temperature of the fridge in each interval. These are not 'primary decision variables' because you cannot directly choose them - rather the value of them is constrained by the on/off decision variables for the fridge.
You would then add constraints on these temperature state variables to represent the dynamics. So in untested code:
l_plus_1 = list(range(6))
fridge_temp = LpVariable.dicts("fridge_temp", l_plus_1, cat='Continuos')
fridge_temp[0] = init_temp # initial temperature of fridge - a known value
for i in l:
prob += fridge_temp[i+1] == fridge_temp[i] + 2 - 3*s[i]
You can then sent the min/max temperature constraints on these new fridge_temp variables.
Note that in the above I've assumed that the fridge temperature variables are defined at one more intervals than the on/off decisions for the fridge. The fridge temperature variables represent the temperature at the start of an interval - and having one extra one means we can ensure the final temperature of the fridge is acceptable.

Related

How to handle (discrete) time-series of boundary condition in Bayesian estimation of ODE?

I want to estimate the parameter in an ordinary differential equation (ODE). However, I don’t know how to input the time series of the boundary condition. It is not a “function”, but a time series (i.e., discrete data points). For example, daily inflow of water when modelling water volume of a lake.
I checked the manual of WinBUGS Differential interface, and it seems that their "Worked Example 2: Population PK Model" offers a solution, which is by ode.block() and piecewise() function:
R31[i] <- piecewise(vec.R31[i, 1:n.block])
vec.R31[i, 1] <- 0
vec.R31[i, 2] <- 0
vec.R31[i, 3] <- dose[i] / TI[i]
vec.R31[i, 4] <- 0
...
list(
... n.block = 4, ...)
where R31[i] can be seen as a time-variable boundary condition, n.block means that there are four sub-period for this boundary condition.
However, this solution can not be applied in my model, since I have a boundary condition whose data can not be divided into only 4 (or several) periods. The boundary condition is a daily-scale time series. Thus, if the simulation is 10 years, then I have 3650 sub-periods.
Is there a way to handle the numeric (i.e., discrete) boundary condition with many data points?

How should I handle music key (scale) as a feature in the knn algorithm

I'm doing a data science project, and I was wondering how to handle a music key (scale) as a feature in the KNN algorithm.
I know KNN is based on distances, therefore giving each key a number like 1-24 doesn't make that much sense (because key number 24 is close to 1 as much as 7 close to 8).
I have thought about making a column for "Major/Minor" and another for the note itself,
but I'm still facing the same problem, I need to specify the note with a number, but because notes are cyclic I cannot number them linearly 1-12.
For the people that have no idea how music keys work my question is equivalent to handling states in KNN, you can't just number them linearly 1-50.
One way you could think about the distance between scales is to think of each scale as a 12-element binary vector where there's a 1 wherever a note is in the scale and a zero otherwise.
Then you can compute the Hamming distance between scales. The Hamming distance, for example, between a major scale and its relative minor scale should be zero because they both contain the same notes.
Here's a way you could set this up in Python
from enum import IntEnum
import numpy as np
from scipy.spatial.distance import hamming
class Note(IntEnum):
C = 0
Db = 1
D = 2
Eb = 3
E = 4
F = 5
Gb = 6
G = 7
Ab = 8
A = 9
Bb = 10
B = 11
major = np.array((1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1))
minor = np.array((1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0)) #WHWWHWW Natural Minor
# Transpose the basic scale form to a key using Numpy's `roll` function
cMaj = np.roll(major, Note.C) # Rolling by zero changes nothing
aMin = np.roll(minor, Note.A)
gMaj = np.roll(major, Note.G)
fMaj = np.roll(major, Note.F)
print('Distance from cMaj to aMin', hamming(cMaj, aMin))
print('Distance from cMaj to gMaj', hamming(cMaj, gMaj)) # One step clockwise on circle of fifths
print('Distance from cMaj to fMaj', hamming(cMaj, fMaj)) # One step counter-clockwise on circle of fifths
IIUC, you can convert your features to something like sin as follows. Hear I have 10 values 1-10 and I am transforming them to keep their circular relation.
a = np.around(np.sin([np.deg2rad(x*18) for x in np.array(list(range(11)))]), 3)
import matplotlib.pyplot as plt
plt.plot(a)
Output:
Through this feature engineering you can see that the circularity of your feature is encoded. The value of 0 is equal to 10.

Calculate and return the average of positive, negative, and neutral

I have the following dataframe:
enter image description here
I am trying to have three additional columns in which they return sum of instances of 0, 1-, and 1 (positive negative and neutral per say). After that, I want to calculate the average sentiment of user's posts. Any help with appending these averages would be great.
So far I tried the solution below:
def mean_positive(L):
# Get all positive numbers into another list
pos_only = [x for x in L if x > 0]
if pos_only:
return sum(pos_only) / len(pos_only)
raise ValueError('No postive numbers in input')
Thank you.

Plotting an exponential function given one parameter

I'm fairly new to python so bare with me. I have plotted a histogram using some generated data. This data has many many points. I have defined it with the variable vals. I have then plotted a histogram with these values, though I have limited it so that only values between 104 and 155 are taken into account. This has been done as follows:
bin_heights, bin_edges = np.histogram(vals, range=[104, 155], bins=30)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2.
plt.errorbar(bin_centres, bin_heights, np.sqrt(bin_heights), fmt=',', capsize=2)
plt.xlabel("$m_{\gamma\gamma} (GeV)$")
plt.ylabel("Number of entries")
plt.show()
Giving the above plot:
My next step is to take into account values from vals which are less than 120. I have done this as follows:
background_data=[j for j in vals if j <= 120] #to avoid taking the signal bump, upper limit of 120 MeV set
I need to plot a curve on the same plot as the histogram, which follows the form B(x) = Ae^(-x/λ)
I then estimated a value of λ using the maximum likelihood estimator formula:
background_data=[j for j in vals if j <= 120] #to avoid taking the signal bump, upper limit of 120 MeV set
#print(background_data)
N_background=len(background_data)
print(N_background)
sigma_background_data=sum(background_data)
print(sigma_background_data)
lamb = (sigma_background_data)/(N_background) #maximum likelihood estimator for lambda
print('lambda estimate is', lamb)
where lamb = λ. I got a value of roughly lamb = 27.75, which I know is correct. I now need to get an estimate for A.
I have been advised to do this as follows:
Given a value of λ, find A by scaling the PDF to the data such that the area beneath
the scaled PDF has equal area to the data
I'm not quite sure what this means, or how I'd go about trying to do this. PDF means probability density function. I assume an integration will have to take place, so to get the area under the data (vals), I have done this:
data_area= integrate.cumtrapz(background_data, x=None, dx=1.0)
print(data_area)
plt.plot(background_data, data_area)
However, this gives me an error
ValueError: x and y must have same first dimension, but have shapes (981555,) and (981554,)
I'm not sure how to fix it. The end result should be something like:
See the cumtrapz docs:
Returns: ... If initial is None, the shape is such that the axis of integration has one less value than y. If initial is given, the shape is equal to that of y.
So you are either to pass an initial value like
data_area = integrate.cumtrapz(background_data, x=None, dx=1.0, initial = 0.0)
or discard the first value of the background_data:
plt.plot(background_data[1:], data_area)

cvxpy integer variable - exclude certain integer values from the solution

I have the following problem and I can't figure out if cvxpy can do what I need.
Context: I optimize portfolios. When buying bonds and optimizing the quantity of each bond to buy, it's only possible to buy each bond only in multiples of 1,000 units.
However, the minimum piece required to be bought is most of the time 10,000.
This means we either don't buy a bond at all or if we buy it, the quantity bought has to be either 10,000, 11,000, 12,000 and so on.
Is there a way (it seems it doesn't) to restrict certain values from the possible solutions an integer variable can have?
So let's assume we have an integer variable x that is non negative.
We basically want to buy 1000x but we know that x can be x = {0, 10, 11, 12, ...}
Is it possible to skip values 1.. 9 without adding other variables?
For example:
import numpy as np
import pandas as pd
import cvxpy as cvx
np.random.seed(1)
# np.random.rand(3)
p = pd.DataFrame({'bond_id': ['s1','s2', 's3', 's4', 's5', 's6', 's7','s8','s9', 's10'],
'er': np.random.rand(10),
'units': [10000,2000,3000,4000,27000,4000,0,0,0,0] })
final_units = cvx.Variable( 10, integer=True)
constraints = list()
constraints.append( final_units >= 0)
constraints.append(sum(final_units*1000) <= 50000)
constraints.append(sum(final_units*1000) >= 50000)
constraints.append(final_units <= 15)
obj = cvx.Maximize( final_units # np.array(list(p['er'])) )
prob = cvx.Problem(obj, constraints)
solve_val = prob.solve()
print("\n* solve_val = {}".format(solve_val))
solution_value = prob.value
solution = str(prob.status).lower()
print("\n** SOLUTION 3: {} Value: {} ".format(solution, solution_value))
print("\n* final_units -> \n{}\n".format(final_units.value))
p['FINAL_SOL'] = final_units.value * 1000
print("\n* Final Portfolio: \n{}\n".format(p))
This solution is a very simplified version of the problem I face. The final vector final_units can suggest values like in this example where we have to buy 5,000 units of bond s9, however I can't since the min I can buy is 10,000.
I know I could add an additional integer vector to express an OR condition, but in reality my real problem is way bigger than this, I have thousand of integer variables already. Hence, I wonder if there's a way to exclude values from 1 to 9 without adding additional variables to the problem.
Thank you
No, not with CVXPY. You can model it with an integer variable x[i] plus a binary variable y[i], and using the constraints (in math notation):
y[i] * 10 <= x[i] <= y[i] * 15
This results in x[i] ∈ {0, 10..15}.
Some solvers have a variable type for this: semi-integer variables. Using this you don't need to have an extra binary variable and these 2 constraints. CVXPY does not support this variable type AFAIK.