Market Neutral Optimisation using CVXPY - optimization

I have a model that generates alphas and sigma for a set of stocks. Have coded a long-only optimisation using CVXOPT by passing the function sol=solvers.qp(Q, p, G, h, A, b)
Now I would like to add two further optimisation problem to the script I already have so that I can also have results for a Long Short and Market Neural (sum weights = 0) portfolio. In order to do that I would like to use/import CVXPY without adding too many lines of code given I have already loaded up alphas sigma and weight bounds.
Below you can find the data I am loading to currently optimise a long only portfolio using CVXOPT. I will appreciate if anyone would be so kind to provide me with some help on how to set CVXPY to return an optimal Long Short and Market Neutral using those data. I could also share the whole code
### Parameters setup
Alpha = np.array(-np.transpose(opt.matrix(np.loadtxt('C:\Alpha.txt'))))
Var_Cov = np.loadtxt('C:\VAR_COV.txt')
n = len (Var_Cov)
r_min = 0.03
maxW = np.loadtxt('C:\maxW.txt')
minW = np.loadtxt('C:\minW.txt')
### Solve
solution = optimize_portfolio(n, Alpha, Var_Cov, r_min)

Related

pyomo matrix product

I would like to use pyomo to solve a multiple linear regression under constraint in pyomo.
to do so I have 3 matrices :
X (noted tour1 in the following code) My inputs (600x13)(bureaux*t1 in pyomo)
Y (noted tour2 in the following code) the matrix I want to predict (6003)(bureauxt2 inpyomo)
T (noted transfer In the code) (13x3)(t1*t2 in pyomo)
I would like to do the following
ypred = XT
minimize (ypred-y)**2
subject to
0<T<1
and Sum_i(Tij)=1
To that effect, I started the following code
from pyomo.environ import *
tour1=pd.DataFrame(np.random.random(size=(60,13)),columns=["X"+str(i) for i in range(13)],index=["B"+str(i) for i in range(60)])
tour2=pd.DataFrame(np.random.random(size=(60,3)),columns=["Y"+str(i) for i in range(3)],index=["B"+str(i) for i in range(60)])
def gettour1(model,i,j):
return tour1.loc[i,j]
def gettour2(model,i,j):
return tour2.loc[i,j]
def cost(model):
return sum((sum(model.tour1[i,k] * model.transfer[k,j] for k in model.t1) - model.tour2[i,j] )**2 for i in model.bureaux for j in model.tour2)
model = ConcreteModel()
model.bureaux = Set(initialize=tour1.index.tolist())
model.t1 = Set(initialize=tour1.columns)
model.t2 = Set(initialize=tour2.columns)
model.tour1 = Param(model.bureaux, model.t1,initialize=gettour1)
model.tour2 = Param(model.bureaux, model.t2,initialize=gettour2)
model.transfer = Var(model.t1,model.t2,bounds=[0,1])
model.obj=Objective(rule=cost, sense=minimize)
I unfortunately get an error at this stage :
KeyError: "Index '('X0', 'B0', 'Y0')' is not valid for indexed component 'transfer'"
anyone knows how I can calculate the objective ?
furthermore any help for the constrains would be appreciated :-)
A couple things...
First, the error you are getting. There is information in that error statement that should help identify the problem. The construction appears to be trying to index transfer with a 3-part index (x, b, y). That clearly is outside of t1 x t2. If you look at the sum equation you have, you are mistakenly using model.tour2 instead of model.t2.
Also, your bounds parameter needs to be a tuple.
While building the model, you should be pprint()-ing the model very frequently to look for these types of issues. That only works well if you have "small" data. 60 x 13 may be the normal problem size, but it is a total pain to troubleshoot. So, start with something tiny, maybe 3 x 4. Make a Set, pprint(). Make a Constraint, pprint()... Once the model computes/solves with "tiny" data, just pop in the real stuff.

Set the objective of an optimizer as the standard deviation of the input (Nonlinear optimization using pymo)

I am trying to use pymo for a single objective nonlinear optimization problem.
The objective function is to minimize the variance (or standard deviation) of the input variables following certain constraints (which I was able to do in Excel).
Following is a code example of what I am trying to do
model = pyo.ConcreteModel()
# declare decision variables
model.x1 = pyo.Var(domain=pyo.NonNegativeReals)
model.x2 = pyo.Var(domain=pyo.NonNegativeReals)
model.x3 = pyo.Var(domain=pyo.NonNegativeReals)
model.x4 = pyo.Var(domain=pyo.NonNegativeReals)
# declare objective
from statistics import stdev
model.variance = pyo.Objective(
expr = stdev([model.x1, model.x2, model.x3, model.x4]),
sense = pyo.minimize)
# declare constraints
model.max_charging = pyo.Constraint(expr = model.x1 + model.x2 + model.x3 + model.x4 >= 500)
model.max_x1 = pyo.Constraint(expr = model.x1 <= 300)
model.max_x2 = pyo.Constraint(expr = model.x2 <= 200)
model.max_x3 = pyo.Constraint(expr = model.x3 <= 100)
model.max_x4 = pyo.Constraint(expr = model.x4 <= 200)
# solve
pyo.SolverFactory('glpk').solve(model).write()
#print
print("energy_price = ", model.variance())
print(f'Variables = [{model.x1()},{model.x2()},{model.x3()},{model.x4()}]')
The error I get is TypeError: can't convert type 'ScalarVar' to numerator/denominator
The problem seems to be caused by using the stdev function from statistics.
My assumption is that the models variables x1-x4 are yet to have been assigned a value and that is the main issue. However, I am not sure how to approach this?
First: stdev is nonlinear. So why even try to solve this with a linear solver?
Pyomo does not know about the statistics package. You'll have to program the standard deviation using elementary operations, use an external function approach, or use an approximation (like minimizing the range).
So I managed to solve this issue and I am including the solution below. But first, there are a couple of points I would like to highlight
As #Erwin Kalvelagen mentioned, 'stdev' is nonlinear so a linear solver like 'glpk' would always result in an error. For my problem, 'ipopt' worked fine but be careful as it can perform poorly in some cases.
Also, as #Erwin Kalvelagen mentioned, Pyomo does not know about the statistics package. So When you try to use a function from that package (e.g., 'stdev', 'variance', etc.), it will try to evaluate the model variables before the solver assigns them any value and that will cause an error.
A pyomo objective function needs an expression. The code sample below shows how to dynamically generate an expression for the variance without using an external package. The code is also agnostic to the number of model variables you have.
Using either the variance or the standard deviation will serve the same purpose for my project. I opted for using the variance to avoid calculating its square root (as the standard deviation is the square root of the variance).
Variability Objective Function
import pyomo.environ as pyo
def variabilityRule(model):
#Calculate mean of all model variables
for index in model.x:
mean += model.x[index]
mean = mean/len(model.x)
#Calculate the difference between each model variable and the mean
obj_exp = ((model.x[1])-mean) * ((model.x[1])-mean)
for i in range(2, len(model.x)+1): #note that pyomo variables start from 1 not zero
obj_exp += ((model.x[i])-mean) * ((model.x[i])-mean)
#Divide by number of items
obj_exp = (obj_exp/(len(model.x)))
return obj_exp
model = pyo.ConcreteModel()
model.objective = pyo.Objective(
rule = variabilityRule,
sense = pyo.maximize)
EDIT: Standard Deviation Calculation
You can use the same approach to calculate the standard deviation as I found out. Just multiply the final expression (`obj_exp`) by power 0.5
obj_exp = obj_exp ** 0.5
...
P.S. If you are interested, this is how I dynamically generated my model variables based on an input array
model.x = pyo.VarList(domain=pyo.NonNegativeReals)
for i in range(0, len(input_array)):
model.x.add()

Projection of fisheye camera model by Scaramuzza

I am trying to understand the fisheye model by Scaramuzza, which is implemented in Matlab, see https://de.mathworks.com/help/vision/ug/fisheye-calibration-basics.html#mw_8aca38cc-44de-4a26-a5bc-10fb312ae3c5
The backprojection (uv to xyz) seems fairly straightforward according to the following equation:
, where rho=sqrt(u^2 +v^2)
However, how does the projection (from xyz to uv) work?! In my understanding we get a rather complex set of equations. Unfortunately, I don't find any details on that....
Okay, I believe I understand it now fully after analyzing the functions of the (windows) calibration toolbox by Scaramuzza, see https://sites.google.com/site/scarabotix/ocamcalib-toolbox/ocamcalib-toolbox-download-page
Method 1 found in file "world2cam.m"
For the projection, use the same equation above. In the projection case, the equation has three known (x,y,z) and three unknown variables (u,v and lambda). We first substitute lambda with rho by realizing that
u = x/lambda
v = y/lambda
rho=sqrt(u^2+v^2) = 1/lambda * sqrt(x^2+y^2) --> lambda = sqrt(x^2+y^2) / rho
After that, we have the unknown variables (u,v and rho)
u = x/lambda = x / sqrt(x^2+y^2) * rho
v = y/lambda = y / sqrt(x^2+y^2) * rho
z / lambda = z /sqrt(x^2+y^2) * rho = a0 + a2*rho^2 + a3*rho^3 + a4*rho^4
As you can see, the last equation now has only one unknown, namely rho. Thus, we can solve it easily using e.g. the roots function in matlab. However, the result does not always exist nor is it necessarily unique. After solving the unknown variable rho, calculating uv is very simple using the equation above.
This procedure needs to be performed for each point (x,y,z) separately and is thus rather computationally expensive for an image.
Method 2 found in file "world2cam_fast.m"
The last equation has the form rho(x,y,z). However, if we define m = z / sqrt(x^2+y^2) = tan(90°-theta), it only depends on one variable, namely rho(m).
Instead of solving this equation rho(m) for every new m, the authors "plot" the function for several values of m and fit an 8th order polynomial to these points. Using this polynomial they can calculate an approximate value for rho(m) much quicker in the following.
This becomes clear, because "world2cam_fast.m" makes use of ocam_model.pol, which is calculated in "undistort.m". "undistort.m" in turn makes use of "findinvpoly.m".

Confused by random.randn()

I am a bit confused by the numpy function random.randn() which returns random values from the standard normal distribution in an array in the size of your choosing.
My question is that I have no idea when this would ever be useful in applied practices.
For reference about me I am a complete programming noob but studied math (mostly stats related courses) as an undergraduate.
The Python function randn is incredibly useful for adding in a random noise element into a dataset that you create for initial testing of a machine learning model. Say for example that you want to create a million point dataset that is roughly linear for testing a regression algorithm. You create a million data points using
x_data = np.linspace(0.0,10.0,1000000)
You generate a million random noise values using randn
noise = np.random.randn(len(x_data))
To create your linear data set you follow the formula
y = mx + b + noise_levels with the following code (setting b = 5, m = 0.5 in this example)
y_data = (0.5 * x_data ) + 5 + noise
Finally the dataset is created with
my_data = pd.concat([pd.DataFrame(data=x_data,columns=['X Data']),pd.DataFrame(data=y_data,columns=['Y'])],axis=1)
This could be used in 3D programming to generate non-overlapping random values. This would be useful for optimization of graphical effects.
Another possible use for statistical applications would be applying a formula in order to test against spacial factors affecting a given constant. Such as if you were measuring a span of time with some formula doing something but then needing to know what the effectiveness would be given various spans of time. This would return a statistic measuring for example that your formula is more effective in the shorter intervals or longer intervals, etc.
np.random.randn(d0, d1, ..., dn) Return a sample (or samples) from the “standard normal” distribution(mu=0, stdev=1).
For random samples from , use:
sigma * np.random.randn(...) + mu
This is because if Z is a standard normal deviate, then will have a normal distribution with expected value and standard deviation .
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.randn.html
https://en.wikipedia.org/wiki/Normal_distribution

weighted regression in SQL

I'm new to SQL, so waiting for someone to shed me some lights hopefully. We got a stored procedure in place using the simple linear regression. Now I want to apply some weighting using a discount factor of lamda, i.e. 1, lamda, lamda^2, ..., lamda^n, while n is the length of the original series.
How should I generate the discounted weight series and apply to the current code structure below?
...
SUM((OASSpline-OASPriorSpline) * (AdjOASDolDur-AdjOASPriorDolDur))/SUM(SQUARE((AdjOASDolDur-AdjOASPriorDolDur))) as Beta, /* Beta = Sxy/Sxx */
SUM(SQUARE((AdjOASDolDur-AdjOASPriorDolDur))) as Sxx,
SUM((OASSpline-OASPriorSpline) * (AdjOASDolDur-AdjOASPriorDolDur)) as Sxy
...
e.g.
If I set discount factor (lamda) = 0.99, my weighting array should be formed generated automatically using the length of 10 from my series:
OASSpline = [1.11,1.45,1.79, 2.14, 2.48, 2.81,3.13,3.42,3.70,5.49]
AdjOASDolDur = [0.75,1.06,1.39, 1.73, 2.10, 2.48,2.85,3.20,3.52,3.61]
OASPriorSpline = 5.49
AdjOASPriorDolDur = 5.61
Weight = [1,0.99,0.9801,0.970299,0.96059601,0.9509900, 0.941480149,0.932065348,0.922744694,0.913517247]
The weighted linear regression should return a beta of 0.81243398, while the current simple linear regression should return a beta of 0.81164174.
Thanks much in advance!
I'll take a stab.
You could look at this article dealing generating sequence numbers and then use the current row number generated as an exponent. Does that work? I think a fair few are bamboozled by the request.