Constrained quadratic minimization in Mathematica with matrices - optimization

I'm attempting to solve the familiar mean-variance optimization problem using matrices in Mathematica, with a few added constraints on the solution vector "w". (The mean-variance optimization problem is basically choosing how to allocate a given budget over a number of assets according to their means and covariances in order to minimize the risk of the portfolio for a chosen level of mean return.)
My question: I'm not sure which function to use to perform the minimization of the objective function, which is quadratic:
obj = 0.5*w'* Sig * w
where w is the Nx1 vector of weights for each of the N assets and Sig is the NxN covariance matrix
From what I've been able to find (I'm fairly new to Mathematica), it seems that FindMinimum, NMinimize, etc. are meant to deal only with scalar inputs, while LinearProgramming is meant for an objective function that's linear (not quadratic) in the weight vector w. I could very well be wrong here--any help in steering me toward the correct function would be much appreciated!
If it helps, I've attached my sample code--I'm not sure how, but if there's a place to upload the sample .csv data, I can do that as well if someone could point me to it.
Thank you very much for any help you can provide.
-Dan
CODE
(* Goal: find an Nx1 vector of weights w that minimizes total \
portfolio risk w'*Sig*w (where Sig is the covariance matrix) subject to:
-The portfolio expected return is equal to the desired level d: w'*M \
= d, where M is the Nx1 vector of means
-There are exactly two assets chosen from each of the refining, \
construction, hitech, and utility sectors, and exactly one asset \
chosen from the "other" sector
^ The above two constraints are represented together as w'*R = k, \
where R is the matrix [M SEC] and k is the vector [d 2 2 2 2 1]
-Each weight in w takes an integer value of either 0 or 1, \
representing buying or not buying that physical asset (ex. a plant) -- \
this constraint is achieved as a combination of an integer constraint \
and a boundary constraint
**Note that for the T=41 days of observations in the data, not every \
asset generates a value for every day; by leaving the days when the \
asset is "off" as blanks, this shouldn't affect the mean or \
covariance matrices.
*)
Clear["Global`*"]
(* (1) Import the data for today *)
X = Import["X:\\testassets.csv", "Data"];
Dimensions[X];
(* (2) Create required vectors and matrices *)
P = Take[X, {2, 42}, {4}];
Dimensions[P]; (* Should be N assets x 1) *)
r = Take[X, {2, 42}, {10, 50}];
Dimensions[r]; (* Should be N x T *)
Sig = Covariance[
r]; (* When there's more time, add block diagonal restriction here \
*)
Dimensions[Sig]; (* Should be N x N *)
M = Mean[r\[Transpose]];
Dimensions[M]; (* Should be N x 1 *)
SEC = Take[X, {2, 42}, {5, 9}];
Dimensions[SEC]; (* Should be N x 5 *)
(* (3) Set up constrained optimization *)
d = 200; (* desired level of return *)
b = 60000;(* budget constraint *)
R = Join[M, POS];
Dimensions[R]; (* Should be N x 6 *)
k = {d, 2, 2, 2, 2, 1};
obj = w*Sig*w\[Transpose];
constr = w*R;
budgetcap = w*P;
lb = ConstantArray[0, 41];
ub = ConstantArray[1, 41];
FindMinimum[{obj, constr = k, budgetcap <= b, Element[w, Integers],
lb <= w <= ub}, w]

Related

minimum-difference constrained sparse least squares problem (called from python)

I'm struggling a bit finding a fast algorithm that's suitable.
I just want to minimize:
norm2(x-s)
st
G.x <= h
x >= 0
sum(x) = R
G is sparse and contains only 1s (and zeros obviously).
In the case of iterative algorithms, it would be nice to get the interim solutions to show to the user.
The context is that s is a vector of current results, and the user is saying "well the sum of these few entries (entries indicated by a few 1.0's in a row in G) should be less than this value (a row in h). So we have to remove quantities from the entries the user specified (indicated by 1.0 entries in G) in a least-squares optimal way, but since we have a global constraint on the total (R) the values removed need to be allocated in a least-squares optimal way amongst the other entries. The entries can't go negative.
All the algorithms I'm looking at are much more general, and as a result are much more complex. Also, they seem quite slow. I don't see this as a complex problem, although mixes of equality and inequality constraints always seem to make things more complex.
This has to be called from Python, so I'm looking at Python libraries like qpsolvers and scipy.optimize. But I suppose Java or C++ libraries could be used and called from Python, which might be good since multithreading is better in Java and C++.
Any thoughts on what library/package/approach to use to best solve this problem?
The size of the problem is about 150,000 rows in s, and a few dozen rows in G.
Thanks!
Your problem is a linear least squares:
minimize_x norm2(x-s)
such that G x <= h
x >= 0
1^T x = R
Thus it fits the bill of the solve_ls function in qpsolvers.
Here is an instance of how I imagine your problem matrices would look like, given what you specified. Since it is sparse we should use SciPy CSC matrices, and regular NumPy arrays for vectors:
import numpy as np
import scipy.sparse as spa
n = 150_000
# minimize 1/2 || x - s ||^2
R = spa.eye(n, format="csc")
s = np.array(range(n), dtype=float)
# such that G * x <= h
G = spa.diags(
diagonals=[
[1.0 if i % 2 == 0 else 0.0 for i in range(n)],
[1.0 if i % 3 == 0 else 0.0 for i in range(n - 1)],
[1.0 if i % 5 == 0 else 0.0 for i in range(n - 1)],
],
offsets=[0, 1, -1],
)
a_dozen_rows = np.linspace(0, n - 1, 12, dtype=int)
G = G[a_dozen_rows]
h = np.ones(12)
# such that sum(x) == 42
A = spa.csc_matrix(np.ones((1, n)))
b = np.array([42.0]).reshape((1,))
# such that x >= 0
lb = np.zeros(n)
Next, we can solve this problem with:
from qpsolvers import solve_ls
x = solve_ls(R, s, G, h, A, b, lb, solver="osqp", verbose=True)
Here I picked CVXOPT but there are other open-source solvers you can install such as ProxQP, OSQP or SCS. You can install a set of open-source solvers by: pip install qpsolvers[open_source_solvers]. After some solvers are installed, you can list those for sparse matrices by:
print(qpsolvers.sparse_solvers)
Finally, here is some code to check that the solution returned by the solver satisfies our constraints:
tol = 1e-6 # tolerance for checks
print(f"- Objective: {0.5 * (x - s).dot(x - s):.1f}")
print(f"- G * x <= h: {(G.dot(x) <= h + tol).all()}")
print(f"- x >= 0: {(x + tol >= 0.0).all()}")
print(f"- sum(x) = {x.sum():.1f}")
I just tried it with OSQP (adding the eps_rel=1e-5 keyword argument when calling solve_ls, otherwise the returned solution would be less accurate than the tol = 1e-6 tolerance) and it found a solution is 737 milliseconds on my (rather old) CPU with:
- Objective: 562494373088866.8
- G * x <= h: True
- x >= 0: True
- sum(x) = 42.0
Hoping this helps. Happy solving!

Define the function for distance matrix in ampl. Keep getting "i is not defined"

I'm trying to set up a ampl model which clusters given points in a 2-dimensional space according to the model of Saglam et al(2005). For testing purposes I want to generate randomly some datapoints and then calculate the euclidian distance matrix for them (since I need this one). I'm aware that I could only make the distance matrix without the data points but in a later step the data points will be given and then I need to calculate the distances between each the points.
Below you'll find the code I've written so far. While loading the model I keep getting the error message "i is not defined". Since i is a subscript that should run over x1 and x1 is a parameter which is defined over the set D and have one subscript, I cannot figure out why this code should be invalid. As far as I understand, I don't have to define variables if I use them only as subscripts?
reset;
# parameters to define clustered
param m; # numbers of data points
param n; # numbers of clusters
# sets
set D := 1..m; #points to be clustered
set L := 1..n; #clusters
# randomly generate datapoints
param x1 {D} = Uniform(1,m);
param x2 {D} = Uniform(1,m);
param d {D,D} = sqrt((x1[i]-x1[j])^2 + (x2[i]-x2[j])^2);
# variables
var x {D, L} binary;
var D_l {L} >=0;
var D_max >= 0;
#minimization funcion
minimize max_clus_dis: D_max;
# constraints
subject to C1 {i in D, j in D, l in L}: D_l[l] >= d[i,j] * (x[i,l] + x[j,l] - 1);
subject to C2 {i in D}: sum{l in L} x[i,l] = 1;
subject to C3 {l in L}: D_max >= D_l[l];
So far I tried to change the line form param x1 to
param x1 {i in D, j in D} = ...
as well as
param d {x1, x2} = ...
Alas, nothing of this helped. So, any help someone can offer is deeply appreciated. I searched the web but I found nothing useful for my task.
I found eventually what was missing. The line in which I calculated the parameter d should be
param d {i in D, j in D} = sqrt((x1[i]-x1[j])^2 + (x2[i]-x2[j])^2);
Retrospectively it's clear that the subscripts i and j should have been mentioned on the line, I don't know how I could miss that.

Beginner Finite Elemente Code does not solve equation properly

I am trying to write the code for solving the extremely difficult differential equation:
x' = 1
with the finite element method.
As far as I understood, I can obtain the solution u as
with the basis functions phi_i(x), while I can obtain the u_i as the solution of the system of linear equations:
with the differential operator D (here only the first derivative). As a basis I am using the tent function:
def tent(l, r, x):
m = (l + r) / 2
if x >= l and x <= m:
return (x - l) / (m - l)
elif x < r and x > m:
return (r - x) / (r - m)
else:
return 0
def tent_half_down(l,r,x):
if x >= l and x <= r:
return (r - x) / (r - l)
else:
return 0
def tent_half_up(l,r,x):
if x >= l and x <= r:
return (x - l) / (r - l)
else:
return 0
def tent_prime(l, r, x):
m = (l + r) / 2
if x >= l and x <= m:
return 1 / (m - l)
elif x < r and x > m:
return 1 / (m - r)
else:
return 0
def tent_half_prime_down(l,r,x):
if x >= l and x <= r:
return - 1 / (r - l)
else:
return 0
def tent_half_prime_up(l, r, x):
if x >= l and x <= r:
return 1 / (r - l)
else:
return 0
def sources(x):
return 1
Discretizing my space:
n_vertex = 30
n_points = (n_vertex-1) * 40
space = (0,5)
x_space = np.linspace(space[0],space[1],n_points)
vertx_list = np.linspace(space[0],space[1], n_vertex)
tent_list = np.zeros((n_vertex, n_points))
tent_prime_list = np.zeros((n_vertex, n_points))
tent_list[0,:] = [tent_half_down(vertx_list[0],vertx_list[1],x) for x in x_space]
tent_list[-1,:] = [tent_half_up(vertx_list[-2],vertx_list[-1],x) for x in x_space]
tent_prime_list[0,:] = [tent_half_prime_down(vertx_list[0],vertx_list[1],x) for x in x_space]
tent_prime_list[-1,:] = [tent_half_prime_up(vertx_list[-2],vertx_list[-1],x) for x in x_space]
for i in range(1,n_vertex-1):
tent_list[i, :] = [tent(vertx_list[i-1],vertx_list[i+1],x) for x in x_space]
tent_prime_list[i, :] = [tent_prime(vertx_list[i-1],vertx_list[i+1],x) for x in x_space]
Calculating the system of linear equations:
b = np.zeros((n_vertex))
A = np.zeros((n_vertex,n_vertex))
for i in range(n_vertex):
b[i] = np.trapz(tent_list[i,:]*sources(x_space))
for j in range(n_vertex):
A[j, i] = np.trapz(tent_prime_list[j] * tent_list[i])
And then solving and reconstructing it
u = np.linalg.solve(A,b)
sol = tent_list.T.dot(u)
But it does not work, I am only getting some up and down pattern. What am I doing wrong?
First, a couple of comments on terminology and notation:
1) You are using the weak formulation, though you've done this implicitly. A formulation being "weak" has nothing to do with the order of derivatives involved. It is weak because you are not satisfying the differential equation exactly at every location. FE minimizes the weighted residual of the solution, integrated over the domain. The functions phi_j actually discretize the weighting function. The difference when you only have first-order derivatives is that you don't have to apply the Gauss divergence theorem (which simplifies to integration by parts for one dimension) to eliminate second-order derivatives. You can tell this wasn't done because phi_j is not differentiated in the LHS.
2) I would suggest not using "A" as the differential operator. You also use this symbol for the global system matrix, so your notation is inconsistent. People often use "D", since this fits better to the idea that it is used for differentiation.
Secondly, about your implementation:
3) You are using way more integration points than necessary. Your elements use linear interpolation functions, which means you only need one integration point located at the center of the element to evaluate the integral exactly. Look into the details of Gauss quadrature to see why. Also, you've specified the number of integration points as a multiple of the number of nodes. This should be done as a multiple of the number of elements instead (in your case, n_vertex-1), because the elements are the domains on which you're integrating.
4) You have built your system by simply removing the two end nodes from the formulation. This isn't the correct way to specify boundary conditions. I would suggesting building the full system first and using one of the typical methods for applying Dirichlet boundary conditions. Also, think about what constraining two nodes would imply for the differential equation you're trying to solve. What function exists that satisfies x' = 1, x(0) = 0, x(5) = 0? You have overconstrained the system by trying to apply 2 boundary conditions to a first-order differential equation.
Unfortunately, there isn't a small tweak that can be made to get the code to work, but I hope the comments above help you rethink your approach.
EDIT to address your changes:
1) Assuming the matrix A is addressed with A[row,col], then your indices are backwards. You should be integrating with A[i,j] = ...
2) A simple way to apply a constraint is to replace one row with the constraint desired. If you want x(0) = 0, for example, set A[0,j] = 0 for all j, then set A[0,0] = 1 and set b[0] = 0. This substitutes one of the equations with u_0 = 0. Do this after integrating.

Coordinate Descent Algorithm in Julia for Least Squares not converging

As a warm-up to writing my own elastic net solver, I'm trying to get a fast enough version of ordinary least squares implemented using coordinate descent.
I believe I've implemented the coordinate descent algorithm correctly, but when I use the "fast" version (see below), the algorithm is insanely unstable, outputting regression coefficients that routinely overflow a 64-bit float when the number of features is of moderate size compared to the number of samples.
Linear Regression and OLS
If b = A*x, where A is a matrix, x a vector of the unknown regression coefficients, and y is the output, I want to find x that minimizes
||b - Ax||^2
If A[j] is the jth column of A and A[-j] is A without column j, and the columns of A are normalized so that ||A[j]||^2 = 1 for all j, the coordinate-wise update is then
Coordinate Descent:
x[j] <-- A[j]^T * (b - A[-j] * x[-j])
I'm following along with these notes (page 9-10) but the derivation is simple calculus.
It's pointed out that instead of recomputing A[j]^T(b - A[-j] * x[-j]) all the time, a faster way to do it is with
Fast Coordinate Descent:
x[j] <-- A[j]^T*r + x[j]
where the total residual r = b - Ax is computed outside the loop over coordinates. The equivalence of these update rules follows from noting that Ax = A[j]*x[j] + A[-j]*x[-j] and rearranging terms.
My problem is that while the second method is indeed faster, it's wildly numerically unstable for me whenever the number of features isn't small compared to the number of samples. I was wondering if anyone might have some insight as to why that's the case. I should note that the first method, which is more stable, still starts disagreeing with more standard methods as the number of features approaches the number of samples.
Julia code
Below is some Julia code for the two update rules:
function OLS_builtin(A,b)
x = A\b
return(x)
end
function OLS_coord_descent(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
for j = 1:P
x[j] = dot(A[:,j], b - A[:,1:P .!= j]*x[1:P .!= j])
end
end
return(x)
end
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
end
end
return(x)
end
Example of the problem
I generate data with the following:
n = 100
p = 50
σ = 0.1
β_nz = float([i*(-1)^i for i in 1:10])
β = append!(β_nz,zeros(Float64,p-length(β_nz)))
X = randn(n,p); X .-= mean(X,1); X ./= sqrt(sum(abs2(X),1))
y = X*β + σ*randn(n); y .-= mean(y);
Here I use p=50, and I get good agreement between OLS_coord_descent(X,y) and OLS_builtin(X,y), whereas OLS_coord_descent_fast(X,y)returns exponentially large values for the regression coefficients.
When p is less than about 20, OLS_coord_descent_fast(X,y) agrees with the other two.
Conjecture
Since things agrees for the regime of p << n, I think the algorithm is formally correct, but numerically unstable. Does anyone have any thoughts on whether this guess is correct, and if so how to correct for the instability while retaining (most) of the performance gains of the fast version of the algorithm?
The quick answer: You forgot to update r after each x[j] update. Following is the fixed function which behaves like OLS_coord_descent:
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
r -= A[:,j]*dot(A[:,j],r) # Add this line
end
end
return(x)
end

Finding out Force from Torque and Distance

I have solid object that is spinning with a torque W, and I want to calculate the force F applied on a certain point that's D units away from the center of the object. All these values are represented in Vector3 format (x, y, z)
I know until now that W = D x F, where x is the cross product, so by expanding this I get:
Wx = Dy*Fz - Dz*Fy
Wy = Dz*Fx - Dx*Fz
Wz = Dx*Fy - Dy*Fx
So I have this equation, and I need to find (Fx, Fy, Fz), and I'm thinking of using the Simplex method to solve it.
Since the F vector can also have negative values, I split each F variable into 2 (F = G-H), so the new equation looks like this:
Wx = Dy*Gz - Dy*Hz - Dz*Gy + Dz*Hy
Wy = Dz*Gx - Dz*Hx - Dx*Gz + Dx*Hz
Wz = Dx*Gy - Dx*Hy - Dy*Gx + Dy*Hx
Next, I define the simplex table (we need <= inequalities, so I duplicate each equation and multiply it by -1.
Also, I define the objective function as: minimize (Gx - Hx + Gy - Hy + Gz - Hz).
The table looks like this:
Gx Hx Gy Hy Gz Hz <= RHS
============================================================
0 0 -Dz Dz Dy -Dy <= Wx = Gx
0 0 Dz -Dz -Dy Dy <= -Wx = Hx
Dz -Dz 0 0 Dx -Dx <= Wy = Gy
-Dz Dz 0 0 -Dx Dx <= -Wy = Hy
-Dy Dy Dx -Dx 0 0 <= Wz = Gz
Dy -Dy -Dx Dx 0 0 <= -Wz = Hz
============================================================
1 -1 1 -1 1 -1 0 = Z
The problem is that when I run it through an online solver I get Unbounded solution.
Can anyone please point me to what I'm doing wrong ?
Thanks in advance.
edit: I'm sure I messed up some signs somewhere (for example the Z should be defined as a max), but I'm sure I'm wrong when defining something more important.
There exists no unique solution to the problem as posed. You can only solve for the tangential projection of the force. This comes from the properties of the vector (cross) product - it is zero for collinear vectors and in particular for the vector product of a vector by itself. Therefore, if F is a solution of W = r x F, then F' = F + kr is also a solution for any k:
r x F' = r x (F + kr) = r x F + k (r x r) = r x F
since the r x r term is zero by the definition of vector product. Therefore, there is not a single solution but rather a whole linear space of vectors that are solutions.
If you restrict the solution to forces that have zero projection in the direction of r, then you could simply take the vector product of W and r:
W x r = (r x F) x r = -[r x (r x F)] = -[(r . F)r - (r . r)F] = |r|2F
with the first term of the expansion being zero because the projection of F onto r is zero (the dot denotes scalar (inner) product). Therefore:
F = (W x r) / |r|2
If you are also given the magnitude of F, i.e. |F|, then you can compute the radial component (if any) but there are still two possible solutions with radial components in opposing directions.
Quick dirty derivation...
Given D and F, you get W perpendicular to them. That's what a cross product does.
But you have W and D and need to find F. This is a bad assumption, but let's assume F was perpendicular to D. Call it Fp, since it's not necessarily the same as F. Ignoring magnitudes, WxD should give you the direction of Fp.
This ignoring magnitudes, so fix that with a little arithmetic. Starting with W=DxF applied to Fp:
mag(W) = mag(D)*mag(Fp) (ignoring geometry; using Fp perp to D)
mag(Fp) = mag(W)/mag(D)
Combining the cross product bit for direction with this stuff for magnitude,
Fp = WxD / mag(WxD) * mag(Fp)
Fp = WxD /mag(W) /mag(D) *mag(W) /mag(D)
= WxD / mag(D)^2.
Note that given any solution Fp to W=DxF, you can add any vector proportional to D to Fp to obtain another solution F. That is a totally free parameter to choose as you like.
Note also that if the torque applies to some sort of axle or object constrained to rotate about some axis, and F is applied to some oddball lever sticking out at a funny angle, then vector D points in some funny direction. You want to replace D with just the part perpendicular to the axle/axis, otherwise the "/mag(D)" part will be wrong.
So from your comment is clear that all rotations are spinning around center of gravity
in that case
F=M/r
F force [N]
M torque [N/m]
r scalar distance between center of rotation [m]
this way you know the scalar size of your Force
now you need the direction
it is perpendicular to rotation axis
and it is the tangent of the rotation in that point
dir=r x axis
F = F * dir / |dir|
bolds are vectors rest is scalar
x is cross product
dir is force direction
axis is rotation axis direction
now just change the direction according to rotation direction (signum of actual omega)
also depending on your coordinate system setup
so ether negate F or not
but this is in 3D free rotation very unprobable scenario
the object had to by symmetrical from mass point of view
or initial driving forces was applied in manner to achieve this
also beware that after first hit with any interaction Force this will not be true !!!
so if you want just to compute Force it generate on certain point if collision occurs is this fine
but immediately after this your spinning will change
and for non symmetric objects the spinning will be most likely off the center of gravity !!!
if your object will be disintegrated then you do not need to worry
if not then you have to apply rotation and movement dynamics
Rotation Dynamics
M=alpha*I
M torque [N/m]
alpha angular acceleration
I quadratic mass inertia for actual rotation axis [kg.m^2]
epislon''=omega'=alpha
' means derivation by time
omega angular speed
epsilon angle