I have a portfolio optimization problem where my objective function is the mean divided by the standard deviation.
The variance is the difference of two random variables so is computed as Var(X) + Var(Y) - 2 * Cov(X, Y). The variance term is specified as above, where w represents the portfolio selection, capital sigma is a covariance matrix, and sigma sub delta g is a vector of covariances related to the second random variable. The problem is that CVXPY doesn't consider the last term there to be nonnegative because some of the covariance terms are negative. Obviously, I know that the variance will always be nonnegative, so I believe that this should work as a quasiconvex problem. Is there any way to tell CVXPY that this variance term will always be positive?
Related
I know that the Hessian matrix is a kind of second derivative test of functions involving more than one independent variable. How does one find the maximum or minimum of a function involving more than one variable? Is it found using the eigenvalues of the Hessian matrix or its principal minors?
You should have a look here:
https://en.wikipedia.org/wiki/Second_partial_derivative_test
For an n-dimensional function f, find an x where the gradient grad f = 0. This is a critical point.
Then, the 2nd derivatives tell, whether x marks a local minimum, a maximum or a saddle point.
The Hessian H is the matrix of all combinations of 2nd derivatives of f.
For the 2D-case the determinant and the minors of the Hessian are relevant.
For the nD-case it might involve a computation of eigen values of the Hessian H (if H is invertible) as part of checking H for being positive (or negative) definite.
In fact, the shortcut in 1) is generalized by 2)
For numeric calculations, some kind of optimization strategy can be used for finding x where grad f = 0.
I've come across a from scratch implementation for gaussian processes:
http://krasserm.github.io/2018/03/19/gaussian-processes/
There, the isotropic squared exponential kernel is implemented in numpy. It looks like:
The implementation is:
def kernel(X1, X2, l=1.0, sigma_f=1.0):
sqdist = np.sum(X1**2, 1).reshape(-1, 1) + np.sum(X2**2, 1) - 2 * np.dot(X1, X2.T)
return sigma_f**2 * np.exp(-0.5 / l**2 * sqdist)
consistent with the implementation of Nando de Freitas: https://www.cs.ubc.ca/~nando/540-2013/lectures/gp.py
However, I am not quite sure how this implementation matches the provided formula, especially in the sqdist part. In my opinion, it is wrong but it works (and delivers the same results as scipy's cdist with squared euclidean distance). Why do I think it is wrong? If you multiply out the multiplication of the two matrices, you get
which equals to either a scalar or a nxn matrix for a vector x_i, depending on whether you define x_i to be a column vector or not. The implementation however gives back a nx1 vector with the squared values.
I hope that anyone can shed light on this.
I found out: The implementation is correct. I just was not aware of the fuzzy notation (in my opinion) which is sometimes used in ML contexts. What is to be achieved is a distance matrix and each row vectors of matrix A are to be compared with the row vectors of matrix B to infer the covariance matrix, not (as I somehow guessed) the direct distance between two matrices/vectors.
In Gelman book, the effective number is defined in terms of the following;
R hat
between- within MCMC sequence of variance, B and W
the number of MCMC samples, denoted by n
the number of chains, denoted by m
I do not know how the samplig() calculate the between MCMC sequence of variance for the case chains=1. So, I cannot calculate these terms ( B,W,m). I want to implement some algorithm according to the paper:https://arxiv.org/abs/1804.06788.
Roughly speaking, this paper construct some test statistics which is uniformly distributed under the null hypothesis that the MCMC sampling is correct. And if MCMC sampling is not correct, then the histogram of the test statistics become skew shape and this deviation from uniformity tells us the MCMC contains bias. I want to implement but it needs to calculate the above quantities.
In rstan, is there such function to extract the above quantities ? I think the process of calculation of R hat statistics, the above quantities B,W, m are retained in some place in the stanfit S4 object.
I am sorry, I found n_eff, but I do not know the choice of m of the case chains =1.
In the case that only one chain is estimated (which should not be happening anyway), then m = 2 because the post-warmup draws from the single chain are split into the first half and the second half. This splitting method is discussed in the documentation.
I got a basic idea of Big-O notation from Big-O notation's definition.
In my problem, a 2-D surface is divided into uniform M grids. Each grid (m) is assigned with a posterior probability based on A features.
The posterior probability of m grid is calculated as follows:
and the marginal likelihood is given as:
Here, A features are independent of each other and sigma and mean symbol represent the standard deviation and mean value of each a feature at each grid. I need to calculate the Posterior probability of all M grids.
What will be the time complexity of the above operation in terms of Big-O notation?
My guess is O(M) or O(M+A). Am I correct? I'm expecting an authenticate answer to present at the formal forum.
Also, what will be the time complexity if M grids are divided into T clusters where every cluster has Q grids (Q << M) (calculating Posterior Probability only on Q grids out of M grids) ?
Thank you very much.
Discrete sum and product
can be understood as loops. If you are happy with floating point approximation most other operators are typically O(1), conditional probability looks like a function call. Just inject constants and variables in your equation and you'll get the expected Big-O, the details of formula are irrelevant. Also be aware that these "loops" can often be simplified using mathematical properties.
If the result is not obvious, please convert your above mathematical formula in actual programming code in a programming language. Computer Science Big-O is never about a formula but about an actual translation of it in programming steps, depending on the implementation the same formula can lead to very different execution complexities. As different as adding integers by actually performing sum O(n) or applying Gauss formula O(1) for instance.
By the way why are you doing a discrete sum on a discrete domaine N ? Shouldn't it be M ?
Is it possible to model a non-linear piece-wise cost function in Cplex?
For example something like the figures I put here:
non linear piece wise Cost function (black line)
I know one way is to linearising the quadratic part to linear one, but, I want to use the quadratic part as it is.
You can see that the condition is on the decision variable itself, the cost function can be formulated as follows:
if x ≲ x0 Then cost is quadratic part;
else cost is linear part.
Thanks in advance :)
One way is to pick the cheapest curve at x:
min cost
cost ≥ f(x) − Mδ
cost ≥ g(x) − M(1−δ)
δ ϵ {0,1}
M is a constant: the largest difference between the two curves (i.e. M=|f(xmax)−g(xmax)|). δ is a binary variable. I assumed we are minimizing cost and that the quadratic function is convex.
This construct implements
min cost
cost ≥ f(x) or cost ≥ g(x)
The solver will always drop the most expensive function, and keep the cheapest. In your picture this is exactly what we want: on the left of x0 the quadratic function is the cheapest, and on the right of x0, the linear function is cheaper. This formulation will automatically pick the cheaper option.