How to optimize lower Cholesky Parameter in Pytorch? - optimization

Is there any way to create a parameter which is lower triangular with positive diagonal and enforce this constraint during optimization in Pytorch?

Check this one torch.potrf.
A simple example:
a = torch.randn(3, 3)
a = torch.mm(a, a.t()) # make symmetric positive definite
l = torch.potrf(a, upper=False)
tri_loss = l.sum()
opt.zero_grad()
tri_loss.backward()
opt.step()

Related

Projected gradient descent on probability simplex in pytorch

I have a matrix A of dimension 1000x70000.
my loss function includes A and I want to find optimal value of A using gradient descent where the constraint is that the rows of A remain in probability simplex (i.e. every row sums up to 1). I have initialised A as given below
A=np.random.dirichlet(np.ones(70000),1000)
A=torch.tensor(A,requires_grad=True)
and my training loop looks like as given below
for epoch in range(500):
y_pred=forward(X)
y=model(torch.mm(A.float(),X))
l=loss(y,y_pred)
l.backward()
A.grad.data=-A.grad.data
optimizer.step()
optimizer.zero_grad()
if epoch%2==0:
print("Loss",l,"\n")
An easy way to accomplish that is not to use A directly for computation but use a row normalized version of A.
# you can keep 'A' unconstrained
A = torch.rand(1000, 70000, requires_grad=True)
then divide each row by its summation (keeping row sum always 1)
for epoch in range(500):
y_pred = forward(X)
B = A / A.sum(-1, keepdim=True) # normalize rows manually
y = model(torch.mm(B, X))
l = loss(y,y_pred)
...
So now, at each step, B is the constrained matrix - i.e. the quantity of your interest. However, the optimization would still be on (unconstrained) A.
Edit: #Umang Gupta remined me in the comment section that OP wanted to have "probability simplex" which means there would be another constraint, i.e. A >= 0.
To accomplish that, you may simply apply some appropriate activation function (e.g. torch.exp, torch.sigmoid) on A in each iteration
A_ = torch.exp(A)
B = A_ / A_.sum(-1, keepdim=True) # normalize rows
the exact choice of function depends on the behaviour of training dynamics which needs to be experimented with.

Fitting a multivariable polynomial with inequality constraints on the polynomial

I have experimental scattered data (in green, in the picture) in a 2D domain (x,y), that I want to fit with a two-dimensional polynomial, such as:
f(x,y) = c0 + c1*x + c2*y + c3*x*y + c4 * x ** 2 * y ** 2
where c0, c1,... are the coefficients of the polynomial. On top of this, I have equality and inequality constraints:
f(x=0,y) = 0
f(x,y) > 0, for 0 < x < 90
How can I do this? Can I express my inequality in f(x,y), by inequalities in the c0, c1, c2,... coefficients?
I used scipy.optimize.minimize to minimize the least squares of ||Ax-B||, where Ax is the polynomial expression evaluated at the experimental points, x is the vector of the coefficients c0, c1, c2,... to be optimized, and B is my experimental data. I really need some guidance on how to apply the inequality constraint.
What I tried so far:
I was able to implement the equality constraint, manually simplifying f(x,y) and f(x=0,y)=0, by substitution, and reformulating ||Ax-B||, but I cannot do that for the inequality constraint. See the picture,
where f(x=0,y) = 0 is satisfied, but not f(x,y) > 0.
I tried using the constraints parameter, but I could only apply inequality constraints on the c0,c1,c2,... coefficients, instead of applying the constraint on the desired f(x,y).
I have read on Lagrange multipliers and non-linear programming but I'm still lost.
Two solutions:
With scipy.optimize.minimize the function to be minimized is some kind of chi^2, but additionally, it your constraints are not met, then it returns np.inf, which provides hard boundary.
Use Monte-Carlo Markov Chain method. There are many implementations in python.

Fast way to set diagonals of an (M x N x N) matrix? Einsum / n-dimensional fill_diagonal?

I'm trying to write fast, optimized code based on matrices, and have recently discovered einsum as a tool for achieving significant speed-up.
Is it possible to use this to set the diagonals of a multidimensional array efficiently, or can it only return data?
In my problem, I'm trying to set the diagonals for an array of square matrices (shape: M x N x N) by summing the columns in each square (N x N) matrix.
My current (slow, loop-based) solution is:
# Build dummy array
dimx = 2 # Dimension x (likely to be < 100)
dimy = 3 # Dimension y (likely to be between 2 and 10)
M = np.random.randint(low=1, high=9, size=[dimx, dimy, dimy])
# Blank the diagonals so we can see the intended effect
np.fill_diagonal(M[0], 0)
np.fill_diagonal(M[1], 0)
# Compute diagonals based on summing columns
diags = np.einsum('ijk->ik', M)
# Set the diagonal for each matrix
# THIS IS LOW. CAN IT BE IMPROVED?
for i in range(len(M)):
np.fill_diagonal(M[i], diags[i])
# Print result
M
Can this be improved at all please? It seems np.fill_diagonal doesn't accepted non-square matrices (hence forcing my loop based solution). Perhaps einsum can help here too?
One approach would be to reshape to 2D, set the columns at steps of ncols+1 with the diagonal values. Reshaping creates a view and as such allows us to directly access those diagonal positions. Thus, the implementation would be -
s0,s1,s2 = M.shape
M.reshape(s0,-1)[:,::s2+1] = diags
If you do np.source(np.fill_diagonal) you'll see that in the 2d case it uses a 'strided' approach
if a.ndim == 2:
step = a.shape[1] + 1
end = a.shape[1] * a.shape[1]
a.flat[:end:step] = val
#Divakar's solution applies this to your 3d case by 'flattening' on 2 dimensions.
You could sum the columns with M.sum(axis=1). Though I vaguely recall some timings that found that einsum was actually a bit faster. sum is a little more conventional.
Someone has has asked for an ability to expand dimensions in einsum, but I don't think that will happen.

Numpy eig/eigh gives negative values for a symmetric positive-semi definite matrix

The following code for the CIFAR dataset after GCN:
xtx = np.dot(dataset.train_data[i].transpose(), dataset.train_data[i])
e, q = np.linalg.eigh(xtx)
print(np.max(e), np.min(e))
Produces the following output:
2.65138e+07 -0.00247511
This is inconsistent given that xtx is symmetric positive-semi definite. My guess is that this might be due to applying GCN earlier, but still the minimum eigenvalue is not even that close to 0?
Update: So the condition number of my matrix is 8.89952e+09. I've actually have forgotten before to take out the mean so now the maximum eigenvalue is ~573, while the minimum is -7.14630133e-08. My question is that I'm trying to do ZCA. In this case how should I proceed? Add a diagonal petrubtion to xtx or to the Eigenvalues?
If you eigenvalues accross several orders of magnitude you can use the svd decomposition so that X.T # X = (U # S # V.T).T # (U # S # V.T) = V # S # S # V.T.
_, s, v = np.linalg.svd( dataset.train_data[i] )
e = s[:len(v)]**2
Now e is guaranteed to be positive, and, suppose that you calculated something around 5e3 for the higher singular value and that rounding errors limits the accuracy of the other SVD at 1e-7, this means that when you calculate e[1] = s[1]**2 you will actually something at the order of 1e-14.

How to understand bias parameter in LIBLINEAR?

I don't understand the meaning of bias parameter in the API of LIBLINEAR. Why is it specified by user during the training? Shouldn't it be just a distance from the separating hyperplane to origin which is a parameter of the learned model?
This is from the README:
struct problem
{
int l, n;
int *y;
struct feature_node **x;
double bias;
};
If bias >= 0, we assume that one additional feature is added to the end of each data instance.
What is this additional feature?
Let's look at the equation for the separating hyperplane:
w_1 * x_1 + w_2 * x_2 + w_3 * x_3 + ... + w_bias * x_bias = 0
Where x are the feature values and w are the trained "weights". The additional feature x_bias is a constant, whose value is equal to the bias. If bias = 0, you will get a separating hyperplane going through the origin (0,0,0,...). You can imagine many cases, where such a hyperplane is not the optimal separator.
The value of the bias affects the margin through scaling of w_bias. Therefore the bias is a tuning parameter, which is usually determined through cross-validation similar to other parameters.