Linear least squares of factorizable 2D polynomial - least-squares

I would like to fit 2D data where my function is a product of two 1D polynomials of 4th order.
f(x,y) = (a0 + a1*x + ... + a4*x^4)*(b0 + b1*y ... + b4*y^4)
I know that I can solve this using any optimization routine that uses non-linear least squares.
Is there a way to use a linear least square method to fit such a factorizable polynomial?
I could expand the polynomial and formulate the problem as
(1, x1, y1, x1*y1, x1^2, y1^2, x1^2*y1, ...) * X = f(x1, y1)
. .
. .
. .
and solve for X = (a0*b0, a1*b0, a0*b1, ...)^T.
Then I have the problem, that the system of equations that define a0, a1, ... and b0, b1, ... is overdetermined (X has length 5*5 = 25, but I have only 5+5=10 independent variables).
Constrained least squares (using e.g. L*X = d) also does not seem to work, since the constraints are non-linear.
Is there any (tricky?) way to use linear least squares for this type of problem?

Related

Fitting a multivariable polynomial with inequality constraints on the polynomial

I have experimental scattered data (in green, in the picture) in a 2D domain (x,y), that I want to fit with a two-dimensional polynomial, such as:
f(x,y) = c0 + c1*x + c2*y + c3*x*y + c4 * x ** 2 * y ** 2
where c0, c1,... are the coefficients of the polynomial. On top of this, I have equality and inequality constraints:
f(x=0,y) = 0
f(x,y) > 0, for 0 < x < 90
How can I do this? Can I express my inequality in f(x,y), by inequalities in the c0, c1, c2,... coefficients?
I used scipy.optimize.minimize to minimize the least squares of ||Ax-B||, where Ax is the polynomial expression evaluated at the experimental points, x is the vector of the coefficients c0, c1, c2,... to be optimized, and B is my experimental data. I really need some guidance on how to apply the inequality constraint.
What I tried so far:
I was able to implement the equality constraint, manually simplifying f(x,y) and f(x=0,y)=0, by substitution, and reformulating ||Ax-B||, but I cannot do that for the inequality constraint. See the picture,
where f(x=0,y) = 0 is satisfied, but not f(x,y) > 0.
I tried using the constraints parameter, but I could only apply inequality constraints on the c0,c1,c2,... coefficients, instead of applying the constraint on the desired f(x,y).
I have read on Lagrange multipliers and non-linear programming but I'm still lost.
Two solutions:
With scipy.optimize.minimize the function to be minimized is some kind of chi^2, but additionally, it your constraints are not met, then it returns np.inf, which provides hard boundary.
Use Monte-Carlo Markov Chain method. There are many implementations in python.

How to find time-varying coefficients for a VAR model by using the Kalman Filter

I'm trying to write some code in R to reproduce the model i found in this article.
The idea is to model the signal as a VAR model, but fit the coefficients by a Kalman-filter model. This would essentially enable me to create a robust time-varying VAR(p) model and analyze non-stationary data to a degree.
The model to track the coefficients is:
X(t) = F(t) X(t− 1) +W(t)
Y(t) = H(t) X(t) + E(t),
where H(t) is the Kronecker product between lagged measurements in my time-series Y and a unit vector, and X(t) fills the role of regression-coefficients. F(t) is taken to be an identity matrix, as that should mean we assume coefficients to evolve as a random walk.
In the article, from W(T), the state noise covariance matrix Q(t) is chosen at 10^-3 at first and then fitted based on some iteration scheme. From E(t) the state noise covariance matrix is R(t) substituted by the covariance of the noise term unexplained by the model: Y(t) - H(t)Xhat(t)
I have the a priori covariance matrix of estimation error (denoted Σ in the article) written as P (based on other sources) and the a posteriori as Pmin, since it will be used in the next recursion as a priori, if that makes sense.
So far i've written the following, based on the articles Appendix A 1.2
Y <- *my timeseries, for test purposes two channels of 3000 points*
F <- diag(8) # F is (m^2*p by m^2 *p) where m=2 dimensions and p =2 lags
H <- diag(2) %x% t(vec(Y[,1:2])) #the Kronecker product of vectorized lags Y-1 and Y-2
Xhatminus <- matrix(1,8,1) # an arbitrary *a priori* coefficient matrix
Q <- diag(8)%x%(10**-7) #just a number a really low number diagonal matrix, found it used in some examples
R<- 1 #Didnt know what else to put here just yet
Pmin = diag(8) #*a priori* error estimate, just some 1-s...
Now should start the reccursion. To test i just took the first 3000 points of one trial of my data.
Xhatstorage <- matrix(0,8,3000)
for(j in 3:3000){
H <- diag(2) %x% t(vec(Y[,(j-2):(j-1)]))
K <- (Pmin %*% t(H)) %*% solve((H%*%Pmin%*%t(H) + R)) ##Solve gives inverse matrix ()^-1
P <- Pmin - K%*% H %*% Pmin
Xhatplus <- F%*%( Xhatminus + K%*%(Y[,j]-H%*%Xhatminus) )
Pplus <- (F%*% P %*% F) + Q
Xhatminus <- Xhatplus
Xhatstorage[,j] <- Xhatplus
Pmin <- Pplus
}
I extracted Xhatplus values into a storage matrix and used them to write this primitive VAR model with them:
Yhat<-array(0,3000)
for(t in 3:3000){
Yhat[t]<- t(vec(Y[,(t-2)])) %*% Xhatstorage[c(1,3),t] + t(vec(Y[,(t-1)])) %*% Xhatstorage[c(2,4),t]
}
The looks like this .
The blue line is VAR with Kalman filter found coefficients, Black is original data..
I'm having issue understanding how i can better evaluate my coefficients? Why is it so off?
How should i better choose the first a priori and a posteriori estimates to start the recursion? Currently, adding more lags to the VAR is not the issue i'm sure, it's that i don't know how to choose the initial values for Pmin and Xhatmin. Most places i pieced this together from start from arbitrary 0 assumptions in toy models, but in this case, choosing any of the said matrixes as 0 will just collapse the entire algorithm.
Lastly, is this recursion even a correct implementation of Oya et al describe in the article? I know im still missing the R evaluation based on previously unexplained errors (V(t) in Appendix A 1.2), but in general?

Fast way to set diagonals of an (M x N x N) matrix? Einsum / n-dimensional fill_diagonal?

I'm trying to write fast, optimized code based on matrices, and have recently discovered einsum as a tool for achieving significant speed-up.
Is it possible to use this to set the diagonals of a multidimensional array efficiently, or can it only return data?
In my problem, I'm trying to set the diagonals for an array of square matrices (shape: M x N x N) by summing the columns in each square (N x N) matrix.
My current (slow, loop-based) solution is:
# Build dummy array
dimx = 2 # Dimension x (likely to be < 100)
dimy = 3 # Dimension y (likely to be between 2 and 10)
M = np.random.randint(low=1, high=9, size=[dimx, dimy, dimy])
# Blank the diagonals so we can see the intended effect
np.fill_diagonal(M[0], 0)
np.fill_diagonal(M[1], 0)
# Compute diagonals based on summing columns
diags = np.einsum('ijk->ik', M)
# Set the diagonal for each matrix
# THIS IS LOW. CAN IT BE IMPROVED?
for i in range(len(M)):
np.fill_diagonal(M[i], diags[i])
# Print result
M
Can this be improved at all please? It seems np.fill_diagonal doesn't accepted non-square matrices (hence forcing my loop based solution). Perhaps einsum can help here too?
One approach would be to reshape to 2D, set the columns at steps of ncols+1 with the diagonal values. Reshaping creates a view and as such allows us to directly access those diagonal positions. Thus, the implementation would be -
s0,s1,s2 = M.shape
M.reshape(s0,-1)[:,::s2+1] = diags
If you do np.source(np.fill_diagonal) you'll see that in the 2d case it uses a 'strided' approach
if a.ndim == 2:
step = a.shape[1] + 1
end = a.shape[1] * a.shape[1]
a.flat[:end:step] = val
#Divakar's solution applies this to your 3d case by 'flattening' on 2 dimensions.
You could sum the columns with M.sum(axis=1). Though I vaguely recall some timings that found that einsum was actually a bit faster. sum is a little more conventional.
Someone has has asked for an ability to expand dimensions in einsum, but I don't think that will happen.

orientation of normal surface/vertex vectors

Given a convex 3d polygon (convex hull) How can I determine the correct direction for normal surface/vertex vectors? As the polygon is convex, by correct I mean outward facing (away from the centroid).
def surface_normal(centroid, p1, p2, p3):
a = p2-p1
b = p3-p1
n = np.cross(a,b)
if **test including centroid?** :
return n
else:
return -n # change direction
I actually need the normal vertex vectors as I am exporting as a .obj file, but I am assuming that I would need to calculate the surface vectors before hand and combine them.
This solution should work under the assumption of a convex hull in 3d. You calculate the normal as shown in the question. You can normalize the normal vector with
n /= np.linalg.norm(n) # which should be sqrt(n[0]**2 + n[1]**2 + n[2]**2)
You can then calculate the center point of your input triangle:
pmid = (p1 + p2 + p3) / 3
After that you calculate the distance of the triangle-center to your surface centroid. This is
dist_centroid = np.linalg.norm(pmid - centroid)
The you can calculate the distance of your triangle_center + your normal with the length of the distance to the centroid.
dist_with_normal = np.linalg.norm(pmid + n * dist_centroid - centroid)
If this distance is larger than dist_centroid, then your normal is facing outwards. If it is smaller, it is pointing inwards. If you have a perfect sphere and point towards the centroid, it should almost be zero. This may not be the case for your general surface, but the convexity of the surface should make sure, that this is enough to check for its direction.
if(dist_centroid < dist_with_normal):
n *= -1
Another, nicer option is to use a scalar product.
pmid = (p1 + p2 + p3) / 3
if(np.dot(pmid - centroid, n) < 0):
n *= -1
This checks if your normal and the vector from the mid of your triangle to the centroid have the same direction. If that is not so, change the direction.

Is it possible to optimize this Matlab code for doing vector quantization with centroids from k-means?

I've created a codebook using k-means of size 4000x300 (4000 centroids, each with 300 features). Using the codebook, I then want to label an input vector (for purposes of binning later on). The input vector is of size Nx300, where N is the total number of input instances I receive.
To compute the labels, I calculate the closest centroid for each of the input vectors. To do so, I compare each input vector against all centroids and pick the centroid with the minimum distance. The label is then just the index of that centroid.
My current Matlab code looks like:
function labels = assign_labels(centroids, X)
labels = zeros(size(X, 1), 1);
% for each X, calculate the distance from each centroid
for i = 1:size(X, 1)
% distance of X_i from all j centroids is: sum((X_i - centroid_j)^2)
% note: we leave off the sqrt as an optimization
distances = sum(bsxfun(#minus, centroids, X(i, :)) .^ 2, 2);
[value, label] = min(distances);
labels(i) = label;
end
However, this code is still fairly slow (for my purposes), and I was hoping there might be a way to optimize the code further.
One obvious issue is that there is a for-loop, which is the bane of good performance on Matlab. I've been trying to come up with a way to get rid of it, but with no luck (I looked into using arrayfun in conjunction with bsxfun, but haven't gotten that to work). Alternatively, if someone know of any other way to speed this up, I would be greatly appreciate it.
Update
After doing some searching, I couldn't find a great solution using Matlab, so I decided to look at what is used in Python's scikits.learn package for 'euclidean_distance' (shortened):
XX = sum(X * X, axis=1)[:, newaxis]
YY = Y.copy()
YY **= 2
YY = sum(YY, axis=1)[newaxis, :]
distances = XX + YY
distances -= 2 * dot(X, Y.T)
distances = maximum(distances, 0)
which uses the binomial form of the euclidean distance ((x-y)^2 -> x^2 + y^2 - 2xy), which from what I've read usually runs faster. My completely untested Matlab translation is:
XX = sum(data .* data, 2);
YY = sum(center .^ 2, 2);
[val, ~] = max(XX + YY - 2*data*center');
Use the following function to calculate your distances. You should see an order of magnitude speed up
The two matrices A and B have the columns as the dimenions and the rows as each point.
A is your matrix of centroids. B is your matrix of datapoints.
function D=getSim(A,B)
Qa=repmat(dot(A,A,2),1,size(B,1));
Qb=repmat(dot(B,B,2),1,size(A,1));
D=Qa+Qb'-2*A*B';
You can vectorize it by converting to cells and using cellfun:
[nRows,nCols]=size(X);
XCell=num2cell(X,2);
dist=reshape(cell2mat(cellfun(#(x)(sum(bsxfun(#minus,centroids,x).^2,2)),XCell,'UniformOutput',false)),nRows,nRows);
[~,labels]=min(dist);
Explanation:
We assign each row of X to its own cell in the second line
This piece #(x)(sum(bsxfun(#minus,centroids,x).^2,2)) is an anonymous function which is the same as your distances=... line, and using cell2mat, we apply it to each row of X.
The labels are then the indices of the minimum row along each column.
For a true matrix implementation, you may consider trying something along the lines of:
P2 = kron(centroids, ones(size(X,1),1));
Q2 = kron(ones(size(centroids,1),1), X);
distances = reshape(sum((Q2-P2).^2,2), size(X,1), size(centroids,1));
Note
This assumes the data is organized as [x1 y1 ...; x2 y2 ...;...]
You can use a more efficient algorithm for nearest neighbor search than brute force.
The most popular approach are Kd-Tree. O(log(n)) average query time instead of the O(n) brute force complexity.
Regarding a Maltab implementation of Kd-Trees, you can have a look here