How to find time-varying coefficients for a VAR model by using the Kalman Filter - dynamic

I'm trying to write some code in R to reproduce the model i found in this article.
The idea is to model the signal as a VAR model, but fit the coefficients by a Kalman-filter model. This would essentially enable me to create a robust time-varying VAR(p) model and analyze non-stationary data to a degree.
The model to track the coefficients is:
X(t) = F(t) X(t− 1) +W(t)
Y(t) = H(t) X(t) + E(t),
where H(t) is the Kronecker product between lagged measurements in my time-series Y and a unit vector, and X(t) fills the role of regression-coefficients. F(t) is taken to be an identity matrix, as that should mean we assume coefficients to evolve as a random walk.
In the article, from W(T), the state noise covariance matrix Q(t) is chosen at 10^-3 at first and then fitted based on some iteration scheme. From E(t) the state noise covariance matrix is R(t) substituted by the covariance of the noise term unexplained by the model: Y(t) - H(t)Xhat(t)
I have the a priori covariance matrix of estimation error (denoted Σ in the article) written as P (based on other sources) and the a posteriori as Pmin, since it will be used in the next recursion as a priori, if that makes sense.
So far i've written the following, based on the articles Appendix A 1.2
Y <- *my timeseries, for test purposes two channels of 3000 points*
F <- diag(8) # F is (m^2*p by m^2 *p) where m=2 dimensions and p =2 lags
H <- diag(2) %x% t(vec(Y[,1:2])) #the Kronecker product of vectorized lags Y-1 and Y-2
Xhatminus <- matrix(1,8,1) # an arbitrary *a priori* coefficient matrix
Q <- diag(8)%x%(10**-7) #just a number a really low number diagonal matrix, found it used in some examples
R<- 1 #Didnt know what else to put here just yet
Pmin = diag(8) #*a priori* error estimate, just some 1-s...
Now should start the reccursion. To test i just took the first 3000 points of one trial of my data.
Xhatstorage <- matrix(0,8,3000)
for(j in 3:3000){
H <- diag(2) %x% t(vec(Y[,(j-2):(j-1)]))
K <- (Pmin %*% t(H)) %*% solve((H%*%Pmin%*%t(H) + R)) ##Solve gives inverse matrix ()^-1
P <- Pmin - K%*% H %*% Pmin
Xhatplus <- F%*%( Xhatminus + K%*%(Y[,j]-H%*%Xhatminus) )
Pplus <- (F%*% P %*% F) + Q
Xhatminus <- Xhatplus
Xhatstorage[,j] <- Xhatplus
Pmin <- Pplus
}
I extracted Xhatplus values into a storage matrix and used them to write this primitive VAR model with them:
Yhat<-array(0,3000)
for(t in 3:3000){
Yhat[t]<- t(vec(Y[,(t-2)])) %*% Xhatstorage[c(1,3),t] + t(vec(Y[,(t-1)])) %*% Xhatstorage[c(2,4),t]
}
The looks like this .
The blue line is VAR with Kalman filter found coefficients, Black is original data..
I'm having issue understanding how i can better evaluate my coefficients? Why is it so off?
How should i better choose the first a priori and a posteriori estimates to start the recursion? Currently, adding more lags to the VAR is not the issue i'm sure, it's that i don't know how to choose the initial values for Pmin and Xhatmin. Most places i pieced this together from start from arbitrary 0 assumptions in toy models, but in this case, choosing any of the said matrixes as 0 will just collapse the entire algorithm.
Lastly, is this recursion even a correct implementation of Oya et al describe in the article? I know im still missing the R evaluation based on previously unexplained errors (V(t) in Appendix A 1.2), but in general?

Related

Computation complexity of the following matrix multiplications

I would like to know the computational complexities of the following operations including matrices:
a) $H = \beta I + P^{T}P$ where, $\beta$ is a known constant, $I$ is a $N \times N$ identity matrix, $P$ is a $M \times N$ sparse matrix sampled from a $N \times N$ identity matrix. Thus, P contains $M$ ones and rest all elements are zeros.
b) $a = F^{-1}Z + P^{T}B$, where $Z$ and $B$ are $N \times N$ and $M \times N$ respectively and $F{-1}$ is the inverse Fast Fourier transform (FFT).
Summary - I am not able to see how the special structure of $P$ will reduce the time complexity.
My attempt so far:
a) Since $\beta$ is a constant and we know all the places of ones in $I$, $\beta I$ will have $O(1)$ complexity.
I know multiplying two matrices of size $N \times M$ and $M \times N$ has a complexity of $O(N^{2}M) but I am not able to leverage the sparsity and special structure of P to get the actual complexity of this matrix multiplication.
b) For this one, I think that the first one will have $O(N\log N)$ complexity and the second one should have $O(N^{2}M)$ complexity, so the overall complexity will be dominated by $O(N^{2}M)$.
def get_I_matrix(N):
return np.diag(np.ones(N))
def get_P_matrix(picks, N):
return get_I_matrix(N)[np.ix_(picks),:]
def get_H_hat_diag(N, beta, picks):
H_hat_diag = beta*np.ones(N*N)
P_diag = np.zeros(N)
P_diag[picks] += 1
H_hat_diag = H_hat_diag + np.kron(P_diag, P_diag)
return H_hat_diag
def get_a_hat_vec(a_hat, z, beta):
return (a_hat + ifftw2d(beta*z)).flatten('F') # column first
a_hat = np.zeros([N, N])
a_hat = a_hat.astype(complex)
a_hat[np.ix_(picks, picks)] = B
H_hat_diag = get_H_hat_diag(N, beta, picks)
a_hat_vec = get_a_hat_vec(y_hat, z, beta)
Since, latex isn't supported, I didn't want to write $\hat$ everywhere, hence $H$ and $a$ in the problem.

systemfit 3SLS Testing for Overidentification Restrictions

currently I'm struggling to find a good way to perform the Hansen/Sargan tests of Overidentification restrictions within a Three-Stage Least Squares model (3SLS) in panel data using R. I was digging the whole day in different networks and couldn't find a way of depicting the tests in R using the well-known systemfit package.
Currently, my code is simple.
violence_c_3sls <- Crime ~ ln_GDP +I(ln_GDP^2) + ln_Gini
income_c_3sls <-ln_GDP ~ Crime + ln_Gini
gini_c_3sls <- ln_Gini ~ ln_GDP + I(ln_GDP^2) + Crime
inst <- ~ Educ_Gvmnt_Exp + I(Educ_Gvmnt_Exp^2)+ Health_Exp + Pov_Head_Count_1.9
system_c_3sls <- list(violence_c_3sls, income_c_3sls, gini_c_3sls)
fitsur_c_3sls <-systemfit(system_c_3sls, "3SLS",inst=inst, data=df_new, methodResidCov = "noDfCor" )
summary(fitsur_c_3sls)
However, adding more instruments to create an over-identified system do not yield in an output of the Hansen/Sargan test, thus I assume the test should be executed aside from the output and probably associated to systemfit class object.
Thanks in advance.
With g equations, l exogenous variables, and k regressors, the Sargan test for 3SLS is
where u is the stacked residuals, \Sigma is the estimated residual covariance, and P_W is the projection matrix on the exogenous variables. See Ch 12.4 from Davidson & MacKinnon ETM.
Calculating the Sargan test from systemfit should look something like this:
sargan.systemfit=function(results3sls){
result <- list()
u=as.matrix(resid(results3sls)) #model residuals, n x n_eq
n_eq=length(results3sls$eq) # number of equations
n=nrow(u) #number of observations
n_reg=length(coef(results3sls)) # total number of regressors
w=model.matrix(results3sls,which='z') #Matrix of instruments, in block diagonal form with one block per equation
#Need to aggregate into a single block (in case different instruments used per equation)
w_list=lapply(X = 1:n_eq,FUN = function(eq_i){
this_eq_label=results3sls$eq[[eq_i]]$eqnLabel
this_w=w[str_detect(rownames(w),this_eq_label),str_detect(colnames(w),this_eq_label)]
colnames(this_w)=str_remove(colnames(this_w),paste0(this_eq_label,'_'))
return(this_w)
})
w=do.call(cbind,w_list)
w=w[,!duplicated(colnames(w))]
n_inst=ncol(w) #w is n x n_inst, where n_inst is the number of unique instruments/exogenous variables
#estimate residual variance (or use residCov, should be asymptotically equivalent)
var_u=crossprod(u)/n #var_u=results3sls$residCov
P_w=w%*%solve(crossprod(w))%*%t(w) #Projection matrix on instruments w
#as.numeric(u) vectorizes the residuals into a n_eq*n x 1 vector.
result$statistic <- as.numeric(t(as.numeric(u))%*%kronecker(solve(var_u),P_w)%*%as.numeric(u))
result$df <- n_inst*n_eq-n_reg
result$p.value <- 1 - pchisq(result$statistic, result$df)
result$method = paste("Sargan over-identifying restrictions test")
return(result)
}

DLT vs Homography Estimation

I'm a little confused about the difference between the DLT algorithm described here and the homography estimation described here. In both of these techniques, we are trying to solve for the entries of a 3x3 matrix by using at least 4 point correspondences. In both methods, we set up a system where we have a "measurement" matrix and we use SVD to solve for the vector of elements that make up H. I was wondering why there are two techniques that seem to do the same thing, and why one might be used over the other.
You have left-right image correspondences {p_i} <-> {p'_i}, where p_i = (x_i, y_i), etc.
Normalizing them to the unit square means computing two shifts m=(mx, my), m'=(mx', my'), and two scales s=(sx,sy), s'=(sx',sy') such that q_i = (p_i - m) / s and q_i' = (p_i' - m') / s', and both the {q_i} and the {q'_i} transformed image points are centered at (0,0) and approximately contained within a square of unit side length. A little math shows that a good choice for the m terms are the averages of the x,y coordinates in each set of image points, and for the s terms you use the standard deviations (or twice the standard deviation) times 1/sqrt(2).
You can express this normalizing transformation in matrix form: q = T p,
where T = [[1/sx, 0, -mx/sx], [0, 1/sy, -my/sy], [0, 0, 1]], and likewise q' = T' p'.
You then compute the homography K between the {q_i} and {q'_i} points: q_i' = K q_i.
Finally, you denormalize K into the origina (un-normalized) coordinates thus: H = inv(T') K T, and H is the desired homography that maps {p} into {p'}.

Fast way to set diagonals of an (M x N x N) matrix? Einsum / n-dimensional fill_diagonal?

I'm trying to write fast, optimized code based on matrices, and have recently discovered einsum as a tool for achieving significant speed-up.
Is it possible to use this to set the diagonals of a multidimensional array efficiently, or can it only return data?
In my problem, I'm trying to set the diagonals for an array of square matrices (shape: M x N x N) by summing the columns in each square (N x N) matrix.
My current (slow, loop-based) solution is:
# Build dummy array
dimx = 2 # Dimension x (likely to be < 100)
dimy = 3 # Dimension y (likely to be between 2 and 10)
M = np.random.randint(low=1, high=9, size=[dimx, dimy, dimy])
# Blank the diagonals so we can see the intended effect
np.fill_diagonal(M[0], 0)
np.fill_diagonal(M[1], 0)
# Compute diagonals based on summing columns
diags = np.einsum('ijk->ik', M)
# Set the diagonal for each matrix
# THIS IS LOW. CAN IT BE IMPROVED?
for i in range(len(M)):
np.fill_diagonal(M[i], diags[i])
# Print result
M
Can this be improved at all please? It seems np.fill_diagonal doesn't accepted non-square matrices (hence forcing my loop based solution). Perhaps einsum can help here too?
One approach would be to reshape to 2D, set the columns at steps of ncols+1 with the diagonal values. Reshaping creates a view and as such allows us to directly access those diagonal positions. Thus, the implementation would be -
s0,s1,s2 = M.shape
M.reshape(s0,-1)[:,::s2+1] = diags
If you do np.source(np.fill_diagonal) you'll see that in the 2d case it uses a 'strided' approach
if a.ndim == 2:
step = a.shape[1] + 1
end = a.shape[1] * a.shape[1]
a.flat[:end:step] = val
#Divakar's solution applies this to your 3d case by 'flattening' on 2 dimensions.
You could sum the columns with M.sum(axis=1). Though I vaguely recall some timings that found that einsum was actually a bit faster. sum is a little more conventional.
Someone has has asked for an ability to expand dimensions in einsum, but I don't think that will happen.

Is it possible to optimize this Matlab code for doing vector quantization with centroids from k-means?

I've created a codebook using k-means of size 4000x300 (4000 centroids, each with 300 features). Using the codebook, I then want to label an input vector (for purposes of binning later on). The input vector is of size Nx300, where N is the total number of input instances I receive.
To compute the labels, I calculate the closest centroid for each of the input vectors. To do so, I compare each input vector against all centroids and pick the centroid with the minimum distance. The label is then just the index of that centroid.
My current Matlab code looks like:
function labels = assign_labels(centroids, X)
labels = zeros(size(X, 1), 1);
% for each X, calculate the distance from each centroid
for i = 1:size(X, 1)
% distance of X_i from all j centroids is: sum((X_i - centroid_j)^2)
% note: we leave off the sqrt as an optimization
distances = sum(bsxfun(#minus, centroids, X(i, :)) .^ 2, 2);
[value, label] = min(distances);
labels(i) = label;
end
However, this code is still fairly slow (for my purposes), and I was hoping there might be a way to optimize the code further.
One obvious issue is that there is a for-loop, which is the bane of good performance on Matlab. I've been trying to come up with a way to get rid of it, but with no luck (I looked into using arrayfun in conjunction with bsxfun, but haven't gotten that to work). Alternatively, if someone know of any other way to speed this up, I would be greatly appreciate it.
Update
After doing some searching, I couldn't find a great solution using Matlab, so I decided to look at what is used in Python's scikits.learn package for 'euclidean_distance' (shortened):
XX = sum(X * X, axis=1)[:, newaxis]
YY = Y.copy()
YY **= 2
YY = sum(YY, axis=1)[newaxis, :]
distances = XX + YY
distances -= 2 * dot(X, Y.T)
distances = maximum(distances, 0)
which uses the binomial form of the euclidean distance ((x-y)^2 -> x^2 + y^2 - 2xy), which from what I've read usually runs faster. My completely untested Matlab translation is:
XX = sum(data .* data, 2);
YY = sum(center .^ 2, 2);
[val, ~] = max(XX + YY - 2*data*center');
Use the following function to calculate your distances. You should see an order of magnitude speed up
The two matrices A and B have the columns as the dimenions and the rows as each point.
A is your matrix of centroids. B is your matrix of datapoints.
function D=getSim(A,B)
Qa=repmat(dot(A,A,2),1,size(B,1));
Qb=repmat(dot(B,B,2),1,size(A,1));
D=Qa+Qb'-2*A*B';
You can vectorize it by converting to cells and using cellfun:
[nRows,nCols]=size(X);
XCell=num2cell(X,2);
dist=reshape(cell2mat(cellfun(#(x)(sum(bsxfun(#minus,centroids,x).^2,2)),XCell,'UniformOutput',false)),nRows,nRows);
[~,labels]=min(dist);
Explanation:
We assign each row of X to its own cell in the second line
This piece #(x)(sum(bsxfun(#minus,centroids,x).^2,2)) is an anonymous function which is the same as your distances=... line, and using cell2mat, we apply it to each row of X.
The labels are then the indices of the minimum row along each column.
For a true matrix implementation, you may consider trying something along the lines of:
P2 = kron(centroids, ones(size(X,1),1));
Q2 = kron(ones(size(centroids,1),1), X);
distances = reshape(sum((Q2-P2).^2,2), size(X,1), size(centroids,1));
Note
This assumes the data is organized as [x1 y1 ...; x2 y2 ...;...]
You can use a more efficient algorithm for nearest neighbor search than brute force.
The most popular approach are Kd-Tree. O(log(n)) average query time instead of the O(n) brute force complexity.
Regarding a Maltab implementation of Kd-Trees, you can have a look here