Algorithm to find the largest eigenvalue after Lanczos iterations - eigenvalue

I am interested in approximating the largest eigenvalue of a symmetric matrix. Many of the literatures redirect me to Lanczos algorithm. However, from my understanding, Lanczos algorithm only produces the best approximation subspace in a tridiagonal form, i.e., after running Lanczos iteration on my matrix for some K iterations, I will have a K-by-K tridiagonal matrix.
The next step should be to find the largest eigenvalue of this tridiagonal matrix; however, I could not find how to do this. Are there any algorithms that can find the largest eigenvalue of a tridiagonal matrix? Power iteration produces the eigenvalue with the maximum magnitude. This is not what I need.

Related

What is a Hessian matrix?

I know that the Hessian matrix is a kind of second derivative test of functions involving more than one independent variable. How does one find the maximum or minimum of a function involving more than one variable? Is it found using the eigenvalues of the Hessian matrix or its principal minors?
You should have a look here:
https://en.wikipedia.org/wiki/Second_partial_derivative_test
For an n-dimensional function f, find an x where the gradient grad f = 0. This is a critical point.
Then, the 2nd derivatives tell, whether x marks a local minimum, a maximum or a saddle point.
The Hessian H is the matrix of all combinations of 2nd derivatives of f.
For the 2D-case the determinant and the minors of the Hessian are relevant.
For the nD-case it might involve a computation of eigen values of the Hessian H (if H is invertible) as part of checking H for being positive (or negative) definite.
In fact, the shortcut in 1) is generalized by 2)
For numeric calculations, some kind of optimization strategy can be used for finding x where grad f = 0.

Tensorflow: efficient way to subtract a vector from the matrix

I have a MxN matrix and a length N vector. What I want to do is to subtract this vector from each of M rows of the matrix. The obvious way to do this is to tf.tile the vector, but this seems highly inefficient because of all the new memory allocation: tiling takes up to 4x as much time than the actualy subtraction.
Is there any way to do it more efficiently, or will I have to write my own operation in C++?

Proof that k-means always converges?

I understand k-means algorithms steps.
However I'm not sure if the algorithm will always converge? Or can the observations always switch from one centroid to another?
The algorithm always converges (by-definition) but not necessarily to global optimum.
The algorithm may switch from centroid to centroid but this is a parameter of the algorithm (precision, or delta). This is sometimes refered as "cycling". The algorithm after a while cycles through centroids. There are two solutions (which both can be used at the same time). Precision parameter, maximum number of iterations parameter.
Precision parameter, if centroids amount of change is less than a threshold delta, stop the algorithm.
Max Num Iterations, if algorithm reaches that number of iterations stop the algorithm.
Note that the above schemes do not spoil the convergence characteristics of the algorithm. it still will converge but not necessarily to global optimum (this is irrelevant of the scheme used, as in many optimisation algorithms).
You may be interested in the related question on stats.SE Cycling in k-means algorithm and a referenced proof of convergence

Which objective is optimized Intra-Cluster sum of distances or MSE?

In the cluster analysis papers using meta-heuristic algorithms, many has optimized Mean-Squared Quantization Error (MSE). For example in
[1] and [2] .
I have a confusion with the results. They have told that they have used the MSE as the objective function. But they have reported the result values in intra-cluster sum of Euclidean distances.
K-Means minimizes Within-Cluster Sum of Squares (WCSS) (from wiki) [3]. I could not find what is the difference between WCSS and MSE, when Euclidean distance is used in the case of the difference metric when calculating MSE.
In the case of K-Means the WCSS is minimized, and if we use the same MSE function with the meta-heuristics algorithms they will also minimize it. In this case how the sum of Euclidean distances for the K-Means and the other vary?
I can reproduce the results shown in the papers if I optimize the intra-cluster sum of Euclidean distances.
I think I am doing something wrong here. Can anyone help me with this.
Main question: What objectives did the referenced papers [1] and [2] optimize, and which function's values are shown in the table?
K-means optimizes the (sum of within-cluster-) sum of squares aka variance aka sum of squared Euclidean distances.
This is easy to see if you study the convergence proof.
I can't study the two papers you referenced. They're with crappy Elsevier and paywalled, and I'm not going to pay $36+$32 to answer your question.
Update: I managed to get a free copy of one of them. They call it "MSE, mean-square quantization error", but their equation is the usual within-cluster-sum-of-squares, no mean involved; with a shady self-citation attached to this statement, and half of the references being self-citations... it seems like it's more this author that likes to call it different than everybody else. Looks bit like "reinventing the wheel with a different name" to me. I'd carefully double-check their results. I'm not saying they are false, I havn't checked in more detail. But the "mean-square error" doesn't involve a mean for sure; it's the sum of squared errors.
Update: if "intra-cluster sum" means sum of pairwise distances of any two objects, consider the following:
Without loss of generality, move the data such that the mean is 0. (Translation doesn't change Euclidean or squared Euclidean distances).
sum_x sum_y sum_i (x_i-y_i)^2
= sum_x sum_y [ sum_i (x_i)^2 + sum_i (y_i)^2 - 2 sum_i (x_i*y_i) ]
= n * sum_x sum_i (x_i)^2 + n * sum_y sum_i (y_i)
- 2 * sum_i [sum_x x_i * sum_y y_i]
The first two summands are the same. So we have 2n times the WCSS.
But since mu_i = 0, sum_x x_i = sum_y y_i = 0, and the third term disappears.
If I didn't screw up this computation, then the mean, asymmetric pairwise squared Euclidean distance within a cluster is the same as WCSS.

Time complexity for finding largest eigenvalue

I'm trying to figure out the time complexity for calculating the largest eigenvector in a whole bunch of small matrices.
Each matrix is the adjacency matrix of the 1-step neighborhood of a node in a weighted, undirected graph. So all values are positive and the matrix is symmetric.
E.g.
0 2 1 1
2 0 1 0
1 1 0 0
1 0 0 0
I've found that the Power iteration method that is supposed to be O(n^2) complexity per iteration.
So does that mean the complexity for finding largest eigenvector for the 1-step neighborhood for every node in a graph is O(n * p^2), where n is the number of nodes, and p is the average degree of the graph (i.e. number of edges / number of nodes)?
Off the top of my head I would say your best bet is to use an iterative randomised algorithm known as the power iteration. The algorithm is iterative and finds true max eigenvalue with geometric convergence, with ratio of the largest eigenvalue to the second largest. So, if you two largest eigenvalues are equal do not use this method, otherwise it works quite nicely. You actually get the largest eigenvalue and respective eigenvector.
However, if you matrices are very small you might be just as well to perform PCA because it will not be so expensive. I am not aware of any hard threshold of when you should switch between the two. Also it depends if you are willing to accept small inaccuracies or the absolute true value.