the covariance matrix returned by function ‘numpy.polyfit’ - numpy

when I set ‘cov=True’,the function ‘numpy.polyfit’ will return a covariance matrix,I wonder what does it mean and how to calculate it,Thanks a lot !!!
the author of ‘ZSSR’ super resolution algorithm adopted a learning rate adjustment strategy,which first take a linear fit of reconstruction error,and then compare the standard deviation and the slope to guide the adjustment of learning rate。I can‘t see the relationship between the learning rate and the standard deviation(in fact I don’t know what does the covariance matrix returned by the funciton mean),so I wonder know the detail of this covariance matrix。

Related

bridgesampling with data from R2OpenBugs

I am working on Bayesian Model Averaging and need to calculate posterior model probabilities.
I want to use `bridge sampling' method recommended by Meng and Wong (1996) and Lopes and West (2004). I have posterior sample as mcmc.list class that I obtained using R2OPenBugs (90,000 samples, each sample has four values). To approximate the marginal likelihood, an approximation function g() to the joint posterior density is chosen. For simplicity, we
took g() as the tetravariate normal density function with mean set to the empirical mean
vector and variance set to the empirical variance-covariance matrix from the mcmc sample.
I dont know how to code in R this function nor how to use existing bridge_sampler function.
Bridge_sampler function has arguments that I dont know what they mean. Tutorial gives example how to use bridge_sampler function with jugs data but not with R2OpenBugs data.
Any help is appreciated!

Explained variance calculation

My questions are specific to https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.
I don't understand why you square eigenvalues
https://github.com/scikit-learn/scikit-learn/blob/55bf5d9/sklearn/decomposition/pca.py#L444
here?
Also, explained_variance is not computed for new transformed data other than original data used to compute eigen-vectors. Is that not normally done?
pca = PCA(n_components=2, svd_solver='full')
pca.fit(X)
pca.transform(Y)
In this case, won't you separately calculate explained variance for data Y as well. For that purpose, I think we would have to use point 3 instead of using eigen-values.
Explained variance can be also computed by taking the variance of each axis in the transformed space and dividing by the total variance. Any reason that is not done here?
Answers to your questions:
1) The square roots of the eigenvalues of the scatter matrix (e.g. XX.T) are the singular values of X (see here: https://math.stackexchange.com/a/3871/536826). So you square them. Important: the initial matrix X should be centered (data has been preprocessed to have zero mean) in order for the above to hold.
2) Yes this is the way to go. explained_variance is computed based on the singular values. See point 1.
3) It's the same but in the case you describe you HAVE to project the data and then do additional computations. No need for that if you just compute it using the eigenvalues / singular values (see point 1 again for the connection between these two).
Finally, keep in mind that not everyone really wants to project the data. Someone can only get the eigenvalues and then immediately estimate the explained variance WITHOUT projecting the data. So that's the best gold standard way to do it.
EDIT 1:
Answer to edited Point 2
No. PCA is an unsupervised method. It only transforms the X data not the Y (labels).
Again, the explained variance can be computed fast, easily, and with half line of code using the eigenvalues/singular values OR as you said using the projected data e.g. estimating the covariance of the projected data, then variances of PCs will be in the diagonal.

Time complexity (Big-O notation) of Posterior Probability Calculation

I got a basic idea of Big-O notation from Big-O notation's definition.
In my problem, a 2-D surface is divided into uniform M grids. Each grid (m) is assigned with a posterior probability based on A features.
The posterior probability of m grid is calculated as follows:
and the marginal likelihood is given as:
Here, A features are independent of each other and sigma and mean symbol represent the standard deviation and mean value of each a feature at each grid. I need to calculate the Posterior probability of all M grids.
What will be the time complexity of the above operation in terms of Big-O notation?
My guess is O(M) or O(M+A). Am I correct? I'm expecting an authenticate answer to present at the formal forum.
Also, what will be the time complexity if M grids are divided into T clusters where every cluster has Q grids (Q << M) (calculating Posterior Probability only on Q grids out of M grids) ?
Thank you very much.
Discrete sum and product
can be understood as loops. If you are happy with floating point approximation most other operators are typically O(1), conditional probability looks like a function call. Just inject constants and variables in your equation and you'll get the expected Big-O, the details of formula are irrelevant. Also be aware that these "loops" can often be simplified using mathematical properties.
If the result is not obvious, please convert your above mathematical formula in actual programming code in a programming language. Computer Science Big-O is never about a formula but about an actual translation of it in programming steps, depending on the implementation the same formula can lead to very different execution complexities. As different as adding integers by actually performing sum O(n) or applying Gauss formula O(1) for instance.
By the way why are you doing a discrete sum on a discrete domaine N ? Shouldn't it be M ?

Get covariance best-fit parameters obtained by lmfit using non-"Leastsq"methods

I have some observational data and I want to fit some model parameters by using lmfit.Minimizer() to minimize an error function which, for reasons I won't get into here, must return a float instead of an array of residuals. This means that I cannot use the Leastsq method to minimize the function. In practice, methods nelder, BFGS and powell converge fine, but these methods do not provide the covariance of the best-fit parameters (MinimizerResult.covar).
I would like to know if thee is a simple way to compute this covariance when using any of the non-Leastsq methods.
It is true that the leastsq method is the only method that can calculate error bars and that this requires a residual array (with more elements than variables!).
It turns out that some work has been done in lmfit toward the goal of being able to compute uncertainties for scalar minimizers, but it is not complete. See https://github.com/lmfit/lmfit-py/issues/169 and https://github.com/lmfit/lmfit-py/pull/481. If you're interested in helping, that would be great!
But, yes, you could compute the covariance by hand. For each variable, you would need to make a small perturbation to its value (ideally around 1-sigma, but since that is what you're trying to calculate, you probably don't know it) and then fix that value and optimize all the other values. In this way you can compute the Jacobian matrix (derivative of the residual array with respect to the variables).
From the Jacobian matrix, the covariance matrix is (assuming there are no singularities):
covar = numpy.inv(numpy.dot(numpy.transpose(jacobian), jacobian))

Is the Pearson correlation coefficient a suitable objective function for quadratic programming solvers?

Is the Pearson correlation coefficient -- with one vector, x, exogenous and another vector, y, as a choice variable -- a suitable quadratic objective function for quadratic programming solvers like Gurobi?
A quick Google search for "Gurobi objective function" shows that Gurobi has an API to set an objective function that accepts a linear or quadratic expression. That is quite expected because quadratic programming is, by definition, an optimization of a quadratic function, with the math behind the methods specifically designed for this class (like, working directly with the Q coefficient matrix and the c vector rather than the raw function).
I didn't look into details too much, but I can see that Pearson product-moment correlation coefficient appears to be not a quadratic but a rational function. So, if your specific case can't be simplified to that, no.
I cannot say anything about other solvers because each one is an independent product and has to be considered separately.
Since your function appears to be piecewise continuous and infinitely differentiable, you're probably interested in general-purpose gradient methods instead.