Check how fast numpy.linalg.lstsq is finding convergence - numpy

I have a question concerning NumPy module linalg.lstsq(a,b). There is any possibility to check how fast this method is finding convergence? I mean some of characteristics which indicate how fast computation is going to convergence?
Thank you in advanced for brain storm.

The Numpy function linalg.lstsq uses singular value decomposition (SVD) to solve the least-square problem. Thus, if your matrix A is n by n, it will require n^3 flops.
More precisely, I think that the function uses the Householder Bidiagonalization to compute the SVD and so, if your matrix is m by n, the complexity will be O(max(m, n) * min(m, n)^2).

Related

computational complexity of higher order derivatives with AD in jax

Let f: R -> R be an infinitely differentiable function. What is the computational complexity of calculating the first n derivatives of f in Jax? Naive chain rule would suggest that each multiplication gives a factor of 2 increase, hence the nth derivative would require at least 2^n more operations. I imagine though that clever manipulation of formal series would reduce the number of required calculations and eliminate duplications, esspecially if the derivaives are Jax jitted? Is there a different between the Jax, Tensorflow and Torch implementations?
https://openreview.net/forum?id=SkxEF3FNPH discusses this topic, but doesn t provide a computational complexity.
What is the computational complexity of calculating the first n derivatives of f in Jax?
There's not much you can say in general about computational complexity of Nth derivatives. For example, with a function like jnp.sin, the Nth derivative is O[1], oscillating between negative and positive sin and cos calls as N grows. For an order-k polynomial, the Nth derivative is O[0] for N > k. Other functions may have complexity that is linear or polynomial or even exponential with N depending on the operations they contain.
I imagine though that clever manipulation of formal series would reduce the number of required calculations and eliminate duplications, esspecially if the derivaives are Jax jitted
You imagine correctly! One implementation of this idea is the jax.experimental.jet module, which is an experimental transform designed for computing higher-order derivatives efficiently and accurately. It doesn't cover all JAX functions, but it may be complete enough to do what you have in mind.

kNN-DTW time complexity

I found from various online sources that the time complexity for DTW is quadratic. On the other hand, I also found that standard kNN has linear time complexity. However, when pairing them together, does kNN-DTW have quadratic or cubic time?
In essence, does the time complexity of kNN solely depend on the metric used? I have not found any clear answer for this.
You need to be careful here. Let's say you have n time series in your 'training' set (let's call it this, even though you are not really training with kNN) of length l. Computing the DTW between a pair of time series has a asymptotic complexity of O(l * m) where m is your maximum warping window. As m <= l also O(l^2) holds. (although there might be more efficient implementations, i don't think they are actually faster in practice in most cases, see here). Classifying a time series using kNN requires you to compute the distance between that time series and all time series in the training set which would mean n comparisons, linear with respect to n.
So your final complexity would be in O(l * m * n) or O(l^2 * n). In words: the complexity is quadratic with respect to time series length and linear with respect to the number of training examples.

Gradient of LSTM one to many model output w.r.t input

I have an LSTM model that takes as input a vector x0 (dimension : n) and returns a sequence of vectors (size : T x n). I require the derivative of each sequence w.r.t x0 (size: n x n). Thus I require a Jacobian matrix of size (T x n x n). What is the most efficient way to do this in TensorFLow. I need this for optimization research that requires a function that takes x0 and returns the derivative information. Having searched all available options, I don't have any good way to approach this. Any help (psuedo code, documentations, posts etc) will be very beneficial.
Not entirely sure if this is what you are looking for, but Tensorflow has eager execution (allows for interactive debugging) and a pretty comprehensive framework for automatic computation of gradients. The combination of those two might assist. This link includes some details and example code which may lend itself to your specific needs: https://www.tensorflow.org/guide/eager and in particular this section: https://www.tensorflow.org/guide/eager#computing_gradients along with others on the post.
I hope this helps.

Time complexity (Big-O notation) of Posterior Probability Calculation

I got a basic idea of Big-O notation from Big-O notation's definition.
In my problem, a 2-D surface is divided into uniform M grids. Each grid (m) is assigned with a posterior probability based on A features.
The posterior probability of m grid is calculated as follows:
and the marginal likelihood is given as:
Here, A features are independent of each other and sigma and mean symbol represent the standard deviation and mean value of each a feature at each grid. I need to calculate the Posterior probability of all M grids.
What will be the time complexity of the above operation in terms of Big-O notation?
My guess is O(M) or O(M+A). Am I correct? I'm expecting an authenticate answer to present at the formal forum.
Also, what will be the time complexity if M grids are divided into T clusters where every cluster has Q grids (Q << M) (calculating Posterior Probability only on Q grids out of M grids) ?
Thank you very much.
Discrete sum and product
can be understood as loops. If you are happy with floating point approximation most other operators are typically O(1), conditional probability looks like a function call. Just inject constants and variables in your equation and you'll get the expected Big-O, the details of formula are irrelevant. Also be aware that these "loops" can often be simplified using mathematical properties.
If the result is not obvious, please convert your above mathematical formula in actual programming code in a programming language. Computer Science Big-O is never about a formula but about an actual translation of it in programming steps, depending on the implementation the same formula can lead to very different execution complexities. As different as adding integers by actually performing sum O(n) or applying Gauss formula O(1) for instance.
By the way why are you doing a discrete sum on a discrete domaine N ? Shouldn't it be M ?

NP problems can be solved in deterministically EXPONENTIAL time?

any problem in NP can be solved in deterministically exponential time,
or we can say that
any language in NP can be decided by an algorithm running in time 2^O(n^k)
i.e., NP ⊆ EXP
informally speaking, we just try each one of the possible solutions and then decide it
However, there is a simple example that I can not figure out what's wrong with the idea i made
Here it is..
The Traveling Salesman problem : given a undirected graph G=(V,E) V=|n|
This is a well-known NP-complete problem, therefore, indeed belongs to NP
And I try to analyse the running time..like this:
I simply list out all the possible solutions, and there are (n-1)! possible tours in total
Then I check each one of them, it takes O(n) for each possible tour
The total running time will be O(n!)
It doesn't look like can be bounded above by 2^O(n^k), i.e., exponential time
where is the pitfall of this analysis?
or in the other word, how can we explain traveling salesman problem indeed can be decided by an algorithm running in time 2^O(n^k)
Note that
n! ≤ nn = (2log n)n = 2n log n ≤ 2n2
So n! = 2O(n2), so n! &in; EXP.
Hope this helps!