Time complexity (Big-O notation) of Posterior Probability Calculation - time-complexity

I got a basic idea of Big-O notation from Big-O notation's definition.
In my problem, a 2-D surface is divided into uniform M grids. Each grid (m) is assigned with a posterior probability based on A features.
The posterior probability of m grid is calculated as follows:
and the marginal likelihood is given as:
Here, A features are independent of each other and sigma and mean symbol represent the standard deviation and mean value of each a feature at each grid. I need to calculate the Posterior probability of all M grids.
What will be the time complexity of the above operation in terms of Big-O notation?
My guess is O(M) or O(M+A). Am I correct? I'm expecting an authenticate answer to present at the formal forum.
Also, what will be the time complexity if M grids are divided into T clusters where every cluster has Q grids (Q << M) (calculating Posterior Probability only on Q grids out of M grids) ?
Thank you very much.

Discrete sum and product
can be understood as loops. If you are happy with floating point approximation most other operators are typically O(1), conditional probability looks like a function call. Just inject constants and variables in your equation and you'll get the expected Big-O, the details of formula are irrelevant. Also be aware that these "loops" can often be simplified using mathematical properties.
If the result is not obvious, please convert your above mathematical formula in actual programming code in a programming language. Computer Science Big-O is never about a formula but about an actual translation of it in programming steps, depending on the implementation the same formula can lead to very different execution complexities. As different as adding integers by actually performing sum O(n) or applying Gauss formula O(1) for instance.
By the way why are you doing a discrete sum on a discrete domaine N ? Shouldn't it be M ?

Related

computational complexity of higher order derivatives with AD in jax

Let f: R -> R be an infinitely differentiable function. What is the computational complexity of calculating the first n derivatives of f in Jax? Naive chain rule would suggest that each multiplication gives a factor of 2 increase, hence the nth derivative would require at least 2^n more operations. I imagine though that clever manipulation of formal series would reduce the number of required calculations and eliminate duplications, esspecially if the derivaives are Jax jitted? Is there a different between the Jax, Tensorflow and Torch implementations?
https://openreview.net/forum?id=SkxEF3FNPH discusses this topic, but doesn t provide a computational complexity.
What is the computational complexity of calculating the first n derivatives of f in Jax?
There's not much you can say in general about computational complexity of Nth derivatives. For example, with a function like jnp.sin, the Nth derivative is O[1], oscillating between negative and positive sin and cos calls as N grows. For an order-k polynomial, the Nth derivative is O[0] for N > k. Other functions may have complexity that is linear or polynomial or even exponential with N depending on the operations they contain.
I imagine though that clever manipulation of formal series would reduce the number of required calculations and eliminate duplications, esspecially if the derivaives are Jax jitted
You imagine correctly! One implementation of this idea is the jax.experimental.jet module, which is an experimental transform designed for computing higher-order derivatives efficiently and accurately. It doesn't cover all JAX functions, but it may be complete enough to do what you have in mind.

CVXPY: quasiconvex fractional programming problem with variance term

I have a portfolio optimization problem where my objective function is the mean divided by the standard deviation.
The variance is the difference of two random variables so is computed as Var(X) + Var(Y) - 2 * Cov(X, Y). The variance term is specified as above, where w represents the portfolio selection, capital sigma is a covariance matrix, and sigma sub delta g is a vector of covariances related to the second random variable. The problem is that CVXPY doesn't consider the last term there to be nonnegative because some of the covariance terms are negative. Obviously, I know that the variance will always be nonnegative, so I believe that this should work as a quasiconvex problem. Is there any way to tell CVXPY that this variance term will always be positive?

How do I determine "Big-oh" time complexity given an excel sheet showing various input size values VS their corresponding run time values

I have been given a list of input sizes and their corresponding runtime values for a given algorithm A. How should I go about computing the "Big-oh" time complexity of algorithm A given these values?
Try playing around with the numbers and see if they approximately fit one of the "standard" complexity functions, e.g. n, n^2, n^3, 2^n, log(n).
For example, if the ratio between value and input is nearly constant, it's likely O(n). If the ratio between value and input grows linearly (or doubling the input quadruples the value etc.), it is O(n^2). If it grows quadratically, it's O(n^3). If adding a constant to the input results in multiplicative change in its value, it's exponential. And if it's the reverse relationship, it's log(n).
If it's just slightly but consistently growing more quickly than a line, it's probably O(n log(n)).
You can also plot the graph of your values (input numbers vs runtime values) in Excel and overlay it with the graph of the function you guessed may fit, and then try to tweak the parameters (e.g. for O(n^2), plot a graph of a*x^2 + b, and tweak a and b).
To make it more precise (e.g. to calculate the uncertainty), you could apply regression analysis (search for non-linear regression analysis in Excel).

Determine the running time of an algorithm with two parameters

I have implemented an algorithm that uses two other algorithms for calculating the shortest path in a graph: Dijkstra and Bellman-Ford. Based on the time complexity of the these algorithms, I can calculate the running time of my implementation, which is easy giving the code.
Now, I want to experimentally verify my calculation. Specifically, I want to plot the running time as a function of the size of the input (I am following the method described here). The problem is that I have two parameters - number of edges and number of vertices.
I have tried to fix one parameter and change the other, but this approach results in two plots - one for varying number of edges and the other for varying number of vertices.
This leads me to my question - how can I determine the order of growth based on two plots? In general, how can one experimentally determine the running time complexity of an algorithm that has more than one parameter?
It's very difficult in general.
The usual way you would experimentally gauge the running time in the single variable case is, insert a counter that increments when your data structure does a fundamental (putatively O(1)) operation, then take data for many different input sizes, and plot it on a log-log plot. That is, log T vs. log N. If the running time is of the form n^k you should see a straight line of slope k, or something approaching this. If the running time is like T(n) = n^{k log n} or something, then you should see a parabola. And if T is exponential in n you should still see exponential growth.
You can only hope to get information about the highest order term when you do this -- the low order terms get filtered out, in the sense of having less and less impact as n gets larger.
In the two variable case, you could try to do a similar approach -- essentially, take 3 dimensional data, do a log-log-log plot, and try to fit a plane to that.
However this will only really work if there's really only one leading term that dominates in most regimes.
Suppose my actual function is T(n, m) = n^4 + n^3 * m^3 + m^4.
When m = O(1), then T(n) = O(n^4).
When n = O(1), then T(n) = O(m^4).
When n = m, then T(n) = O(n^6).
In each of these regimes, "slices" along the plane of possible n,m values, a different one of the terms is the dominant term.
So there's no way to determine the function just from taking some points with fixed m, and some points with fixed n. If you did that, you wouldn't get the right answer for n = m -- you wouldn't be able to discover "middle" leading terms like that.
I would recommend that the best way to predict asymptotic growth when you have lots of variables / complicated data structures, is with a pencil and piece of paper, and do traditional algorithmic analysis. Or possibly, a hybrid approach. Try to break the question of efficiency into different parts -- if you can split the question up into a sum or product of a few different functions, maybe some of them you can determine in the abstract, and some you can estimate experimentally.
Luckily two input parameters is still easy to visualize in a 3D scatter plot (3rd dimension is the measured running time), and you can check if it looks like a plane (in log-log-log scale) or if it is curved. Naturally random variations in measurements plays a role here as well.
In Matlab I typically calculate a least-squares solution to two-variable function like this (just concatenates different powers and combinations of x and y horizontally, .* is an element-wise product):
x = log(parameter_x);
y = log(parameter_y);
% Find a least-squares fit
p = [x.^2, x.*y, y.^2, x, y, ones(length(x),1)] \ log(time)
Then this can be used to estimate running times for larger problem instances, ideally those would be confirmed experimentally to know that the fitted model works.
This approach works also for higher dimensions but gets tedious to generate, maybe there is a more general way to achieve that and this is just a work-around for my lack of knowledge.
I was going to write my own explanation but it wouldn't be any better than this.

How to calculate continuous effect of gravitational pull between simulated planets

so I am making a simple simulation of different planets with individual velocity flying around space and orbiting each other.
I plan to simulate their pull on each other by considering each planet as projecting their own "gravity vector field." Each time step I'm going to add the vectors outputted from each planets individual vector field equation (V = -xj + (-yj) or some notation like it) except the one being effected in the calculation, and use the effected planets position as input to the equations.
However this would inaccurate, and does not consider the gravitational pull as continuous and constant. Bow do I calculate the movement of my planets if each is continuously effecting the others?
Thanks!
In addition to what Blender writes about using Newton's equations, you need to consider how you will be integrating over your "acceleration field" (as you call it in the comment to his answer).
The easiest way is to use Euler's Method. The problem with that is it rapidly diverges, but it has the advantage of being easy to code and to be reasonably fast.
If you are looking for better accuracy, and are willing to sacrifice some performance, one of the Runge-Kutta methods (probably RK4) would ordinarily be a good choice. I'll caution you that if your "acceleration field" is dynamic (i.e. it changes over time ... perhaps as a result of planets moving in their orbits) RK4 will be a challenge.
Update (Based on Comment / Question Below):
If you want to calculate the force vector Fi(tn) at some time step tn applied to a specific object i, then you need to compute the force contributed by all of the other objects within your simulation using the equation Blender references. That is for each object, i, you figure out how all of the other objects pull (apply force) and those vectors when summed will be the aggregate force vector applied to i. Algorithmically this looks something like:
for each object i
Fi(tn) = 0
for each object j ≠ i
Fi(tn) = Fi(tn) + G * mi * mj / |pi(tn)-pj(tn)|2
Where pi(tn) and pj(tn) are the positions of objects i and j at time tn respectively and the | | is the standard Euclidean (l2) normal ... i.e. the Euclidean distance between the two objects. Also, G is the gravitational constant.
Euler's Method breaks the simulation into discrete time slices. It looks at the current state and in the case of your example, considers all of the forces applied in aggregate to all of the objects within your simulation and then applies those forces as a constant over the period of the time slice. When using
ai(tn) = Fi(tn)/mi
(ai(tn) = acceleration vector at time tn applied to object i, Fi(tn) is the force vector applied to object i at time tn, and mi is the mass of object i), the force vector (and therefore the acceleration vector) is held constant for the duration of the time slice. In your case, if you really have another method of computing the acceleration, you won't need to compute the force, and can instead directly compute the acceleration. In either event, with the acceleration being held as constant, the position at time tn+1, p(tn+1) and velocity at time tn+1, v(tn+1), of the object will be given by:
pi(tn+1) = 0.5*ai(tn)*(tn+1-tn)2 + vi(tn)*(tn+1-tn)+pi(tn)
vi(tn+1) = ai(tn+1)*(tn+1-tn) + vi(tn)
The RK4 method fits the driver of your system to a 2nd degree polynomial which better approximates its behavior. The details are at the wikipedia site I referenced above, and there are a number of other resources you should be able to locate on the web. The basic idea is that instead of picking a single force value for a particular timeslice, you compute four force vectors at specific times and then fit the force vector to the 2nd degree polynomial. That's fine if your field of force vectors doesn't change between time slices. If you're using gravity to derive the vector field, and the objects which are the gravitational sources move, then you need to compute their positions at each of the four sub-intervals in order compute the force vectors. It can be done, but your performance is going to be quite a bit poorer than using Euler's method. On the plus side, you get more accurate motion of the objects relative to each other. So, it's a challenge in the sense that it's computationally expensive, and it's a bit of a pain to figure out where all the objects are supposed to be for your four samples during the time slice of your iteration.
There is no such thing as "continuous" when dealing with computers, so you'll have to approximate continuity with very small intervals of time.
That being said, why are you using a vector field? What's wrong with Newton?
And the sum of the forces on an object is that above equation. Equate the two and solve for a
So you'll just have to loop over all the objects one by one and find the acceleration on it.