On what does the time complexity for graph algorithms depends on? - time-complexity

I stumbled over this question in my textbook:
"In general, on what does the time complexity of Prim's, Kruskal's and Dijkstra's algorithms depends on?"
a. The number of vertices in the graph.
b. The number of edges in the graph.
c. Both, on the number of vertices and edges in the graph.
Explain your choice.
So according to Wikipedia Prim's,Kruskal's and Dijkstra's algorithms worst case time complexities are O(ElogV), O(ElogV) and O(E+VlogV) respectively. So i guess the answer is c? But why?

I don't know about Prim's and Kruskal's and might be wrong about Dijkstra's but I think in its case the answer would be b because:
Dijkstra's will visit nodes on the shortest known path until it finds the destination.
This implies that if two edges point to the same node, only one will ever be considered by the algorithm since one has a higher weight or they're equal, rendering one edge moot to follow.
Therefore, the only way to increase the time spent traversing the graph by adding edges is by adding nodes (adding edges on existing node can change the algorithm's traversal time but it's not proportional to the amount of edges, only to their weights).
Therefore, my intuition is that only the amount of nodes are in direct relation with running time. The Dijkstra's Alogrithm wikipedia page seems to confirm this:
The simplest implementation of the Dijkstra's algorithm stores
vertices of set Q in an ordinary linked list or array, and extract
minimum from Q is simply a linear search through all vertices in Q. In
this case, the running time is O(E + V^2) or O(V^2).
This is only an intuition of course, and cs.stackex might be of greater use.

The answer is (c), because both V and E contribute to the asymptotic complexity of the respective algorithms. Now, on further analysis one could argue that V is much less on Kruskal's and Prim's(since it is log factor). But E seems to almost have same weights in all three cases.
Also, note that |E| <= |V|^2 always (for simple graphs)

In worst case graph will be a complete graph i.e v(v-1)/2 edges ie e>>v and e ~ v^2
Time Complexity of Prim's and Dijkstra's algorithms are:
1. With Adjacency List and Priority queue:
O((v+e) log v) in worst case: e>>v so O( e log v)
2. With matrix and Priority queue:
O(v^2 + e log v) in WC e ~ v^2
So O(v^2 + e log v) ~ O(e + e log v) ~ O(e log v).
3. When graph go denser ( Worst case is Complete Graph ) we use Fibonacci Heap
and adjacency list: O( e + v log v)
Time complexity of kruskal is O(e log e) in Worst case e ~ v^2
so log (v^2) = 2 log v
So we can safely say than O(e log e) can be O( 2e log v)
ie O( e log v) in worst case.

As you said, the time complexities of O(ElogV), O(ElogV), and O(E+VlogV) mean that each one is dependent on both E and V. This is because each algorithm involves considering the edges and their respective weights in a graph. Since for Prim’s and Kruskal’s the MST has to be connected and include all vertices, and for Dijkstra’s the shortest path has to pass from one vertex to another through other intermediary vertices, the vertices also have to be considered in each algorithm.
For example, with Dijkstra’s algorithm, you are essentially looking to add edges that are both low in cost and that connect vertices that will eventually provide a path from the starting vertex to the ending vertex. To find the shortest path, you cannot solely look for a path that connects the start vertex to the end, and you cannot solely look for the smallest weighted edges, you need to consider both. Since you are considering both edges and vertices, the time it takes to make these considerations throughout the algorithm will depend on the number of edges and number of vertices.
Additionally, different time complexities are possible through different implementations of the three algorithms, and analyzing each algorithm requires a consideration of both E and V.
For example, Prim’s algorithm is O(V^2), but can be improved with the use of a min heap-based priority queue to achieve the complexity you found: O(ElogV). O(ElogV) may seem like the faster algorithm, but that’s not always the case. E can be as large as V^2, so in dense graphs with close to V^2 edges, O(ElogV) becomes O(V^2). If V is very small then there is not much difference between O(V^2) and O(ElogV). E and V also influence the running time based on the way that the graph is being stored. For example, an adjacency list becomes very inefficient with dense graphs (with E approaching V^2) because checking to see if an edge exists in the graph goes from close to O(1) to O(V).

Related

How to optimally use a run length morton-encoded chunk of voxels to generate a mesh?

I am currently toying around with meshing voxel chunks, that is a N^3 list of voxel values. Since in my use-case most of the voxels are going to be of the same type a lot of the neighbors will share the same value. Thus, using RLE (run length encoding) makes sense to use, as it drastically cuts down the actual storage requirement at the cost of a O(log(n)) (n being the amount of runs in a chunk) random lookup. At the same time, I encode each voxel position as a z-order curve, specifically using morton encoding. This all works in my use-case and gives the expected space reductions required.
The issue is with meshing the individual chunks. Generating meshes takes up to 500ms (for N=48) which is simply too long for fluid gameplay. My current algorithm heavily borrows from the one described here and here.
Pseudocode algorithm:
- For each axis in [X, Y, Z]
- for each k in 0..N in axis
- Create a mask of size [N * N]
- for each u in 0..N of orthogonal axis:
- for each v in 0..N of second orthogonal axis:
- Check if (k, u, v) has a face between itself and (k+1, u, v)
- If yes, set mask[u*v*N] = true
- using the mask, make a greedy mesh and output that
The mesh generation itself is very fast (<<< 1ms) and wouldn't gain much from being optimized, but building the mask itself is very costly. As one can see in the tracing output below, each mesh_mask_building takes on average ~4ms which happens 3*N times per mesh!
My first thought to optimize this, was to use the inherent runs of the chunk and simply traverse those and build a mask of each, but this did not work out, as morton encoding is winding a lot throughout the primitive and as such would not be much better at constructing a mesh. It would also be highly suboptimal when one considers a chunk where only one layer is set to visible voxels. The current method generates a simple cuboid as it does not care about runs, but in my suggested method, each run would be seperate and generate more faces.
So my question is, how can I use a morton-run-length encoding to generate a mesh?

Breadth first or Depth first

There is a theory that says six degrees of seperations is the highest
degree for people to be connected through a chain of acquaintances.
(You know the Baker - Degree of seperation 1, the Baker knows someone
you don't know - Degree of separation 2)
We have a list of People P, list A of corresponding acquaintances
among these people, and a person x
We are trying to implement an algorithm to check if person x respects
the six degrees of separations. It returns true if the distance from x
to all other people in P is at most six, false otherwise.
We are tying to accomplish O(|P| + |A|) in the worst-case.
To implement this algorithm, I thought about implementing an adjacency list over an adjacency matrix to represent the Graph G with vertices P and edges A, because an Adjacency Matrix would take O(n^2) to traverse.
Now I thought about using either BFS or DFS, but I can't seem to find a reason as to why the other is more optimal for this case.
I want to use BFS or DFS to store the distances from x in an array d, and then loop over the array d to look if any Degree is larger than 6.
DFS and BFS have the same Time Complexity, but Depth is better(faster?) in most cases at finding the first Degree larger than 6, whereas Breadth is better at excluding all Degrees > 6 simultaneously.
After DFS or BFS I would then loop over the array containing the distances from person x, and return true if there was no entry >6 and false when one is found.
With BFS, the degrees of separations would always be at the end of the Array, which would maybe lead to a higher time complexity?
With DFS, the degrees of separations would be randomly scattered in the Array, but the chance to have a degree of separation higher than 6 early in the search is higher.
I don't know if it makes any difference to the Time Complexity if using DFS or BFS here.
Time complexity of BFS and DFS is exactly the same. Both methods visit all connected vertices of the graph, so in both cases you have O(V + E), where V is the number of vertices and E is the number of edges.
That being said, sometimes one algorithm can be preferred over the other precisely because the order of vertex visitation is different. For instance, if you were to evaluate a mathematical expression, DFS would be much more convenient.
In your case, BFS could be used to optimize graph traversal, because you can simply cut-off BFS at the required degree of separation level. All the people who have the required (or bigger) degree of separation would be left unmarked as visited.
The same trick would be much more convoluted to implement with DFS, because as you've astutely noticed, DFS first gets "to the bottom" of the graph, and then it goes back recursively (or via stack) up level by level.
I believe that you can use the the Dijkstra algorithm.
Is a BFS approach that updates your path, is the path have a smaller value. Think as distance have always a cost of 1 and, if you have two friends (A and B) for a person N.
Those friends have a common friend C but, in a first time your algorithm checks a distance for friend A with cost 4 and mark as visited, they can't check the friend B that maybe have a distance of 3. The Dijkstra will help you doing checking this.
The Dijkstra solve this in O(|V|+|E|log|V)
See more at https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

Graph Between Steiner Tree and Complete Graph

Given a set points P in the plane, and given a threshold t, I'd like to compute a connected graph G to minimize the sum of the lengths of its edges, subject to the following constraints:
The vertices of G contain all the points in P.
For every pair of points u and v in P, their distance in G is no greater than t times their Euclidean distance.
When t=1, this problem is solved by constructing a complete graph on P. When t is infinite (or simply large enough), this problem is the Euclidean Steiner Tree Problem.
If there already a name for this problem, I'm curious what it is. More than that, does anyone have any suggestions for how to make an algorithm for this? Since it contains the Euclidean Steiner Tree Problem as a special case, it can't be simpler, so I'm not looking for anything particularly time efficient. Thanks!

Time complexity of counting cycles and complete subgraphs of a given size

I am very new to graph theory and I am trying to understand the following. Given an undirected graph G with V vertexes and E edges, what is the time complexity of counting all the complete subgraphs of size k and what is the time complexity of counting all the cycles of length k? Basically, which one is faster? I am only considering cases where k is even. Are there any well-known references for this?
Counting all cycles of length k can, in general, be achieved in O(VE) time (V = number of vertices, E = number of edges), although this number can be greatly improved depending on k (Ref). Counting all complete subgraphs or cliques can be achieved in O(nk k2) in the general case, but again, this number can be improved upon depending on the structure of the graph and algorithm used (Ref).

Determine the running time of an algorithm with two parameters

I have implemented an algorithm that uses two other algorithms for calculating the shortest path in a graph: Dijkstra and Bellman-Ford. Based on the time complexity of the these algorithms, I can calculate the running time of my implementation, which is easy giving the code.
Now, I want to experimentally verify my calculation. Specifically, I want to plot the running time as a function of the size of the input (I am following the method described here). The problem is that I have two parameters - number of edges and number of vertices.
I have tried to fix one parameter and change the other, but this approach results in two plots - one for varying number of edges and the other for varying number of vertices.
This leads me to my question - how can I determine the order of growth based on two plots? In general, how can one experimentally determine the running time complexity of an algorithm that has more than one parameter?
It's very difficult in general.
The usual way you would experimentally gauge the running time in the single variable case is, insert a counter that increments when your data structure does a fundamental (putatively O(1)) operation, then take data for many different input sizes, and plot it on a log-log plot. That is, log T vs. log N. If the running time is of the form n^k you should see a straight line of slope k, or something approaching this. If the running time is like T(n) = n^{k log n} or something, then you should see a parabola. And if T is exponential in n you should still see exponential growth.
You can only hope to get information about the highest order term when you do this -- the low order terms get filtered out, in the sense of having less and less impact as n gets larger.
In the two variable case, you could try to do a similar approach -- essentially, take 3 dimensional data, do a log-log-log plot, and try to fit a plane to that.
However this will only really work if there's really only one leading term that dominates in most regimes.
Suppose my actual function is T(n, m) = n^4 + n^3 * m^3 + m^4.
When m = O(1), then T(n) = O(n^4).
When n = O(1), then T(n) = O(m^4).
When n = m, then T(n) = O(n^6).
In each of these regimes, "slices" along the plane of possible n,m values, a different one of the terms is the dominant term.
So there's no way to determine the function just from taking some points with fixed m, and some points with fixed n. If you did that, you wouldn't get the right answer for n = m -- you wouldn't be able to discover "middle" leading terms like that.
I would recommend that the best way to predict asymptotic growth when you have lots of variables / complicated data structures, is with a pencil and piece of paper, and do traditional algorithmic analysis. Or possibly, a hybrid approach. Try to break the question of efficiency into different parts -- if you can split the question up into a sum or product of a few different functions, maybe some of them you can determine in the abstract, and some you can estimate experimentally.
Luckily two input parameters is still easy to visualize in a 3D scatter plot (3rd dimension is the measured running time), and you can check if it looks like a plane (in log-log-log scale) or if it is curved. Naturally random variations in measurements plays a role here as well.
In Matlab I typically calculate a least-squares solution to two-variable function like this (just concatenates different powers and combinations of x and y horizontally, .* is an element-wise product):
x = log(parameter_x);
y = log(parameter_y);
% Find a least-squares fit
p = [x.^2, x.*y, y.^2, x, y, ones(length(x),1)] \ log(time)
Then this can be used to estimate running times for larger problem instances, ideally those would be confirmed experimentally to know that the fitted model works.
This approach works also for higher dimensions but gets tedious to generate, maybe there is a more general way to achieve that and this is just a work-around for my lack of knowledge.
I was going to write my own explanation but it wouldn't be any better than this.