Estimating the Run Time for the "Traveling Salesman Problem" - optimization

The "Traveling Salesman Problem" is a problem where a person has to travel between "n" cities - but choose the itinerary such that:
Each city is visited only once
The total distance traveled is minimized
I have heard that if a modern computer were the solve this problem using "brute force" (i.e. an exact solution) - if there are more than 15 cities, the time taken by the computer will exceed a hundred years!
I am interested in understanding "how do we estimate the amount of time it will take for a computer to solve the Traveling Salesman Problem (using "brute force") as the number of cities increase". For instance, from the following reference (https://www.sciencedirect.com/topics/earth-and-planetary-sciences/traveling-salesman-problem):
My Question: Is there some formula we can use to estimate the number of time it will take a computer to solve Traveling Salesman using "brute force"? For example:
N cities = N! paths
Each of these N! paths will require "N" calculations
Thus, N * N calculations would be required for the computer to check all paths and then be certain that the shortest path has been found : If we know the time each calculation takes, perhaps we could estimate the total run time as "time per calculation * N*N! "
But I am not sure if this factors in the time to "store and compare" calculations.
Can someone please explain this?

I have heard that if a modern computer were the solve this problem using "brute force" (i.e. an exact solution) - if there are more than 15 cities, the time taken by the computer will exceed a hundred years!
This is not completely true. While the naive brute-force algorithm runs with a n! complexity. A much better algorithm using dynamic programming runs in O(n^2 2^n). Just to give you an idea, with n=25, n! ≃ 2.4e18 while n^2 2^n ≃ 1e12. The former is too huge to be practicable while the second could be OK although it should take a pretty long time on a PC (one should keep in mind that both algorithm complexities contain an hidden constant variable playing an important role to compute a realistic execution time). I used an optimized dynamic programming solution based on the Held–Karp algorithm to compute the TSP of 20 cities on my machine with a relatively reasonable time (ie. no more than few minutes of computation).
Note that in practice heuristics are used to speed up the computation drastically often at the expense of a sub-optimal solution. Some algorithm can provide a good result in a very short time compared to the previous exact algorithms (polynomial algorithms with a relatively small exponent) with a fixed bound on the quality of the result (for example the distance found cannot be bigger than 2 times the optimal solution). In the end, heuristics can often found very good results in a reasonable time. One simple heuristic is to avoid crossing segments assuming an Euclidean distance is used (AFAIK a solution with crossing segments is always sub-optimal).
My Question: Is there some formula we can use to estimate the number of time it will take a computer to solve Travelling Salesman using "brute force"?
Since the naive algorithm is compute bound and quite simple, you can do such an approximation based on the running-time complexity. But to get a relatively precise approximation of the execution time, you need a calibration since not all processors nor implementations behave the same way. You can assume that the running time is C n! and find the value of C experimentally by measuring the computation time taken by a practical brute-force implementation. Another approach is to theoretically find the value of C based on low-level architectural properties (eg. frequency, number of core used, etc.) of the target processor. The former is much more precise assuming the benchmark is properly done and the number of points is big enough. Moreover, the second method requires a pretty good understanding of the way modern processors work.
Numerically, assuming a running time t ≃ C n!, we can say that ln t ≃ ln(C n!) ≃ ln(C) + ln(n!). Based on the Stirling's approximation, we can say that ln t ≃ ln C + n ln n + O(ln n), so ln C ≃ ln t - n ln n - O(ln n). Thus, ln C ≃ ln t - n ln n - O(ln n) and finally, C ≃ exp(ln t - n ln n) (with an O(n) approximation). That being said, the Stirling's approximation may not be precise enough. Using a binary search to numerically compute the inverse gamma function (which is a generalization of the factorial) should give a much better approximation for C.
Each of these N! paths will require "N" calculations
Well, a slightly optimized brute-force algorithm do not need perform N calculation as the partial path length can be precomputed. The last loops just need to read the precomputed sums from a small array that should be stored in the L1 cache (so it take only no more than few cycle of latency to read/store).

Related

Time complexity with respect to input

This is a constant doubt I'm having. For example, I have a 2-d array of size n^2 (n being the number of rows and columns). Suppose I want to print all the elements of the 2-d array. When I calculate the time complexity of the algorithm with respect to n it's O(n^2 ). But if I calculated the time with respect to the input size (n^2 ) it's linear. Are both these calculations correct? If so, why do people only use O(n^2 ) everywhere regarding 2-d arrays?
That is not how time complexity works. You cannot do "simple math" like that.
A two-dimensional square array of extent x has n = x*x elements. Printing these n elements takes n operations (or n/m if you print m items at a time), which is O(N). The necessary work increases linearly with the number of elements (which is, incidentially, quadratic in respect of the array extent -- but if you arranged the same number of items in a 4-dimensional array, would it be any different? Obviously, no. That doesn't magically make it O(N^4)).
What you use time complexity for is not stuff like that anyway. What you want time complexity to tell you is an approximate idea of how some particular algorithm may change its behavior if you grow the number of inputs beyond some limit.
So, what you want to know is, if you do XYZ on one million items or on two million items, will it take approximately twice as long, or will it take approximately sixteen times as long, for example.
Time complexity analysis is irrespective of "small details" such as how much time an actual operations takes. Which tends to make the whole thing more and more academic and practically useless in modern architectures because constant factors (such as memory latency or bus latency, cache misses, faults, access times, etc.) play an ever-increasing role as they stay mostly the same over decades while the actual cost-per-step (instruction throughput, ALU power, whatever) goes down steadily with every new computer generation.
In practice, it happens quite often that the dumb, linear, brute force approach is faster than a "better" approach with better time complexity simply because the constant factor dominates everything.

Time complexity of two algorithms running together?

Imagine T1(n) and T2(n) are running times of P1 and P2 programs, and
T1(n) ∈ O(f(n))
T2(n) ∈ O(g(n))
What is the amount of T1(n)+T2(n), when P1 is running along side P2?
The Answer is O(max{f(n), g(n)}) but why?
When we think about Big-O notation, we generally think about what the algorithm does as the size of the input n gets really big. A lot of times, we can fall back on some sort of intuition with math. Consider two functions, one that is O(n^2) and one that is O(n). As n gets really large, both algorithms increases without bound. The difference is, the O(n^2) algorithm grows much, MUCH faster than O(n). So much, in fact, that if you combine the algorithms into something that would be O(n^2+n), the factor of n by itself is so small that it can be ignored, and the algorithm is still in the class O(n^2).
That's why when you add together two algorithms, the combined running time is in O(max{f(n), g(n)}). There's always one that 'dominates' the runtime, making the affect of the other negligible.
The Answer is O(max{f(n), g(n)})
This is only correct if the programms run independently of each other. Anyhow, let's assume, this is the case.
In order to answer the why, we need to take a closer look at what the BIG-O-notation represents. Contrary to the way you stated it, it does not represent time but an upperbound on the complexity.
So while running both programms might take more time, the upperbound on the complexity won't increase.
Lets considder an example: P_1 calculates the the product of all pairs of n numbers in a vector, it is implemented using nested loops, and therefore has a complexity of O(n*n). P_2 just prints the numbers in a single loop and therefore has a complexity of O(n).
Now if we run both programms at the same time, the nested loops of P_1 are the most 'complex' part, leaving the combination with a complexity of O(n*n)

Time Complexity of an algorithm in Travelling Salesman?

I'm currently learning about the TSP and want to combine two simple heuristics in one algorithm. It works by using the nearest neighbour algorithm to create a tour and then improving it by using a 2 opt swap for every combination. I believe the number of steps for the 2 opt technique is n(n-1), so O(n) = n^2. However, I don't know how to calculate the complexity for the nearest neighbour algorithm. I likely the think it will result to O(n^2) but I am not certain about the process to get to this.

Time complexity of a genetic algorithm for bin packing

I am trying to explore genetic algorithms (GA) for the bin packing problem, and compare it to classical Any-Fit algorithms. However the time complexity for GA is never mentioned in any of the scholarly articles. Is this because the time complexity is very high? and that the main goal of a GA is to find the best solution without considering the time? What is the time complexity of a basic GA?
Assuming that termination condition is number of iterations and it's constant then in general it would look something like that:
O(p * Cp * O(Crossover) * Mp * O(Mutation) * O(Fitness))
p - population size
Cp - crossover probability
Mp - mutation probability
As you can see it not only depends on parameters like eg. population size but also on implementation of crossover, mutation operations and fitness function implementation. In practice there would be more parameters like for example chromosome size etc.
You don't see much about time complexity in publications because researchers most of the time compare GA using convergence time.
Edit Convergence Time
Every GA has some kind of a termination condition and usually it's convergence criteria. Let's assume that we want to find the minimum of a mathematical function so our convergence criteria will be the function's value. In short we reach convergence during optimization when it's no longer worth it to continue optimization because our best individual doesn't get better significantly. Take a look at this chart:
You can see that after around 10000 iterations fitness doesn't improve much and the line is getting flat. Best case scenario reaches convergence at around 9500 iterations, after that point we don't observe any improvement or it's insignificantly small. Assuming that each line shows different GA then Best case has the best convergence time becuase it reaches convergence criteria first.

does every algorithm have Big Omega?

does every algorithm have Big Omega?
Is it possible for algorithms to have both Big O and Big Omega (but not equal to each other- not Big Theta) ?
For instance Quicksort's Big O - O(n log n) But does it have Big Omega? If it does, how do i calculate it?
First, it is of paramount importance that one not confuse the bound with the case. A bound - like Big-Oh, Big-Omega, Big-Theta, etc. - says something about a rate of growth. A case says something about the kinds of input you're currently considering being processed by your algorithm.
Let's consider a very simple example to illustrate the distinction above. Consider the canonical "linear search" algorithm:
LinearSearch(list[1...n], target)
1. for i := 1 to n do
2. if list[i] = target then return i
3. return -1
There are three broad kinds of cases one might consider: best, worst, and average cases for inputs of size n. In the best case, what you're looking for is the first element in the list (really, within any fixed number of the start of the list). In such cases, it will take no more than some constant amount of time to find the element and return from the function. Therefore, the Big-Oh and Big-Omega happen to be the same for the best case: O(1) and Omega(1). When both O and Omega apply, we also say Theta, so this is Theta(1) as well.
In the worst case, the element is not in the list, and the algorithm must go through all n entries. Since f(n) = n happens to be a function that is bound from above and from below by the same class of functions (linear ones), this is Theta(n).
Average case analysis is usually a bit trickier. We need to define a probability space for viable inputs of length n. One might say that all valid inputs (where integers can be represented using 32 bits in unsigned mode, for instance) are equally probable. From that, one could work out the average performance of the algorithm as follows:
Find the probability that target is not represented in the list. Multiply by n.
Given that target is in the list at least once, find the probability that it appears at position k for each 1 <= k <= n. Multiply each P(k) by k.
Add up all of the above to get a function in terms of n.
Notice that in step 1 above, if the probability is non-zero, we will definitely get at least a linear function (exercise: we can never get more than a linear function). However, if the probability in step 1 is indeed zero, then the assignment of probabilities in step 2 makes all the difference in determining the complexity: you can have best-case behavior for some assignments, worst-case for others, and possibly end up with behavior that isn't the same as best (constant) or worst (linear).
Sometimes, we might speak loosely of a "general" or "universal" case, which considers all kinds of input (not just the best or the worst), but that doesn't give any particular weighting to inputs and doesn't take averages. In other words, you consider the performance of the algorithm in terms of an upper-bound on the worst-case, and a lower-bound on the best-case. This seems to be what you're doing.
Phew. Now, back to your question.
Are there functions which have different O and Omega bounds? Definitely. Consider the following function:
f(n) = 1 if n is odd, n if n is even.
The best case is "n is odd", in which case f is Theta(1); the worst case is "n is even", in which case f is Theta(n); and if we assume for the average case that we're talking about 32-bit unsigned integers, then f is Theta(n) in the average case, as well. However, if we talk about the "universal" case, then f is O(n) and Omega(1), and not Theta of anything. An algorithm whose runtime behaves according to f might be the following:
Strange(list[1...n], target)
1. if n is odd then return target
2. else return LinearSearch(list, target)
Now, a more interesting question might be whether there are algorithms for which some case (besides the "universal" case) cannot be assigned some valid Theta bound. This is interesting, but not overly so. The reason is that you, during your analysis, are allowed to choose the cases that constitutes best- and worst-case behavior. If your first choice for the case turns out not to have a Theta bound, you can simply exclude the inputs that are "abnormal" for your purposes. The case and the bound aren't completely independent, in that sense: you can often choose a case such that it has "good" bounds.
But can you always do it?
I don't know, but that's an interesting question.
Does every algorithm have a Big Omega?
Yes. Big Omega is a lower bound. Any algorithm can be said to take at least constant time, so any algorithm is Ω(1).
Does every algorithm have a Big O?
No. Big O is a upper bound. Algorithms that don't (reliably) terminate don't have a Big O.
An algorithm has an upper bound if we can say that, in the absolute worst case, the algorithm will not take longer than this. I'm pretty sure O(∞) is not valid notation.
When will the Big O and Big Omega of an algorithm be equal?
There is actually a special notation for when they can be equal: Big Theta (Θ).
They will be equal if the algorithm scales perfectly with the size of the input (meaning there aren't input sizes where the algorithm is suddenly a lot more efficient).
This is assuming we take Big O to be the smallest possible upper bound and Big Omega to be the largest possible lower bound. This is not actually required from the definition, but they're commonly informally treated as such. If you drop this assumption, you can find a Big O and Big Omega that aren't equal for any algorithm.
Brute force prime number checking (where we just loop through all smaller numbers and try to divide them into the target number) is perhaps a good example of when the smallest upper bound and largest lower bound are not equal.
Assume you have some number n. Let's also for the time being ignore the fact that bigger numbers take longer to divide (a similar argument holds when we take this into account, although the actual complexities would be different). And I'm also calculating the complexity based on the number itself instead of the size of the number (which can be the number of bits, and could change the analysis here quite a bit).
If n is divisible by 2 (or some other small prime), we can very quickly check whether it's prime with 1 division (or a constant number of divisions). So the largest lower bound would be Ω(1).
Now if n is prime, we'll need to try to divide n by each of the numbers up to sqrt(n) (I'll leave the reason we don't need to go higher than this as an exercise). This would take O(sqrt(n)), which would also then be our smallest upper bound.
So the algorithm would be Ω(1) and O(sqrt(n)).
Exact complexity also may be hard to calculate for some particularly complex algorithms. In such cases it may be much easier and acceptable to simply calculate some reasonably close lower and upper bounds and leave it at that. I don't however have an example on hand for this.
How does this relate to best case and worst case?
Do not confuse upper and lower bounds for best and worst case. This is a common mistake, and a bit confusing, but they're not the same. This is a whole other topic, but as a brief explanation:
The best and worst (and average) cases can be calculated for every single input size. The upper and lower bounds can then be used for each of those 3 cases (separately). You can think of each of those cases as a line on a graph with input size on the x-axis and time on the y-axis and then, for each of those lines, the upper and lower bounds are lines which need to be strictly above or below that line as the input size tends to infinity (this isn't 100% accurate, but it's a good basic idea).
Quick-sort has a worst-case of Θ(n2) (when we pick the worst possible pivot at every step) and a best-case of Θ(n log n) (when we pick good pivots). Note the use of Big Theta, meaning each of those are both lower and upper bounds.
Let's compare quick-sort with the above prime checking algorithm:
Say you have a given number n, and n is 53. Since it's prime, it will (always) take around sqrt(53) steps to determine whether it's prime. So the best and worst cases are all the same.
Say you want to sort some array of size n, and n is 53. Now those 53 elements can be arranged such that quick-sort ends up picking really bad pivots and run in around 532 steps (the worst case) or really good pivots and run in around 53 log 53 steps (the best case). So the best and worst cases are different.
Now take n as 54 for each of the above:
For prime checking, it will only take around 1 step to determine that 54 is prime. The best and worst cases are the same again, but they're different from what they were for 53.
For quick-sort, you'll again have a worst case of around 542 steps and a best case of around 54 log 54 steps.
So for quick-sort the worst case always takes around n2 steps and the best case always takes around n log n steps. So the lower and upper (or "tight") bound of the worst case is Θ(n2) and the tight bound of the best case is Θ(n log n).
For our prime checking, sometimes the worst case takes around sqrt(n) steps and sometimes it takes around 1 step. So the lower bound for the worse case would be Ω(1) and upper bound would be O(sqrt(n)). It would be the same for the best case.
Note that above I simply said "the algorithm would be Ω(1) and O(sqrt(n))". This is slightly ambiguous, as it's not clear whether the algorithm always takes the same amount of time for some input size, or the statement is referring to one of the best, average or worst case.
How do I calculate this?
It's hard to give general advice for this since proofs of bounds are greatly dependent on the algorithm. You'd need to analyse the algorithm similar to what I did above to figure out the worst and best cases.
Big O and Big Omega it can be calculated for every algorithm as you can see in Big-oh vs big-theta