Time complexity of a genetic algorithm for bin packing - time-complexity

I am trying to explore genetic algorithms (GA) for the bin packing problem, and compare it to classical Any-Fit algorithms. However the time complexity for GA is never mentioned in any of the scholarly articles. Is this because the time complexity is very high? and that the main goal of a GA is to find the best solution without considering the time? What is the time complexity of a basic GA?

Assuming that termination condition is number of iterations and it's constant then in general it would look something like that:
O(p * Cp * O(Crossover) * Mp * O(Mutation) * O(Fitness))
p - population size
Cp - crossover probability
Mp - mutation probability
As you can see it not only depends on parameters like eg. population size but also on implementation of crossover, mutation operations and fitness function implementation. In practice there would be more parameters like for example chromosome size etc.
You don't see much about time complexity in publications because researchers most of the time compare GA using convergence time.
Edit Convergence Time
Every GA has some kind of a termination condition and usually it's convergence criteria. Let's assume that we want to find the minimum of a mathematical function so our convergence criteria will be the function's value. In short we reach convergence during optimization when it's no longer worth it to continue optimization because our best individual doesn't get better significantly. Take a look at this chart:
You can see that after around 10000 iterations fitness doesn't improve much and the line is getting flat. Best case scenario reaches convergence at around 9500 iterations, after that point we don't observe any improvement or it's insignificantly small. Assuming that each line shows different GA then Best case has the best convergence time becuase it reaches convergence criteria first.

Related

Estimating the Run Time for the "Traveling Salesman Problem"

The "Traveling Salesman Problem" is a problem where a person has to travel between "n" cities - but choose the itinerary such that:
Each city is visited only once
The total distance traveled is minimized
I have heard that if a modern computer were the solve this problem using "brute force" (i.e. an exact solution) - if there are more than 15 cities, the time taken by the computer will exceed a hundred years!
I am interested in understanding "how do we estimate the amount of time it will take for a computer to solve the Traveling Salesman Problem (using "brute force") as the number of cities increase". For instance, from the following reference (https://www.sciencedirect.com/topics/earth-and-planetary-sciences/traveling-salesman-problem):
My Question: Is there some formula we can use to estimate the number of time it will take a computer to solve Traveling Salesman using "brute force"? For example:
N cities = N! paths
Each of these N! paths will require "N" calculations
Thus, N * N calculations would be required for the computer to check all paths and then be certain that the shortest path has been found : If we know the time each calculation takes, perhaps we could estimate the total run time as "time per calculation * N*N! "
But I am not sure if this factors in the time to "store and compare" calculations.
Can someone please explain this?
I have heard that if a modern computer were the solve this problem using "brute force" (i.e. an exact solution) - if there are more than 15 cities, the time taken by the computer will exceed a hundred years!
This is not completely true. While the naive brute-force algorithm runs with a n! complexity. A much better algorithm using dynamic programming runs in O(n^2 2^n). Just to give you an idea, with n=25, n! ≃ 2.4e18 while n^2 2^n ≃ 1e12. The former is too huge to be practicable while the second could be OK although it should take a pretty long time on a PC (one should keep in mind that both algorithm complexities contain an hidden constant variable playing an important role to compute a realistic execution time). I used an optimized dynamic programming solution based on the Held–Karp algorithm to compute the TSP of 20 cities on my machine with a relatively reasonable time (ie. no more than few minutes of computation).
Note that in practice heuristics are used to speed up the computation drastically often at the expense of a sub-optimal solution. Some algorithm can provide a good result in a very short time compared to the previous exact algorithms (polynomial algorithms with a relatively small exponent) with a fixed bound on the quality of the result (for example the distance found cannot be bigger than 2 times the optimal solution). In the end, heuristics can often found very good results in a reasonable time. One simple heuristic is to avoid crossing segments assuming an Euclidean distance is used (AFAIK a solution with crossing segments is always sub-optimal).
My Question: Is there some formula we can use to estimate the number of time it will take a computer to solve Travelling Salesman using "brute force"?
Since the naive algorithm is compute bound and quite simple, you can do such an approximation based on the running-time complexity. But to get a relatively precise approximation of the execution time, you need a calibration since not all processors nor implementations behave the same way. You can assume that the running time is C n! and find the value of C experimentally by measuring the computation time taken by a practical brute-force implementation. Another approach is to theoretically find the value of C based on low-level architectural properties (eg. frequency, number of core used, etc.) of the target processor. The former is much more precise assuming the benchmark is properly done and the number of points is big enough. Moreover, the second method requires a pretty good understanding of the way modern processors work.
Numerically, assuming a running time t ≃ C n!, we can say that ln t ≃ ln(C n!) ≃ ln(C) + ln(n!). Based on the Stirling's approximation, we can say that ln t ≃ ln C + n ln n + O(ln n), so ln C ≃ ln t - n ln n - O(ln n). Thus, ln C ≃ ln t - n ln n - O(ln n) and finally, C ≃ exp(ln t - n ln n) (with an O(n) approximation). That being said, the Stirling's approximation may not be precise enough. Using a binary search to numerically compute the inverse gamma function (which is a generalization of the factorial) should give a much better approximation for C.
Each of these N! paths will require "N" calculations
Well, a slightly optimized brute-force algorithm do not need perform N calculation as the partial path length can be precomputed. The last loops just need to read the precomputed sums from a small array that should be stored in the L1 cache (so it take only no more than few cycle of latency to read/store).

Differential evolution algorithm different results for different runs

As the title says, I am using the Differential Evolution algorithm as implemented in the Python mystic package for a global optimisation problem for O(10) parameters, with bounds and constraints.
I am using the simple interface diffev
result = my.diffev(func, x0, npop = 10*len(list(bnds)), bounds = bnds,
ftol = 1e-11, gtol = gtol, maxiter = 1024**3, maxfun = 1024**3,
constraints = constraint_eq, penalty = penalty,
full_output=True, itermon=mon, scale = scale)
I was experimenting running the SAME optimisation over several times: given a scaling for the differential evolution algorithm, I run 10 times the optimisation problem.
Result? I get different answers for almost all the results!
I experiment with scaling of 0.7, 0.75, 0.8, and 0.85, all roughly same bad behaviour (as suggested on the mystic page).
Here there is an example: on the x-axis there are the parameters, on the y-axis their values. The labels represent the iteration. Ideally you want to see only one line.
I run with gtol = 3500, so it should be quite long. I am using npop = 10*number pars, ftol = 1e-11, and the other important arguments for the diffev algorithm are the default ones.
Does anyone have some suggestion for tuning the differential evolution with mystic? Is there a way to avoid this variance in the results? I know it is a stochastic algorithm, but I did not expect it to give different results while running on gtol of 3500. My understanding was also that this algorithm does not get stuck into local minima, but I might be wrong.
p.s.
This is not relevant for the question, but just to give some context of why this is important for me.
What I need to do for my work is to minimise a function, under the conditions above, for several input data: I optimize for each data configuration over the O(10) parameters, then the configuration with some parameters that gives the overall minimum is the 'chosen' one.
Now, if the optimiser is not stable, it might give me the wrong data configuration by chance as the optimal one, as I run over hundreds of them.
I'm the mystic author. As you state, differential evolution (DE) is a stochastic algorithm. Essentially, DE uses a random mutations on the current solution vector to come up with new candidate solutions. So, you can expect to get different results for different runs in many cases, especially when the function is nonlinear.
Theoretically, if you let it run forever, it will find the global minimum. However, most of us don't want to wait that long. So, there's termination conditions like gtol (change over generations) which sets the cutoff for number of iterations without improvement. There are also solver parameters that effect how the mutation is generated, like cross, scale, and strategy. Essentially, if you get different results for different runs, all that means is that you haven't tuned the optimizer for the particular cost function yet, and should try to play with the settings.
Of importance is the balance between npop and gtol, and that's where I often go first. You want to increase the population of candidates, generally, until it saturates (i.e. doesn't have an effect) or becomes too slow.
If you have other information you can constrain the problem with, that often helps (i.e. use constraints or penalty to restrict your search space).
I also use mystic's visualization tools to try to get an understanding of what the response surface looks like (i.e. visualization and interpolation of log data).
Short answer is, any solver that includes randomness in the algorithm will often need to be tuned before you get consistent results.

Time Complexity of an algorithm in Travelling Salesman?

I'm currently learning about the TSP and want to combine two simple heuristics in one algorithm. It works by using the nearest neighbour algorithm to create a tour and then improving it by using a 2 opt swap for every combination. I believe the number of steps for the 2 opt technique is n(n-1), so O(n) = n^2. However, I don't know how to calculate the complexity for the nearest neighbour algorithm. I likely the think it will result to O(n^2) but I am not certain about the process to get to this.

Standard Errors for Differential Evolution

Is it possible to calculate standard errors for Differential Evolution?
From the Wikipedia entry:
http://en.wikipedia.org/wiki/Differential_evolution
It's not derivative based (indeed that is one of its strengths) but how then so you calculate the standard errors?
I would have thought some kind of bootstrapping strategy might have been applicable but can't seem to find any sources than apply bootstrapping to DE?
Baz
Concerning the standard errors, differential evolution is just like any other evolutionary algorithm.
Using a bootstrapping strategy seems a good idea: the usual formulas assume a normal (Gaussian) distribution for the underlying data. That's almost never true for evolutionary computation (exponential distributions being far more common, probably followed by bimodal distributions).
The simplest bootstrap method involves taking the original data set of N numbers and sampling from it to form a new sample (a resample) that is also of size N. The resample is taken from the original using sampling with replacement. This process is repeated a large number of times (typically 1000 or 10000 times) and for each of these bootstrap samples we compute its mean / median (each of these are called bootstrap estimates).
The standard deviation (SD) of the means is the bootstrapped standard error (SE) of the mean and the SD of the medians is the bootstrapped SE of the median (the 2.5th and 97.5th centiles of the means are the bootstrapped 95% confidence limits for the mean).
Warnings:
the word population is used with different meanings in different contexts (bootstrapping vs evolutionary algorithm)
in any GA or GP, the average of the population tells you almost nothing of interest. Use the mean/median of the best-of-run
the average of a set that is not normally distributed produces a value that behaves non-intuitively. Especially if the probability distribution is skewed: large values in "tail" can dominate and average tends to reflect the typical value of the "worst" data not the typical value of the data in general. In this case it's better the median
Some interesting links are:
A short guide to using statistics in Evolutionary Computation
An Introduction to Statistics for EC Experimental Analysis

How to design acceptance probability function for simulated annealing with multiple distinct costs?

I am using simulated annealing to solve an NP-complete resource scheduling problem. For each candidate ordering of the tasks I compute several different costs (or energy values). Some examples are (though the specifics are probably irrelevant to the question):
global_finish_time: The total number of days that the schedule spans.
split_cost: The number of days by which each task is delayed due to interruptions by other tasks (this is meant to discourage interruption of a task once it has started).
deadline_cost: The sum of the squared number of days by which each missed deadline is overdue.
The traditional acceptance probability function looks like this (in Python):
def acceptance_probability(old_cost, new_cost, temperature):
if new_cost < old_cost:
return 1.0
else:
return math.exp((old_cost - new_cost) / temperature)
So far I have combined my first two costs into one by simply adding them, so that I can feed the result into acceptance_probability. But what I would really want is for deadline_cost to always take precedence over global_finish_time, and for global_finish_time to take precedence over split_cost.
So my question to Stack Overflow is: how can I design an acceptance probability function that takes multiple energies into account but always considers the first energy to be more important than the second energy, and so on? In other words, I would like to pass in old_cost and new_cost as tuples of several costs and return a sensible value .
Edit: After a few days of experimenting with the proposed solutions I have concluded that the only way that works well enough for me is Mike Dunlavey's suggestion, even though this creates many other difficulties with cost components that have different units. I am practically forced to compare apples with oranges.
So, I put some effort into "normalizing" the values. First, deadline_cost is a sum of squares, so it grows exponentially while the other components grow linearly. To address this I use the square root to get a similar growth rate. Second, I developed a function that computes a linear combination of the costs, but auto-adjusts the coefficients according to the highest cost component seen so far.
For example, if the tuple of highest costs is (A, B, C) and the input cost vector is (x, y, z), the linear combination is BCx + Cy + z. That way, no matter how high z gets it will never be more important than an x value of 1.
This creates "jaggies" in the cost function as new maximum costs are discovered. For example, if C goes up then BCx and Cy will both be higher for a given (x, y, z) input and so will differences between costs. A higher cost difference means that the acceptance probability will drop, as if the temperature was suddenly lowered an extra step. In practice though this is not a problem because the maximum costs are updated only a few times in the beginning and do not change later. I believe this could even be theoretically proven to converge to a correct result since we know that the cost will converge toward a lower value.
One thing that still has me somewhat confused is what happens when the maximum costs are 1.0 and lower, say 0.5. With a maximum vector of (0.5, 0.5, 0.5) this would give the linear combination 0.5*0.5*x + 0.5*y + z, i.e. the order of precedence is suddenly reversed. I suppose the best way to deal with it is to use the maximum vector to scale all values to given ranges, so that the coefficients can always be the same (say, 100x + 10y + z). But I haven't tried that yet.
mbeckish is right.
Could you make a linear combination of the different energies, and adjust the coefficients?
Possibly log-transforming them in and out?
I've done some MCMC using Metropolis-Hastings. In that case I'm defining the (non-normalized) log-likelihood of a particular state (given its priors), and I find that a way to clarify my thinking about what I want.
I would take a hint from multi-objective evolutionary algorithm (MOEA) and have it transition if all of the objectives simultaneously pass with the acceptance_probability function you gave. This will have the effect of exploring the Pareto front much like the standard simulated annealing explores plateaus of same-energy solutions.
However, this does give up on the idea of having the first one take priority.
You will probably have to tweak your parameters, such as giving it a higher initial temperature.
I would consider something along the lines of:
If (new deadline_cost > old deadline_cost)
return (calculate probability)
else if (new global finish time > old global finish time)
return (calculate probability)
else if (new split cost > old split cost)
return (calculate probability)
else
return (1.0)
Of course each of the three places you calculate the probability could use a different function.
It depends on what you mean by "takes precedence".
For example, what if the deadline_cost goes down by 0.001, but the global_finish_time cost goes up by 10000? Do you return 1.0, because the deadline_cost decreased, and that takes precedence over anything else?
This seems like it is a judgment call that only you can make, unless you can provide enough background information on the project so that others can suggest their own informed judgment call.