Optimizing a genetic algorithm? - optimization

I've been playing with parallel processing of genetic algorithms to improve performance but I was wondering what some other commonly used techniques are to optimize a genetic algorithm?

Since fitness values are frequently recalculated (the diversity of the population decreases as the algorithm runs), a good strategy to improve the performance of a GA is to reduce the time needed to calculate the fitness.
Details depend on implementation, but previously calculated fitness values can often be
efficiently saved with a hash table. This kind of optimization can drop computation time significantly (e.g. "IMPROVING GENETIC ALGORITHMS PERFORMANCE BY HASHING FITNESS VALUES" - RICHARD J. POVINELLI, XIN FENG reports that the application of hashing to a GA can improve performance by over 50% for complex real-world problems).
A key point is collision management: you can simply overwrite the existing element of the hash table or adopt some sort of scheme (e.g. a linear probe).
In the latter case, as collisions mount, the efficiency of the hash table degrades to that of a linear search. When the cumulative number of collisions exceeds the size of the hash table, a rehash should be performed: you have to create a larger hash table and copy the elements from the smaller hash table to the larger one.
The copy step could be omitted: the diversity decreases as the GA runs, so many of the eliminated elements will not be used and the most frequently used chromosome values will be quickly recalculated (the hash table will fill up again with the most used key element values).

One thing I have done is to limit the number of fitness calculations. For example, where the landscape is not noisy i.e. where a recalculation of fitness would result in the same answer every time, don't recalculate simply cache the answer.
Another approach is to use a memory operator. The operator maintains a 'memory' of solutions and ensures that the best solution in that memory is included in the GA population if it is better than the best in the population. The memory is kept up to date with good solutions during the GA run. This approach can reduce the number of fitness calculations required and increase the performance.
I have examples of some of this stuff here:
http://johnnewcombe.net/blog/gaf-part-8/
http://johnnewcombe.net/blog/gaf-part-3/

This is a very broad question; I suggest to use the R galgo package for this purpose.

Related

When calculating the time complexity of an algorithm can we count the addition of two numbers of any size as requiring 1 "unit" of time or O(1) units?

I am working on analysing the time complexity of an algorithm. I am not certain what the correct way of calculating with the time complexity of basic operations such as addition and subtraction of two numbers is. I have learnt that the time complexity of adding up two n digit numbers is O(n), because this is how many elementary bit operations you need to perform during the addition. However, I have heard recently, that nowadays, in modern processors the time taken by adding up two numbers of any size (which is still managable by a computer) is constant: it does not depend on the size of the two numbers. Hence in the time complexity analysis of an algorithm you should calculate the time complexity of adding up two numbers of any size as O(1). Which approach is correct? Or in case the answer is that both approaches are "correct" used in the appropriate context which approach is more acceptable in a research paper? Thank you for any answer in advance.
It depends on the kind of algorithm you are analyzing, but for the general case you are just going to assume the inputs to the algorithm being analyzed will fit into the word-size of the machine it will be performed on (be that 32 bit, 128 bit, whatever), and under that assumption, where any single arithmetic operation will probably be executed as a single machine instruction and be computed in a single or small constant number of CPU clock cycles regardless of the underlying complexity of the hardware implementation, you will treat the complexity of that operation as being O(1). That is, you would assume O(1) complexity for arithmetic operations unless there's a particular reason to assume that they cannot be handled in constant time.
You would only really break the O(1) assumption if you were specifically designing an algorithm to be performed on numerical inputs of arbitrary precision such that you're planning on actually programmatically computing arithmetic operations yourself rather than passing them off completely to hardware (your algorithm expects overflow/underflows and is designed to handle them), or if you were working down at the level of implementing these operations yourself in an ALU or FPU circuit. Then, whether multiplication is performed in O(n*logn) or O(n*logn*loglogn) time in the number of bits would actually become relevant to your complexity analysis: because the number of bits involved in these operations isn't bounded by some constant or you're specifically analyzing the complexity of an algorithm/piece of hardware which is itself implementing an arithmetic operation.

What is the relationship between time complexity and the number of steps in an algorithm?

For large values of n, an algorithm that takes 20000n^2 steps has better time complexity (takes less time) than one that takes 0.001n^5 steps
I believe this statement is true. But, why?
If there are more steps wouldn't that take more time?
Computational complexity is considered in the asymptotic sense because the important question is usually of scaling. Even with your clear case, the ^5 algorithm begins to take longer around 275 items - which isn't very many. See this figure from wolfram alpha:
Quoting from the wikipedia article linked above:
Usually asymptotic estimates are used because different implementations of the same algorithm may differ in efficiency. However the efficiencies of any two "reasonable" implementations of a given algorithm are related by a constant multiplicative factor called a hidden constant.
All that said, if you have two comparable algorithms and the one with less complexity has a significant constant coefficient and you're only going to process 10 items, then it very well may be a good idea to choose the less efficient one. Some common libraries even switch algorithms depending upon the size of the data being processed; this is called a hybrid algorithm and Python's sorted implementation, Timsort uses it to switch between insertion sort and merge sort.

What makes non linear functions computationally expensive in hardware (e.g. FPGA)?

I've read some articles that state non-linear functions (like exponentials) are computationally expensive.
I was wondering what makes them computationally expensive.
When referring to 'computationally expensive' does it mean in terms of time taken or hardware resources used?
I've tried searching on Google, but I couldn't find any simple explanations for this.
Not pretending to offer the answer, but start with what you have in fpga.
Normally you're limited to adders, multipliers and some memory. What can you do with those?
Linear function - easy, taking just one multiplier and one adder.
Nonlinear functions - what a those? Either polynomial, requiring you to spend a ton of multipliers (the more the higher the polynomial's degree), or even transcendental, requiring you to find some satisfactory approximation, doing that in many steps.
Even simple integer division can't be done in one clock, in simple implementations requiring as many steps as there's bits in the numbers being divided.
The other possible solution is to use a lookup tables. And it's great for a small range of arguments. But if you want to have the function values found in wide range of arguments, or with greater precision, you'll end up with lookup table that is so large that can't fit in the device you have to work with.
So that's the main costs - you'll spend lots of dedicated hardware resources (multipliers, memory for lookup tables), or spend lots of time in many-steps approximation algorithms, or algorithms that refine the results one "digit" per iteration (integer division, CORDIC, etc).

Except for speed and resource usage, are there any other criteria that two algorithms can compete about?

I intend to race two algorithms and evaluate them. Ignoring developer hindrances such as complexity and deployment difficulties, are there any other criteria which I can test the algorithms against?
By speed I mean the fastest algorithm to return a successful
result.
By resources I mean computational power, memory and storage.
Please note that the algorithms in questions are in fact genetic algorithms. Precisely, a parallel genetic algorithm over a distributed network against a local non-distributed genetic algorithm. So results will differ with each run.
Further criteria might be:
- influence of compiler / optimisation flags
- cpu architecture dependence
For speed you should keep in mind that is can vary from run to run. Often the first is the slowest. A meassure like average of fastest 3 execution time from 10000 might help.

Is flop per second a measure of the speed of a processor, or a measure of the speed of an algorithm?

1) I can see very clearly that: the number of floating point operations a computer can do in one second is a good way of quantifying its performance. That's correct, right?
2) My teacher keeps asking me to calculate the flop rate for algorithms I program. I do this by calculating how many flops the algorithm does and timing how long it takes to run. In this situation the flop rate always falls way short of the flop rate I expect from the computer I'm using. So for algorithms, is a flop rate more an assessment of how long the 'other stuff' takes (i.e. overheads, stuff that doesn't involve flopping). That is, when the flop count is low, most of the programs time is spent calling functions etc. and not performing flop, correct?
I know this is a very broad question but I was hoping for some ideas from those in industry or academia about what they intuitively feel the flop rate of an algorithm actually is.
Properly, “flops” is a measure of processor or system performance. Many people misuse it as a measure of implementation or algorithm speed.
Suppose you had a computation to perform that is fixed in the number of operations it takes. For example, you want to multiply a matrix with dimensions a•b with a matrix with dimensions b•c. If you perform this multiplication in the usual way, then, in each combination of one of a rows and one of c columns, you perform b multiplications and b-1 additions. So the entire matrix multiplication takes a•c•(2b-1) floating-point operations. If it finishes in one second, some people say it is providing a•c•(2b-1) flops.
If you have two programs that both do the multiplication the same way, you can compare them using this figure. The one of them that has more “flops” is better. Even though they use the same algorithm, one of them might have a better implementation, perhaps because it organizes the work more efficiently for memory cache.
This breaks when somebody figures out a new algorithm that gets the same job done with fewer operations. Then some people compare programs (or routines) using the nominal number of operations of the original method, even though the program actually performs fewer operations.
To some extent, this makes sense. If you have two programs that do the same job, and one of them has a higher number of “flops” calculated this way, then it is the program that gives you the answer more quickly.
However, it does not make sense to the extent that it introduces inaccuracy. We are often not interested in a single problem size but in various sizes, and the “flops” of a program will not scale linearly with the nominal number of operations once a new algorithm is used.
By analogy, suppose it is 80 kilometers from town A to town B over the mountain road that everybody uses. If it takes your car an hour to make the trip, your car is traveling 80 kilometers an hour. While out exploring one day, you discover a pass through the mountains that reduces the trip to 70 kilometers. Now you can make the trip in 52.5 minutes. The same calculation that some people do with “flops” would say your car is going 91.4 kilometers per hour, since it makes the 80-kilometer trip in 52.5 minutes.
That is obviously wrong. However, it is useful for deciding which route to take.
FLOPS means the amount of Floating Point Operations Per Second, executed by a processor. That can be a purely theoretical figure derived from some hardware/architecture specification or an empirical result from running some algorithm that is tuned to give high numbers.
The main issue in FLOPS calculation comes from a system, where there are multiple and parallel execution blocks. AFAIK, only in that context it starts to get really tough to split a practical algorithm (e.g. FFT, or RGB->YUV conversion) to the most useful set of instructions, that use all the calculation units in a CPU. (e.g. without automatic vectorization a x64 system often calculates Floating point operations only in the Xmm0[0] register, wasting 50-75% of the full potential.)
This partly answers the question 2. Besides of the obvious stall introduced by cache/memory to register bandwidth, the next crucial obstacle in the way to maximum FLOPS figures is that the data is in the wrong register. That's something that is often completely ignored in complexity analysis that just like FLOPS calculations only count basic arithmetic operations. In case of parallel programming, it often happens, that there are not only one, but 4, 8 or 16 values in wrong registers without any means of easily permuting them all at once. Add that to the overhead, "warm up" and "cool down" stages in an algorithm that tries to occupy all the calculating units with meaningful data and there you have major reasons for getting 100 MFlops out of a 1GFLOPS system.