What is the role of crossover in genetic algorithms? - evolutionary-algorithm

I'm not entirely familiar with all the terminology, so please excuse any errors on that part. I decided to write a program that uses an evolutionary (?) algorithm to go from a random string of characters to a target string.
It works fine, but I noticed something that I don't understand. If I use 'asexual' reproduction, i.e. each string clones itself and then mutates with some probability, the convergence to the solution is much slower, and sometimes doesn't converge at all, but gets trapped, at say an average fitness of ~0.8. This doesn't make sense to me, since the mutation should make it trend towards the optimum, right?
However, if I instead use crossover, e.g. I pick two parents and do a uniform mix of characters and then like normal mutate the child with some probability the convergence is not only practically guaranteed, it is also orders of magnitude faster. I don't think this can explained solely by the fact that the child gets to "mutate more" since both its parents are also mutated in the previous generation.
Could someone please explain the role of crossover, and why it allows for much faster convergence in algorithms such as these?

The role of crossover in genetic algorithms is to spread the beneficial mutations of population members that let them achieve a high fitness value to more members of the population.
The key is that only the fit members of the population crossover with each other and thereby spread beneficial mutations to more members of the population with the intent that the offespring (/new members) might also benefit from the mutation of the parent. Thereby crossover leads to a spreading of mutations that lead to a higher fitness value spread throughout the population - leading to your observed faster convergence.
Take for example a neural network that plays a jump and run video game and is optimized with a genetic algorithm. One member might have received a high fitness score through a mutation that makes him jump over obstacles. Another member might have received a high fitness score through a mutation that makes him collect lifes on the map. Crossing over fit parent members spreads the beneficial mutations faster than if the rest of the population would need to learn these fitness increasing mutations itself by chance.

Related

Genetic Algorithm: Need some clarification on selection and what to do when crossover doesn't happen

I'm writing a genetic algorithm to minimize a function. I have two questions, one in regards to selection and the other with regards to crossover and what to do when it doesn't happen.
Here's an outline of what I'm doing:
while (number of new population < current population)
# Evaluate all fitnesses and give them a rank. Choose individual based on rank (wheel roulette) to get first parent.
# Do it again to get second parent, ensuring parent1 =/= parent2
# Elitism (do only once): choose the fittest individual and immediately copy to new generation
Multi-point crossover: 50% chance
if (crossover happened)
do single point mutation on child (0.75%)
else
pick random individual to be copied into new population.
end
And all of this is under another while loop which tracks fitness progression and number of iterations, which I didn't include. So, my questions:
As you can see, two parents are chosen randomly in each
iteration until the new population is filled up. So, the two same
parents may mate more than once and surely several fit parents will
mate many more times than once. Is this in any way bad?
In the obitko tutorial, it says if crossover doesn't
happen, then child is exact copy of parents. I don't even understand
what that means, so, as you can see, I just picked a random parent
(uniformly; no fitness considered) and copied to new population.
This seems weird to me. Whether I actually do this or not, my results really don't change that much. What's the proper way to handle the case
when crossover doesn't happen?
Some parents having several offspring is common; I'd even say this is the default practice (and consider biological evolution, where precisely this is one of the main ingredients).
"If crossover doesn't happen, then child is exact copy of parents"
That is a bit confusing. Crossover (well explained in your link) means taking some genes from one parent and some from the other. This is called sexual reproduction and requires two (or more?) parents.
But asexual reproduction is also possible. In this case, you simply take one parent and mutate its genome in the new individual. This is almost what you were attempting, but you are missing the important mutation step (note mutations can be very aggressive or very conservative!)
Note that asexual reproduction requires mutation after copying the genome to create diversity, while in sexual reproduction this is an optional step.
It is fine to use either type of reproduction, or a mix of them. By the way: in some problems genes might not always have the same size. Sexual reproduction is problematic in this case. If you are interested in this problem, take a look at the NEAT algorithm, a popular neuroevolution algorithm designed to address this (wiki and paper).
Finally, elitism (copying the best-performing individuals to the next generation) is common, but it may be problematic. Genetic algorithms often stall in sub-optimal solutions (called local maxima, where any changes decrease fitness). Elitism can contribute to this problem. Of course, the opposite problem is too much diversity being similar to random search, so you need to find the right balance.
I don't see anything wrong with the same individual being the parent of more than one child per generation. It can only affect your diversity a little bit. If you don't like this, or find a real lack of diversity at the final generations, you can actually flag the individual so it cannot be parent more than once per generation.
I actually don't fully agree with the tutorial, I think after you have selected the individuals that will become parents (based on their fitness, of course) you should actually perform the crossover. Otherwise you will be cloning a lot of individuals to the next generation.

what is the right way to crossover when using GA to get minimum of one variable function,like sin(x)^2

I am encoding the interval [x:y] to binary codes like 10101111, so for population, it is like [[1,0,1,1],[0,1,0,1]].
I defined the fitness function directly using the value of the function (sin(x)^2).
For selection, i am using tournament selection and for crossover, only simple exchange part of the chromosome like this: 1(10)0 and 0(01)1 -> 1(01)0 and 0(10)1.
For mutation, using Bit inversion.
The algorithm kind of works, it can generate the global minimum sometimes, and sometimes local ones. but I don't see the function of crossover in this problem, because the feature of the 'x' is being broken every time (i think), I don't know why, and if it is even right way to code the crossover or maybe the encoding part.
I'm afraid that there isn't a "right way" to crossover.
There are many crossover operator (e.g. Comparison of a Crossover Operator in Binary-coded Genetic Algorithms - STJEPAN PICEK, MARIN GOLUB) that can be used in binary coded genetic algorithm, but:
depending on the properties of a problem one or another crossover operator will have better result.
every crossover operator has its advantages and downfalls, so choosing one ultimately represents the question of someone's requirements and experiments
undergone.
in many situations uniform and two-point crossover are good choices.
Crossover is the major exploratory mechanism of the genetic algorithm, but the driving force behind GA is the cooperation between selection, crossover and mutation (mutation prevents convergence of the population and introduces variation).
Usually a mutation-only approach doesn't have enough exploration strength to reach to the minimum and the success is largely due to distribution of solutions in the initial population.
For continuous function optimization you should also check differential evolution.

Dynamic number of test cases in genetic programming?

When looking at Genetic programming papers, it seems to me that the number of test cases is always fixed. However, most mutations should (?) at every stage of the execution be very deleterious, i. e. make it obvious after one test case that the mutated program performs much worse than the previous one. What happens if you, at first, only try very few (one?) test case and look whether the mutation makes any sense?
Is it maybe so that different test cases test for different features of the solutions, and one mutation will probably improve only one of those features?
I don't know if I agree with your assumption that most mutations should be very deleterious, but you shouldn't care even if they were. Your goal is not to optimize the individuals, but to optimize the population. So trying to determine if a "mutation makes any sense" is exactly what genetic programming is supposed to do: i.e. eliminate mutations that "don't make sense." Your only "guidance" for the algorithm should come through the fitness function.
I'm also not sure what you mean with "test case", but for me it sounds like you are looking for something related to multi-objective-optimization (MOO). That means you try to optimize a solution regarding different aspects of the problem - therefore you do not need to mutate/evaluate a population for a specific test-case, but to find a multi objective fitness function.
"The main idea in MOO is the notion of Pareto dominance" (http://www.gp-field-guide.org.uk)
I think this is a good idea in theory but tricky to put into practice. I can't remember seeing this approach actually used before but I wouldn't be surprised if it has.
I presume your motivation for doing this is to improve the efficiency of the applying the fitness function - you can stop evaluation early and discard the individual (or set fitness to 0) if the tests look like they're going to be terrible.
One challenge is to decide how many test cases to apply; discarding an individual after one random test case is surely not a good idea as the test case could be a real outlier. Perhaps terminating evaluation after 50% of test cases if the fitness of the individual was <10% of the best would probably not discard any very good individuals; on the other hand it might not be worth it given a lot of individuals will be of middle-of-the road fitness and might well only save a small proportion of the computation. You could adjust the numbers so you save more effort, but the more effort you try to save the more chances you have of genuinely good individuals being discarded by accident.
Factor in the extra time to taken to code this and possible bugs etc. and I shouldn't think the benefit would be worthwhile (unless this is a research project in which case it might be interesting to try it and see).
I think it's a good idea. Fitness evaluation is the most computational intense process in GP, so estimating the fitness value of individuals in order to reduce the computational expense of actually calculating the fitness could be an important optimization.
Your idea is a form of fitness approximation, sometimes it's called lazy evaluation (try searching these words, there are some research papers).
There are also distinct but somewhat overlapping schemes, for instance:
Dynamic Subset Selection (Chris Gathercole, Peter Ross) is a method to select a small subset of the training data set on which to actually carry out the GP algorithm;
Segment-Based Genetic Programming (Nailah Al-Madi, Simone Ludwig) is a technique that reduces the execution time of GP by partitioning the dataset into segments and using the segments in the fitness evaluation process.
PS also in the Brood Recombination Crossover (Tackett) child programs are usually evaluated on a restricted number of test cases to speed up the crossover.

What model best suits optimizing for a real-time strategy game?

An article has been making the rounds lately discussing the use of genetic algorithms to optimize "build orders" in StarCraft II.
http://lbrandy.com/blog/2010/11/using-genetic-algorithms-to-find-starcraft-2-build-orders/
The initial state of a StarCraft match is pre-determined and constant. And like chess, decisions made in this early stage of the match have long-standing consequences to a player's ability to perform in the mid and late game. So the various opening possibilities or "build orders" are under heavy study and scrutiny. Until the circulation of the above article, computer-assisted build order creation probably wasn't as popularity as it has been recently.
My question is... Is a genetic algorithm really the best way to model optimizing build orders?
A build order is a sequence of actions. Some actions have prerequisites like, "You need building B before you can create building C, but you can have building A at any time." So a chromosome may look like AABAC.
I'm wondering if a genetic algorithm really is the best way to tackle this problem. Although I'm not too familiar with the field, I'm having a difficult time shoe-horning the concept of genes into a data structure that is a sequence of actions. These aren't independent choices that can be mixed and matched like a head and a foot. So what value is there to things like reproduction and crossing?
I'm thinking whatever chess AIs use would be more appropriate since the array of choices at any given time could be viewed as tree-like in a way.
Although I'm not too familiar with the field, I'm having a difficult time shoe-horning the concept of genes into a data structure that is a sequence of actions. These aren't independent choices that can be mixed and matched like a head and a foot. So what value is there to things like reproduction and crossing?
Hmm, that's a very good question. Perhaps the first few moves in Starcraft can indeed be performed in pretty much any order, since contact with the enemy is not as immediate as it can be in Chess, and therefore it is not as important to remember the order of the first few moves as it is to know which of the many moves are included in those first few. But the link seems to imply otherwise, which means the 'genes' are indeed not all that amenable to being swapped around, unless there's something cunning in the encoding that I'm missing.
On the whole, and looking at the link you supplied, I'd say that genetic algorithms are a poor choice for this situation, which could be accurately mathematically modelled in some parts and the search tree expanded out in others. They may well be better than an exhaustive search of the possibility space, but may not be - especially given that there are multiple populations and poorer ones are just wasting processing time.
However, what I mean by "a poor choice" is that it is inefficient relative to a more appropriate approach; that's not to say that it couldn't still produce 98% optimal results in under a second or whatever. In situations such as this where the brute force of the computer is useful, it is usually more important that you have modelled the search space correctly than to have used the most effective algorithm.
As TaslemGuy pointed out, Genetic Algorithms aren't guaranteed to be optimal, even though they usually give good results.
To get optimal results you would have to search through every possible combination of actions until you find the optimal path through the tree-like representation. However, doing this for StarCraft is difficult, since there are so many different paths to reach a goal. In chess you move a pawn from e2 to e4 and then the opponent moves. In StarCraft you can move a unit at instant x or x+1 or x+10 or ...
A chess engine can look at many different aspects of the board (e.g. how many pieces does it have and how many does the opponent have), to guide it's search. It can ignore most of the actions available if it knows that they are strictly worse than others.
For a build-order creator only time really matters. Is it better to build another drone to get minerals faster, or is it faster to start that spawning pool right away? Not as straightforward as with chess.
These kinds of decisions happen pretty early on, so you will have to search each alternative to conclusion before you can decide on the better one, which will take a long time.
If I were to write a build-order optimizer myself, I would probably try to formulate a heuristic that estimates how good (close the to the goal state) the current state is, just as chess engines do:
Score = a*(Buildings_and_units_done/Buildings_and_units_required) - b*Time_elapsed - c*Minerals - d*Gas + e*Drone_count - f*Supply_left
This tries to keep the score tied to the completion percentage as well as StarCraft common knowledge (keep your ressources low, build drones, don't build more supply than you need). The variables a to f would need tweaking, of course.
After you've got a heuristic that can somewhat estimate the worth of a situation, I would use Best-first search or maybe IDDFS to search through the tree of possibilities.
Edit:
I recently found a paper that actually describes build order optimization in StarCraft, in real time even. The authors use depth-first search with branch and bound and heuristics that estimate the minimum amount of effort required to reach the goal based on the tech tree (e.g. zerglings need a spawning pool) and the time needed to gather the required minerals.
Genetic Algorithm can be, or can sometimes not be, the optimal or non-optimal solution. Based on the complexity of the Genetic Algorithm, how much mutation there is, the forms of combinations, and how the chromosomes of the genetic algorithm is interpreted.
So, depending on how your AI is implemented, Genetic Algorithms can be the best.
You are looking at a SINGLE way to implement genetic algorithms, while forgetting about genetic programming, the use of math, higher-order functions, etc. Genetic algorithms can be EXTREMELY sophisticated, and by using clever combining systems for crossbreeding, extremely intelligent.
For instance, neural networks are optimized by genetic algorithms quite often.
Look up "Genetic Programming." It's similar, but uses tree-structures instead of lines of characters, which allows for more complex interactions that breed better. For more complex stuff, they typically work out better.
There's been some research done using hierarchical reinforcement learning to build a layered ordering of actions that efficiently maximizes a reward. I haven't found much code implementing the idea, but there are a few papers describing MAXQ-based algorithms that have been used to explicitly tackle real-time strategy game domains, such as this and this.
This Genetic algorithm only optimizes the strategy for one very specific part of the game: The order of the first few build actions of the game. And it has a very specific goal as well: To have as many roaches as quickly as possible.
The only aspects influencing this system seem to be (I'm no starcraft player):
build time of the various units and
buildings
allowed units and buildings given the available units and buildings
Larva regeneration rate.
This is a relatively limited, relatively well defined problem with a large search space. As such it is very well suited for genetic algorithms (and quite a few other optimization algorithm at that). A full gene is a specific set of build orders that ends in the 7th roach. From what I understand you can just "play" this specific gene to see how fast it finishes, so you have a very clear fitness test.
You also have a few nice constraints on the build order, so you can combine different genes slightly smarter than just randomly.
A genetic algorithm used in this way is a very good tool to find a more optimal build order for the first stage of a game of starcraft. Due to its random nature it is also good at finding a surprising strategy, which might have been an additional goal of the author.
To use a genetic algorithm as the algorithm in an RTS game you'd have to find a way to encode reactions to situations rather than just plain old build orders. This also involves correctly identifying situations which can be a difficult task in itself. Then you'd have to let these genes play thousands of games of starcraft, against each other and (possibly) against humans, selecting and combining winners (or longer-lasting losers). This is also a good application of genetic algorithms, but it involves solving quite a few very hard problems before you even get to the genetic algorithm part.

Difference between Gene Expression Programming and Cartesian Genetic Programming

Something pretty annoying in evolutionary computing is that mildly different and overlapping concepts tend to pick dramatically different names. My latest confusion because of this is that gene-expression-programming seems very similar to cartesian-genetic-programming.
(how) Are these fundamentally different concepts?
I've read that indirect encoding of GP instructions is an effective technique ( both GEP and CGP do that ). Has there been reached some sort of consensus that indirect encoding has outdated classic tree bases GP?
Well, it seems that there is some difference between gene expression programming (GEP) and cartesian genetic programming (CGP or what I view as classic genetic programming), but the difference might be more hyped up than it really ought to be. Please note that I have never used GEP, so all of my comments are based on my experience with CGP.
In CGP there is no distinction between genotype and a phenotype, in other words- if you're looking at the "genes" of a CGP you're also looking at their expression. There is no encoding here, i.e. the expression tree is the gene itself.
In GEP the genotype is expressed into a phenotype, so if you're looking at the genes you will not readily know what the expression is going to look like. The "inventor" of GP, Cândida Ferreira, has written a really good paper and there are some other resources which try to give a shorter overview of the whole concept.
Ferriera says that the benefits are "obvious," but I really don't see anything that would necessarily make GEP better than CGP. Apparently GEP is multigenic, which means that multiple genes are involved in the expression of a trait (i.e. an expression tree). In any case, the fitness is calculated on the expressed tree, so it doesn't seem like GEP is doing anything to increase the fitness. What the author claims is that GEP increases the speed at which the fitness is reached (i.e. in fewer generations), but frankly speaking you can see dramatic performance shifts from a CGP just by having a different selection algorithm, a different tournament structure, splitting the population into tribes, migrating specimens between tribes, including diversity into the fitness, etc.
Selection:
random
roulette wheel
top-n
take half
etc.
Tournament Frequency:
once per epoch
once per every data instance
once per generation.
Tournament Structure:
Take 3, kill 1 and replace it with the child of the other two.
Sort all individuals in the tournament by fitness, kill the lower half and replace it with the offspring of the upper half (where lower is worse fitness and upper is better fitness).
Randomly pick individuals from the tournament to mate and kill the excess individuals.
Tribes
A population can be split into tribes that evolve independently of each-other:
Migration- periodically, individual(s) from a tribe would be moved to another tribe
The tribes are logically separated so that they're like their own separate populations running in separate environments.
Diversity Fitness
Incorporate diversity into the fitness, where you count how many individuals have the same fitness value (thus are likely to have the same phenotype) and you penalize their fitness by a proportionate value: the more individuals with the same fitness value, the more penalty for those individuals. This way specimens with unique phenotypes will be encouraged, therefore there will be much less stagnation of the population.
Those are just some of the things that can greatly affect the performance of a CGP, and when I say greatly I mean that it's in the same order or greater than Ferriera's performance. So if Ferriera didn't tinker with those ideas too much, then she could have seen much slower performance of the CGPs... especially if she didn't do anything to combat stagnation. So I would be careful when reading performance statistics on GEP, because sometimes people fail to account for all of the "optimizations" available out there.
There seems to be some confusion in these answers that must be clarified. Cartesian GP is different from classic GP (aka tree-based GP), and GEP. Even though they share many concepts and take inspiration from the same biological mechanisms, the representation of the individuals (the solutions) varies.
In CGPthe representation (mapping between genotype and phenotype) is indirect, in other words, not all of the genes in a CGP genome will be expressed in the phenome (a concept also found in GEP and many others). The genotypes can be coded in a grid or array of nodes, and the resulting program graph is the expression of active nodes only.
In GEP the representation is also indirect, and similarly not all genes will be expressed in the phenotype. The representation in this case is much different from treeGP or CGP, but the genotypes are also expressed into a program tree. In my opinion GEP is a more elegant representation, easier to implement, but also suffers from some defects like: you have to find the appropriate tail and head size which is problem specific, the mnltigenic version is a bit of a forced glue between expression trees, and finally it has too much bloat.
Independently of which representation may be better than the other in some specific problem domain, they are general purpose, can be applied to any domain as long as you can encode it.
In general, GEP is simpler from GP. Let's say you allow the following nodes in your program: constants, variables, +, -, *, /, if, ...
For each of such nodes with GP you must create the following operations:
- randomize
- mutate
- crossover
- and probably other genetic operators as well
In GEP for each of such nodes only one operation is needed to be implemented: deserialize, which takes array of numbers (like double in C or Java), and returns the node. It resembles object deserialization in languages like Java or Python (the difference is that deserialization in programming languages uses byte arrays, where here we have arrays of numbers). Even this 'deserialize' operation doesn't have to be implemented by the programmer: it can be implemented by a generic algorithm, just like it's done in Java or Python deserialization.
This simplicity from one point of view may make searching of best solution less successful, but from other side: requires less work from programmer and simpler algorithms may execute faster (easier to optimize, more code and data fits in CPU cache, and so on). So I would say that GEP is slightly better, but of course the definite answer depends on problem, and for many problems the opposite may be true.