distribution of population in genetic algorithms - optimization

My questions is ,if there are genetic optimization algorithms where the population keeps i.i.d (independ identically distributed) during all iterations. The most common ones like NSGA2 or SPEA2 mix the current population with the previous one so that mixed population is no longer iid. But are there algorithms where the distribution of the population changes during optimization but still remains i.i.d?

You can try fitness uniform selection https://arxiv.org/abs/cs/0103015.
But, IMHO the results won't be very good.

Related

Population size in Fast Messy Genetic Algorithm

I'm trying to implement the Fast Messy GA using the paper by Goldberg, Deb, Kargupta Harik: fmGA - Rapid Accurate Optimization of Difficult Problems using Fast Messy Genetic Algorithms.
I'm stuck with the formula about the initial population size to account for the Building Block evaluation noise:
The sub-functions here are m=10 order-3(k=3) deceptive functions:
l=30, l'=27 and B is signal-to-noise ratio which is the ratio of the fitness deviation to the difference between the best and second best fitness value(30-28=2). Fitness deviation according to the table above is sqrt(155).
However in the paper they say using 10 order-3 subfunctions and using the equation must give you population size 3,331 but after substitution I can't reach it since I am not sure what is the value of c(alpha).
Any help will be appreciated. Thank you
I think I've figured it out what exactly is c(alpha). At least the graph drawing it against alpha looks exactly the same as in the paper. It seems by the square of the ordinate they mean the square of the Z-score found by Inverse Normal Random Distribution using alpha as the right-tail area. At first I was missleaded that after finding the Z-score it should be substituted in the Normal Random Distribution equation to fight the height(ordinate).
There is some implementation in Lua here https://github.com/xenomeno/GA-Messy for the interested folks. However the Fast Messy GA has some problems reproducing the figures from the original Goldberg's paper which I am not sure how to fix but these is another matter.

Genetic algorithm - find max of minimized subsets

I have a combinatorial optimization problem for which I have a genetic algorithm to approximate the global minima.
Given X elements find: min f(X)
Now I want to expand the search over all possible subsets and to find the one subset where its global minimum is maximal compared to all other subsets.
X* are a subset of X, find: max min f(X*)
The example plot shows all solutions of three subsets (one for each color). The black dot indicates the highest value of all three global minima.
image: solutions over three subsets
The main problem is that evaluating the fitness between subsets runs agains the convergence of the solution within a subset. Further the solution is actually a local minimum.
How can this problem be generally described? I couldn't find a similar problem in the literature so far. For example if its solvable with a multi-object genetic algorithm.
Any hint is much appreciated.
While it may not always provide exactly the highest minima (or lowest maxima), a way to maintain local optima with genetic algorithms consists in implementing a niching method. These are ways to maintain population diversity.
For example, in Niching Methods for Genetic Algorithms by Samir W. Mahfoud 1995, the following sentence can be found:
Using constructed models of fitness sharing, this study derives lower bounds on the population size required to maintain, with probability gamma, a fixed number of desired niches.
If you know the number of niches and you implement the solution mentioned, you could theoretically end up with the local optima you are looking for.

Does translating the genes in a chromosome for a genetic algorithm for a combinatorial function increase the diversity of candidates?

I'm new to genetic algorithms and am writing code for the Traveling Salesman problem. I'm using cycle crossover to generate new offspring and I've found that this leads to some of the offspring retaining the same exact phenotype as one parent even when the two parents are different. Would translating the chromosomes avoid this?
By translate I mean a chromosome with phenotype ABCDE shifting over two to DEABC. They would be equivalent answers and have equal fitness, but might make more diverse offspring.
Is this worth it in the long run, or is it just wasting computing time?
Cycle crossover (CX) is based on the assumption that it's important to preserve the absolute position of cities (a city preferably inherits its position from either parent) and the preventive "translation" is against the spirit of CX.
Anyway multiple studies (e.g. 1) have shown that for TSP the key is to preserve the relative position of cities and the edges.
So it could work, but you have to experiment. Some form of mutation is another possibility.
Probably, if the characteristics of CX aren't satisfying, a different crossover operator is a better choice: staying with simple operators, one of the most successful is Order Crossover (e.g. 2).
L. Darrell Whitley, Timothy Starkweather, D'Ann Fuquay - Scheduling problems and traveling salesmen: The genetic edge recombination operator - 1989.
Pablo Moscato - On Genetic Crossover Operators for Relative Order Preservation.

How to estimate the Scoring Scheme in Pairwise Alignment

I'm not specialist in bioinformatics. I want to align two nucleotide sequences using a global alignment method. Each sequence is a combinations of the {A,C,T,G} letters.
The problem is that I don't know how to choose the best scoring scheme (substations and gap penalties).
Currently, I'm using the values +1,-1,-2 for match , mismatch and gap penalty. And I'm aware that ; the number of transitions in human DNA is larger than the number of transversions.
My question is how to estimate the penalties for (match , mismatch and gap) based on my dataset. Is there any statistical model can help?
If we need to answer to this question we need to know the dataset well and your scope exactly,but generally for match/mismatch we may represent as +1/-1 this does not include (transversion and transition).
For I advice you to take a look on this model and Kimura
Finally for penalty, you may use "low, medium, and high" penalty according to the divergent the sequences,I mean If the organisms is closely related then you may use the low gap penalty, and the high penalty for the more divergent organisms, so the gap penalty depends on how divergent the sequences are that you are aligning.
If we need to know if the sequences is divergent or not, as I said it's depends and different according to your data, but you may take a look on these examples about some sequences : link1, link2, link3, link4, and link5

Standard Errors for Differential Evolution

Is it possible to calculate standard errors for Differential Evolution?
From the Wikipedia entry:
http://en.wikipedia.org/wiki/Differential_evolution
It's not derivative based (indeed that is one of its strengths) but how then so you calculate the standard errors?
I would have thought some kind of bootstrapping strategy might have been applicable but can't seem to find any sources than apply bootstrapping to DE?
Baz
Concerning the standard errors, differential evolution is just like any other evolutionary algorithm.
Using a bootstrapping strategy seems a good idea: the usual formulas assume a normal (Gaussian) distribution for the underlying data. That's almost never true for evolutionary computation (exponential distributions being far more common, probably followed by bimodal distributions).
The simplest bootstrap method involves taking the original data set of N numbers and sampling from it to form a new sample (a resample) that is also of size N. The resample is taken from the original using sampling with replacement. This process is repeated a large number of times (typically 1000 or 10000 times) and for each of these bootstrap samples we compute its mean / median (each of these are called bootstrap estimates).
The standard deviation (SD) of the means is the bootstrapped standard error (SE) of the mean and the SD of the medians is the bootstrapped SE of the median (the 2.5th and 97.5th centiles of the means are the bootstrapped 95% confidence limits for the mean).
Warnings:
the word population is used with different meanings in different contexts (bootstrapping vs evolutionary algorithm)
in any GA or GP, the average of the population tells you almost nothing of interest. Use the mean/median of the best-of-run
the average of a set that is not normally distributed produces a value that behaves non-intuitively. Especially if the probability distribution is skewed: large values in "tail" can dominate and average tends to reflect the typical value of the "worst" data not the typical value of the data in general. In this case it's better the median
Some interesting links are:
A short guide to using statistics in Evolutionary Computation
An Introduction to Statistics for EC Experimental Analysis