Population size in Fast Messy Genetic Algorithm - optimization

I'm trying to implement the Fast Messy GA using the paper by Goldberg, Deb, Kargupta Harik: fmGA - Rapid Accurate Optimization of Difficult Problems using Fast Messy Genetic Algorithms.
I'm stuck with the formula about the initial population size to account for the Building Block evaluation noise:
The sub-functions here are m=10 order-3(k=3) deceptive functions:
l=30, l'=27 and B is signal-to-noise ratio which is the ratio of the fitness deviation to the difference between the best and second best fitness value(30-28=2). Fitness deviation according to the table above is sqrt(155).
However in the paper they say using 10 order-3 subfunctions and using the equation must give you population size 3,331 but after substitution I can't reach it since I am not sure what is the value of c(alpha).
Any help will be appreciated. Thank you

I think I've figured it out what exactly is c(alpha). At least the graph drawing it against alpha looks exactly the same as in the paper. It seems by the square of the ordinate they mean the square of the Z-score found by Inverse Normal Random Distribution using alpha as the right-tail area. At first I was missleaded that after finding the Z-score it should be substituted in the Normal Random Distribution equation to fight the height(ordinate).
There is some implementation in Lua here https://github.com/xenomeno/GA-Messy for the interested folks. However the Fast Messy GA has some problems reproducing the figures from the original Goldberg's paper which I am not sure how to fix but these is another matter.

Related

Genetic algorithm - find max of minimized subsets

I have a combinatorial optimization problem for which I have a genetic algorithm to approximate the global minima.
Given X elements find: min f(X)
Now I want to expand the search over all possible subsets and to find the one subset where its global minimum is maximal compared to all other subsets.
X* are a subset of X, find: max min f(X*)
The example plot shows all solutions of three subsets (one for each color). The black dot indicates the highest value of all three global minima.
image: solutions over three subsets
The main problem is that evaluating the fitness between subsets runs agains the convergence of the solution within a subset. Further the solution is actually a local minimum.
How can this problem be generally described? I couldn't find a similar problem in the literature so far. For example if its solvable with a multi-object genetic algorithm.
Any hint is much appreciated.
While it may not always provide exactly the highest minima (or lowest maxima), a way to maintain local optima with genetic algorithms consists in implementing a niching method. These are ways to maintain population diversity.
For example, in Niching Methods for Genetic Algorithms by Samir W. Mahfoud 1995, the following sentence can be found:
Using constructed models of fitness sharing, this study derives lower bounds on the population size required to maintain, with probability gamma, a fixed number of desired niches.
If you know the number of niches and you implement the solution mentioned, you could theoretically end up with the local optima you are looking for.

distribution of population in genetic algorithms

My questions is ,if there are genetic optimization algorithms where the population keeps i.i.d (independ identically distributed) during all iterations. The most common ones like NSGA2 or SPEA2 mix the current population with the previous one so that mixed population is no longer iid. But are there algorithms where the distribution of the population changes during optimization but still remains i.i.d?
You can try fitness uniform selection https://arxiv.org/abs/cs/0103015.
But, IMHO the results won't be very good.

LDPC behaviour as density of parity-check matrix increases

My assignment is to implement a Loopy Belief Propagation algorithm for Low-density Parity-check Code. This code uses a parity-check matrix H which is rather sparse (say 750-by-1000 binary matrix with an average of about 3 "ones" per each column). The code to generate the parity-check matrix is taken from here
Anyway, one of the subtasks is to check the reliability of LDPC code when the density of the matrix H increases. So, I fix the channel at 0.5 capacity, fix my code speed at 0.35 and begin to increase the density of the matrix. As the average number of "ones" in a column goes from 3 to 7 in steps of 1, disaster happens. With 3 or 4 the code copes perfectly well. With higher density it begins to fail: not only does it sometimes fail to converge, it oftentimes converges to the wrong codeword and produces mistakes.
So my question is: what type of behaviour is expected of an LDPC code as its sparse parity-check matrix becomes denser? Bonus question for skilled mind-readers: in my case (as the code performance degrades) is it more likely because the Loopy Belief Propagation algo has no guarantee on convergence or because I made a mistake implementing it?
After talking to my TA and other students I understand the following:
According to Shannon's theorem, the reliability of the code should increase with the density of the parity check matrix. That is simply because more checks are made.
However, since we use Loopy Belief Propagation, it struggles a lot when there are more and more edges in the graph forming more and more loops. Therefore, the actual performance degrades.
Whether or not I made a mistake in my code based solely on this behaviour cannot be established. However, since my code does work for sparse matrices, it is likely that the implementation is fine.

Monte Carlo Integration

Does anyone have any ideas how to implement a monte carlo integration simulator in vb.net.
I have looked around the internet with no luck.
Any code, or ideas as to how to start it would be of help.
Well i guess we are talking about a 2 dimensional problem. I assume you have a polygon of which you want to calculate the area.
1) First you need a function to check if a point is inside the polygon.
2) Now you define an area with a known size around the polygon.
3) Now you need random points inside your known area, some of them will be in your polygon, some will be outside, count them!
4) Now you have two relations: First the relations of all points to points inside your polygon. Second the area around your polygon which you know, to the area of the polygon you don't know.
5) The relations is the same --> you can calculate the area of your polygon! (Area of polygon should be: points in you polygon / all your points * size of known area)
Example: 3 points hits hit the polygon, 20 points where "shot", the area of the polygon is 0.6m²
NOTE: This area is only an approach! The more points you have, the better the approach gets.
You can implement a fancy method to display this in your vb program of course. Was this what you needed? Is my assumption about the polygon correct? Do you need help with the "point inside polygon" algorithm?
There is nothing specific to VB.net with this problem, except maybe for the choice of a random number generator from the library.
Numerically solving integrals of a function f(x_1,...,x_n) by using can become infeasible (in acceptable time) for high dimensions n, because the number of sample points needed for a given sampling distance grows exponentially with the dimension of the problem. The fundamental idea with Monte Carlo Integration is to replace the uniform sampling of the variables x_1,...,x_n with random sampling, taking n random numbers per sample. With these samples, estimate the integral. The more samples, the better the estimate. And the major benefit of MC integration is, that you can use standard statistical methods to estimate the error of your result.
So, how to start: Implement integration by uniform sampling of the integration space, then go to random sampling and add error estimation.

Mathematical analysis of a sound sample (as an array of numbers)

I need to find the frequency of a sample, stored (in vb) as an array of byte. Sample is a sine wave, known frequency, so I can check), but the numbers are a bit odd, and my maths-foo is weak.
Full range of values 0-255. 99% of numbers are in range 235 to 245, but there are some outliers down to 0 and 1, and up to 255 in the remaining 1%.
How do I normalise this to remove outliers, (calculating the 235-245 interval as it may change with different samples), and how do I then calculate zero-crossings to get the frequency?
Apologies if this description is rubbish!
The FFT is probably the best answer, but if you really want to do it by your method, try this:
To normalize, first make a histogram to count how many occurrances of each value from 0 to 255. Then throw out X percent of the values from each end with something like:
for (i=lower=0;i< N*(X/100); lower++)
i+=count[lower];
//repeat in other direction for upper
Now normalize with
A[i] = 255*(A[i]-lower)/(upper-lower)-128
Throw away results outside the -128..127 range.
Now you can count zero crossings. To make sure you are not fooled by noise, you might want to keep track of the slope over the last several points, and only count crossings when the average slope is going the right way.
The standard method to attack this problem is to consider one block of data, hopefully at least twice the actual frequency (taking more data isn't bad, so it's good to overestimate a bit), then take the FFT and guess that the frequency corresponds to the largest number in the resulting FFT spectrum.
By the way, very similar problems have been asked here before - you could search for those answers as well.
Use the Fourier transform, it's much more noise insensitive than counting zero crossings
Edit: #WaveyDavey
I found an F# library to do an FFT: From here
As it turns out, the best free
implementation that I've found for F#
users so far is still the fantastic
FFTW library. Their site has a
precompiled Windows DLL. I've written
minimal bindings that allow
thread-safe access to FFTW from F#,
with both guru and simple interfaces.
Performance is excellent, 32-bit
Windows XP Pro is only up to 35%
slower than 64-bit Linux.
Now I'm sure you can call F# lib from VB.net, C# etc, that should be in their docs
If I understood well from your description, what you have is a signal which is a combination of a sine plus a constant plus some random glitches. Say, like
x[n] = A*sin(f*n + phi) + B + N[n]
where N[n] is the "glitch" noise you want to get rid of.
If the glitches are one-sample long, you can remove them using a median filter which has to be bigger than the glitch length. On both sides of the glitch. Glitches of length 1, mean you will have enough with a median of 3 samples of length.
y[n] = median3(x[n])
The median is computed so: Take the samples of x you want to filter (x[n-1],x[n],x[n+1]), sort them, and your output is the middle one.
Now that the noise signal is away, get rid of the constant signal. I understand the buffer is of a limited and known length, so you can just compute the mean of the whole buffer. Substract it.
Now you have your single sinus signal. You can now compute the fundamental frequency by counting zero crossings. Count the amount of samples above 0 in which the former sample was below 0. The period is the total amount of samples of your buffer divided by this, and the frequency is the oposite (1/x) of the period.
Although I would go with the majority and say that it seems like what you want is an fft solution (fft algorithm is pretty quick), if fft is not the answer for whatever reason you may want to try fitting a sine curve to the data using a fitting program and reading off the fitted frequency.
Using Fityk, you can load the data, and fit to a*sin(b*x-c) where 2*pi/b will give you the frequency after fitting.
Fityk can be used from a gui, from a command-line for scripting and has a C++ API so could be included in your programs directly.
I googled for "basic fft". Visual Basic FFT Your question screams FFT, but be careful, using FFT without understanding even a little bit about DSP can lead results that you don't understand or don't know where they come from.
get the Frequency Analyzer at http://www.relisoft.com/Freeware/index.htm and run it and look at the code.