The acceptance rate jumps between 0 and 1 drastically in high-dimensional MH algorithm. How can I tune it? - bayesian

One part of my MCMC algorithm is using MH algorithm to update (n\times 1) vector of parameters $\boldsymbol{\delta}$. I think it is less computational intensive to propose a new sample from a $n\times 1$ multivariate proposal distribution (n is large). However, it seems impossible to tune the acceptance rate towards some ideal interval, such as 0.2 to 0.5.
I have tried random walk update based on multivariate normal and multivariate uniform distribution. No matter how I adjust the step size, the pattern of acceptance rate looks similar to the following figure.
enter image description here
Is there anyone have such experience? Any good suggest is welcome!

Related

Population size in Fast Messy Genetic Algorithm

I'm trying to implement the Fast Messy GA using the paper by Goldberg, Deb, Kargupta Harik: fmGA - Rapid Accurate Optimization of Difficult Problems using Fast Messy Genetic Algorithms.
I'm stuck with the formula about the initial population size to account for the Building Block evaluation noise:
The sub-functions here are m=10 order-3(k=3) deceptive functions:
l=30, l'=27 and B is signal-to-noise ratio which is the ratio of the fitness deviation to the difference between the best and second best fitness value(30-28=2). Fitness deviation according to the table above is sqrt(155).
However in the paper they say using 10 order-3 subfunctions and using the equation must give you population size 3,331 but after substitution I can't reach it since I am not sure what is the value of c(alpha).
Any help will be appreciated. Thank you
I think I've figured it out what exactly is c(alpha). At least the graph drawing it against alpha looks exactly the same as in the paper. It seems by the square of the ordinate they mean the square of the Z-score found by Inverse Normal Random Distribution using alpha as the right-tail area. At first I was missleaded that after finding the Z-score it should be substituted in the Normal Random Distribution equation to fight the height(ordinate).
There is some implementation in Lua here https://github.com/xenomeno/GA-Messy for the interested folks. However the Fast Messy GA has some problems reproducing the figures from the original Goldberg's paper which I am not sure how to fix but these is another matter.

Time complexity of a genetic algorithm for bin packing

I am trying to explore genetic algorithms (GA) for the bin packing problem, and compare it to classical Any-Fit algorithms. However the time complexity for GA is never mentioned in any of the scholarly articles. Is this because the time complexity is very high? and that the main goal of a GA is to find the best solution without considering the time? What is the time complexity of a basic GA?
Assuming that termination condition is number of iterations and it's constant then in general it would look something like that:
O(p * Cp * O(Crossover) * Mp * O(Mutation) * O(Fitness))
p - population size
Cp - crossover probability
Mp - mutation probability
As you can see it not only depends on parameters like eg. population size but also on implementation of crossover, mutation operations and fitness function implementation. In practice there would be more parameters like for example chromosome size etc.
You don't see much about time complexity in publications because researchers most of the time compare GA using convergence time.
Edit Convergence Time
Every GA has some kind of a termination condition and usually it's convergence criteria. Let's assume that we want to find the minimum of a mathematical function so our convergence criteria will be the function's value. In short we reach convergence during optimization when it's no longer worth it to continue optimization because our best individual doesn't get better significantly. Take a look at this chart:
You can see that after around 10000 iterations fitness doesn't improve much and the line is getting flat. Best case scenario reaches convergence at around 9500 iterations, after that point we don't observe any improvement or it's insignificantly small. Assuming that each line shows different GA then Best case has the best convergence time becuase it reaches convergence criteria first.

Implementing the Bayes' theorem in a fitness function

In an evolutionary programming project I'm working on I thought it could be a useful idea to use the formula in Bayes' theorem. Although I'm not totally sure what that would look like.
So the programs that are evolving are attempting to predict the future state of a time series using past data. Given some price data for the past n days the program will predict either buy if it predicts the price will rise, sell if fall, leave if there is too little movement.
From my understanding, I work out the probability of the model being accurate with regards to buying with the following algorithm after testing it on the historical data and recording correct and incorrect predictions.
prob-b-given-a = correct-buy-predictions / total
prob-a = actual-buy-count / total
prob-b = prediction-buy-count / total
prob-a-given-b = (prob-b-given-a * prob-a) / prob-b
fitness = prob-a-given-b //last step for clarification
Am I interpreting Bayes' theorem correctly and is this a suitable fitness function?
How would I combine the fitness function for all predictions? (in my example I only show the predictive probability of the buy prediction)

LDPC behaviour as density of parity-check matrix increases

My assignment is to implement a Loopy Belief Propagation algorithm for Low-density Parity-check Code. This code uses a parity-check matrix H which is rather sparse (say 750-by-1000 binary matrix with an average of about 3 "ones" per each column). The code to generate the parity-check matrix is taken from here
Anyway, one of the subtasks is to check the reliability of LDPC code when the density of the matrix H increases. So, I fix the channel at 0.5 capacity, fix my code speed at 0.35 and begin to increase the density of the matrix. As the average number of "ones" in a column goes from 3 to 7 in steps of 1, disaster happens. With 3 or 4 the code copes perfectly well. With higher density it begins to fail: not only does it sometimes fail to converge, it oftentimes converges to the wrong codeword and produces mistakes.
So my question is: what type of behaviour is expected of an LDPC code as its sparse parity-check matrix becomes denser? Bonus question for skilled mind-readers: in my case (as the code performance degrades) is it more likely because the Loopy Belief Propagation algo has no guarantee on convergence or because I made a mistake implementing it?
After talking to my TA and other students I understand the following:
According to Shannon's theorem, the reliability of the code should increase with the density of the parity check matrix. That is simply because more checks are made.
However, since we use Loopy Belief Propagation, it struggles a lot when there are more and more edges in the graph forming more and more loops. Therefore, the actual performance degrades.
Whether or not I made a mistake in my code based solely on this behaviour cannot be established. However, since my code does work for sparse matrices, it is likely that the implementation is fine.

How to decide the step size when using Metropolis–Hastings algorithm

I have a simple question regarding to Metropolis–Hastings algorithm.
Suppose the distribution only has one variable x and the value range of x is s=[-2^31,2^31].
In the sampling process, I need to propose a new value of x and then decide whether to accept it.
x_{t+1} =x_t+\epsilon
If I want to implement it by myself, how to decide the value of \epsilon.
The basic solution is to pick a value from Uniform[-2^31,2^31] and set it to \epsilon. What if the value range is unbounded like [-inf, inf]?
How does the current MCMC library (e.g. pymc) solve that problem?
Suppose you have $d$ dimensional parameters, the optimal scale is approximate $2.4d^(−1/2)$ times the scale of the target distribution, which implies optimal acceptance rates of 0.44 for $d = 1$ and 0.23 for $d$ goes to \infinity.
reference: Automatic Step Size Selection in Random Walk Metropolis Algorithms,
Todd L. Graves, 2011.
The best approach is to code a self-tuning algorithm that starts with an arbitrary variance for the step size variance, and tune this variance as the algorithm progresses. You are shooting for an acceptance rate of 25-50% for the Metropolis algorithm.