Understanding fitness function - cryptography

I am working with use of genetic algorithm to break transposition cipher. So in this work I have come across to a paper named Breaking Transposition Cipher with Genetic Algorithm by R. Toemeh & S. Arumugam.
In this paper they have used a fitness function. But i can not understand it completely. I can not understand the function of β and γ in the equation.
Can anyone please explain the fitness function please? Here is the picture of the fitness function:

The weights β and γ can be varied to allow more or less
emphasis on particular statistics (they're determined "experimentally").
Kb(i, j) and Kt(i, j, k) are the known language bigram and trigram statistics. E.g. for English language you have (bigrams):
(further details in The frequency of bigrams in an English corpus)
Db(i, j) and Dt(i, j ,k) are the bigram and trigram statistics of
the message decrypted with key k.
In A Generic Genetic Algorithm to Automate an Attack on Classical Ciphers by Anukriti Dureha and Arashdeep Kaur there are some reference values of β and γ (and α since they use an extended form of the above equation) and three types of ciphers.
Some further details about β and γ.
They're weights that remain constant during the evolution. They should be tuned experimentally ("optimal" values depends on the target languages and the cipher algorithms).
Offline parameter tuning is the way to go, i.e.:
simple parameter sweep (try everything)
meta-GA
racing strategy

Related

Pattern-based optimizations on lambda calculus

I am writing an interpreter for the lambda calculus in C#. So far I have gone down the following avenues for interpretation.
Compilation of terms to MSIL, such that lazy evaluation is still preserved.
Evaluation on a tree of terms (term rewriting).
At this moment, the MSIL compilation strategy is well over an order of magnitude faster in most any case I have been able to test. However, I am looking into optimizing the term rewriter by identifying patterns often used in the construction of LC terms. So far, I have come up with one method in particular which provides a relatively small speedup: identification of exponentiated applications. E.g. f (f (f (f x))) is simplified to f^4 x. Then, a rule for applications of equal applicant exponentials is used, namely f^m (f^n x) = f^(m + n) x. This rule works very well in particular for the exponentiation of church numerals.
This optimization has me wondering: Are there other pattern-based approaches to optimization in LC?

Time complexity (Big-O notation) of Posterior Probability Calculation

I got a basic idea of Big-O notation from Big-O notation's definition.
In my problem, a 2-D surface is divided into uniform M grids. Each grid (m) is assigned with a posterior probability based on A features.
The posterior probability of m grid is calculated as follows:
and the marginal likelihood is given as:
Here, A features are independent of each other and sigma and mean symbol represent the standard deviation and mean value of each a feature at each grid. I need to calculate the Posterior probability of all M grids.
What will be the time complexity of the above operation in terms of Big-O notation?
My guess is O(M) or O(M+A). Am I correct? I'm expecting an authenticate answer to present at the formal forum.
Also, what will be the time complexity if M grids are divided into T clusters where every cluster has Q grids (Q << M) (calculating Posterior Probability only on Q grids out of M grids) ?
Thank you very much.
Discrete sum and product
can be understood as loops. If you are happy with floating point approximation most other operators are typically O(1), conditional probability looks like a function call. Just inject constants and variables in your equation and you'll get the expected Big-O, the details of formula are irrelevant. Also be aware that these "loops" can often be simplified using mathematical properties.
If the result is not obvious, please convert your above mathematical formula in actual programming code in a programming language. Computer Science Big-O is never about a formula but about an actual translation of it in programming steps, depending on the implementation the same formula can lead to very different execution complexities. As different as adding integers by actually performing sum O(n) or applying Gauss formula O(1) for instance.
By the way why are you doing a discrete sum on a discrete domaine N ? Shouldn't it be M ?

Can I use this fitness function?

I am working on a project using a genetic algorithm, and I am trying to formulate a fitness function, my questions are:
What is the effect of fitness formula choice on a GA?
It is possible to make the fitness function equals directly the number of violation (in case of minimisation)?
What is the effect of fitness formula choice on a GA
The fitness function plays a very important role in guiding GA.
Good fitness functions will help GA to explore the search space effectively and efficiently. Bad fitness functions, on the other hand, can easily make GA get trapped in a local optimum solution and lose the discovery power.
Unfortunately every problem has its own fitness function.
For classification tasks error measures (euclidean, manhattan...) are widely adopted. You can also use entropy based approaches.
For optimization problems, you can use a crude model of the function you are investigating.
Extensive literature is available on the characteristics of fitness function (e.g. {2}, {3}, {5}).
From an implementation point of view, some additional mechanisms have to be taken into consideration: linear scaling, sigma truncation, power scaling... (see {1}, {2}).
Also the fitness function can be dynamic: changing during the evolution to help search space exploration.
It is possible to make the fitness function equals directly the number of violation (in case of minimisation)?
Yes, it's possible but you have to consider that it could be a too coarse grained fitness function.
If the fitness function is too coarse (*), it doesn't have enough expressiveness to guide the search and the genetic algorithm will get stuck in local minima a lot more often and may never converge on a solution.
Ideally a good fitness function should have the capacity to tell you what the best direction to go from a given point is: if the fitness of a point is good, a subset of its neighborhood should be better.
So no large plateau (a broad flat region that doesn't give a search direction and induces a random walk).
(*) On the other hand a perfectly smooth fitness function could be a sign you are using the wrong type of algorithm.
A naive example: you look for parameters a, b, c such that
g(x) = a * x / (b + c * sqrt(x))
is a good approximation of n given data points (x_i, y_i)
You could minimize this fitness function:
| 0 if g(x_i) == y_i
E1_i = |
| 1 otherwise
f1(a, b, c) = sum (E1_i)
i
and it could work, but the search isn't aimed. A better choice is:
E2_i = (y_i - g(x_i)) ^ 2
f1(a, b, c) = sum (E2_i)
i
now you have a "search direction" and greater probability of success.
Further details:
Genetic Algorithms: what fitness scaling is optimal? by Vladik Kreinovich, Chris Quintana
Genetic Algorithms in Search, Optimization and Machine Learning by Goldberg, D. (1989, Addison-Wesley)
The Royal Road for Genetic Algorithms: Fitness Landscapes and GA Performance by Melanie Mitchell, Stephanie Forrest, John H Holland.
Avoiding the pitfalls of noisy fitness functions with genetic algorithms by Fiacc Larkin, Conor Ryan (ISBN: 978-1-60558-325-9)
Essentials of Metaheuristics by Sean Luke

what is the importance of crossing over in Differential Evolution Algorithm?

In Differential Evolution Algorithm for optimization problems.
There are three evolutionary processes involved, that is mutation crossing over and selection
I am just a beginner but I have tried removing the crossing over process and there is no significant difference result from the original algorithm.
So what is the importance of crossing over in Differential Evolution Algorithm?
If you don't use crossover may be your algorithm just explore the problem search space and doesn't exploit it. In general an evolutionary algorithm succeeds if it makes good balance between exploration and exploitation rates.
For example DE/rand/1/Either-Or is a variant of DE which eliminates crossover operator but uses effective mutation operator. According to Differential Evolution: A Survey of the State-of-the-Art, in this Algorithm, trial vectors that are pure mutants occur with a probability pF and those that are pure recombinants occur with a probability 1 − pF. This variant is shown to yield competitive results against classical DE-variants rand/1/bin and target-to-best/1/bin (Main Reference).
X(i,G) is the i-th target (parent) vector of Generation G, U(i,G) is it's corresponding trial vector,F is difference vector scale factor and k = 0.5*(F + 1)[in the original paper].
In this scheme crossover isn't used but mutation is effective enough to compare with original DE algorithm.

If a language (L) is recognized by an n-state NFA, can it also be recognized by a DFA with no more than 2^n states?

I'm thinking so, because the upper bound would be the 2^n, and given that these are both finite machines, the intersection for both the n-state NFA and the DFA with 2^n or less states will be valid.
Am I wrong here?
You're right. 2^n is an upper limit, so the generated DFA can't have more states than that limit. But it's the worst-case scenario. In most common scenarios there's less states than that in the resulting DFA. Sometimes it could be even less than in the original NFA.
But as far as I know, the algorithm to predict how many states the resulting DFA will actually have, doesn't exist yet. So if you'll find it, please let me know ;)
That is correct. As you probably already know, both DFAs and NFAs only accept regular languages. That means that they are equal in the languages they can accept. Also, the most primitive way of transforming a NFA to a DFA is with subset construction (also called powerset construction), where you simply create a state in the DFA for every combination of states in the NFA. This is called the powerset of states, which could at most be 2^n.
But, as mentioned by #SasQ that is the worst case scenario. Typically you will not end up with that many states if you use Hopcroft's algorithm or Brozowski's algorithm.