What is difference between statistical semantics and distributional semantics? - semantics

I am searching for appropriate techniques to carry out some experiments of information retrieval. I have basic idea of statistical semantics which is mainly consider as an internal semantics and used as a measure of similarity between units of text (terms) and is evaluated based on the statistical analysis of term occurrence patterns. So i need to contrast against distributional semantics. Which is better to be used and why ?

Related

Why differential evolution works so well?

What is the idea behind the mutation in differential evolution and why should this kind of mutation perform well?
I do not see any good geometric reason behind it.
Could anyone point me to some technical explanation of this?
Like all evolutionary algorithms, DE uses a heuristic, so my explanation is going to be a bit hand-wavy. What DE is trying to do, like all evolutionary algorithms, is to do a random search that's not too random. DE's mutation operator first computes the vector between two random members of the population, then adds that vector to a third random member of the population. This works well because it uses the current population as a way of figuring out how large of a step to take, and in what direction. If the population is widely dispersed, then it's reasonable to take big steps; if it's tightly concentrated, then it's reasonable to take small steps.
There are many reasons DE works better than Goldberg's GA, but focusing on the variation operators I'd say that the biggest difference is that DE uses real-coded variables and GA uses binary encoding. When optimizing on a continuous space, binary encoding is not a good choice. This has been known since the early 1990s, and one of the first things to come out of the encounter between the primarily German Evolution Strategy community and the primarily American Genetic Algorithm community was Deb's Simulated Binary Crossover. This operator acts like the GA's crossover operator, but on real-coded variables.

Definitions of Phenotype and Genotype

Can someone help me understand the definitions of phenotype and genotype in relation to evolutionary algorithms?
Am I right in thinking that the genotype is a representation of the solution. And the phenotype is the solution itself?
Thanks
Summary: For simple systems, yes, you are completely right. As you get into more complex systems, things get messier.
That is probably all most people reading this question need to know. However, for those who care, there are some weird subtleties:
People who study evolutionary computation use the words "genotype" and "phenotype" frustratingly inconsistently. The only rule that holds true across all systems is that the genotype is a lower-level (i.e. less abstracted) encoding than the phenotype. A consequence of this rule is that there can generally be multiple genotypes that map to the same phenotype, but not the other way around. In some systems, there are really only the two levels of abstraction that you mention: the representation of a solution and the solution itself. In these cases, you are entirely correct that the former is the genotype and the latter is the phenotype.
This holds true for:
Simple genetic algorithms where the solution is encoded as a bitstring.
Simple evolutionary strategies problems, where a real-value vector is evolved and the numbers are plugged directly into a function which is being optimized
A variety of other systems where there is a direct mapping between solution encodings and solutions.
But as we get to more complex algorithms, this starts to break down. Consider a simple genetic program, in which we are evolving a mathematical expression tree. The number that the tree evaluates to depends on the input that it receives. So, while the genotype is clear (it's the series of nodes in the tree), the phenotype can only be defined with respect to specific inputs. That isn't really a big problem - we just select a set of inputs and define phenotype based on the set of corresponding outputs. But it gets worse.
As we continue to look at more complex algorithms, we reach cases where there are no longer just two levels of abstraction. Evolutionary algorithms are often used to evolve simple "brains" for autonomous agents. For instance, say we are evolving a neural network with NEAT. NEAT very clearly defines what the genotype is: a series of rules for constructing the neural network. And this makes sense - that it the lowest-level encoding of an individual in this system. Stanley, the creator of NEAT, goes on to define the phenotype as the neural network encoded by the genotype. Fair enough - that is indeed a more abstract representation. However, there are others who study evolved brain models that classify the neural network as the genotype and the behavior as the phenotype. That is also completely reasonable - the behavior is perhaps even a better phenotype, because it's the thing selection is actually based on.
Finally, we arrive at the systems with the least definable genotypes and phenotypes: open-ended artificial life systems. The goal of these systems is basically to create a rich world that will foster interesting evolutionary dynamics. Usually the genotype in these systems is fairly easy to define - it's the lowest level at which members of the population are defined. Perhaps it's a ring of assembly code, as in Avida, or a neural network, or some set of rules as in geb. Intuitively, the phenotype should capture something about what a member of the population does over its lifetime. But each member of the population does a lot of different things. So ultimately, in these systems, phenotypes tend to be defined differently based on what is being studied in a given experiment. While this may seem questionable at first, it is essentially how phenotypes are discussed in evolutionary biology as well. At some point, a system is complex enough that you just need to focus on the part you care about.

How to extract semantic relatedness from a text corpus

The goal is to assess semantic relatedness between terms in a large text corpus, e.g. 'police' and 'crime' should have a stronger semantic relatedness than 'police' and 'mountain' as they tend to co-occur in the same context.
The simplest approach I've read about consists of extracting IF-IDF information from the corpus.
A lot of people use Latent Semantic Analysis to find semantic correlations.
I've come across the Lucene search engine: http://lucene.apache.org/
Do you think it is suitable to extract IF-IDF?
What would you recommend to do what I'm trying to do, both in terms of technique and software tools (with a preference for Java)?
Thanks in advance!
Mulone
Yes, Lucene gets TF-IDF data. The Carrot^2 algorithm is an example of a semantic extraction program built on Lucene. I mention it since, as a first step, they create a correlation matrix. Of course, you probably can build this matrix yourself easily.
If you deal with a ton of data, you may want to use Mahout for the harder linear algebra parts.
It is very easy if you have lucene index. For example to get correllation you can use simple formula count(term1 and term2)/ count(term1)* count(term2). Where count is hits from you search results. Moreover you can easility calculate other semntica metrics such as chi^2, info gain. All you need is to get formula and convert it to terms of count from Query

Is functional programming considered more "mathematical"? If so, why?

Every now and then, I hear someone saying things like "functional programming languages are more mathematical". Is it so? If so, why and how? Is, for instance, Scheme more mathematical than Java or C? Or Haskell?
I cannot define precisely what is "mathematical", but I believe you can get the feeling.
Thanks!
There are two common(*) models of computation: the Lambda Calculus (LC) model and the Turing Machine (TM) model.
Lambda Calculus approaches computation by representing it using a mathematical formalism in which results are produced through the composition of functions over a domain of types. LC is also related to Combinatory Logic, which is considered a more generalized approach to the same topic.
The Turing Machine model approaches computation by representing it as the manipulation of symbols stored on idealized storage using a body of basic operations (like addition, mutation, etc).
These different models of computation are the basis for different families of programming languages. Lambda Calculus has given rise to languages like ML, Scheme, and Haskell. The Turing Model has given rise to C, C++, Pascal, and others. As a generalization, most functional programming languages have a theoretical basis in lambda calculus.
Due to the nature of Lambda Calculus, certain proofs are possible about the behavior of systems built on its principles. In fact, provability (ie correctness) is an important concept in LC, and makes possible certain kinds of reasoning and conclusions about LC systems. LC is also related to (and relies on) type theory and category theory.
By contrast, Turing models rely less on type theory and more on structuring computation as a series of state transitions in the underlying model. Turing Machine models of computation are more difficult to make assertions about and do not lend themselves to the same kinds of mathematical proofs and manipulation that LC-based programs do. However, this does not mean that no such analysis is possible - some important aspects of TM models is used when studying virtualization and static analysis of programs.
Because functional programming relies on careful selection of types and transformation between types, FP can be perceived as more "mathematical".
(*) Other models of computation exist as well, but they are less relevant to this discussion.
Pure functional programming languages are examples of a functional calculus and so in theory programs written in a functional language can be reasoned about in a mathematical sense. Ideally you'd like to be able to 'prove' the program is correct.
In practice such reasoning is very hard except in trivial cases, but it's still possible to some degree. You might be able to prove certain properties of the program, for example you might be able to prove that given all numeric inputs to the program, the output is always constrained within a certain range.
In non-functional languages with mutable state and side effects attempts to reason about a program and 'prove' correctness are all but impossible, at the moment at least. With non-functional programs you can think through the program and convince yourself parts of it are correct, and you can run unit tests that test certain inputs, but it's usually not possible to construct rigorous mathematical proofs about the behaviour of the program.
I think one major reason is that pure functional languages have no side effects, i.e. no mutable state, they only map input parameters to result values, which is just what a mathematical function does.
The logic structures of functional programming is heavily based on lambda calculus. While it may not appear to be mathematical based solely on algebraic forms of math, it is written very easily from discrete mathematics.
In comparison to imperative programming, it doesn't prescribe exactly how to do something, but what must be done. This reflects topology.
The mathematical feel of functional programming languages comes from a few different features. The most obvious is the name; "functional", i.e. using functions, which are fundamental to math. The other significant reason is that functional programming involves defining a collection of things that will always be true, which by their interactions achieve the desired computation -- this is similar to how mathematical proofs are done.

Difference between Gene Expression Programming and Cartesian Genetic Programming

Something pretty annoying in evolutionary computing is that mildly different and overlapping concepts tend to pick dramatically different names. My latest confusion because of this is that gene-expression-programming seems very similar to cartesian-genetic-programming.
(how) Are these fundamentally different concepts?
I've read that indirect encoding of GP instructions is an effective technique ( both GEP and CGP do that ). Has there been reached some sort of consensus that indirect encoding has outdated classic tree bases GP?
Well, it seems that there is some difference between gene expression programming (GEP) and cartesian genetic programming (CGP or what I view as classic genetic programming), but the difference might be more hyped up than it really ought to be. Please note that I have never used GEP, so all of my comments are based on my experience with CGP.
In CGP there is no distinction between genotype and a phenotype, in other words- if you're looking at the "genes" of a CGP you're also looking at their expression. There is no encoding here, i.e. the expression tree is the gene itself.
In GEP the genotype is expressed into a phenotype, so if you're looking at the genes you will not readily know what the expression is going to look like. The "inventor" of GP, Cândida Ferreira, has written a really good paper and there are some other resources which try to give a shorter overview of the whole concept.
Ferriera says that the benefits are "obvious," but I really don't see anything that would necessarily make GEP better than CGP. Apparently GEP is multigenic, which means that multiple genes are involved in the expression of a trait (i.e. an expression tree). In any case, the fitness is calculated on the expressed tree, so it doesn't seem like GEP is doing anything to increase the fitness. What the author claims is that GEP increases the speed at which the fitness is reached (i.e. in fewer generations), but frankly speaking you can see dramatic performance shifts from a CGP just by having a different selection algorithm, a different tournament structure, splitting the population into tribes, migrating specimens between tribes, including diversity into the fitness, etc.
Selection:
random
roulette wheel
top-n
take half
etc.
Tournament Frequency:
once per epoch
once per every data instance
once per generation.
Tournament Structure:
Take 3, kill 1 and replace it with the child of the other two.
Sort all individuals in the tournament by fitness, kill the lower half and replace it with the offspring of the upper half (where lower is worse fitness and upper is better fitness).
Randomly pick individuals from the tournament to mate and kill the excess individuals.
Tribes
A population can be split into tribes that evolve independently of each-other:
Migration- periodically, individual(s) from a tribe would be moved to another tribe
The tribes are logically separated so that they're like their own separate populations running in separate environments.
Diversity Fitness
Incorporate diversity into the fitness, where you count how many individuals have the same fitness value (thus are likely to have the same phenotype) and you penalize their fitness by a proportionate value: the more individuals with the same fitness value, the more penalty for those individuals. This way specimens with unique phenotypes will be encouraged, therefore there will be much less stagnation of the population.
Those are just some of the things that can greatly affect the performance of a CGP, and when I say greatly I mean that it's in the same order or greater than Ferriera's performance. So if Ferriera didn't tinker with those ideas too much, then she could have seen much slower performance of the CGPs... especially if she didn't do anything to combat stagnation. So I would be careful when reading performance statistics on GEP, because sometimes people fail to account for all of the "optimizations" available out there.
There seems to be some confusion in these answers that must be clarified. Cartesian GP is different from classic GP (aka tree-based GP), and GEP. Even though they share many concepts and take inspiration from the same biological mechanisms, the representation of the individuals (the solutions) varies.
In CGPthe representation (mapping between genotype and phenotype) is indirect, in other words, not all of the genes in a CGP genome will be expressed in the phenome (a concept also found in GEP and many others). The genotypes can be coded in a grid or array of nodes, and the resulting program graph is the expression of active nodes only.
In GEP the representation is also indirect, and similarly not all genes will be expressed in the phenotype. The representation in this case is much different from treeGP or CGP, but the genotypes are also expressed into a program tree. In my opinion GEP is a more elegant representation, easier to implement, but also suffers from some defects like: you have to find the appropriate tail and head size which is problem specific, the mnltigenic version is a bit of a forced glue between expression trees, and finally it has too much bloat.
Independently of which representation may be better than the other in some specific problem domain, they are general purpose, can be applied to any domain as long as you can encode it.
In general, GEP is simpler from GP. Let's say you allow the following nodes in your program: constants, variables, +, -, *, /, if, ...
For each of such nodes with GP you must create the following operations:
- randomize
- mutate
- crossover
- and probably other genetic operators as well
In GEP for each of such nodes only one operation is needed to be implemented: deserialize, which takes array of numbers (like double in C or Java), and returns the node. It resembles object deserialization in languages like Java or Python (the difference is that deserialization in programming languages uses byte arrays, where here we have arrays of numbers). Even this 'deserialize' operation doesn't have to be implemented by the programmer: it can be implemented by a generic algorithm, just like it's done in Java or Python deserialization.
This simplicity from one point of view may make searching of best solution less successful, but from other side: requires less work from programmer and simpler algorithms may execute faster (easier to optimize, more code and data fits in CPU cache, and so on). So I would say that GEP is slightly better, but of course the definite answer depends on problem, and for many problems the opposite may be true.