does ANTLR4 build automata at some point? - antlr

as I understand the latest ANTLR4 went away from building static DFA tables for the lexer and for parser and now they do it at runtime. Is this correct? Could someone please explain in general how ANTLR4 works?

Our recent paper describes the mechanism in excruciating, academic detail. Section 3 provides a high level overview:
Instead of relying on static grammar analysis, an ALL(*) parser adapts to the input sentences presented to it at parse- time. The parser analyzes the current decision point (nonterminal with multiple productions) using a GLR-like mechanism to explore all possible decision paths with respect to the current “call” stack of in-process nonterminals and the remaining input on-demand. The parser incrementally and dynamically builds a lookahead DFA per decision that records a mapping from lookahead sequence to predicted production number. If the DFA constructed to date matches the current lookahead, the parser can skip analysis and immediately expand the predicted alternative.
We use an augmented transition network (ATN) to represent the grammar but build DFA using an algorithm very similar to the subset-construction algorithm of NFA-to-DFA conversion fame.
Hope this helps.


How to determine the number of states of finite automata according to any language?

I have a burning question about finite state machines, that how we can know that this language needs 2 states or 3 states? I mean is there any formula for that?
Although I believe, we would always work to minimize the number of states but still How can we determine the number of states to be created according to any language or string (without actually constructing the DFA)?
You are in effect asking about DFA minimization. It is a well-studied problem for which a number of algorithms have been developed. The Wikipedia article on it is a good starting point.
The theoretical result which governs the number of states is the Myhill-Nerode theorem, but this theorem doesn't give any quick formula. You have to determine the number of equivalence classes in an equivalence relation defined in terms of the language. Hopcroft's algorithm for DFA minimization is essentially an algorithm for determining the equivalence classes in the Myhill-Nerode equivalence relation. I suspect that any attempt to use Myhill-Nerode more directly is going to lead to something similar to Hopcroft's algorithm, though I am not an expert in the field.
Aho-corasick multiple pattern matching algorithm is a finite state machine with only 1 state.

How to build short sentences with a small letter set restriction?

I'm looking for a way to write a program that creates short german sentences with a restricted letter set. The sentences can be nonsense but should grammatically be correct. The following examples only contain the letters "aeilmnost":
"Antonia ist mit Tina im Tal."
"Tamina malt mit lila Tinte Enten."
"Tina nimmt alle Tomaten mit."
For this task I need a dictionary like this one (found in the answer to "Where can I find a parsable list of German words?"). The research area for programatically create text is NLG - Natural Language Generation. On the NLG-Wiki I found a large table of NLG systems. I picked two from the list, which could be appropriate:
SimpleNLG - a Java API, which has also an adaption for the german language
KOMET - multilingual generation, from University Bremen
Do you have worked with a NLG library and have some advice which one to use for building short sentences with a letter set restriction?
Can you recommend a paper to this topic?
Grammatically correct is a pretty fuzzy area, since grammar is not to strictly defined as one might think. What you really want here though, is a part-of-speech tagger, and a markov chain.
Specifically a markov chain says that given a certain state (the first word for instance) there's just a certain chance of moving on to another state (the next word). They are relatively easy to write from scracth, but I've got a gist here in python that shows how they work if you want an example.
Once you've got that I would suggest a part-of-speech-based markov chain, combined with just checking to see if words are constructed from your desired character set. In general the algorithm would go something like this:
Pick first word at random, checking that it is constructed solely from your desired set of characters
Use the Markov Chain to predict the next word
Check if that word is an appropriate part of speech, and that it conforms to the desired character set.
If not, predict another word until it is the case.
If so, then repeat starting at 2 to completion.
Hope that's what you're looking for. Let me know if you have any more questions.
As Slater Tyranus already said, Markov chains certainly form the basis of this task. I am going to suggest a more heavy-duty approach. It is considerably more work, but is likely to give much better results in terms of grammatical correctness.
Language Model based on PCFG parse trees: A language model works by assigning a probability to a sequence of words. It requires training data, however, in order to be built first. In your case, the training process should disregard words containing letters outside the limited set.
While theoretically a language model based on parse trees is much more likely to serve your purpose, there is one caveat: due to the kind of letter-based restriction you have, data sparsity will certainly raise its ugly head. Backoff techniques (e.g. Katz's backoff model) can help a bit, but it will essentially depend on whether or not you can train on enough enough data.
As far as readily available parsers are concerned, the Stanford NLP group provides a German parser based on the Negra corpus, as mentioned in their home page.

recursive descent parsing and antlr

I run across several blogs, such as this one:, somehow they use "recursive descent parsing" refers to handmade parser vs. parser generator like ANTLR.
To me "recursive descent parsing" and ANTLR are 2 different things, one is a general parsing theory while the other is an exact technology. But I am wondering why, it seems quite popular, people are mixing/comparing them together?
Recursive descent parsers are a specific subset of top-down parsers (LL). Recursive descent parsers are what programmers typically build by hand because that is the natural expression when building things my hand. Tools can generate all sorts of funny machines. ANTLR's goal for the last 25 years has been to generate what programmers build by hand, which means that it generates recursive descent parsers. Necessarily the generated parsers are more complicated because they are not hand optimized by human.
I would assume it's just because handwritten parsers tend to be recursive descent because that form follows closely from the [E]BNF definition and is very easy to verify manually, and if necessary to debug. Conversely, tools like ANTLR, Bison and the rest don't generally produce recursive descent parsers.
So you're right in that the comparison is strictly an approach to parsing versus a tool for parser generation but somewhere along the way recursive descent and handwritten have become idiomatic synonyms.

rule based fuzzy control system and function approximation

I am trying to implement a function approximator (aggregation) using a rule-based fuzzy control system. So as to simplify my implementation (and have better understanding) I am trying to approximate y=x^2 (the simplest non-linear function). As far as i understand i have to map my input (e.g. uniform samples over [-1,1]) to fuzzy sets (fuzzyfication) and then use a defuzzyfication method to take crisp values. Is there any simple explanation of this procedure because fuzzy control system literature is a bit mess.
This is sort of a broad question, but I'll give it a go since it has sat unanswered for so long.
First, I believe you need to refine your objective (at least as it stated here). I would hesitate to use the term "function approximation" in this context. If I follow your question correctly, the objective is map a non-linear function into another domain via fuzzy methods.
To do so, you first need to define your fuzzy set membership functions. (This link is a good example of the process.) Without additional information, the I recommend the triangular function due to its ease in implementation. The number of fuzzy sets, their placement and width (or support), and degree of overlap is application specific. You've indicated that your input domain is [-1,1], so you might find that three fuzzy sets does the trick, i.e Negative, Zero, and Positive.
From there, you need to craft a set of rules, i.e. if x is Negative then...
With rules in place, you can then define the defuzzification process. In short, this step weights the activation of each rule according to the needs of the application.
I don't believe I can contribute more fully until the output is better defined. You state "use a defuzzyfication method to take crisp values." - what does this set of crisp values mean? What is the range? Etc. Furthermore, you'll get more a response if you can identify the areas in which you are stuck (i.e. more specific questions).

graphic imaginary numbers with

anyone have experience doing this? when i say imaginary i mean the square root of negative one. how would i graph this?
Or more specifically,
Complex numbers have many applications. They are useful for being able to store two properties (the real and imaginary parts) that behave sensibly when you apply standard math operators on them, like multiplication. Many problems become easy to solve by transforming them to the complex number domain, perform an operation on them that is easy to calculate, then transforming them back.
A good example is calculating the behavior of an electronic circuit that has reactive components. The impedance of a coil in the complex domain is jwL, of a capacitor is 1/jwC (w = omega). Driven with a signal in the complex domain, you can easily calculate the response. In this particular case, graphing the response is meaningful by mapping the real part on the X-axis and the imaginary part on the Y-axis. The length of the vector is the amplitude, the angle is the phase.
The Laplace transform is another complex domain transformation, based on Euler's identity. It has a very useful graphical representation too, plotting the complex roots of the equation within the unity circle allows predicting the stability of a feedback system.
These kind of transforms are popular because they simplify the math or their graphical representation are easy to interpret. Whether yours are equally useful really depends on what the transform does.