Is there a tool for converting an ANTLR grammar to a Bison grmmar?
I doubt it. Since ANTLR supports a broader class of grammars than Bison, it's only even possible for a subset of ANTLR grammars. At least from what I've seen, relatively few ANTLR grammars fit in the subset that could be directly converted to Bison.
Related
As the subject asks, does anyone know of an existing Tatsu grammar (or at least a PEG-format grammar) for the [g]awk language?
I did already browse all existing Tatsu examples that I could find, and searched extensively around the net for any PEG-format grammar for the requested language.
Peter
If there's an ANTLR grammar for AWK, you can start with the TatSu g2e converter.
If there is a grammar for AWK in any other grammar language, the shortest route is to write a grammar->grammar translator, as grammar languages tend to be small enough to be treatable with little effort.
Moving a grammar that was originally LR, LL, or LLA to PEG+LEFTREC takes just a little more effort.
I want to generate a Swift parser from the grammar which describes FTL syntax.
Is there any tool to make the EBNF -> ANTLR conversion automatically? Or are these two grammar syntaxes at least convertible?
The grammar itself is autogenerated from the set of rules written in JavaScript. The other possible solution is to update rules -> EBNF serializer to output ANTLR syntax. But I'm newbie into languages and not sure that could handle it.
Is there any tool to make the EBNF -> ANTLR conversion automatically?
AFAIK, there is no such tool.
Or are these two grammar syntaxes at least convertible?
No. EBNF allows for indirect left recursive rules, which ANTLR does not support (it does support direct left recursive rules though).
There are grammars we convert to unambiguous by using left recursion. Are there grammars that cannot be converted to unambiguous grammars?
There are unambiguous context-free grammars for most practical languages (ignoring context-sensitive features such as variable declarations, whitespace sensitivity, etc.).
But there is no algorithm which can find an unambiguous grammar given an ambiguous grammar. Furthermore, there is not even an algorithm which can tell you for certain whether a given grammar is ambiguous. These are both undecidable problems.
And, to answer your question, yes there are context-free languages for which there is no unambiguous grammar. Such languages are said to be inherently ambiguous.
My book gives similar but slightly different explanations of regular grammar and regular language.
I doubt it's wrong, is a regular language the same thing of a regular grammar?
The definition of my book is:
A grammar is regular if all the productions are V-> aW or V->Wa with V,W non terminal or terminal symbols, "a" terminal symbol.W can also be empty or be the same of V.
Regular grammars and regular languages are two different terms:
A language is a (possibly infinite) set of valid sequences of terminal symbols.
A grammar defines which are the valid sequences.
The same language could be represented with different class of grammars (regular, context free, etc.). A language is said to be regular if it can be represented with a regular grammar. On the othet hand, a regular grammar always defines a regular language. What you have posted is the definition of the regular grammar.
See this Wikipedia post for further information.
A formal grammar is a set of rules, whereas a formal language is a set of strings.
A regular grammar is a formal grammar that describes a regular language.
According to Wikipedia:
[T]he left regular grammars generate exactly all regular languages. The right regular grammars describe the reverses of all such languages, that is, exactly the regular languages as well.
If mixing of left-regular and right-regular rules is allowed, we still have a linear grammar, but not necessarily a regular one.
In the above, left-regular rules are rules of the form V->Wa (right-regular, of the form V->aW).
I think if I explain the difference between a language and grammar, your queries will automatically get resolved.
A language is a set of strings over some set of alphabets satisfying certain rules encoded as grammars, while
Grammars are used to generate languages.
So basically grammars denote the syntactical rules of a string and the set of strings that can be generated with the start symbol of the grammar is called the Language of the grammar
I've understood how lexical analysis works,
but no idea how the syntactic analysis is done,
though in principle they two should similar(The only difference lies in the
type of their input symbols, characters or tokens.) ,
but the generated parser code is greatly different.
Especially the yy_action,yy_lookahead,there's no such thing in lexical analysis...
The grammars used to generate lexical analyzers generally are regular grammars, while the grammars used to generated syntatic analyzers generally are context-free grammars. Although they might look the same at the surface, they have very different characteristics and capabilities. Regular grammars can be recognized by deterministic finite automatons, which are relatively simple to construct and make fast. Context-free grammars are more challenging to build a recognizer for and usually a parser generator tool will construct a parser for only a subset of context-free grammars. For example, yacc constructs parsers for context-free grammars that are also LALR(1) grammars using push-down automata.
For more information on parsing, I would highly recommend Parsing Techniques, which walks through all the nuances of parsing in excruciating (but well described!) detail.