Generate BNF diagrams from an antlr grammar? - antlr

I may well be asking something not achievable here.. Maybe someone can point out either
(a) What would be some steps (/tools?) to at least partially achieve creation of bnf diagrams from a (rather complex) antlr grammar
(b) why (if it were the case) this simply can not be achieved. E.g. maybe since antlr is extended BNF and its recursive structure differs from bnf requirements.. Along those lines.

ANTLRWorks 1 works for generating diagrams, one at a time, for rule.
for v4, ANTLRWorks 2 also generates them though I'm not sure it can save them to disk.
Ter

If it is an ANTLR 3 grammar, you could
use http://bottlecaps.de/convert to convert it to W3C notation,
with the result, proceed to http://bottlecaps.de/rr/ui for generating syntax diagrams.

Grako has an ANTLR3-to-EBNF translator in its examples. You can customize it to the BNF style you require (or to ANTLR4).

Related

Convert W3C's EBNF to ANTLR

I want to generate a Swift parser from the grammar which describes FTL syntax.
Is there any tool to make the EBNF -> ANTLR conversion automatically? Or are these two grammar syntaxes at least convertible?
The grammar itself is autogenerated from the set of rules written in JavaScript. The other possible solution is to update rules -> EBNF serializer to output ANTLR syntax. But I'm newbie into languages and not sure that could handle it.
Is there any tool to make the EBNF -> ANTLR conversion automatically?
AFAIK, there is no such tool.
Or are these two grammar syntaxes at least convertible?
No. EBNF allows for indirect left recursive rules, which ANTLR does not support (it does support direct left recursive rules though).

Is ANTLR available on QBasic?

I have just started studying about compiler design. And i have a little task to write a grammar on QBasic. But there are only few targeted languages on ANTLR. Is it possible on QBasic? Please anyone explain about this.
The biggest repository of ANTLR grammars I know of in one place is at this Github page. And it doesn't look like QBasic is among them.
I've written a grammar/interpreter or three with BASIC-like syntax with domain-specific extensions (and no line numbers!) but it doesn't look like anyone has undertaken QBasic in ANTLR4, not publicly at least.

What is a good tool for automatically calculating FIRST and FOLLOW sets?

I'm currently in the middle of playing with a BNF grammar that I hope to be able to wrangle into a LL(1) form. However, I've just finished making changes and calculating the new FIRST and FOLLOW sets for the grammar by hand for the third time today and I'm getting tired of it. There has to be a better way!
Can someone suggest a tool that, given a grammar, will automatically calculate the first and follow sets for all of the non terminals?
A year ago, we had a semester project at the university I attend, where our task was to create a programming language. As a group, we decided we wanted to be able to hand-write the parser from scratch, so we had to aim for an LL(1) grammar, since it would have been completely unrealistic to write a parser otherwise.
Of course, our starting point was far from being LL(1), so we too had to wrangle it into place. For that purpose, we used the kfgEdit tool from the AtoCC package. All you do is enter your rules, and then it can check if it's an LL(1) grammar at the click of a button.
A fair word of warning: The tool is a bit finicky about what it accepts. While you'd often use EBNF for the real grammar, so you can write ? and * and + to signal how many times that token must appear there, this is not supported. Grouping is also not supported. You may very well find that it takes a very long time to do this, and you will almost surely want to do some "rearranging" after you've reached LL(1) to make the grammar even close to being readable.
Of course, depending on the type of grammar you're dealing with, this may not be much of a problem for you. We had created a sort of Pascal/C hybrid, with a fairly restricted set of constructs (procedures, functions, only built-in primitive types and arrays of them, ifs, a single loop construct we'd come up with ourselves in place of the standard 3...), and it took me at least a week to wrangle it into an LL(1) grammar - probably 2, actually. Note that this is out of a total of about 4 months, so that was a lot of time spent there.
If you absolutely MUST have an LL(1) grammar, then you obviously will need to press on if you get into a situation like this, but if you're allowed to use parser generators like yacc/bison or SableCC then you will, in the long run, most likely find it a LOT easier to go down that route. That doesn't mean you SHOULD go down that route - I found that actually writing everything by hand provided some insight I probably wouldn't have gained otherwise - but it might be better for you to gain that insight in a different situation than your current.
tl;dr version: Use kfgEdit from the AtoCC package.
For recursive descent parsing, it would be worth looking at ANTLR. However, I'm not sure it provides an exact answer for your question - find the FIRST and FOLLOW sets for a given grammar.
The DMS Software Reengineering Toolkit has a parser generator that computes FIRST and FOLLOW sets; it will also let you inspect the L(AL)R state machine it generates.
However, if you have a legitimate context-free grammar, you don't have to "wrangle it" into LL shape; the DMS parser generator produces GLR parsers from any context-free grammar.

Is it easier to write a recursive-descent parser using an EBNF or a BNF?

I've got a BNF and EBNF for a grammar. The BNF is obviously more verbose. I have a fairly good idea as far as using the BNF to build a recursive-descent parser; there are many resources for this. I am having trouble finding resources to convert an EBNF to a recursive-descent parser. Is this because it's more difficult? I recall from my CS theory classes that we went over EBNFs, but we didn't go over converting them into a recursive-descent parser. We did go over converting BNF's into a recursive-descent parser.
The reason I'm asking is because the EBNF is more compact.
From looking at the EBNF's in general, I notice that terms enclosed between { and } can be converted into a while loop. Are there any other guidelines or rules?
You should investigate so-called metacompilers, which essentially compile EBNF into recursive descent parsers. How they do it is exactly the answer your question.
(Its pretty straightfoward, but good to understand the details).
A really wonderful paper is the "MetaII" paper by Val Schorre. This is metacompiler technology from honest-to-God 1964. In 10 pages, he shows you how to build a metacompiler, and provides not just that, but another compiler too and the output of both!. There's an astonishing moment that you come too if you go build one of these, where you realized how the meta-compiler compiles itself using its own grammar. This moment got me
hooked on compiler back in about 1970 when I first tripped over this paper. This is one of those computer science papers that everybody in the software business should read.
James Neighbors (the inventor of the term "domain" in software engineering, and builder of the first program transformation system [based on these metacompilers] has a great online MetaII tutorial, for those of you that don't want the do-it-from-scratch experience. (I have nothing to do with this except that Neighbors and I were undergraduates together).
Both ways are a fine way to learn about metacompilers and generating parsers from EBNF.
The key ideas are that the left hand side of a rule creates a function that parses that nonterminal and returns true if match and advances the input stream; false if no match and the input stream doesn't advance.
The contents of the function is determined by the right hand side. Literal tokens are matched directly.
Nonterminals cause calls to other functions generated for the other rules.
Kleene* maps to while loops, alternations map to conditional branches. What EBNF doesn't address,
and the metacompilers do, is how does parsing do anyting other than saying "matched" or not?
The secret is weaving output operations into the EBNF. The MetaII paper makes all this crystal clear.
Neither is harder than the other. It is really the difference between implementing something iteratively and implementing something recursively. In BNF, everything is recursive. In EBNF, some of the recursion is expressed iteratively. There are different variations in EBNF syntax, so I'll just use the English... "zero or more" is a simple while loop as you have discovered. "One or more" is the same as one followed by "zero or more". "Zero or one times" is a simple if statement. That should cover most of the cases.
The early meta compilers META II and TREEMETA and their kin are not exactly recursive decent parser. They were were stated as using recursive functions. That just meant they could call them selves.
We do not call C a recursive language. A C or C++ function is recursive in the same way the early meta compilers are recursive.
Recursion can be used. They were programming languages. Recursion is generally used only when analyzing nexted language constructs. For example parenthesized expression and nexted blocks.
More of an LR recursive decent combination. CWIC the last documented one has extensive backtracking and look ahead features. The '-' not operator can match any language construct. And inverts it success or failure. -term fails if a term is matched for example. The input is never advanced. The '?' looks ahead and matches any language construct ?expr for example would try to parse an expr. The look ahead '?' matched construct is not kept or is the input advanced.

What are the disadvantages of using ANTLR compared to Flex/Bison?

I've worked on Flex, Bison few years ago during my undergraduate studies. However, I don't remember much about it now. Recently, I have come to hear about ANTLR.
Would you recommend that I learn ANTLR or better to brush up Flex/Bison?
Does ANTLR have more/less features than Flex/Bison?
ANTLRv3 is LL(k), and can be configured to be LL(*). The latter in particular is ridiculously easy to write parsers in, as you can essentially use EBNF as-is.
Also, ANTLR generates code that is quite a lot like recursive descent parser you'd write from scratch. It's very readable and easy to debug to see why the parse doesn't work, or works wrong.
The advantage of Flex/Bison (or any other LALR parser) is that it's faster.
ANTLR has a run-time library JAR that you must include in your project.
ANTLR's recursive-descent parsers are easier to debug than the "bottom-up" parsers generated by Flex/Bison, but the grammar rules are slightly different.
If you want a Flex/Bison-style (LALR) parser generator for Java, look at JavaCC.
We have decided to use ANTLR for some of our information processing requirements - parsing legacy files and natural language. The learning curve is steep ut we are getting on top of it and I feel that it's a more modern and versatile approach for what we need to do. The disadvantages - since you ask - are mainly the learning curve which seems to be inevitable.