I'm trying to use a composite grammar with Antlr 3.1 and Antlrworks 1.4.2. When I put the import statement in, it says 'undefined import'. I've tried a number of different combinations of lexer grammar and parser grammer but can't get it to generate the code. Am I missing something obvious? Am example is below.
grammar Tokens;
TOKEN : 'token';
grammar Parser;
import Tokens;//gives undefined import error
rule : TOKEN+;
I'm referencing the documentation from
http://www.antlr.org/wiki/display/ANTLR3/Composite+Grammars
Thanks
When separating lexer- and parser grammars, you need to explicitly define what type of grammar it is.
Try:
parser grammar Parser;
import Tokens;//gives undefined import error
rule : TOKEN+;
and:
lexer grammar Tokens;
TOKEN : 'token';
Note that from a combined grammar file Foo.g, the lexer and parser get a Parser and Lexer prefix by default: FooLexer.java and FooParser.java respectively. But in "explicit" grammars, the name of the .java file is that of the grammar itself: Parser.java and Tokens.java in your case. You might want to watch out calling a class Parser since that is the name of ANTLR's base parser class:
http://www.antlr.org/api/Java/classorg_1_1antlr_1_1runtime_1_1_parser.html
Also watch out to place the import statement below the options { ... } section, but before any tokens { ... } you may have defined, otherwise you might get strange errors.
Argghh It was something stupid. Antlrworks will underline the import and highlight all the tokens as undefined syntax errors but still allow you to generate the code if you try!
The reason it wasn't working the first time was the import was above the options as per Bart's suggestions.
Related
I use antlr4 with javascript target.
Here is a sample grammar:
P : T ;
T : [a-z]+ {console.log(this.text);} ;
start: P ;
When I run the generated parser, nothing is printed, although the input is matched. If I move the action to the token P, then it gets invoked. Why is that?
Actions are ignored in referenced rules. This was the original behavior of ANTLR 4, back when the lexer only supported a single action per token (and that action must appear at the end of the token).
Several releases later the limitation of one-action-per-rule was lifted, allowing any number of actions to be executed for a token. However, we found that many existing users relied on the original behavior, and wrote their grammars assuming that actions in referenced rules were ignored. Many of these grammars used complicated logic in these rules, so changing the behavior would be a severe breaking change that would prevent people from using new versions of ANTLR 4.
Rather than break so many existing ANTLR 4 lexers, we decided to preserve the original behavior and only execute actions that appear in the same rule as the matched token. Newer versions do allow you to place multiple actions in each rule though.
tl;dr: We considered allowing actions in other rules to execute, but decided not to because it would break a lot of grammars already written and used by people.
I found that #init and #after actions will override this default behavior.
Change the example code to:
grammar Test;
ALPHA : [a-z]+;
p : t ;
t
#init {
console.log(this.text);
}
#after {
console.log(this.text);
}
: ALPHA;
start: p ;
I changed parser rules to LOWER case as my Eclipse tool was complaining about the syntax otherwise. I also had to insert ALPHA for [a-z]+; for the same reason. The above text compiled, but I haven't tried running the generated parser. However, I am successfully working around this issue with #init/#after in my larger parser.
Hope this is helpful.
I just started with antlr, And I am using 4.2. Easy guessing says it would be like antlr3 in basics. so I followed the accepted answer of this question. (But instead of Exp, I replaced Java, which means I want to parse Java) Everything is fine, Until I want to compile the ANTLRDemo.java example.
When I compile that, I get 4 errors:
ANTLRStringStream in = new ANTLRStringStream("some random text");
JavaLexer lexer = new JavaLexer(in);
first error: constructor JavaLexer in class JavaLexer cannot be applied to
given types; JavaLexer lexer = new JavaLexer(in); required:
CharStream found: ANTLRStringStream reason: actual argument
ANTLRStringStream cannot be converted to CharStream by method
invocation conversion (I know what this error is ;-)
CommonTokenStream tokens = new CommonTokenStream( lexer);
JavaParser parser = new JavaParser(tokens);
System.out.println(parser.eval());
to make it short, let's say every line has its own similar error. For example, "parser" does not have an "eval()" method.
What am I missing? I guess antlr4 does not run like 3. Any Ideas? Please consider my beginner status.
In ANTLR 4, use ANTLRInputStream instead of the old ANTLRStringStream from ANTLR 3.
The eval() method exists when you have a parser rule in the grammar named eval. One such method is created for each rule in the grammar. If you do not intend to start parsing at rule eval, then you should replace that call with the name of the start rule for your particular grammar.
I'm creating my first grammar with ANTLR and ANTLRWorks 2. I have mostly finished the grammar itself (it recognizes the code written in the described language and builds correct parse trees), but I haven't started anything beyond that.
What worries me is that every first occurrence of a token in a parser rule is underlined with a yellow squiggle saying "Implicit token definition in parser rule".
For example, in this rule, the 'var' has that squiggle:
variableDeclaration: 'var' IDENTIFIER ('=' expression)?;
How it looks exactly:
The odd thing is that ANTLR itself doesn't seem to mind these rules (when doing test rig test, I can't see any of these warning in the parser generator output, just something about incorrect Java version being installed on my machine), so it's just ANTLRWorks complaining.
Is it something to worry about or should I ignore these warnings? Should I declare all the tokens explicitly in lexer rules? Most exaples in the official bible The Defintive ANTLR Reference seem to be done exactly the way I write the code.
I highly recommend correcting all instances of this warning in code of any importance.
This warning was created (by me actually) to alert you to situations like the following:
shiftExpr : ID (('<<' | '>>') ID)?;
Since ANTLR 4 encourages action code be written in separate files in the target language instead of embedding them directly in the grammar, it's important to be able to distinguish between << and >>. If tokens were not explicitly created for these operators, they will be assigned arbitrary types and no named constants will be available for referencing them.
This warning also helps avoid the following problematic situations:
A parser rule contains a misspelled token reference. Without the warning, this could lead to silent creation of an additional token that may never be matched.
A parser rule contains an unintentional token reference, such as the following:
number : zero | INTEGER;
zero : '0'; // <-- this implicit definition causes 0 to get its own token
If you're writing lexer grammar which wouldn't be used across multiple parser grammmar(s) then you can ignore this warning shown by ANTLRWorks2.
Let's suppose I have two grammars (and that there is a Lexer defined somewhere), ParserA and ParserB.
In ParserA I have the following code:
parser grammar ParserA;
classDeclaration
scope {
ST mList;
}
...
ParserB is something like:
parser grammar ParserB;
import ParserA;
methodDeclaration : something something { $classDeclaration::mList.add(...) };
The code in the action will fail to compile (by javac) since classDeclaration is in a different class (and file). Any tips on how to fix it?
Any tips on how to fix it?
No, there's (AFAIK) no ANTLR shortcut here: there's no communication possible between imported grammars (either by using scopes or by providing parameters to imported grammar rules).
I have an antlr generated Java parser that uses the C target and it works quite well. The problem is I also want it to parse erroneous code and produce a meaningful AST. If I feed it a minimal Java class with one import after which a semicolon is missing it produces two "Tree Error Node" objects where the "import" token and the tokens for the imported class should be.
But since it parses the following code correctly and produces the correct nodes for this code it must recover from the error by adding the semicolon or by resyncing. Is there a way to make antlr reflect this fixed input it produces internally in the AST? Or can I at least get the tokens/text that produced the "Tree Node Errors" somehow?
In the C targets
antlr3commontreeadaptor.c around line 200 the following fragment indicates that the C target only creates dummy error nodes so far:
static pANTLR3_BASE_TREE
errorNode (pANTLR3_BASE_TREE_ADAPTOR adaptor, pANTLR3_TOKEN_STREAM ctnstream, pANTLR3_COMMON_TOKEN startToken, pANTLR3_COMMON_TOKEN stopToken, pANTLR3_EXCEPTION e)
{
// Use the supplied common tree node stream to get another tree from the factory
// TODO: Look at creating the erronode as in Java, but this is complicated by the
// need to track and free the memory allocated to it, so for now, we just
// want something in the tree that isn't a NULL pointer.
//
return adaptor->createTypeText(adaptor, ANTLR3_TOKEN_INVALID, (pANTLR3_UINT8)"Tree Error Node");
}
Am I out of luck here and only the error nodes the Java target produces would allow me to retrieve the text of the erroneous nodes?
I haven't used antlr much, but typically the way you handle this type of error is to add rules for matching wrong syntax, make them produce error nodes, and try to fix up after errors so that you can keep parsing. Fixing up afterwards is the problem because you don't want one error to trigger more and more errors for each new token until the end.
I solved the problem by adding new alternate rules to the grammer for all possible erroneous statements.
Each Java import statement gets translated to an AST subtree with the artificial symbol IMPORT as the root for example. To make sure that I can differentiate between ASTs from correct and erroneous code the rules for the erroneous statements rewrite them to an AST with a root symbol with the prefix ERR_, so in the example of the import statement the artifical root symbol would be ERR_IMPORT.
More different root symbols could be used to encode more detailed information about the parse error.
My parser is now as error tolerant as I need it to be and it's very easy to add rules for new kinds of erroneous input whenever I need to do so. You have to watch out to not introduce any ambiguities into your grammar, though.