Did "!", "^" and "$" had a special meaning in Antlr3? - antlr

I dont have any prior knowledge about ANTLR(I recently learned a little bit about ANTLR4), but I have to translate an old grammar to a newer version and eclipse is telling me, that their are no viable alternatives for those characters and shows the syntax error " '!' came as a complete surprise to me".
I already deleted those characters and it does not seam to be a problem, but maybe it had a special function in ANTLR3.
Thanks in advance.
global_block:
DATABASE! IDENTIFIER!
| GLOBALS! define_section!+ END! GLOBALS!
| GLOBALS! STRING!
;
main_block: MAIN sequence? END em=MAIN
-> ^(MAIN MAIN '(' ')' sequence? $em)
;

^ and -> are related to tree rewriting: https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687090/Tree+construction
ANTLR4 does not support it (v4 has listeners and visitors for tree traversal, but no rewriting anymore). Just remove all of these ! and -> ... in parser rules (do not remove the -> ... inside lexer rules like -> channel(...), which is still supported in v4).
So in your case, these rules would be valid in ANTLR4:
global_block
: DATABASE IDENTIFIER
| GLOBALS define_section+ END GLOBALS
| GLOBALS STRING
;
main_block
: MAIN sequence? END MAIN
;
The $ can still be used in ANTLR4: they are used to reference sub-rules or tokens:
expression
: lhs=expression operator=(PLUS | MINUS) rhs=expression
| NUMBER
;
so that in embedded code block, you can do: $lhs.someField.someMethod(). In your case, you can also just remove them because they are probably only used in the tree rewrite rules.
EDIT
kaby76 has a Github page with some instructions for converting grammars to ANTLR4: https://github.com/kaby76/AntlrVSIX/blob/master/doc/Import.md#antlr3

Related

Upgrading Grammar file to Antlr4

I am upgrading my Antlr grammar file to latest Antlr4.
I have converted most of the file but stuck in syntax difference that I can't figure out. The 3 such difference is:
equationset: equation* EOF!;
equation: variable ASSIGN expression -> ^(EQUATION variable expression)
;
orExpression
: andExpression ( OR^ andExpression )*
;
In first one, the error is due to !. I am not sure whether EOF and EOF! is same or not. Removing ! resolves the error, but I want to be sure that is the correct fix.
In 2nd rule, -> and ^ is giving error. I am not sure what is Antlr4 equivalent.
In 3rd rule, ^ is giving error. Removing it fixes the error, but I can't find any migration guide that explains what should be equivalent for this.
Can you please give me the Antrl4 equivalent of these 3 rules and give some brief explanation what is the difference? If you can refer to any other resource where I can find the answer is OK as well.
Thanks in advance.
Many of the ANTLR3 grammars contain syntax tree manipulations which are no longer supported with ANTLR4 (now we get a parse tree instead of a syntax tree). What you see here is exactly that.
EOF! means EOF should be matched but not appear in the AST. Since there is no AST anymore you cannot change that, so remove the exclamation mark.
The construct -> ^(EQUATION variable expression) rewrites the AST created by the equation rule. Since there is no AST anymore you cannot change that, so remove that part.
OR^ finally determines that the OR operator should become the root of the generated AST. Since there is no AST anymore ..., you got the point now :-)

Antlr Arrow Syntax

I found this syntax in an Antlr parser for bash:
file_descriptor
: DIGIT -> ^(FILE_DESCRIPTOR DIGIT)
| DIGIT MINUS -> ^(FILE_DESCRIPTOR_MOVE DIGIT);
What does the -> syntax do?
What is it called such that I can google it to read about it?
The 'Definitive Guide to Antlr4' only has one page about it. It refers to "lexer command", but it never names the operator. The usage in the book differs from the usage in the bash parser.
In ANTLR3, -> is used in parser rules and signifies a tree rewrite rule, which is no longer supported in ANTLR4.
In ANTLR4, the -> is used in lexer rules and has nothing to do with the old v3 functionality.

ANTLR error(99) grammar has no rules

I previously posted about my first attempt at using ANTLR when I was having issues with left recursion.
Now that I have resolved those issues, I am getting the following error when I try to use org.antlr.v4.Tool to generate the code:
error(99): C:test.g4::: grammar 'test' has no rules
What are the possible reasons for this error? Using ANTLRWorks I can certainly see rules in the Parse Tree so why can't it see them? Is it because it cannot find a suitable START rule?
I think Antlr expects the first rule name to be in small case. I was getting the same error with my grammar
grammar ee;
Condition : LogicalExpression ;
LogicalExpression : BooleanLiteral ;
BooleanLiteral : True ;
True : 'true' ;
By changing the first production rule in the grammar to lower case it solved the issue i.e. the below grammar worked for me.
grammar ee;
condition : LogicalExpression ;
LogicalExpression : BooleanLiteral ;
BooleanLiteral : True ;
True : 'true' ;
Note: It is my personal interpretation, I couldn't find this reasoning in the online documentation.
Edit: The production rules should begin with lower case letters as specified in the latest docs [1]
[1] https://github.com/antlr/antlr4/blob/master/doc/lexicon.md#identifiers
I'm not sure if you've found the solution for this, but I had the same problem and fixed it by changing my start symbol to 'prog'. So for example, the first two lines of your .g4 file would be:
grammar test;
prog : <...> ;
Where <...> will be your first derivation.
I just got that error as well (antlworks 2.1).
switching from
RULE : THIS | THAT ; to rule : this | that ; for parser rules (i.e. from uppercase to lowercase) solved the problem!
EDIT
The above correction holds only for RULE , what follows after the : can be any combination of lexer/parser rules
The most likely cause is just what the error message suggests. And the most likely reason for that is that you have not saved your grammar to the file--or if you're using ANTLRWorks2--ANTLRWorks hasn't saved your work to the file. I have no idea why ANTLRWorks doesn't save reliably.
I also got the same error but could not fix it.
I downloaded antlrworks-1.4.jar and it's working perfectly.
Download >> antlrworks-1.4.jar
Changing the first rule to start with a lower case character worked for me.
I had the same problem, and this means that your grammar has no Syntactic rules. So in order to avoid this error, you need to write at least one Syntactic rule.

Antlr 3 keywords and identifiers colliding

Surprise, I am building an SQL like language parser for a project.
I had it mostly working, but when I started testing it against real requests it would be handling, I realized it was behaving differently on the inside than I thought.
The main issue in the following grammar is that I define a lexer rule PCT_WITHIN for the language keyword 'pct_within'. This works fine, but if I try to match a field like 'attributes.pct_vac', I get the field having text of 'attributes.ac' and a pretty ANTLR error of:
line 1:15 mismatched character u'v' expecting 'c'
GRAMMAR
grammar Select;
options {
language=Python;
}
eval returns [value]
: field EOF
;
field returns [value]
: fieldsegments {print $field.text}
;
fieldsegments
: fieldsegment (DOT (fieldsegment))*
;
fieldsegment
: ICHAR+ (USCORE ICHAR+)*
;
WS : ('\t' | ' ' | '\r' | '\n')+ {self.skip();};
ICHAR : ('a'..'z'|'A'..'Z');
PCT_CONTAINS : 'pct_contains';
USCORE : '_';
DOT : '.';
I have been reading everything I can find on the topic. How the Lexer consumes stuff as it finds it even if it is wrong. How you can use semantic predication to remove ambiguity/how to use lookahead. But everything I read hasn't helped me fix this issue.
Honestly I don't see how it even CAN be an issue. I must be missing something super obvious because other grammars I see have Lexer rules like EXISTS but that doesn't cause the parser to take a string like 'existsOrNot' and spit out and IDENTIFIER with the text of 'rNot'.
What am I missing or doing completely wrong?
Convert your fieldsegment parser rule into a lexer rule. As it stands now it will accept input like
"abc
_ abc"
which is probably not what you want. The keyword "pct_contains" won't be matched by this rule since it is defined separately. If you want to accept the keyword in certain sequences as regular identifier you will have to include it in the accepted identifier rule.

How can I construct a clean, Python like grammar in ANTLR?

G'day!
How can I construct a simple ANTLR grammar handling multi-line expressions without the need for either semicolons or backslashes?
I'm trying to write a simple DSLs for expressions:
# sh style comments
ThisValue = 1
ThatValue = ThisValue * 2
ThisOtherValue = (1 + 2 + ThisValue * ThatValue)
YetAnotherValue = MAX(ThisOtherValue, ThatValue)
Overall, I want my application to provide the script with some initial named values and pull out the final result. I'm getting hung up on the syntax, however. I'd like to support multiple line expressions like the following:
# Note: no backslashes required to continue expression, as we're in brackets
# Note: no semicolon required at end of expression, either
ThisValueWithAReallyLongName = (ThisOtherValueWithASimilarlyLongName
+AnotherValueWithAGratuitouslyLongName)
I started off with an ANTLR grammar like this:
exprlist
: ( assignment_statement | empty_line )* EOF!
;
assignment_statement
: assignment NL!?
;
empty_line
: NL;
assignment
: ID '=' expr
;
// ... and so on
It seems simple, but I'm already in trouble with the newlines:
warning(200): StackOverflowQuestion.g:11:20: Decision can match input such as "NL" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
Graphically, in org.antlr.works.IDE:
Decision Can Match NL Using Multiple Alternatives http://img.skitch.com/20090723-ghpss46833si9f9ebk48x28b82.png
I've kicked the grammar around, but always end up with violations of expected behavior:
A newline is not required at the end of the file
Empty lines are acceptable
Everything in a line from a pound sign onward is discarded as a comment
Assignments end with end-of-line, not semicolons
Expressions can span multiple lines if wrapped in brackets
I can find example ANTLR grammars with many of these characteristics. I find that when I cut them down to limit their expressiveness to just what I need, I end up breaking something. Others are too simple, and I break them as I add expressiveness.
Which angle should I take with this grammar? Can you point to any examples that aren't either trivial or full Turing-complete languages?
I would let your tokenizer do the heavy lifting rather than mixing your newline rules into your grammar:
Count parentheses, brackets, and braces, and don't generate NL tokens while there are unclosed groups. That'll give you line continuations for free without your grammar being any the wiser.
Always generate an NL token at the end of file whether or not the last line ends with a '\n' character, then you don't have to worry about a special case of a statement without a NL. Statements always end with an NL.
The second point would let you simplify your grammar to something like this:
exprlist
: ( assignment_statement | empty_line )* EOF!
;
assignment_statement
: assignment NL
;
empty_line
: NL
;
assignment
: ID '=' expr
;
How about this?
exprlist
: (expr)? (NL+ expr)* NL!? EOF!
;
expr
: assignment | ...
;
assignment
: ID '=' expr
;
I assume you chose to make NL optional, because the last statement in your input code doesn't have to end with a newline.
While it makes a lot of sense, you are making life a lot harder for your parser. Separator tokens (like NL) should be cherished, as they disambiguate and reduce the chance of conflicts.
In your case, the parser doesn't know if it should parse "assignment NL" or "assignment empty_line". There are many ways to solve it, but most of them are just band-aides for an unwise design choice.
My recommendation is an innocent hack: Make NL mandatory, and always append NL to the end of your input stream!
It may seem a little unsavory, but in reality it will save you a lot of future headaches.