define a grammar in Antlr - antlr

I have defined the following grammar.
grammar Sample_1;
#header {
package a;
}
#lexer::header {
package a;
}
program
:
define*
implement*
;
define
: IDENT '=(' INTEGER',' INTEGER ')'
;
implement
:IDENT '=(' (IDENT ','?)* ')'
;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
IDENT : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
How to check in this grammar so that when I have the example
A=(1,1)
B=(1,2)
G=(A,B)
the result is successful but if I write
A=(1,1)
B=(1,2)
G=(A,E)
it gives an error that E is not defined
thanks
the result:
i got it working thanks a lot:
grammar Sample_1;
#members{
int level=0;
}
#header {
package a;
}
#lexer::header {
package a;
}
program
:
block
;
block
scope {
List symbols;
}
#init {
$block::symbols=new ArrayList();
level++;
}
#after {
System.err.println("Hello");
level--;
}
: (define* implement+)
;
define
: IDENT {$block::symbols.add($IDENT.text);} '=(' INTEGER',' INTEGER ')'
;
implement
:IDENT '=(' (a=IDENT
{if (!$block::symbols.contains($a.text)){
System.err.println("undefined");
}}','?)* ')'
;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
IDENT : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};

Antlr supports actions, little snippets of code embedded in the grammar file.
An action for an assignment could store into a map. An action for a right-hand-side IDENT could try to pull a value from the map, and throw an exception if it fails.
Chapter 6 in Terrence Parr's "The Definitive ANTLR Reference" covers actions.

Related

Precedence in Antlr using parentheses

We are developing a DSL, and we're facing some problems:
Problem 1:
In our DSL, it's allowed to do this:
A + B + C
but not this:
A + B - C
If the user needs to use two or more different operators, he'll need to insert parentheses:
A + (B - C) or (A + B) - C.
Problem 2:
In our DSL, the most precedent operator must be surrounded by parentheses.
For example, instead of using this way:
A + B * C
The user needs to use this:
A + (B * C)
To solve the Problem 1 I've got a snippet of ANTLR that worked, but I'm not sure if it's the best way to solve it:
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr;
To solve the Problem 2, I have no idea where to start.
I appreciate your help to find out a better solution to the first problem and a direction to solve the seconde one. (Sorry for my bad english)
Below is the grammar that we have developed:
grammar TclGrammar;
options {
output=AST;
ASTLabelType=CommonTree;
}
#members {
public boolean isSum(String type) {
System.out.println("Tipo: " + type);
return "+".equals(type);
}
public boolean isSub(String type) {
System.out.println("Tipo: " + type);
return "-".equals(type);
}
}
prog
: exprMain ';' {System.out.println(
$exprMain.tree == null ? "null" : $exprMain.tree.toStringTree());}
;
exprMain
: exprQuando? (exprDeveSatis | exprDeveFalharCaso)
;
exprDeveSatis
: 'DEVE SATISFAZER' '{'! expr '}'!
;
exprDeveFalharCaso
: 'DEVE FALHAR CASO' '{'! expr '}'!
;
exprQuando
: 'QUANDO' '{'! expr '}'!
;
expr
: logicExpr
;
logicExpr
: boolExpr (('E'|'OU')^ boolExpr)*
;
boolExpr
: comparatorExpr
| emExpr
| 'VERDADE'
| 'FALSO'
;
emExpr
: FIELD 'EM' '[' (variable_lista | field_lista) comCruzamentoExpr? ']'
-> ^('EM' FIELD (variable_lista+)? (field_lista+)? comCruzamentoExpr?)
;
comCruzamentoExpr
: 'COM CRUZAMENTO' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COM CRUZAMENTO' FIELD+)
;
comparatorExpr
: sumExpr (('<'^|'<='^|'>'^|'>='^|'='^|'<>'^) sumExpr)?
| naoPreenchidoExpr
| preenchidoExpr
;
naoPreenchidoExpr
: FIELD 'NAO PREENCHIDO' -> ^('NAO PREENCHIDO' FIELD)
;
preenchidoExpr
: FIELD 'PREENCHIDO' -> ^('PREENCHIDO' FIELD)
;
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr
;
multExpr
: funcExpr(('*'^|'/'^) funcExpr)?
;
funcExpr
: 'QUANTIDADE'^ '('! FIELD ')'!
| 'EXTRAI_TEXTO'^ '('! FIELD ';' INTEGER ';' INTEGER ')'!
| cruzaExpr
| 'COMBINACAO_UNICA' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COMBINACAO_UNICA' FIELD+)
| 'EXISTE'^ '('! FIELD ')'!
| 'UNICO'^ '('! FIELD ')'!
| atom
;
cruzaExpr
: operadorCruzaExpr ('CRUZA COM'^|'CRUZA AMBOS'^) operadorCruzaExpr ondeExpr?
;
operadorCruzaExpr
: FIELD('('!field_lista')'!)?
;
ondeExpr
: ('ONDE'^ '('!expr')'!)
;
atom
: FIELD
| VARIABLE
| '('! expr ')'!
;
field_lista
: FIELD(';' field_lista)?
;
variable_lista
: VARIABLE(';' variable_lista)?
;
FIELD
: NONCONTROL_CHAR+
;
NUMBER
: INTEGER | FLOAT
;
VARIABLE
: '\'' NONCONTROL_CHAR+ '\''
;
fragment SIGN: '+' | '-';
fragment NONCONTROL_CHAR: LETTER | DIGIT | SYMBOL;
fragment LETTER: LOWER | UPPER;
fragment LOWER: 'a'..'z';
fragment UPPER: 'A'..'Z';
fragment DIGIT: '0'..'9';
fragment SYMBOL: '_' | '.' | ',';
fragment FLOAT: INTEGER '.' '0'..'9'*;
fragment INTEGER: '0' | SIGN? '1'..'9' '0'..'9'*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {skip();}
;
This is similar to not having operator precedence at all.
expr
: funcExpr
( ('+' funcExpr)*
| ('-' funcExpr)*
| ('*' funcExpr)*
| ('/' funcExpr)*
)
;
I think the following should work. I'm assuming some lexer tokens with obvious names.
expr: sumExpr;
sumExpr: onlySum | subExpr;
onlySum: atom ( PLUS onlySum )?;
subExpr: onlySub | multExpr;
onlySub: atom ( MINUS onlySub )? ;
multExpr: atom ( STAR atomic )? ;
parenExpr: OPEN_PAREN expr CLOSE_PAREN;
atom: FIELD | VARIABLE | parenExpr
The only* rules match an expression if it only has one type of operator outside of parentheses. The *Expr rules match either a line with the appropriate type of operators or go to the next operator.
If you have multiple types of operators, then they are forced to be inside parentheses because the match will go through atom.

Using floats in range and grammar mistakes?

I am trying to convert a LALR grammar to LL using ANTLR and I am running into a few problems. So far, I think converting the expressions into a Top-Down approach is straight forward to me. The problem is when I include Range (1..10) and (1.0..10.0) with floats.
I have tried to use the answer found here and somehow it is not even running correctly with my code, let alone solving a range of float, i.e. (float..float).
Float literal and range parameter in ANTLR
Attached is a sample of my grammar that just focuses on this issue.
grammar Test;
options {
language = Java;
output = AST;
}
parse: 'in' rangeExpression ';'
;
rangeExpression : expression ('..' expression)?
;
expression : addingExpression (('=='|'!='|'<='|'<'|'>='|'>') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/'|'div') unaryExpression)*
;
unaryExpression: ('+'|'-')* primitiveElement;
primitiveElement : literalExpression
| id ('.' id)?
| '(' expression ')'
;
literalExpression : NUMBER
| BOOLEAN_LITERAL
| 'infinity'
;
id : IDENTIFIER
;
// L E X I C A L R U L E S
Range
: '..'
;
NUMBER
: (DIGITS Range) => DIGITS {$type=DIGITS;}
| (FloatLiteral) => FloatLiteral {$type=FloatLiteral;}
| DIGITS {$type=DIGITS;}
;
// fragments
fragment FloatLiteral : Float;
fragment Float
: DIGITS ( options {greedy = true; } : '.' DIGIT* EXPONENT?)
| '.' DIGITS EXPONENT?
| DIGITS EXPONENT
;
BOOLEAN_LITERAL : 'false'
| 'true'
;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
Any reason why it is not even taking:
in 10;
or
in 10.0;
Thanks in advance!
The following things are not correct:
you're never matching a FloatLiteral in your literalExpression rule
in every alternative of NUMBER you're changing the type of the token, therefor a NUMBER token will never be created
Something like this should work for both 11..22 and 1.1..2.2:
...
literalExpression : INT
| BOOLEAN_LITERAL
| FLOAT
| 'infinity'
;
id : IDENTIFIER
;
// L E X I C A L R U L E S
Range
: '..'
;
INT
: (DIGITS Range)=> DIGITS
| DIGITS (('.' DIGITS EXPONENT? | EXPONENT) {$type=FLOAT;})?
;
BOOLEAN_LITERAL : 'false'
| 'true'
;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment FLOAT : ;
To your question about handling (1.0 .. 10.0):
Notice that parser rule primitiveElement defines an alternative as '(' expression ')', but rule expression can never reach rule rangeExpression.
Consider redefining expression and rangeExpression like so:
expression : rangeExpression
;
rangeExpression : compExpression ('..' compExpression)?
;
compExpression : addingExpression (('=='|'!='|'<='|'<'|'>='|'>') addingExpression)*
;
This ensures that the expression rule sits above all forms of expressions and will work as expected in parentheses.

How to resolve "The following alternatives can never be matched"

I have been struggling to resolve a "multiple alternatives" error in my parser for a couple of days now but with no success. I have been converting Bart Kiers excellent Tiny Language(TL) tutorial code to C# using Sam Harwell's port of ANTLR3 and VS2010. Kudos to both these guys for their excellent work. I believe I have followed Bart's tutorial accurately but as I am a newbie with ANTLR I can't be sure.
I did have the TL code working nicely on a pure math basis i.e. no "functions" or "if then else" or "while" (see screenshot of a little app)
but when I added the code for the missing pieces to complete the tutorial I get a parsing error in "functionCall" and in "list" (see the code below)
grammar Paralex2;
options {
language=CSharp3;
TokenLabelType=CommonToken;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
BLOCK;
RETURN;
STATEMENTS;
ASSIGNMENT;
FUNC_CALL;
EXP;
EXP_LIST;
ID_LIST;
IF;
TERNARY;
U_SUB;
NEGATE;
FUNCTION;
INDEXES;
LIST;
LOOKUP;
}
#lexer::namespace{Paralex2}
#parser::namespace{Paralex2}
/*
* Parser Rules
*/
#parser::header {using System; using System.Collections.Generic;}
#parser::members{
public SortedList<string, Function> functions = new SortedList<string, Function>();
private void defineFunction(string id, Object idList, Object block) {
// `idList` is possibly null! Create an empty tree in that case.
CommonTree idListTree = idList == null ? new CommonTree() : (CommonTree)idList;
// `block` is never null.
CommonTree blockTree = (CommonTree)block;
// The function name with the number of parameters after it the unique key
string key = id + idListTree.Children.Count();
functions.Add(key, new Function(id, idListTree, blockTree));
}
}
public parse
: block EOF -> block
;
block
: (statement | functionDecl)* (Return exp ';')? -> ^(BLOCK ^(STATEMENTS statement*) ^(RETURN exp?))
;
statement
: assignment ';' -> assignment
| functionCall ';' -> functionCall
| ifStatement
| forStatement
| whileStatement
;
assignment
: Identifier indexes? '=' exp
-> ^(ASSIGNMENT Identifier indexes? exp)
;
functionCall
: Identifier '(' expList? ')' -> ^(FUNC_CALL Identifier expList?)
| Assert '(' exp ')' -> ^(FUNC_CALL Assert exp)
| Size '(' exp ')' -> ^(FUNC_CALL Size exp)
;
ifStatement
: ifStat elseIfStat* elseStat? End -> ^(IF ifStat elseIfStat* elseStat?)
;
ifStat
: If exp Do block -> ^(EXP exp block)
;
elseIfStat
: Else If exp Do block -> ^(EXP exp block)
;
elseStat
: Else Do block -> ^(EXP block)
;
functionDecl
: Def Identifier '(' idList? ')' block End
{defineFunction($Identifier.text, $idList.tree, $block.tree);}
;
forStatement
: For Identifier '=' exp To exp Do block End
-> ^(For Identifier exp exp block)
;
whileStatement
: While exp Do block End -> ^(While exp block)
;
idList
: Identifier (',' Identifier)* -> ^(ID_LIST Identifier+)
;
expList
: exp (',' exp)* -> ^(EXP_LIST exp+)
;
exp
: condExp
;
condExp
: (orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
orExp
: andExp ('||'^ andExp)*
;
andExp
: equExp ('&&'^ equExp)*
;
equExp
: relExp (('==' | '!=')^ relExp)*
;
relExp
: addExp (('>=' | '<=' | '>' | '<')^ addExp)*
;
addExp
: mulExp ((Add | Sub)^ mulExp)*
;
mulExp
: powExp ((Mul | Div)^ powExp)*
;
powExp
: unaryExp ('^'^ unaryExp)*
;
unaryExp
: Sub atom -> ^(U_SUB atom)
| '!' atom -> ^(NEGATE atom)
| atom
;
atom
: Nmber
| Bool
| Null
| lookup
;
list
: '[' expList? ']' -> ^(LIST expList?)
;
lookup
: list indexes? -> ^(LOOKUP list indexes?)
| functionCall indexes? -> ^(LOOKUP functionCall indexes?)
| Identifier indexes? -> ^(LOOKUP Identifier indexes?)
| String indexes? -> ^(LOOKUP String indexes?)
| '(' exp ')' indexes? -> ^(LOOKUP exp indexes?)
;
indexes
: ('[' exp ']')+ -> ^(INDEXES exp+)
;
/*
* Lexer Rules
*/
Assert : 'assert';
Size : 'size';
Def : 'def';
If : 'if';
Else : 'else';
Return : 'return';
For : 'for';
While : 'while';
To : 'to';
Do : 'do';
End : 'end';
In : 'in';
Null : 'null';
Or : '||';
And : '&&';
Equals : '==';
NEquals : '!=';
GTEquals : '>=';
LTEquals : '<=';
Pow : '^';
GT : '>';
LT : '<';
Add : '+';
Sub : '-';
Mul : '*';
Div : '/';
Modulus : '%';
OBrace : '{';
CBrace : '}';
OBracket : '[';
CBracket : ']';
OParen : '(';
CParen : ')';
SColon : ';';
Assign : '=';
Comma : ',';
QMark : '?';
Colon : ':';
Bool
: 'true'
| 'false'
;
Nmber
: Int ('.' Digit*)?
;
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | Digit)*
;
String
#after {
setText(getText().substring(1, getText().length()-1).replaceAll("\\\\(.)", "$1"));
}
: '"' (~('"' | '\\') | '\\' ('\\' | '"'))* '"'
| '\'' (~('\'' | '\\') | '\\' ('\\' | '\''))* '\''
;
Comment
: '//' ~('\r' | '\n')* {Skip();}
| '/*' .* '*/' {Skip();}
;
Space
: (' ' | '\t' | '\r' | '\n' | '\u000C') {Skip();}
;
fragment Int
: '1'..'9' Digit*
| '0'
;
fragment Digit
: '0'..'9'
;
The error messages I get are
Decision can match input such as "CParen" using multiple alternatives: 1, 2 : Line 79:20
and
Decision can match input such as "CBracket" using multiple alternatives: 1, 2 : Line 176:10
The errors relate to the functionCall and list rules. I have examined the parser file in ANTLRWorks 1.5 and confirmed the same errors there. The syntax diagrams for the two rules look like this;
and this;
I have tried several changes to try to solve the problem but I don't seem to be able to get the syntax right. I would appreciate any help you guys could provide and can email the images if that would help.
Thanks in advance
Ian Carson
You have an OR-operator too many in the condExp rule making the grammar ambiguous.
You have:
condExp
: ( orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:
But it should be:
condExp
: ( orExp -> orExp)
( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:

Solving antlr left recursion

I'm trying to parse a language using ANTLR which can contain the following syntax:
someVariable, somVariable.someMember, functionCall(param).someMember, foo.bar.baz(bjork).buffalo().xyzzy
This is the ANTLR grammar which i've come up with so far, and the access_operation throws the error
The following sets of rules are mutually left-recursive [access_operation, expression]:
grammar Test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
LHS;
RHS;
CALL;
PARAMS;
}
start
: body? EOF
;
body
: expression (',' expression)*
;
expression
: function -> ^(CALL)
| access_operation
| atom
;
access_operation
: (expression -> ^(LHS)) '.'! (expression -> ^(RHS))
;
function
: (IDENT '(' body? ')') -> ^(IDENT PARAMS?)
;
atom
: IDENT
| NUMBER
;
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : '0'..'9';
IDENT : (LETTER)+ ;
NUMBER : (DIGIT)+ ;
SPACE : (' ' | '\t' | '\r' | '\n') { $channel=HIDDEN; };
What i could manage so far was to refactor the access_operation rule to '.' expression which generates an AST where the access_operation node only contains the right side of the operation.
What i'm looking for instead is something like this:
How can the left-recursion problem solved in this case?
By "wrong AST" I'll make a semi educated guess that, for input like "foo.bar.baz", you get an AST where foo is the root with bar as a child who in its turn has baz as a child, which is a leaf in the AST. You may want to have this reversed. But I'd not go for such an AST if I were you: I'd keep the AST as flat as possible:
foo
/ | \
/ | \
bar baz ...
That way, evaluating is far easier: you simply look up foo, and then walk from left to right through its children.
A quick demo:
grammar Test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
BODY;
ACCESS;
CALL;
PARAMS;
}
start
: body EOF -> body
;
body
: expression (',' expression)* -> ^(BODY expression+)
;
expression
: atom
;
atom
: NUMBER
| (IDENT -> IDENT) ( tail -> ^(IDENT tail)
| call tail? -> ^(CALL IDENT call tail?)
)?
;
tail
: (access)+
;
access
: ('.' IDENT -> ^(ACCESS IDENT)) (call -> ^(CALL IDENT call))?
;
call
: '(' (expression (',' expression)*)? ')' -> ^(PARAMS expression*)
;
IDENT : LETTER+;
NUMBER : DIGIT+;
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : '0'..'9';
which can be tested with:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "someVariable, somVariable.someMember, functionCall(param).someMember, " +
"foo.bar.baz(bjork).buffalo().xyzzy";
TestLexer lexer = new TestLexer(new ANTLRStringStream(src));
TestParser parser = new TestParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
The output of Main corresponds to the following AST:
EDIT
And since you indicated your ultimate goal is not evaluating the input, but that you rather need to conform the structure of the AST to some 3rd party API, here's a grammar that will create an AST like you indicated in your edited question:
grammar Test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
BODY;
ACCESS_OP;
CALL;
PARAMS;
LHS;
RHS;
}
start
: body EOF -> body
;
body
: expression (',' expression)* -> ^(BODY expression+)
;
expression
: atom
;
atom
: NUMBER
| (ID -> ID) ( ('(' params ')' -> ^(CALL ID params))
('.' expression -> ^(ACCESS_OP ^(LHS ^(CALL ID params)) ^(RHS expression)))?
| '.' expression -> ^(ACCESS_OP ^(LHS ID) ^(RHS expression))
)?
;
params
: (expression (',' expression)*)? -> ^(PARAMS expression*)
;
ID : LETTER+;
NUMBER : DIGIT+;
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : '0'..'9';
which creates the following AST if you run the Main class:
The atom rule may be a bit daunting, but you can't shorten it much since the left ID needs to be available to most of the alternatives. ANTLRWorks helps in visualizing the alternative paths this rule may take:
which means atom can be any of the 5 following alternatives (with their corresponding AST's):
+----------------------+--------------------------------------------------------+
| alternative | generated AST |
+----------------------+--------------------------------------------------------+
| NUMBER | NUMBER |
| ID | ID |
| ID params | ^(CALL ID params) |
| ID params expression | ^(ACCESS_OP ^(LHS ^(CALL ID params)) ^(RHS expression))|
| ID expression | ^(ACCESS_OP ^(LHS ID) ^(RHS expression) |
+----------------------+--------------------------------------------------------+

ANTLR no viable alternative at input '/'

Okay, I'm really confused about this error. I know in the past having a '/' as a token in a rule hasn't produced any errors. However, this is simply baffling. Here is my grammar:
grammar LilWildC;
options {
language = Java;
}
#header
{
package com.matthewkimber.lilwildc;
}
#lexer::header
{
package com.matthewkimber.lilwildc;
}
program
: global_variables procedure+
;
global_variables
: variable_definition*
;
variable_definition
: 'number' IDENT ';'
| 'number' '[' A_NUMBER ']' IDENT ';'
;
procedure
: 'procedure' IDENT '{' block '}'
;
block
: local_variables statement+
;
local_variables
: variable_definition*
;
statement
: variable_reference '=' numeric_expression ';'
;
variable_reference
: IDENT
| IDENT '[' numeric_expression ']'
;
numeric_expression
: multiply_expression
( '+' multiply_expression
| '-' multiply_expression
)*
;
multiply_expression
: negative_factor
( '*' negative_factor
| '/' negative_factor
| '%' negative_factor
)*
;
negative_factor
: '-'? factor
;
factor
: A_NUMBER
| variable_reference
| '(' numeric_expression ')'
;
A_NUMBER: (('0'..'9')+'.'?) | (('0'..'9')*'.'('0'..'9')+) ;
IDENT: ('a'..'z' | 'A'..'Z')('a'..'z' | 'A'..'Z' | '0'..'9' | '_')* ;
WS: (' ' | '\t' | ('\r'?'\n'))+ { $channel = HIDDEN; } ;
When I run a test on the grammar with the following input:
procedure main
{
var = 10 / 1;
}
I get the following parse tree in the ANTLR eclipse plug-in:
What I don't get is that multiplication and modulo work fine, only divide throws this error. Is ANTLR skipping right over the '/' and not seeing it as a token or have I missed something? Any help is greatly appreciated.
There's nothing wrong with your grammar, the problem must be the Eclipse plugin. ANTLRWorks' debugger produces the tree:
And creating a little test myself (after fixing the typo grammary LilWildC; to grammar LilWildC;, and removing the packages) with a main class and ANTLR 3.3:
LilWildC.g
grammar LilWildC;
options {
language = Java;
}
program
: global_variables procedure+
;
global_variables
: variable_definition*
;
variable_definition
: 'number' IDENT ';'
| 'number' '[' A_NUMBER ']' IDENT ';'
;
p rocedure
: 'procedure' IDENT '{' block '}'
;
block
: local_variables statement+
;
local_variables
: variable_definition*
;
statement
: variable_reference '=' numeric_expression ';'
;
variable_reference
: IDENT
| IDENT '[' numeric_expression ']'
;
numeric_expression
: multiply_expression
( '+' multiply_expression
| '-' multiply_expression
)*
;
multiply_expression
: negative_factor
( '*' negative_factor
| '/' negative_factor
| '%' negative_factor
)*
;
negative_factor
: '-'? factor
;
factor
: A_NUMBER
| variable_reference
| '(' numeric_expression ')'
;
A_NUMBER: (('0'..'9')+'.'?) | (('0'..'9')*'.'('0'..'9')+) ;
IDENT: ('a'..'z' | 'A'..'Z')('a'..'z' | 'A'..'Z' | '0'..'9' | '_')* ;
WS: (' ' | '\t' | ('\r'?'\n'))+ { $channel = HIDDEN; } ;
Main.java
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src =
"procedure main \n" +
"{ \n" +
" var = 10 / 1; \n" +
"} \n";
LilWildCLexer lexer = new LilWildCLexer(new ANTLRStringStream(src));
LilWildCParser parser = new LilWildCParser(new CommonTokenStream(lexer));
parser.program();
}
}
bart#hades:~/Programming/ANTLR/Demos/LilWildC$ java -cp antlr-3.3.jar org.antlr.Tool LilWildC.g
bart#hades:~/Programming/ANTLR/Demos/LilWildC$ javac -cp antlr-3.3.jar *.java
bart#hades:~/Programming/ANTLR/Demos/LilWildC$ java -cp .:antlr-3.3.jar Main
produces no errors or warnings.