I have to parse this definition of var agenda (it's Z language):
agenda : \nat \cross \nat \pfun \nat
I want \cross have precedence over \pfun, so if i code:
typeNorm returns [TreeNode node]
: a=typeNorm '\\cross' b=typeNorm
| a=typeNorm \pfun b=typeNorm
it works, produces agenda AST:
\pfun
\cross \nat
\nat \nat
but, if i code:
typeNorm returns [TreeNode node]
: a=typeNorm ('\\cross' b=typeNorm)
| a=typeNorm \pfun b=typeNorm
produces:
\cross
\nat \pfun
\nat \nat
I need to understand why parentesis change precedence
I cannot reproduce this.
For the input a \pfun b \pcross c, both:
typeNorm
: typeNorm '\\pcross' typeNorm
| typeNorm '\\pfun' typeNorm
| ID
;
ID : [a-zA-Z]+;
SPACES : [ \t\r\n]+ -> skip;
and:
typeNorm
: typeNorm ('\\pcross' typeNorm)
| typeNorm '\\pfun' typeNorm
| ID
;
ID : [a-zA-Z]+;
SPACES : [ \t\r\n]+ -> skip;
the following parse tree is produced:
Tested with ANTLR 4.9.2 and 4.9.3.
Related
I'm at a very beginning of learning ANTLR4 lexer rules. My goal is to create a simple grammar for Java properties files. Here is what I have so far:
lexer grammar PropertiesLexer;
LineComment
: ( LineCommentHash
| LineCommentExcl
)
-> skip
;
fragment LineCommentHash
: '#' ~[\r\n]*
;
fragment LineCommentExcl
: '!' ~[\r\n]*
;
fragment WrappedLine
: '\\'
( '\r' '\n'?
| '\n'
)
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
Key
: KeyLetterStart
( KeyLetter
| Escaped
)*
;
fragment KeyLetterStart
: ~[ \t\r\n:=]
;
fragment KeyLetter
: ~[\t\r\n:=]
;
fragment Escaped
: '\\' .?
;
Equal
: ( '\\'? ':'
| '\\'? '='
)
;
Value
: ValueLetterBegin
( ValueLetter
| Escaped
| WrappedLine
)*
;
fragment ValueLetterBegin
: ~[ \t\r\n]
;
fragment ValueLetter
: ~ [\r\n]+
;
Whitespace
: [ \t]+
-> skip
;
My test file is this one:
# comment 1
# comment 2
#
.key1= value1
key2\:sub=value2
key3 \= value3
key4=value41\
value42
# comment3
#comment4
key=value
When I run grun, I'm getting following output:
[#0,30:42='.key1= value1',<Value>,4:0]
[#1,45:60='key2\:sub=value2',<Value>,5:0]
[#2,63:76='key3 \= value3',<Value>,6:0]
[#3,81:102='key4=value41\\r\nvalue42',<Value>,8:0]
[#4,130:138='key=value',<Value>,13:0]
[#5,141:140='<EOF>',<EOF>,14:0]
I don't understand why the Value definition is matched. When commenting out the Value definition, however, it recognizes the Key and Equal definitions:
[#0,30:34='.key1',<Key>,4:0]
[#1,35:35='=',<Equal>,4:5]
[#2,37:42='value1',<Key>,4:7]
[#3,45:49='key2\',<Key>,5:0]
[#4,50:50=':',<Equal>,5:5]
[#5,51:53='sub',<Key>,5:6]
[#6,54:54='=',<Equal>,5:9]
[#7,55:60='value2',<Key>,5:10]
[#8,63:68='key3 \',<Key>,6:0]
[#9,69:69='=',<Equal>,6:6]
[#10,71:76='value3',<Key>,6:8]
[#11,81:84='key4',<Key>,8:0]
[#12,85:85='=',<Equal>,8:4]
[#13,86:93='value41\',<Key>,8:5]
[#14,96:102='value42',<Key>,9:0]
[#15,130:132='key',<Key>,13:0]
[#16,133:133='=',<Equal>,13:3]
[#17,134:138='value',<Key>,13:4]
[#18,141:140='<EOF>',<EOF>,14:0]
but how to let it recognize the Key, Equal and Value definitons?
ANTLR's lexer rules match as much characters as possible, that is why you're seeing all these Value tokens being created (they match the most characters).
Lexical modes seem like a good fit to use here. Something like this:
lexer grammar PropertiesLexer;
COMMENT
: [!#] ~[\r\n]* -> skip
;
KEY
: ( '\\' ~[\r\n] | ~[\r\n\\=:] )+
;
EQUAL
: [=:] -> pushMode(VALUE_MODE)
;
NL
: [\r\n]+ -> skip
;
mode VALUE_MODE;
VALUE
: ( ~[\\\r\n] | '\\' . )+
;
END_VALUE
: [\r\n]+ -> skip, popMode
;
I have tried to write a grammar to recognize expressions like:
(A + MAX(B) ) / ( C - AVERAGE(A) )
IF( A > AVERAGE(A), 0, 1 )
X / (MAX(X)
Unfortunately antlr3 fails with these errors:
error(210): The following sets of rules are mutually left-recursive [unaryExpression, additiveExpression, primaryExpression, formula, multiplicativeExpression]
error(211): DerivedKeywords.g:110:13: [fatal] rule booleanTerm has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
error(206): DerivedKeywords.g:110:13: Alternative 1: after matching input such as decision cannot predict what comes next due to recursion overflow to additiveExpression from formula
I have spent some hours trying to fix these, it would be great if anyone could at least help me fix the first problem. Thanks
Code:
grammar DerivedKeywords;
options {
output=AST;
//backtrack=true;
}
WS : ( ' ' | '\t' | '\n' | '\r' )
{ skip(); }
;
//for numbers
DIGIT
: '0'..'9'
;
//for both integer and real number
NUMBER
: (DIGIT)+ ( '.' (DIGIT)+ )?( ('E'|'e')('+'|'-')?(DIGIT)+ )?
;
// Boolean operatos
AND : 'AND';
OR : 'OR';
NOT : 'NOT';
EQ : '=';
NEQ : '!=';
GT : '>';
LT : '<';
GTE : '>=';
LTE : '<=';
COMMA : ',';
// Token for Functions
IF : 'IF';
MAX : 'MAX';
MIN : 'MIN';
AVERAGE : 'AVERAGE';
VARIABLE : 'A'..'Z' ('A'..'Z' | '0'..'9')*
;
// OPERATORS
LPAREN : '(' ;
RPAREN : ')' ;
DIV : '/' ;
PLUS : '+' ;
MINUS : '-' ;
STAR : '*' ;
expression : formula;
formula
: functionExpression
| additiveExpression
| LPAREN! a=formula RPAREN! // First Problem
;
additiveExpression
: a=multiplicativeExpression ( (MINUS^ | PLUS^ ) b=multiplicativeExpression )*
;
multiplicativeExpression
: a=unaryExpression ( (STAR^ | DIV^ ) b=unaryExpression )*
;
unaryExpression
: MINUS^ u=unaryExpression
| primaryExpression
;
functionExpression
: f=functionOperator LPAREN e=formula RPAREN
| IF LPAREN b=booleanExpression COMMA p=formula COMMA s=formula RPAREN
;
functionOperator :
MAX | MIN | AVERAGE;
primaryExpression
: NUMBER
// Used for scientific numbers
| DIGIT
| VARIABLE
| formula
;
// Boolean stuff
booleanExpression
: orExpression;
orExpression : a=andExpression (OR^ b=andExpression )*
;
andExpression
: a=notExpression (AND^ b=notExpression )*
;
notExpression
: NOT^ t=booleanTerm
| booleanTerm
;
booleanOperator :
GT | LT | EQ | GTE | LTE | NEQ;
booleanTerm : a=formula op=booleanOperator b=formula
| LPAREN! booleanTerm RPAREN! // Second problem
;
error(210): The following sets of rules are mutually left-recursive [unaryExpression, additiveExpression, primaryExpression, formula, multiplicativeExpression]
- this means that if the parser enters unaryExpression rule, it has the possibility to match additiveExpression, primaryExpression, formula, multiplicativeExpression and unaryExpression again without ever consuming a single token from input - so it cannot decide whether to use those rules or not, because even if it uses the rules, the input will be the same.
You're probably trying to allow subexpressions in expressions by this sequence of rules - you need to make sure that path will consume the left parenthesis of the subexpression. Probably the formula alternative in primaryExpression should be changed to LPAREN formula RPAREN, and the rest of grammar be adjusted accordingly.
If I have expressions like:
(name = Paul AND age = 16) OR country = china;
And I want to get:
QUERY
|
|-------------|
() |
| |
AND OR
| |
|-------| |
name age country
| | |
Paul 16 china
How can I print the () and the condition (AND/OR) before the fields name, age country?
My grammar file is something like this:
parse
: block EOF -> block
;
block
: (statement)* (Return ID ';')?
-> ^(QUERY statement*)
;
statement
: assignment ';'
-> assignment
;
assignment
: expression (condition expression)*
-> ^(condition expression*)
| '(' expression (condition expression)* ')' (condition expression)*
-> ^(Brackets ^(condition expression*))
;
condition
: AND
| OR
;
Brackets: '()' ;
OR : 'OR' ;
AND : 'AND' ;
..
But it only prints the first condition that appears in the expression ('AND' in this example), and I can't group what is between brackets, and what is not...
Your grammar looks odd to me, and there are errors in it: if the parser does not match "()", you can't use Brackets inside a rewrite rule. And why would you ever want to have the token "()" inside your AST?
Given your example input:
(name = Paul AND age = 16) OR country = china;
here's possible way to construct an AST:
grammar T;
options {
output=AST;
}
query
: expr ';' EOF -> expr
;
expr
: logical_expr
;
logical_expr
: equality_expr ( logical_op^ equality_expr )*
;
equality_expr
: atom ( equality_op^ atom )*
;
atom
: ID
| INT
| '(' expr ')' -> expr
;
equality_op
: '='
| 'IS' 'NOT'?
;
logical_op
: 'AND'
| 'OR'
;
ID : ('a'..'z' | 'A'..'Z')+;
INT : '0'..'9'+;
WS : (' ' | '\t' | '\r' | '\n')+ {skip();};
which would result in this:
I'm a little confused. I have a grammar that works well and matches my language just as I want it to. Recently, I added a couple rules to the grammar and in the process of converting the new grammar rules to the tree grammar I am getting some strange errors. The first error I was getting was that the tree grammar was ambiguous.
The errors I received were:
[10:53:16] warning(200): ShiroDefinitionPass.g:129:17:
Decision can match input such as "'subjunctive node'" using multiple alternatives: 5, 6
As a result, alternative(s) 6 were disabled for that input
[10:53:16] warning(200): ShiroDefinitionPass.g:129:17:
Decision can match input such as "PORT_ASSIGNMENT" using multiple alternatives: 2, 6
and about 10 more similar errors.
I can't tell why the tree grammar is ambiguous. It was fine before I added the sNode, subjunctDeclNodeProd, subjunctDecl, and subjunctSelector rules.
My grammar is:
grammar Shiro;
options{
language = Java;
ASTLabelType=CommonTree;
output=AST;
}
tokens{
NEGATION;
STATE_DECL;
PORT_DECL;
PORT_INIT;
PORT_ASSIGNMENT;
PORT_TAG;
PATH;
PORT_INDEX;
EVAL_SELECT;
SUBJ_SELECT;
SUBJ_NODE_PROD;
ACTIVATION;
ACTIVATION_LIST;
PRODUCES;
}
shiro : statement+
;
statement
: nodestmt
| sNode
| graphDecl
| statestmt
| collection
| view
| NEWLINE!
;
view : 'view' IDENT mfName IDENT -> ^('view' IDENT mfName IDENT)
;
collection
: 'collection' IDENT orderingFunc path 'begin' NEWLINE
(collItem)+ NEWLINE?
'end'
-> ^('collection' IDENT orderingFunc path collItem+)
;
collItem: IDENT -> IDENT
;
orderingFunc
: IDENT -> IDENT
;
statestmt
: 'state' stateName 'begin' NEWLINE
stateHeader
'end' -> ^(STATE_DECL stateName stateHeader)
;
stateHeader
: (stateTimeStmt | stateCommentStmt | stateParentStmt | stateGraphStmt | activationPath | NEWLINE!)+
;
stateTimeStmt
: 'Time' time -> ^('Time' time)
;
stateCommentStmt
: 'Comment' comment -> ^('Comment' comment)
;
stateParentStmt
: 'Parent' stateParent -> ^('Parent' stateParent)
;
stateGraphStmt
: 'Graph' stateGraph -> ^('Graph' stateGraph)
;
stateName
: IDENT
;
time : STRING_LITERAL
;
comment : STRING_LITERAL
;
stateParent
: IDENT
;
stateGraph
: IDENT
;
activationPath
: l=activation ('.'^ (r=activation | activationList))*
;
activationList
: '<' activation (',' activation)* '>' -> ^(ACTIVATION_LIST activation+)
;
activation
: c=IDENT ('[' v=IDENT ']')? -> ^(ACTIVATION $c ($v)?)
;
graphDecl
: 'graph' IDENT 'begin' NEWLINE
graphLine+
'end'
-> ^('graph' IDENT graphLine+)
;
graphLine
: nodeProduction | portAssignment | NEWLINE!
;
nodeInternal
: (nodeProduction
| portAssignment
| portstmt
| nodestmt
| sNode
| NEWLINE!)+
;
nodestmt
: 'node'^ IDENT ('['! activeSelector ']'!)? 'begin'! NEWLINE!
nodeInternal
'end'!
;
sNode
: 'subjunctive node'^ IDENT '['! subjunctSelector ']'! 'begin'! NEWLINE!
(subjunctDeclNodeProd | subjunctDecl | NEWLINE!)+
'end'!
;
subjunctDeclNodeProd
: l=IDENT '->' r=IDENT 'begin' NEWLINE
nodeInternal
'end' -> ^(SUBJ_NODE_PROD $l $r nodeInternal )
;
subjunctDecl
: 'subjunct'^ IDENT ('['! activeSelector ']'!)? 'begin'! NEWLINE!
nodeInternal
'end'!
;
subjunctSelector
: IDENT -> ^(SUBJ_SELECT IDENT)
;
activeSelector
: IDENT -> ^(EVAL_SELECT IDENT)
;
nodeProduction
: path ('->'^ activationPath )+ NEWLINE!
;
portAssignment
: path '(' mfparams ')' NEWLINE -> ^(PORT_ASSIGNMENT path mfparams)
;
portDecl
: portType portName mfName -> ^(PORT_DECL ^(PORT_TAG portType) portName mfName)
;
portDeclInit
: portType portName mfCall -> ^(PORT_INIT ^(PORT_TAG portType) portName mfCall)
;
portstmt
: (portDecl | portDeclInit ) NEWLINE!
;
portName
: IDENT
;
portType: 'port'
| 'eval'
;
mfCall : mfName '(' mfparams ')' -> ^(mfName mfparams)
;
mfName : IDENT
;
mfparams: expression(',' expression)* -> expression+
;
// Path
path : (IDENT)('.' IDENT)*('[' pathIndex ']')? -> ^(PATH IDENT+ pathIndex?)
;
pathIndex
: portIndex -> ^(PORT_INDEX portIndex)
;
portIndex
: ( NUMBER |STRING_LITERAL )
;
// Expressions
term : path
| '(' expression ')' -> expression
| NUMBER
| STRING_LITERAL
;
unary : ('+'^ | '-'^)* term
;
mult : unary (('*'^ | '/'^ | '%'^) unary)*
;
add
: mult (( '+'^ | '-'^ ) mult)*
;
expression
: add (( '|'^ ) add)*
;
// LEXEMES
STRING_LITERAL
: '"' .* '"'
;
NUMBER : DIGIT+ ('.'DIGIT+)?
;
IDENT : (LCLETTER | UCLETTER | DIGIT)(LCLETTER | UCLETTER | DIGIT|'_')*
;
COMMENT
: '//' ~('\n'|'\r')* {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' NEWLINE?{$channel=HIDDEN;}
;
WS
: (' ' | '\t' | '\f')+ {$channel = HIDDEN;}
;
NEWLINE : '\r'? '\n'
;
fragment
LCLETTER
: 'a'..'z'
;
fragment
UCLETTER: 'A'..'Z'
;
fragment
DIGIT : '0'..'9'
;
My tree grammar for the section looks like:
tree grammar ShiroDefinitionPass;
options{
tokenVocab=Shiro;
ASTLabelType=CommonTree;
}
shiro
: statement+
;
statement
: nodestmt
| sNode
| graphDecl
| statestmt
| collection
| view
;
view : ^('view' IDENT mfName IDENT)
;
collection
: ^('collection' IDENT orderingFunc path collItem+)
;
collItem: IDENT
;
orderingFunc
: IDENT
;
statestmt
: ^(STATE_DECL stateHeader)
;
stateHeader
: (stateTimeStmt | stateCommentStmt | stateParentStmt| stateGraphStmt | activation )+
;
stateTimeStmt
: ^('Time' time)
;
stateCommentStmt
: ^('Comment' comment)
;
stateParentStmt
: ^('Parent' stateParent)
;
stateGraphStmt
: ^('Graph' stateGraph)
;
stateName
: IDENT
;
time : STRING_LITERAL
;
comment : STRING_LITERAL
;
stateParent
: IDENT
;
stateGraph
: IDENT
;
activationPath
: l=activation ('.' (r=activation | activationList))*
;
activationList
: ^(ACTIVATION_LIST activation+)
;
activation
: ^(ACTIVATION IDENT IDENT?)
;
// Graph Declarations
graphDecl
: ^('graph' IDENT graphLine+)
;
graphLine
: nodeProduction
| portAssignmen
;
// End Graph declaration
nodeInternal
: (nodeProduction
|portAssignment
|portstmt
|nodestmt
|sNode )+
;
nodestmt
: ^('node' IDENT activeSelector? nodeInternal)
;
sNode
: ^('subjunctive node' IDENT subjunctSelector (subjunctDeclNodeProd | subjunctDecl)*)
;
subjunctDeclNodeProd
: ^(SUBJ_NODE_PROD IDENT IDENT nodeInternal+ )
;
subjunctDecl
: ^('subjunct' IDENT activeSelector? nodeInternal )
;
subjunctSelector
: ^(SUBJ_SELECT IDENT)
;
activeSelector returns
: ^(EVAL_SELECT IDENT)
;
nodeProduction
: ^('->' nodeProduction)
| path
;
portAssignment
: ^(PORT_ASSIGNMENT path)
;
// Port Statement
portDecl
: ^(PORT_DECL ^(PORT_TAG portType) portName mfName)
;
portDeclInit
: ^(PORT_INIT ^(PORT_TAG portType) portName mfCall)
;
portstmt
: (portDecl | portDeclInit)
;
portName
: IDENT
;
portType returns
: 'port' | 'eval'
;
mfCall
: ^(mfName mfparams)
;
mfName
: IDENT
;
mfparams
: (exps=expression)+
;
// Path
path
: ^(PATH (id=IDENT)+ (pathIndex)? )
;
pathIndex
: ^(PORT_INDEX portIndex)
;
portIndex
: ( NUMBER
|STRING_LITERAL
)
;
// Expressions
expression
: ^('+' op1=expression op2=expression)
| ^('-' op1=expression op2=expression)
| ^('*' op1=expression op2=expression)
| ^('/' op1=expression op2=expression)
| ^('%' op1=expression op2=expression)
| ^('|' op1=expression op2=expression)
| NUMBER
| path
;
In your tree grammar, here is the declaration of rule nodeInternal:
nodeInternal
: (nodeProduction
|portAssignment
|portstmt
|nodestmt
|sNode)+
;
And here is your declaration of rule subjunctDeclNodeProd:
subjunctDeclNodeProd
: ^(SUBJ_NODE_PROD IDENT IDENT nodeInternal+ )
;
When subjunctDeclNodeProd is being processed, ANTLR doesn't know how to process an input such as this, with two PATH children:
^(SUBJ_NODE_PROD IDENT IDENT ^(PATH IDENT) ^(PATH IDENT))
Should it follow rule nodeInternal once and process nodeProduction, nodeProduction or should it follow nodeInternal twice and process nodeProduction each time?
Consider rewriting subjunctDeclNodeProd without the +:
subjunctDeclNodeProd
: ^(SUBJ_NODE_PROD IDENT IDENT nodeInternal)
;
I think that will take care of the problem.
I'm trying to built C-- compiler using ANTLR 3.4.
Full set of the grammar listed here,
program : (vardeclaration | fundeclaration)* ;
vardeclaration : INT ID (OPENSQ NUM CLOSESQ)? SEMICOL ;
fundeclaration : typespecifier ID OPENP params CLOSEP compoundstmt ;
typespecifier : INT | VOID ;
params : VOID | paramlist ;
paramlist : param (COMMA param)* ;
param : INT ID (OPENSQ CLOSESQ)? ;
compoundstmt : OPENCUR vardeclaration* statement* CLOSECUR ;
statementlist : statement* ;
statement : expressionstmt | compoundstmt | selectionstmt | iterationstmt | returnstmt;
expressionstmt : (expression)? SEMICOL;
selectionstmt : IF OPENP expression CLOSEP statement (options {greedy=true;}: ELSE statement)?;
iterationstmt : WHILE OPENP expression CLOSEP statement;
returnstmt : RETURN (expression)? SEMICOL;
expression : (var EQUAL expression) | sampleexpression;
var : ID ( OPENSQ expression CLOSESQ )? ;
sampleexpression: addexpr ( ( LOREQ | LESS | GRTR | GOREQ | EQUAL | NTEQL) addexpr)?;
addexpr : mulexpr ( ( PLUS | MINUS ) mulexpr)*;
mulexpr : factor ( ( MULTI | DIV ) factor )*;
factor : ( OPENP expression CLOSEP ) | var | call | NUM;
call : ID OPENP arglist? CLOSEP;
arglist : expression ( COMMA expression)*;
Used lexer rules as following,
ELSE : 'else' ;
IF : 'if' ;
INT : 'int' ;
RETURN : 'return' ;
VOID : 'void' ;
WHILE : 'while' ;
PLUS : '+' ;
MINUS : '-' ;
MULTI : '*' ;
DIV : '/' ;
LESS : '<' ;
LOREQ : '<=' ;
GRTR : '>' ;
GOREQ : '>=' ;
EQUAL : '==' ;
NTEQL : '!=' ;
ASSIGN : '=' ;
SEMICOL : ';' ;
COMMA : ',' ;
OPENP : '(' ;
CLOSEP : ')' ;
OPENSQ : '[' ;
CLOSESQ : ']' ;
OPENCUR : '{' ;
CLOSECUR: '}' ;
SCOMMENT: '/*' ;
ECOMMENT: '*/' ;
ID : ('a'..'z' | 'A'..'Z')+/*(' ')*/ ;
NUM : ('0'..'9')+ ;
WS : (' ' | '\t' | '\n' | '\r')+ {$channel = HIDDEN;};
COMMENT: '/*' .* '*/' {$channel = HIDDEN;};
But I try to save this it give me the error,
error(211): /CMinusMinus/src/CMinusMinus/CMinusMinus.g:33:13: [fatal] rule expression has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
|---> expression : (var EQUAL expression) | sampleexpression;
1 error
How can I resolve this problem?
As already mentioned: your grammar rule expression is ambiguous: both alternatives in that rule start, or can be, a var.
You need to "help" your parser a bit. If the parse can see a var followed by an EQUAL, it should choose alternative 1, else alternative 2. This can be done by using a syntactic predicate (the (var EQUAL)=> part in the rule below).
expression
: (var EQUAL)=> var EQUAL expression
| sampleexpression
;
More about predicates in this Q&A: What is a 'semantic predicate' in ANTLR?
The problem is this:
expression : (var EQUAL expression) | sampleexpression;
where you either start with var or sampleexpression. But sampleexpression can be reduced to var as well by doing sampleexpression->addExpr->MultExpr->Factor->var
So there is no way to find a k-length predicate for the compiler.
You can as suggested by the error message set backtrack=true to see whether this solves your problem, but it might lead not to the AST - parsetrees you would expect and might also be slow on special input conditions.
You could also try to refactor your grammar to avoid such recursions.