Antltr || Not able to validate nested boolean condition in If block - antlr

I'm facing an issue while validating the below formula with the given grammar rules.
if(2>3?ceil(loopup(12)):floor(matrix(2,3)))
However, I am able to inject the below formulas:
if(2>3?loopup(12):matrix(2,3))
if(2>3?ceil(12.2):floor(2.3))
ast
: expr+ EOF
;
expr: nestedexpr
| LOOKUP_FIELD '(' idrule ')'
| TIER_FIELD '(' idrule ',' idrule ')'
| MATRIX_FIELD '(' idrule ',' idrule ')'
| IF '(' conditionalrule '?' expr ':' expr')'
| ROUND '(' idrule ',' roundnumberrule ')'
;
nestedexpr:
nestedexpr ('*'|'/'|'+'|'-') nestedexpr
| '(' '-' nestedexpr ')'
| '(' nestedexpr ')'
| ROUND '(' expr ',' roundnumberrule ')'
| MATH_FUNCTION_FIELD '(' expr ')'
| DYNAMIC_FIELD_ID/users-ack-status
;
arithematicexpr:
arithematicexpr ('*'|'/'|'+'|'-') arithematicexpr
| '(' '-' arithematicexpr ')'
| '(' arithematicexpr ')'
| DYNAMIC_FIELD_ID
;
orrule: OR '(' conditionalrule (',' conditionalrule)+ ')';
andrule: AND '(' conditionalrule (',' conditionalrule)+ ')';
conditionalrule: orrule | andrule | relationalrule;
relationalrule: DYNAMIC_FIELD_ID RELATIONAL_OPERATOR DYNAMIC_FIELD_ID;
idrule :
DYNAMIC_FIELD_ID
;
LOOKUP_FIELD: L O O K U P ;
TIER_FIELD: T I E R;
MATRIX_FIELD: M A T R I X;
IF: I F;
AND: A N D;
OR: O R;
ROUND : R O U N D;
MATH_FUNCTION_FIELD : C E I L | F L O O R;
RELATIONAL_OPERATOR: '<' | '>' | '<=' | '>=' | '<>' | '=';
BOOL_FIELD : T R U E | F A L S E;
DYNAMIC_FIELD_ID: {isDynamicFieldId()}? . ;
roundnumberrule: ROUND_NUMBER;
ROUND_NUMBER: [0-7];
WS : [ \t\r\n]+ -> skip ;
if(2>3?ceil(loopup(12)):floor(matrix(2,3)))
The above should get parsed by the mentioned grammar rule.

Related

Controlling parser rule alternatives precedence

I have an expression IF 1 THEN 2 ELSE 3 * 4. I want this parsed as IF 1 THEN 2 ELSE (3 * 4), however using my grammar (extract) below, it parses it as (IF 1 THEN 2 ELSE 3) * 4.
formula: expression EOF;
expression
: LPAREN expression RPAREN #parenthesisExp
| IF condition=expression THEN thenExpression=expression ELSE elseExpression=expression #ifExp
| left=expression BINARYOPERATOR right=expression #binaryoperationExp
| left=expression op=(TIMES|DIV) right=expression #muldivExp
| left=expression op=(PLUS|MINUS) right=expression #addsubtractExp
| left=expression op=(EQUALS|NOTEQUALS|LT|GT) right=expression #comparisonExp
| left=expression AMPERSAND right=expression #concatenateExp
| NOT expression #notExp
| STRINGLITERAL #stringliteralExp
| signedAtom #atomExp
;
My understanding is that because I have the ifExp alternative appearing before the muldivExp it should use that first, then because I have the muldivExp before atomExp (which handles numbers) it should do 3 * 4 to end the ELSE, rather than using just the 3. In which case I can't see why it's making the IF..THEN..ELSE a child of the multiplication.
I don't think the rest of the grammar is relevant here, but in case it is see below for the whole thing.
grammar AnaplanFormula;
formula: expression EOF;
expression
: LPAREN expression RPAREN #parenthesisExp
| IF condition=expression THEN thenExpression=expression ELSE elseExpression=expression #ifExp
| left=expression BINARYOPERATOR right=expression #binaryoperationExp
| left=expression op=(TIMES|DIV) right=expression #muldivExp
| left=expression op=(PLUS|MINUS) right=expression #addsubtractExp
| left=expression op=(EQUALS|NOTEQUALS|LT|GT) right=expression #comparisonExp
| left=expression AMPERSAND right=expression #concatenateExp
| NOT expression #notExp
| STRINGLITERAL #stringliteralExp
| signedAtom #atomExp
;
signedAtom
: PLUS signedAtom #plusSignedAtom
| MINUS signedAtom #minusSignedAtom
| func_ #funcAtom
| atom #atomAtom
;
atom
: SCIENTIFIC_NUMBER #numberAtom
| LPAREN expression RPAREN #expressionAtom // Do we need this?
| entity #entityAtom
;
func_: functionname LPAREN (expression (',' expression)*)? RPAREN #funcParameterised
| entity LSQUARE dimensionmapping (',' dimensionmapping)* RSQUARE #funcSquareBrackets
;
dimensionmapping: WORD COLON entity; // Could make WORD more specific here
functionname: WORD; // Could make WORD more specific here
entity: QUOTELITERAL #quotedEntity
| WORD+ #wordsEntity
| left=entity DOT right=entity #dotQualifiedEntity
;
WS: [ \r\n\t]+ -> skip;
/////////////////
// Fragments //
/////////////////
fragment NUMBER: DIGIT+ (DOT DIGIT+)?;
fragment DIGIT: [0-9];
fragment LOWERCASE: [a-z];
fragment UPPERCASE: [A-Z];
fragment WORDSYMBOL: [#?_£%];
//////////////////
// Tokens //
//////////////////
IF: 'IF' | 'if';
THEN: 'THEN' | 'then';
ELSE: 'ELSE' | 'else';
BINARYOPERATOR: 'AND' | 'and' | 'OR' | 'or';
NOT: 'NOT' | 'not';
WORD: (DIGIT* (LOWERCASE | UPPERCASE | WORDSYMBOL)) (LOWERCASE | UPPERCASE | DIGIT | WORDSYMBOL)*;
STRINGLITERAL: DOUBLEQUOTES (~'"' | ('""'))* DOUBLEQUOTES;
QUOTELITERAL: '\'' (~'\'' | ('\'\''))* '\'';
LSQUARE: '[';
RSQUARE: ']';
LPAREN: '(';
RPAREN: ')';
PLUS: '+';
MINUS: '-';
TIMES: '*';
DIV: '/';
COLON: ':';
EQUALS: '=';
NOTEQUALS: LT GT;
LT: '<';
GT: '>';
AMPERSAND: '&';
DOUBLEQUOTES: '"';
UNDERSCORE: '_';
QUESTIONMARK: '?';
HASH: '#';
POUND: '£';
PERCENT: '%';
DOT: '.';
PIPE: '|';
SCIENTIFIC_NUMBER: NUMBER (('e' | 'E') (PLUS | MINUS)? NUMBER)?;
Move your ifExpr down near the end of your alternatives. (In particular, below any alternative that you would wish to match your elseExpression
Your “if ... then ... else ...” is below the muldivExp precisely because you've made it a higher priority. Items lower in the tree are evaluated before items higher in the tree, so higher priority items belong lower in the tree.
With:
expression:
LPAREN expression RPAREN # parenthesisExp
| left = expression BINARYOPERATOR right = expression # binaryoperationExp
| left = expression op = (TIMES | DIV) right = expression # muldivExp
| left = expression op = (PLUS | MINUS) right = expression # addsubtractExp
| left = expression op = (EQUALS | NOTEQUALS | LT | GT) right = expression # comparisonExp
| left = expression AMPERSAND right = expression # concatenateExp
| NOT expression # notExp
| STRINGLITERAL # stringliteralExp
| signedAtom # atomExp
| IF condition = expression THEN thenExpression = expression ELSE elseExpression = expression #
ifExp
;
I get

ANTLR4 no viable alternative at input 'do { return' error?

This ANTLR4 parser grammar errors a 'no viable alternative' error when I try to parse an input. The only rules I know of that matches the part of the input with the error are the rules 'retblock_expr' and 'block_expr'. I have put 'retblock_expr' infront of 'block_expr' and put 'non_assign_expr' infront of 'retblock_expr' but it still throws the error.
input:
print(do { return a[3] })
full error:
line 1:11 no viable alternative at input '(do { return'
parser grammar:
parser grammar TestP;
options { tokenVocab=Test; }
program: ( ( block_expr | retblock_expr ) ( wsp ( block_expr | retblock_expr ) wsp )* wsp )? EOF;
retblock_expr
: isglobal DEF wsp fcreatable wsp fcall wsp ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #FuncBlockA
| FUNC wsp fcall wsp ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #CFuncBlockA
| LAMBDA wsp fcall wsp ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #LambdaBlockA
| SWITCH wsp atompar_option wsp ( DO wsp )? ( LBC wsp (CASE wsp atompar_option wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) )* ( DEFAULT wsp atompar_option wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) )? wsp RBC | (CASE wsp atompar_option wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) )* ( DEFAULT wsp atompar_option wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) )? wsp END ) #SwitchBlockA
| DO wsp ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #DoBlockA
;
non_assign_expr
: ( iterable ( ( DOT | SUP | SIB ) iterable )+ | index ) #AccessExpr
| ( call | datat | LPR non_assign_expr RPR | LBC non_assign_expr RBC | LBR non_assign_expr RBR ) #BracketsExpr
| ( STR | KUN )+ indexable #UnpackExpr
| <assoc=right> non_assign_expr ( wsp POW wsp non_assign_expr )+ #PowExpr
| non_assign_expr ( wsp ( INC | DEC ) wsp non_assign_expr | INC | DEC )+ #CrementExpr
| ( PLS | MNS | BNT | EXC | LEN | NOT )+ non_assign_expr #UnaryExpr
| non_assign_expr EXC+ #FactExpr
| non_assign_expr ( wsp ( STR | DIV | PER | FDV | CDV ) wsp non_assign_expr | PER )+ # AdvExpr
| non_assign_expr ( wsp ( PLS | MNS ) wsp non_assign_expr )+ #BasicExpr
| non_assign_expr ( wsp CON wsp non_assign_expr )+ #ConcatExpr
| non_assign_expr ( wsp ( BLS | BRS ) wsp non_assign_expr )+ #ShiftExpr
| non_assign_expr ( wsp ( LET | LTE | GRT | GTE ) wsp non_assign_expr )+ #CompareExpr
| non_assign_expr ( wsp ( EQL | IS | NEQ | IS wsp NOT ) wsp non_assign_expr )+ #EqualExpr
| non_assign_expr ( wsp BND wsp non_assign_expr )+ #BitAnd
| non_assign_expr ( wsp BXR wsp non_assign_expr )+ #BitXor
| non_assign_expr ( wsp BOR wsp non_assign_expr )+ #BitOr
| <assoc=right> non_assign_expr ( wsp ( AND | TND ) wsp non_assign_expr wsp ( OR | TOR ) wsp non_assign_expr )+ #Ternary
| non_assign_expr ( wsp ( NND | AND ) wsp non_assign_expr )+ #AndExpr
| non_assign_expr ( wsp ( NXR | XOR ) wsp non_assign_expr )+ #XorExpr
| non_assign_expr ( wsp ( NOR | OR ) wsp non_assign_expr )+ #OrExpr
| retblock_expr #RBlockA
| typet LPR non_assign_expr RPR #TypeCastA
| atom #AtomNAE
;
block_expr
: IF wsp non_assign_expr wsp ( ( THEN wsp block_expr | LBC wsp block_expr wsp RBC ) wsp ( ( ELIF wsp non_assign_expr wsp THEN wsp block_expr | ELIF wsp non_assign_expr wsp LBC wsp block_expr wsp RBC )* ELSE wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) | ( ELIF wsp non_assign_expr wsp THEN wsp block_expr | ELIF wsp non_assign_expr wsp LBC wsp block_expr wsp RBC )*? ( ELIF wsp non_assign_expr wsp THEN wsp block_expr wsp END | ELIF wsp non_assign_expr wsp LBC wsp block_expr wsp RBC ) ) | ( THEN wsp block_expr wsp END | LBC wsp block_expr wsp RBC ) ) #IfBlock
| TRY wsp ( block_expr wsp ( EXCEPT (LPR wsp IDN wsp RPR)? wsp ( ( (DO wsp)? LBC wsp block_expr wsp RBC | DO wsp block_expr wsp END | block_expr ) wsp )? FINALLY wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) | EXCEPT (LPR wsp IDN wsp RPR)? wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) ) | ( block_expr wsp END | LBC wsp block_expr wsp RBC ) ) #DebugBlock
| FOR wsp av_var wsp TOR wsp av_inc wsp CMA wsp av_inc wsp ( CMA wsp av_inc wsp )? ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #RangeBlock
| FOR wsp av_var wsp CMA wsp non_assign_expr wsp CMA wsp non_assign_expr wsp CMA wsp non_assign_expr wsp ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #ActionBlock
| FOR wsp IDN wsp ( TOR wsp non_assign_expr )? ( wsp CMA wsp ( IDN wsp ( TOR wsp non_assign_expr )? )? ( wsp CMA wsp IDN wsp TOR wsp non_assign_expr )* )? IN wsp iterable wsp ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #IterationBlock
| ( WHILE wsp non_assign_expr wsp ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) | DO wsp ( LBC wsp block_expr wsp RBC | block_expr ) wsp WHILE wsp non_assign_expr ) #WhileBlock
| ( ( DO | REPEAT ) wsp ( LBC wsp block_expr wsp RBC | block_expr ) wsp UNTIL wsp non_assign_expr | UNTIL wsp non_assign_expr wsp ( DO | REPEAT ) ( LBC wsp block_expr wsp RBC | block_expr wsp END ) ) #RepeatBlock
| isglobal DEF wsp fcreatable wsp fcall wsp ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #FuncBlock
| FUNC wsp fcall wsp ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #CFuncBlock
| LAMBDA wsp fcall wsp ( DO wsp )? ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #LambdaBlock
| SWITCH wsp atompar_option wsp ( DO wsp )? ( LBC wsp (CASE wsp atompar_option wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) )* ( DEFAULT wsp atompar_option wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) )? wsp RBC | (CASE wsp atompar_option wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) )* ( DEFAULT wsp atompar_option wsp ( block_expr wsp END | LBC wsp block_expr wsp RBC ) )? wsp END ) #SwitchBlock
| DO wsp ( LBC wsp block_expr wsp RBC | block_expr wsp END ) #DoBlock
| LPR block_expr RPR #EnclosedBlockA
| LBC block_expr RBC #EnclosedBlockB
| block #OpenBlock
;
atompar_option
: LPR wsp atom wsp RPR
| atom
;
isglobal: ( GLOBAL wsp )?;
block: ( stat+ wsp (PASS | retstat)* )+ | PASS;
stat
: expression+ wsp SMC*
| expression* wsp SMC+
;
retstat: RETURN wsp non_assign_expr;
expression
: <assoc=right> isglobal var_list ( wsp aop wsp expression )+ #AssignExpr
| exp_list #ExpListA
| non_assign_expr #NonAssign
| atom #AtomEXPR
| IVC #InvalidCharacter
;
literal
: strt
| num
;
datat
: listd
| dictd
| setd
| tupled
;
wsp: WSP*;
listd
: LBR wsp exp_list wsp RBR
| EML
;
dictd
: LBC wsp kvpair wsp
(
CMA
wsp
kvpair
wsp
)*
RBC
;
setd
: LBC wsp exp_list wsp RBC
| EMS
;
indexable
: ( dictd | IDN | ( ( datat | IDN | strt ) ) LBR non_assign_expr RBR | ( datat | IDN | strt ) ( ( DOT | SUP | SIB ) ( datat | IDN | strt ) )+ ) fcall
| ( ( datat | IDN | strt ) ) LBR non_assign_expr RBR
| ( datat | IDN | strt ) ( ( DOT | SUP | SIB ) ( datat | IDN | strt ) )+
| IDN
| datat
;
iterable
: indexable
| strt
;
numidn
: num
| IDN
;
av_numidn
: numidn
| av_var
;
av_inc
: av_numidn
| call
;
tupled: LPR wsp (exp_list | CMA) wsp RPR;
kvpair: non_assign_expr wsp TOR wsp non_assign_expr;
index
: ( iterable ) LBR non_assign_expr RBR
| iterable ( ( DOT | SUP | SIB ) iterable )+
;
var_list: ( typet wsp )? av_var ( wsp CMA wsp ( typet wsp )? var_list)*;
av_var
: IDN
| index
;
exp_list: non_assign_expr (wsp CMA wsp non_assign_expr)*;
atom
: num
| av_var
| strt
| typet
| ckw
| val
| datat
;
aop
: A_FDV // '//='
| A_CDV // '*/='
| A_NOR // '||='
| A_FAC // '=!='
| A_LTE // '=<='
| A_GTE // '=>='
| A_EQL // '==='
| A_NEQ // '!=='
| A_CON // '..='
| A_NXR // '$$='
| A_BRS // '>>='
| A_NND // '&&='
| A_BLS // '<<='
| A_DCL // '::='
| A_CLD // ':.='
| A_KUN // '=**'
| A_VUN // '=*'
| A_DOT // '.='
| A_POW // '^='
| A_NOT // '=!'
| A_BNT // '=~'
| A_LEN // '=#'
| A_PER // '=%'
| A_MUL // '*='
| A_DIV // '/='
| A_MOD // '%='
| A_ADD // '+='
| A_SUB // '-='
| A_LET // '=<'
| A_GRT // '=>'
| A_BND // '&='
| A_BXR // '$='
| A_BOR // '|='
| A_TND // '?='
| A_TOR // ':='
| A_NML // '='
;
num
: exponential
| non_exponential
;
exponential
: PXI
| DXI
| PXF
| DXF
| PXB
| DXB
| PXD
| DXD
| PXP
| DXP
| PRX
| DEX
;
non_exponential
: IMG
| FLT
| DBL
| DCM
| PRC
| INT
;
fcreatable
: dictd
| av_var
;
callablets
: fcreatable
| retblock_expr
| ckw
;
fcall
: CLP
| LPR arg_list RPR
;
call: callablets fcall;
arg_list: arg_type ( wsp CMA wsp arg_type )*;
arg_type
: u_var_list wsp aop wsp u_exp_list
| unkeyed_var
;
unkeyed_var
: LPR var_list RPR
| LBR var_list RBR
| LBC var_list RBC
| var_list
;
u_var_list: unkeyed_var ( wsp aop wsp u_var_list )*;
u_exp_list: unkeyed_exp ( wsp CMA wsp unkeyed_exp )*;
unkeyed_exp
: tupled
| listd
| setd
| non_assign_expr
;
litidn
: literal
| IDN
;
typecast: typet LPR non_assign_expr RPR;
strt
: multi_line
| single_line
| char_string
;
multi_line
: SMT
| USM
| NMT
| UNM
;
single_line
: SST
| USS
| NST
| UNS
| NAS
;
char_string
: SCH
| USC
| NCH
| UNC
| NAC
;
typet
: STRT
| INTT
| NUMT
| DECIMALT
| FLOATT
| DOUBLET
| PRECISET
| EXPNT
| CHART
| IMAGT
| REALT
| HEXTY
| BINTY
| OCTTY
| LISTD
| SETD
| DICTD
| TUPLED
| TYPET
| BOOLT
;
bks_or_WSP
: WSP
| BKS
| SPC
;
emd
: EML
| EMS
;
sep
: SMC
| CMA
| TOR
;
kwr
: WHILE
| FOR
| DO
| DEL
| NEW
| IMPORT
| EXPORT
| DEF
| END
| GLOBAL
| BREAK
| CONTINUE
| NOT
| AND
| OR
| IN
| CASE
| DEFAULT
| RETURN
| TRY
| EXCEPT
| FINALLY
| ELIF
| IF
| ELSE
| AS
| CONST
| REPEAT
| UNTIL
| THEN
| GOTO
| LABEL
| USING
| PUBLIC
| PROTECTED
| PRIVATE
| SELF
| FROM
| XOR
| IMAGT
| REALT
| WHERE
| PASS
| G_G
| L_L
| MAP
| IS
;
ckw
: OPN
| OUT
| OUTF
| PRINT
| PRINTF
| LAMBDA
| FUNC
| ERR
| ERRF
| ASSERT
| ASSERTF
| FORMAT
| SWITCH
| ABS
| ASCII
| CALLABLE
| CHR
| DIR
| EVAL
| EXEC
| FILTER
| GET
| HASH
| ID
| INST
| SUB
| SUPER
| MAX
| MIN
| OBJ
| ORD
| POWF
| REV
| REPR
| ROUND
| FLOOR
| CEIL
| MUL
| SORT
| ADD
| ZIP
| WAIT
| SECS
| MILS
| BENCHMARK
;
val
: RMH // 'inf'
| IMH // 'infi'
| NAN // 'nan'
| IND // 'ind'
| UND // 'und'
| NIL // 'nil'
| NON // 'none'
| TRU // 'true'
| FLS // 'false'
;
opr
: NND // '&&'
| NXR // '$$'
| NOR // '||'
| CLP // '()'
| SUP // '::'
| SIB // ':.'
| KUN // '**'
| INC // '++'
| DEC // '+-'
| FDV // '//'
| CDV // '* /'
| CON // '..'
| BLS // '<<'
| BRS // '>>'
| LTE // '<='
| GTE // '>='
| EQL // '=='
| NEQ // '!='
| LPR // '('
| RPR // ')'
| LBR // '['
| RBR // ']'
| LBC // '{'
| RBC // '}'
| STR // '*'
| POW // '^'
| PLS // '+'
| MNS // '-'
| BNT // '~'
| EXC // '!'
| LEN // '#'
| PER // '%'
| DIV // '/'
| LET // '<'
| GRT // '>'
| BND // '&'
| BXR // '$'
| BOR // '|'
| TND // '?'
| TOR // ':'
| DOT // '.'
;
inl
: strt
| num
| ckw
| kwr
| val
| IDN
| bks_or_WSP
| sep
| emd
| aop
| opr
| typet
| IVC
;
Your PRINT token can only be matched by the blk_expr rule through this path:
There is no path for retblock_expr to recognize anything that begins with the PRINT token.
As a result, it will not matter which order you have elk_expr or retblock_expr.
There is no parser rule in your grammar that will match a PRINT token followed by a LPR token. a block_expr is matched by the program rule, and it only matches (ignoring wsp) block_expr or retblock_expr. Neither of these have alternatives that begin with an LPR token, so ANTLR can't match that token.
print(...) would normally be matched as a function call expression that accepts 0 or more comma-separated parameters. You have no sure rule/alternative defined. (I'd guess that it should be an alternative on either retblock_expr or block_expr
That's the immediate cause of this error. ANTLR really does not have any rule/alternative that can accept a LPR token in this position.

unknown comparison failure occurs with xtext formatter

I'm customizing the xtext formatter of my dsl and whist testing I get a weird comparison failure I don't understand.
Here's the relevant part of my grammar :
EisModel:
'project' '=' project_name=STRING ';'
'plcname' '=' plc_name=STRING ';'
'author' '=' author_name=STRING ';'
testcases+=Testcase*;
Testcase:
"testcase" testcase_name=ID '{'
testblock=Testblock?
'}';
Testblock:
'testActive' '=' testActive=BoolConstant ';'
'blockType' '=' blockType=BlockConstant ';'
'description' '=' description=STRING ';'
define=DefineBlock?;
BoolConstant:
value=('true' | 'false');
BlockConstant:
value=('FC' | 'FB');
And the comparison failure I get, I assume, has something to do with a problem regarding a terminal rule, since I am not doing anything extraorinary in the formatter.
This is the expected code of the JUnit failure trace:
74 4 S "true" BoolConstant:value='true'
78 0 H
78 1 S ";" Testblock:(';' )
79 2 H "\n\t" Whitespace:TerminalRule'WS'
81 9 S "blockType" Testblock:'blockType'
90 1 H " " Whitespace:TerminalRule'WS'
91 1 S "=" Testblock:( '=' )
92 1 H " " Whitespace:TerminalRule'WS'
93 2 S "FC" BlockConstant:value='FC'
And this the actual code:
B BoolConstant Testblock:testActive=BoolConstant path:Testblock/testActive=Testcase/testblock=EisModel/testcases[0]
74 4 S "true" BoolConstant:value='true'
E BoolConstant Testblock:testActive=BoolConstant path:Testblock/testActive=Testcase/testblock=EisModel/testcases[0]
78 0 H
78 1 S ";" Testblock:(';' )
79 2 H "\n\t" Whitespace:TerminalRule'WS'
81 9 S "blockType" Testblock:'blockType'
90 1 H " " Whitespace:TerminalRule'WS'
91 1 S "=" Testblock:( '=' )
92 1 H " " Whitespace:TerminalRule'WS'
B BlockConstant Testblock:blockType=BlockConstant path:Testblock/blockType=Testcase/testblock=EisModel/testcases[0]
93 2 S "FC" BlockConstant:value='FC'
E BlockConstant Testblock:blockType=BlockConstant path:Testblock/blockType=Testcase/testblock=EisModel/testcases[0]
The difference revolves around the lines 74 and 93.
And I don't know what is going wrong or even where I could tweak anything.
Could anyone please help?
Here's the test:
#Test def void testTestblock() {
assertFormatted[
toBeFormatted = '''
project="proj";plcname="name";author="Bob";
testcase One {testActive = true ; blockType = FC ;
description = "string" ; }
'''
expectation = '''
project = "proj";
plcname = "name";
author = "Bob";
testcase One {
testActive = true;
blockType = FC;
description = "string";
}
'''
]
}
The bug even occurs if I comment out my code in the formatter class which extends AbstractFormatter2, so I'll omit that here.
this sounds like a bug to me. please report it at https://github.com/eclipse/xtext-core
workaround:
BoolConstant:
value=BooleanValue;
BlockConstant:
value=BlockValue;
BlockValue:"FC"|"FB";
BooleanValue: "true"|"false";
Here's a little bit more of the grammar:
EisModel:
'project' '=' project_name=STRING ';'
'plcname' '=' plc_name=STRING ';'
'author' '=' author_name=STRING ';'
testcases+=Testcase*;
Testcase:
"testcase" testcase_name=ID '{'
testblock=Testblock?
'}';
Testblock:
'testActive' '=' testActive=BoolConstant ';'
'blockType' '=' blockType=BlockConstant ';'
'description' '=' description=STRING ';'
define=DefineBlock?;
BoolConstant:
value=BooleanValue;
BlockConstant:
value=BlockValue;
BlockValue:
'FC' | 'FB';
BooleanValue:
'true' | 'false';
DefineBlock:
'define' '{' direction=DirectionBlock '}' teststeps+=TeststepBlock*;
DirectionBlock:
input=Input & inout=InOut? & output=Output;
Input:
name='input' '[' inputVariables+=Variables* ']';
Output:
name='output' '[' outputVariables+=Variables* ']';
InOut:
name='inout' '[' inoutVariables+=Variables* ']';
And here another comparison failure.
expected:
123 5 S "input" Input:name='input'
128 0 H
128 1 S "[" Input:'['
129 0 H
129 1 S "]" Input:']'
130 0 H
130 6 S "output" Output:name='output'
136 0 H
136 1 S "[" Output:'['
137 0 H
137 1 S "]" Output:']'
actual:
B Input'input' DirectionBlock:input=Input path:DirectionBlock/input=DefineBlock/direction=Testblock/define=Testcase/testblock=EisModel/testcases[0]
123 5 S "input" Input:name='input'
128 0 H
128 1 S "[" Input:'['
129 0 H
129 1 S "]" Input:']'
E Input'input' DirectionBlock:input=Input path:DirectionBlock/input=DefineBlock/direction=Testblock/define=Testcase/testblock=EisModel/testcases[0]
130 0 H
B Output'output' DirectionBlock:output=Output path:DirectionBlock/output=DefineBlock/direction=Testblock/define=Testcase/testblock=EisModel/testcases[0]
130 6 S "output" Output:name='output'
136 0 H
136 1 S "[" Output:'['
137 0 H
137 1 S "]" Output:']'
E Output'output' DirectionBlock:output=Output path:DirectionBlock/output=DefineBlock/direction=Testblock/define=Testcase/testblock=EisModel/testcases[0]
The differences are now around the lines 123-129 and 130-137.
Edit
After activating the formatter within the editior, I received an error message which I don't understand. Maybe someone else does:
Message:
Unhandled event loop exception
Exception Stack Trace:
java.lang.StackOverflowError
at com.google.common.collect.RegularImmutableMap.get(RegularImmutableMap.java:123)
at com.google.common.collect.RegularImmutableMap.get(RegularImmutableMap.java:115)
at org.eclipse.xtext.formatting2.regionaccess.internal.NodeModelBasedRegionAccess.regionForEObject(NodeModelBasedRegionAccess.java:49)
at org.eclipse.xtext.formatting2.regionaccess.internal.NodeModelBasedRegionAccess.regionForEObject(NodeModelBasedRegionAccess.java:22)
at org.eclipse.xtext.formatting2.AbstractFormatter2.isInRequestedRange(AbstractFormatter2.java:358)
at org.eclipse.xtext.formatting2.AbstractFormatter2.shouldFormat(AbstractFormatter2.java:423)
at org.eclipse.xtext.formatting2.internal.FormattableDocument.format(FormattableDocument.java:186)
at org.example.eis.formatting2.EisFormatter._format(EisFormatter.java:224)
at org.example.eis.formatting2.EisFormatter.format(EisFormatter.java:346)
A look at the java classes:
223 protected void _format(final DirectionBlock directionblock, #Extension final IFormattableDocument document) {
224 document.<DirectionBlock>format(directionblock);
225 }

Why is this grammar ambiguous?

I'm using Antlr4. Here is my grammar:
assign : id '=' expr ;
id : 'A' | 'B' | 'C' ;
expr : expr '+' term
| expr '-' term
| term ;
term : term '*' factor
| term '/' factor
| factor ;
factor : expr '**' factor
| '(' expr ')'
| id ;
WS : [ \t\r\n]+ -> skip ;
I know this grammar is ambiguous and also I know I should add an element to the grammar but I don't know how to make the grammar unambiguous.
factor : expr '**' factor
Consider the input
A + B ** C
A + B is an expr so we could analyse that as a factor, semantically (A+B)C
But the other, more conventional interpretation (A + (BC)) is also possible:
<expr> =>
<expr> + <term> =>
<term> + <term> =>
<factor> + <term> =>
A + <term> =>
A + <factor> =>
A + <expr> ** <factor> =>
A + <term> ** <factor> =>
A + <factor> ** <factor> =>
A + B ** <factor> =>
A + B ** C

ANTLR4 Grammar Performance Very Poor

Given the grammar below, I'm seeing very poor performance when parsing longer strings, on the order of seconds. (this on both Python and Go implementations) Is there something in this grammar that is causing that?
Example output:
0.000061s LEXING "hello world"
0.014349s PARSING "hello world"
0.000052s LEXING 5 + 10
0.015384s PARSING 5 + 10
0.000061s LEXING FIRST_WORD(WORD_SLICE(contact.blerg, 2, 4))
0.634113s PARSING FIRST_WORD(WORD_SLICE(contact.blerg, 2, 4))
0.000095s LEXING (DATEDIF(DATEVALUE("01-01-1970"), date.now, "D") * 24 * 60 * 60) + ((((HOUR(date.now)+7) * 60) + MINUTE(date.now)) * 60))
1.552758s PARSING (DATEDIF(DATEVALUE("01-01-1970"), date.now, "D") * 24 * 60 * 60) + ((((HOUR(date.now)+7) * 60) + MINUTE(date.now)) * 60))
This is on Python.. though I don't expect blazing performance I would expect sub-second for any input. What am I doing wrong?
grammar Excellent;
parse
: expr EOF
;
expr
: atom # expAtom
| concatenationExpr # expConcatenation
| equalityExpr # expEquality
| comparisonExpr # expComparison
| additionExpr # expAddition
| multiplicationExpr # expMultiplication
| exponentExpr # expExponent
| unaryExpr # expUnary
;
path
: NAME (step)*
;
step
: LBRAC expr RBRAC
| PATHSEP NAME
| PATHSEP NUMBER
;
parameters
: expr (COMMA expr)* # functionParameters
;
concatenationExpr
: atom (AMP concatenationExpr)? # concatenation
;
equalityExpr
: comparisonExpr op=(EQ|NE) comparisonExpr # equality
;
comparisonExpr
: additionExpr (op=(LT|GT|LTE|GTE) additionExpr)? # comparison
;
additionExpr
: multiplicationExpr (op=(ADD|SUB) multiplicationExpr)* # addition
;
multiplicationExpr
: exponentExpr (op=(MUL|DIV) exponentExpr)* # multiplication
;
exponentExpr
: unaryExpr (EXP exponentExpr)? # exponent
;
unaryExpr
: SUB? atom # negation
;
funcCall
: function=NAME LPAR parameters? RPAR # functionCall
;
funcPath
: function=funcCall (step)* # functionPath
;
atom
: path # contextReference
| funcCall # atomFuncCall
| funcPath # atomFuncPath
| LITERAL # stringLiteral
| NUMBER # decimalLiteral
| LPAR expr RPAR # parentheses
| TRUE # true
| FALSE # false
;
NUMBER
: DIGITS ('.' DIGITS?)?
;
fragment
DIGITS
: ('0'..'9')+
;
TRUE
: [Tt][Rr][Uu][Ee]
;
FALSE
: [Ff][Aa][Ll][Ss][Ee]
;
PATHSEP
:'.';
LPAR
:'(';
RPAR
:')';
LBRAC
:'[';
RBRAC
:']';
SUB
:'-';
ADD
:'+';
MUL
:'*';
DIV
:'/';
COMMA
:',';
LT
:'<';
GT
:'>';
EQ
:'=';
NE
:'!=';
LTE
:'<=';
GTE
:'>=';
QUOT
:'"';
EXP
: '^';
AMP
: '&';
LITERAL
: '"' ~'"'* '"'
;
Whitespace
: (' '|'\t'|'\n'|'\r')+ ->skip
;
NAME
: NAME_START_CHARS NAME_CHARS*
;
fragment
NAME_START_CHARS
: 'A'..'Z'
| '_'
| 'a'..'z'
| '\u00C0'..'\u00D6'
| '\u00D8'..'\u00F6'
| '\u00F8'..'\u02FF'
| '\u0370'..'\u037D'
| '\u037F'..'\u1FFF'
| '\u200C'..'\u200D'
| '\u2070'..'\u218F'
| '\u2C00'..'\u2FEF'
| '\u3001'..'\uD7FF'
| '\uF900'..'\uFDCF'
| '\uFDF0'..'\uFFFD'
;
fragment
NAME_CHARS
: NAME_START_CHARS
| '0'..'9'
| '\u00B7' | '\u0300'..'\u036F'
| '\u203F'..'\u2040'
;
ERRROR_CHAR
: .
;
You can always try to parse with SLL(*) first and only if that fails you need to parse it with LL(*) (which is the default).
See this ticket on ANTLR's GitHub for further explaination and here is an implementation that uses this strategy.
This method will save you (a lot of) time when parsing syntactically correct input.
Seems like this performance is due to the left recursion used in the addition / multiplication etc, operators. Rewriting these to be binary rules instead yields performance that is instant. (see below)
grammar Excellent;
COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
DOT : '.';
PLUS : '+';
MINUS : '-';
TIMES : '*';
DIVIDE : '/';
EXPONENT : '^';
EQ : '=';
NEQ : '!=';
LTE : '<=';
LT : '<';
GTE : '>=';
GT : '>';
AMPERSAND : '&';
DECIMAL : [0-9]+('.'[0-9]+)?;
STRING : '"' (~["] | '""')* '"';
TRUE : [Tt][Rr][Uu][Ee];
FALSE : [Ff][Aa][Ll][Ss][Ee];
NAME : [a-zA-Z][a-zA-Z0-9_.]*; // variable names, e.g. contact.name or function names, e.g. SUM
WS : [ \t\n\r]+ -> skip; // ignore whitespace
ERROR : . ;
parse : expression EOF;
atom : fnname LPAREN parameters? RPAREN # functionCall
| atom DOT atom # dotLookup
| atom LBRACK expression RBRACK # arrayLookup
| NAME # contextReference
| STRING # stringLiteral
| DECIMAL # decimalLiteral
| TRUE # true
| FALSE # false
;
expression : atom # atomReference
| MINUS expression # negation
| expression EXPONENT expression # exponentExpression
| expression (TIMES | DIVIDE) expression # multiplicationOrDivisionExpression
| expression (PLUS | MINUS) expression # additionOrSubtractionExpression
| expression (LTE | LT | GTE | GT) expression # comparisonExpression
| expression (EQ | NEQ) expression # equalityExpression
| expression AMPERSAND expression # concatenation
| LPAREN expression RPAREN # parentheses
;
fnname : NAME
| TRUE
| FALSE
;
parameters : expression (COMMA expression)* # functionParameters
;