Antlr4 Grammar for Function Application - antlr

I'm trying to write a simple lambda calculus grammar (show below). The issue I am having is that function application seems to be treated as right associative instead of left associative e.g. "f 1 2" is parsed as (f (1 2)) instead of ((f 1) 2). ANTLR has an assoc option for tokens, but I don't see how that helps here since there is no operator for function application. Does anyone see a solution?
LAMBDA : '\\';
DOT : '.';
OPEN_PAREN : '(';
CLOSE_PAREN : ')';
fragment ID_START : [A-Za-z+\-*/_];
fragment ID_BODY : ID_START | DIGIT;
fragment DIGIT : [0-9];
ID : ID_START ID_BODY*;
NUMBER : DIGIT+ (DOT DIGIT+)?;
WS : [ \t\r\n]+ -> skip;
parse : expr EOF;
expr : variable #VariableExpr
| number #ConstantExpr
| function_def #FunctionDefinition
| expr expr #FunctionApplication
| OPEN_PAREN expr CLOSE_PAREN #ParenExpr
;
function_def : LAMBDA ID DOT expr;
number : NUMBER;
variable : ID;
Thanks!

this breaks 4.1's pattern matcher for left-recursion. cleaned up in main branch I believe. try downloading last master and build. CUrrently 4.1 generates:
expr[int _p]
: ( {} variable
| number
| function_def
| OPEN_PAREN expr CLOSE_PAREN
)
(
{2 >= $_p}? expr
)*
;
for that rule. expr ref in loop is expr[0] actually, which isn't right.

Related

Antlr 4.6.1 not generating errorNodes for inputstream

I have a simple grammar like :
grammar CellMath;
equation : expr EOF;
expr
: '-'expr #UnaryNegation // unary minus
| expr op=('*'|'/') expr #MultiplicativeOp // MultiplicativeOperation
| expr op=('+'|'-') expr #AdditiveOp // AdditiveOperation
| FLOAT #Float // Floating Point Number
| INT #Integer // Integer Number
| '(' expr ')' #ParenExpr // Parenthesized Expression
;
MUL : '*' ;
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
FLOAT
: DIGIT+ '.' DIGIT*
| '.' DIGIT+
;
INT : DIGIT+ ;
fragment
DIGIT : [0-9] ; // match single digit
//fragment
//ATSIGN : [#];
WS : [ \t\r\n]+ -> skip ;
ERRORCHAR : . ;
Not able to throw an exception in case of special char in between expression
[{Number}{SPLChar}{Chars}]
Ex:
"123#abc",
"123&abc".
I expecting an exception to throw
For Example:
Input stream : 123#abc Just like in ANTLR labs Image
But in my case Output : '123' without any errors
I'm using Listener pattern, Error nodes are just ignored not going through VisitTerminal([NotNull] ITerminalNode node) / VisitErrorNode([NotNull] IErrorNode node) in the BaseListener class. Also all the BaseErrorListener class methods has been overridden not even there.
Thanks in advance for your help.

Decision can match input such as "MULOP LETTER" using multiple alternatives: 1, 2

I'm getting this error
[22:52:55] warning(200): ProjLang.g:53:30:
Decision can match input such as "MULOP LETTER" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
It seems like there could be some ambiguity in my grammar from my googling. I don't know how I can remove the ambiguity.
fragment LETTER
: ('a'..'z') | ('A'..'Z');
fragment DIGIT
: ('0'..'9');
ADDOP : ('+'|'-'|'|')
;
MULOP : ('*'|'/'|'&')
;
RELOP : ('<'|'=')
;
LPAR : '(';
RPAR : ')';
BOOL : 'true'|'false';
LCURL : '{';
RCURL : '}';
// parser rules: non terminal states should be lowercase
input : expr EOF
;
expr : 'if' expr 'then' expr 'else' expr
| 'let'( 'val' id RELOP expr 'in' expr 'end'|'fun' id LPAR id RPAR RELOP expr 'in' expr 'end')
| 'while' expr 'do' expr
| LCURL expr (';'expr)* RCURL
| '!'expr
| id ':=' expr
| relexpr
;
relexpr
: arithexpr (RELOP arithexpr)?
;
arithexpr
: term (MULOP term)*
;
term : factor (MULOP factor)*
;
factor : num
| BOOL
| id (LPAR expr RPAR)?
| LPAR expr RPAR
;
id : LETTER (LETTER | DIGIT)*;
num : DIGIT+;
I expect to write a grammar without error message so I can generate a lexer and a parser for it.
These rules do essentially the same:
arithexpr
: term (MULOP term)*
;
term : factor (MULOP factor)*
;
If you combine them you will get:
arithexpr: factor (MULOP factor)* (MULOP factor (MULOP factor)*)*
which contains an ambiquity (which of the two MULOP tokens should be matched after the initial factor?). But from the rewrite it's easy to see what to do:
arithexpr: factor (MULOP factor)*;
which replaces the original term and arithexpr rules.

ANTLR4 Unexpected Parse Behavior

I am trying to build a new language with ANTLR, and I have run into a problem. I am trying to support numerical expressions and mathematical operations on numbers(pretty important I reckon), but the parser doesn't seem to be acting how I expect. Here is my grammar:
grammar Lumos;
/*
* Parser Rules
*/
program : 'start' stat+ 'stop';
block : stat*
;
stat : assign
| numop
| if_stat
| while_stat
| display
;
assign : LET ID BE expr ;
display : DISPLAY expr ;
numop : add | subtract | multiply | divide ;
add : 'add' expr TO ID ;
subtract : 'subtract' expr 'from' ID ;
divide : 'divide' ID BY expr ;
multiply : 'multiply' ID BY expr ;
append : 'append' expr TO ID ;
if_stat
: IF condition_block (ELSE IF condition_block)* (ELSE stat_block)?
;
condition_block
: expr stat_block
;
stat_block
: OBRACE block CBRACE
| stat
;
while_stat
: WHILE expr stat_block
;
expr : expr POW<assoc=right> expr #powExpr
| MINUS expr #unaryExpr
| NOT expr #notExpr
| expr op=(TIMES|DIV|MOD) expr #multiplicativeExpr
| expr op=(PLUS|MINUS) expr #additiveExpr
| expr op=RELATIONALOPERATOR expr #relationalExpr
| expr op=EQUALITYOPERATOR expr #equalityExpr
| expr AND expr #andExpr
| expr OR expr #orExpr
//| ARRAY #arrayExpr
| atom #atomExpr
;
atom : LPAREN expr RPAREN #parExpr
| (INT|FLOAT) #numberExpr
| (TRUE|FALSE) #booleanAtom
| ID #idAtom
| STRING #stringAtom
| NIX #nixAtom
;
compileUnit : EOF ;
/*
* Lexer Rules
*/
fragment LETTER : [a-zA-Z] ;
MATHOP : PLUS
| MINUS
| TIMES
| DIV
| MOD
| POW
;
RELATIONALOPERATOR : LTEQ
| GTEQ
| LT
| GT
;
EQUALITYOPERATOR : EQ
| NEQ
;
LPAREN : '(' ;
RPAREN : ')' ;
LBRACE : '{' ;
RBRACE : '}' ;
OR : 'or' ;
AND : 'and' ;
BY : 'by' ;
TO : 'to' ;
FROM : 'from' ;
LET : 'let' ;
BE : 'be' ;
EQ :'==' ;
NEQ :'!=' ;
LTEQ :'<=' ;
GTEQ :'>=' ;
LT :'<' ;
GT :'>' ;
//Different statements will choose between these, but they are pretty much the
same.
PLUS :'plus' ;
ADD :'add' ;
MINUS :'minus' ;
SUBTRACT :'sub' ;
TIMES :'times' ;
MULT :'multiply' ;
DIV :'divide' ;
MOD :'mod' ;
POW :'pow' ;
NOT :'not' ;
TRUE :'true' ;
FALSE :'false' ;
NIX :'nix' ;
IF :'if' ;
THEN :'then' ;
ELSE :'else' ;
WHILE :'while' ;
DISPLAY :'display' ;
ARRAY : '['(INT|FLOAT)(','(INT|FLOAT))+']';
ID : [a-z]+ ;
WORD : LETTER+ ;
//NUMBER : INT | FLOAT ;
INT : [0-9]+ ;
FLOAT : [0-9]+ '.' [0-9]*
| '.'[0-9]+
;
COMMENT : '#' ~[\r\n]* -> channel(HIDDEN) ;
WS : [ \n\t\r]+ -> channel(HIDDEN) ;
STRING : '"' (~["{}])+ '"' ;
When given the input let foo be 5 times 3, the visitor sees let foo be 5 and an extraneous times 3. I thought I set up the expr rule so that it would recognize a multiplication expression before it recognizes atoms, so this wouldn't happen. I don't know where I went wrong, but it does not work how I expected.
If anyone has any idea where I went wrong or how I can fix this problem, I would appreciate your input.
You're using TIMES in your parser rules, but the MATHOP also matches TIMES and since MATHOP is defined before your TIMES rule, it gets precedence. That is why the TIMES rule in expr op=(TIMES|DIV|MOD) expr isn't matched.
I don't see you using this MATHOP rule anywhere in your parser rules, so I recommend just removing the MATHOP rule all together.

String interpolation: is it possible without adding members?

I would like to create an Antlr parser for custom language and decided to pick a simple calculator as an example. In my new grammar it should be possible to define a string, like this:
s = "Hello, I am a string"
and handle string interpolation.
Text in double quotes enclosed in persent should be treated as interpolated, e.g.
s = "Hello, did you know that %2 + 2% is 4?"
Double percent sign should not be processed, e.g.
s = "He wants 50%% of this deal."
But at the same time my calculator should support modulus operation:
x = 5 % 2
So far, I was able to craft a Lexer/Grammar, which could switch mode and parse simple strings, here they are:
lexer grammar CalcLexer;
EQ: '=';
PLUS: '+';
MINUS: '-';
MULT: '*';
DIV: '/';
LPAREN : '(' ;
RPAREN : ')' ;
SINGLE_PERCENT_POP: '%' -> popMode;
ID : [a-zA-Z]+ ;
INT : [0-9]+ ;
OPEN_DOUBLE_QUOTE: '"' -> pushMode(STRING_MODE);
NEWLINE:'\r'? '\n' ;
WS : [ \t]+ -> skip;
mode STRING_MODE;
DOUBLE_PERCENT: '%%';
SINGLE_PERCENT: '%' -> pushMode(DEFAULT_MODE);
TEXT: ~('%'|'\n'|'"')+;
CLOSE_DOUBLE_QUOTE: '"' -> popMode;
and
parser grammar CalcGrammar;
options { tokenVocab=CalcLexer; } // use tokens from CalcLexer.g4
prog: stat+ ;
stat: expr NEWLINE
| ID EQ (expr|text) NEWLINE
| NEWLINE
;
text: OPEN_DOUBLE_QUOTE content* CLOSE_DOUBLE_QUOTE;
content: DOUBLE_PERCENT | TEXT | SINGLE_PERCENT expr SINGLE_PERCENT_POP;
expr: expr (MULT|DIV) expr
| expr (PLUS|MINUS) expr
| INT
| ID
| LPAREN expr RPAREN
;
But only thing doesn't work and I'm not sure if it ever possible to implement without custom code (members) is modulus operation:
x = 5 % 2
There is no way I can ask Anltr to check for previous mode and safely pop mode.
But I hope my understanding is wrong and there is some way to treat % sign as operator in default mode?
I have found several sources for inspiration, probably they would help you as well:
Parsing string interpolation in ANTLR
ANTLR String interpolation
Parsing String Interpolations with ANTLR4
String interpolation and lexer modes
Murphy's law for StackOverflow: you will find an answer to your own question after several minutes you post detailed question to SO.
Instead of switching to DEFAULT_MODE, I should create separate one - STRING_INTERPOLATION. This way I have to define separate tokens for this mode, which will let use % sign in normal mode (and prohibit in interpolated).
Here is Lexer and Grammar which works for me:
lexer grammar CalcLexer;
EQ: '=';
PLUS: '+';
MINUS: '-';
MULT: '*';
DIV: '/';
MOD: '%';
LPAREN : '(' ;
RPAREN : ')' ;
ID : F_ID;
INT : F_INT;
fragment F_ID: [a-zA-Z]+ ;
fragment F_INT: [0-9]+ ;
OPEN_DOUBLE_QUOTE: '"' -> pushMode(STRING_MODE);
NEWLINE:'\r'? '\n' ;
WS : [ \t]+ -> skip;
mode STRING_MODE;
DOUBLE_PERCENT: '%%';
SINGLE_PERCENT: '%' -> pushMode(STRING_INTERPOLATION);
TEXT: ~('%'|'\n'|'"')+;
CLOSE_DOUBLE_QUOTE: '"' -> popMode;
mode STRING_INTERPOLATION;
SINGLE_PERCENT_POP: '%' -> popMode;
I_PLUS: PLUS -> type(PLUS);
I_MINUS: MINUS -> type(MINUS);
I_MULT: MULT -> type(MULT);
I_DIV: DIV -> type(DIV);
I_MOD: MOD -> type(MOD);
I_LPAREN: LPAREN -> type(LPAREN);
I_RPAREN: RPAREN -> type(RPAREN);
I_ID : F_ID -> type(ID);
I_INT : F_INT -> type(INT);
WS1 : [ \t]+ -> skip;
and
parser grammar CalcGrammar;
options { tokenVocab=CalcLexer; } // use tokens from CalcLexer.g4
prog: stat+ ;
stat: expr NEWLINE
| ID EQ (expr|text) NEWLINE
| NEWLINE
;
text: OPEN_DOUBLE_QUOTE content* CLOSE_DOUBLE_QUOTE;
content: DOUBLE_PERCENT | TEXT | SINGLE_PERCENT expr SINGLE_PERCENT_POP;
expr: expr (MULT|DIV|MOD) expr
| expr (PLUS|MINUS) expr
| INT
| ID
| LPAREN expr RPAREN
;
I hope this would help someone. Probably, future me.

Antlr Skip text outside tag

Im trying to skip/ignore the text outside a custom tag:
This text is a unique token to skip < ?compo \5+5\ ?> also this < ?compo \1+1\ ?>
I tried with the follow lexer:
TAG_OPEN : '<?compo ' -> pushMode(COMPOSER);
mode COMPOSER;
TAG_CLOSE : ' ?>' -> popMode;
NUMBER_DIGIT : '1'..'9';
ZERO : '0';
LOGICOP
: OR
| AND
;
COMPAREOP
: EQ
| NE
| GT
| GE
| LT
| LE
;
WS : ' ';
NEWLINE : ('\r\n'|'\n'|'\r');
TAB : ('\t');
...
and parser:
instructions
: (TAG_OPEN statement TAG_CLOSE)+?;
statement
: if_statement
| else
| else_if
| if_end
| operation_statement
| mnemonic
| comment
| transparent;
But it doesn't work (I test it by using the intelliJ tester on the rule "instructions")...
I have also add some skip rules outside the "COMPOSER" mode:
TEXT_SKIP : TAG_CLOSE .*? (TAG_OPEN | EOF) -> skip;
But i don't have any results...
Someone can help me?
EDIT:
I change "instructions" and now the parser tree is correctly builded for every instruction of every tag:
instructions : (.*? TAG_OPEN statement TAG_CLOSE .*?)+;
But i have a not recognized character error outside the the tags...
Below is a quick demo that worked for me.
Lexer grammar:
lexer grammar CompModeLexer;
TAG_OPEN
: '<?compo' -> pushMode(COMPOSER)
;
OTHER
: . -> skip
;
mode COMPOSER;
TAG_CLOSE
: '?>' -> popMode
;
OPAR
: '('
;
CPAR
: ')'
;
INT
: '0'
| [1-9] [0-9]*
;
LOGICOP
: 'AND'
| 'OR'
;
COMPAREOP
: [<>!] '='
| [<>=]
;
MULTOP
: [*/%]
;
ADDOP
: [+-]
;
SPACE
: [ \t\r\n\f] -> skip
;
Parser grammar:
parser grammar CompModeParser;
options {
tokenVocab=CompModeLexer;
}
parse
: tag* EOF
;
tag
: TAG_OPEN statement TAG_CLOSE
;
statement
: expr
;
expr
: '(' expr ')'
| expr MULTOP expr
| expr ADDOP expr
| expr COMPAREOP expr
| expr LOGICOP expr
| INT
;
A test with the input This text is a unique token to skip <?compo 5+5 ?> also this <?compo 1+1 ?> resulted in the following tree:
I found another solution (not elegant as the previous):
Create a generic TEXT token in the general context (so outside the tag's mode)
TEXT : ( ~[<] | '<' ~[?])+ -> skip;
Create a parser rule for handle a generic text
code
: TEXT
| (TEXT? instruction TEXT?)+;
Create a parser rule for handle an instruction
instruction
: TAG_OPEN statement TAG_CLOSE;