What's wrong with this antlr grammar? wildcard problem? - antlr

the grammar is simple enough:
expression: signal logic_op right_op;
signal: .+?;
right_op: .+?;
logic_op: GT|GTE|LT|LTE|NOT|EQ;
GT: '>';
GTE: '>=';
LT: '<';
LTE: '<=';
NOT: 'NOT';
EQ: '=';
WS : [ \t\r\n]+ -> skip ;
but I tested it with 'a > b', it gives me this error:
line 1:0 token recognition error at: 'a'
line 1:4 token recognition error at: 'b'
line 1:5 mismatched input '<EOF>' expecting {'>', '>=', '<', '<=', 'NOT', '=', WS}
Why?

There is no lexer rule that matches a and b. Add this:
ID : [a-zA-Z]+;
And if your follow up question is "but a and b are matched by .+?, right?", then the answer is "no". Inside a parser rule (like signal) a . (dot) matches any token. Inside a lexer rule the . matches any character.

Related

Ignore spaces, but allow text with spaces

I need to write a simple antlr4 grammar for expressions like this:
{paramName=simple text} //correct
{ paramName = simple text} //correct
{bad param=text} //incorrect
First two expression is almost equal. The difference is a space before and after parameter name. Third is incorrect, spaces not allowed in parameter name. I write a grammar:
grammar Test;
prog : '{' paramName '=' paramValue '}' ;
paramName : PARAM_NAME ;
paramValue : TEXT_WITH_SPACES ;
PARAM_NAME : [A-Za-zА-Яа-я_] [A-Za-zА-Яа-я_0-9]* ;
TEXT_WITH_SPACES : (LETTERS_EN|' ')+ ;
WS : [ ]+ -> skip;
fragment LETTERS_EN : ([A-Za-z]) ;
So, the task is ignore spaces around parameter name, but allow spaces in parameter value. But when I add a space inside rule TEXT_WITH_SPACES, my second expression highlight as icorrect.
screenshot
What can I do? Thank you in advance!
Ignore all spaces, but consider them to be "end of word", and allow more words in the value:
grammar Test;
prog : '{' paramName '=' paramValue '}' ;
paramName : WORD ;
paramValue : WORD+ ;
WORD : [A-Za-zА-Яа-я_] [A-Za-zА-Яа-я_0-9]* ;
WS : [ ]+ -> skip;
Update: To preserve spaces in the value:
grammar Test;
prog : '{' paramName '=' paramValue '}' ;
paramName : WORD ;
paramValue : WORD | MULTIWORD ;
MULTIWORD : WORD ((' ')+ WORD)* ;
WORD : [A-Za-zА-Яа-я_] [A-Za-zА-Яа-я_0-9]* ;
WS : [ ]+ -> skip;
This is based on MULTIWORD matching multiple words with nothing but space in between them, and other cases being matched by sequence of WORD and WS.

How to fix extraneous input ' ' expecting, in antlr4

Hello when running antlr4 with the following input i get the following error
image showing problem
[
I have been trying to fix it by doing some changes here and there but it seems it only works if I write every component of whileLoop in a new line.
Could you please tell me what i am missing here and why the problem persits?
grammar AM;
COMMENTS :
'{'~[\n|\r]*'}' -> skip
;
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
anything : whileLoop | write ;
write : 'WRITE' '(' '"' sentance '"' ')' ;
read : 'READ' '(' '"' sentance '"' ')' ;
whileLoop : 'WHILE' expression 'DO' ;
block : 'BODY' anything 'END';
expression : 'TRUE'|'FALSE' ;
test : ID? {System.out.println("Done");};
logicalOperators : '<' | '>' | '<>' | '<=' | '>=' | '=' ;
numberExpressionS : (NUMBER numberExpression)* ;
numberExpression : ('-' | '/' | '*' | '+' | '%') NUMBER ;
sentance : (ID)* {System.out.println("Sentance");};
WS : [ \t\r\n]+ -> skip ;
NUMBER : [0-9]+ ;
ID : [a-zA-Z0-9]* ;
**`strong text`**
Your lexer rules produce conflicts:
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
vs
WS : [ \t\r\n]+ -> skip ;
The critical section is the ' '*. This defines an implicit lexer token. It matches spaces and it is defined above of WS. So any sequence of spaces is not handled as WS but as implicit token.
If I am right putting tabs between the components of whileloop will work, also putting more than one space between them should work. You should simply remove ' '*, since whitespace is to be skipped anyway.

"The following sets of rules are mutually left-recursive"

I have tried to write a grammar to recognize expressions like:
(A + MAX(B) ) / ( C - AVERAGE(A) )
IF( A > AVERAGE(A), 0, 1 )
X / (MAX(X)
Unfortunately antlr3 fails with these errors:
error(210): The following sets of rules are mutually left-recursive [unaryExpression, additiveExpression, primaryExpression, formula, multiplicativeExpression]
error(211): DerivedKeywords.g:110:13: [fatal] rule booleanTerm has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
error(206): DerivedKeywords.g:110:13: Alternative 1: after matching input such as decision cannot predict what comes next due to recursion overflow to additiveExpression from formula
I have spent some hours trying to fix these, it would be great if anyone could at least help me fix the first problem. Thanks
Code:
grammar DerivedKeywords;
options {
output=AST;
//backtrack=true;
}
WS : ( ' ' | '\t' | '\n' | '\r' )
{ skip(); }
;
//for numbers
DIGIT
: '0'..'9'
;
//for both integer and real number
NUMBER
: (DIGIT)+ ( '.' (DIGIT)+ )?( ('E'|'e')('+'|'-')?(DIGIT)+ )?
;
// Boolean operatos
AND : 'AND';
OR : 'OR';
NOT : 'NOT';
EQ : '=';
NEQ : '!=';
GT : '>';
LT : '<';
GTE : '>=';
LTE : '<=';
COMMA : ',';
// Token for Functions
IF : 'IF';
MAX : 'MAX';
MIN : 'MIN';
AVERAGE : 'AVERAGE';
VARIABLE : 'A'..'Z' ('A'..'Z' | '0'..'9')*
;
// OPERATORS
LPAREN : '(' ;
RPAREN : ')' ;
DIV : '/' ;
PLUS : '+' ;
MINUS : '-' ;
STAR : '*' ;
expression : formula;
formula
: functionExpression
| additiveExpression
| LPAREN! a=formula RPAREN! // First Problem
;
additiveExpression
: a=multiplicativeExpression ( (MINUS^ | PLUS^ ) b=multiplicativeExpression )*
;
multiplicativeExpression
: a=unaryExpression ( (STAR^ | DIV^ ) b=unaryExpression )*
;
unaryExpression
: MINUS^ u=unaryExpression
| primaryExpression
;
functionExpression
: f=functionOperator LPAREN e=formula RPAREN
| IF LPAREN b=booleanExpression COMMA p=formula COMMA s=formula RPAREN
;
functionOperator :
MAX | MIN | AVERAGE;
primaryExpression
: NUMBER
// Used for scientific numbers
| DIGIT
| VARIABLE
| formula
;
// Boolean stuff
booleanExpression
: orExpression;
orExpression : a=andExpression (OR^ b=andExpression )*
;
andExpression
: a=notExpression (AND^ b=notExpression )*
;
notExpression
: NOT^ t=booleanTerm
| booleanTerm
;
booleanOperator :
GT | LT | EQ | GTE | LTE | NEQ;
booleanTerm : a=formula op=booleanOperator b=formula
| LPAREN! booleanTerm RPAREN! // Second problem
;
error(210): The following sets of rules are mutually left-recursive [unaryExpression, additiveExpression, primaryExpression, formula, multiplicativeExpression]
- this means that if the parser enters unaryExpression rule, it has the possibility to match additiveExpression, primaryExpression, formula, multiplicativeExpression and unaryExpression again without ever consuming a single token from input - so it cannot decide whether to use those rules or not, because even if it uses the rules, the input will be the same.
You're probably trying to allow subexpressions in expressions by this sequence of rules - you need to make sure that path will consume the left parenthesis of the subexpression. Probably the formula alternative in primaryExpression should be changed to LPAREN formula RPAREN, and the rest of grammar be adjusted accordingly.

ParserRule matching the wrong token

I'm trying to learn a bit ANTLR4 and define a grammar for some 4GL language.
This is what I've got:
compileUnit
:
typedeclaration EOF
;
typedeclaration
:
ID LPAREN DATATYPE INT RPAREN
;
DATATYPE
:
DATATYPE_ALPHANUMERIC
| DATATYPE_NUMERIC
;
DATATYPE_ALPHANUMERIC
:
'A'
;
DATATYPE_NUMERIC
:
'N'
;
fragment
DIGIT
:
[0-9]
;
fragment
LETTER
:
[a-zA-Z]
;
INT
:
DIGIT+
;
ID
:
LETTER
(
LETTER
| DIGIT
)*
;
LPAREN
:
'('
;
RPAREN
:
')'
;
WS
:
[ \t\f]+ -> skip
;
What I want to be able to parse:
TEST (A10)
what I get:
typedeclaration:1:6: mismatched input 'A10' expecting DATATYPE
I am however able to write:
TEST (A 10)
Why do I need to put a whitespace in here? The LPAREN DATATYPE in itself is working, so there is no need for a space inbetween. Also the INT RPAREN is working.
Why is a space needed between DATATYPE and INT? I'm a bit confused on that one.
I guess that it's matching ID because it's the "longest" match, but there must be some way to force to be lazier here, right?
You should ignore 'A' and 'N' chats at first position of ID. As #CoronA noticed ANTLR matches token as long as possible (length of ID 'A10' more than length of DATATYPE_ALPHANUMERIC 'A'). Also read this: Priority rules. Try to use the following grammar:
grammar expr;
compileUnit
: typedeclaration EOF
;
typedeclaration
: ID LPAREN datatype INT RPAREN
;
datatype
: DATATYPE_ALPHANUMERIC
| DATATYPE_NUMERIC
;
DATATYPE_ALPHANUMERIC
: 'A'
;
DATATYPE_NUMERIC
: 'N'
;
INT
: DIGIT+
;
ID
: [b-mo-zB-MO-Z] (LETTER | DIGIT)*
;
LPAREN
: '('
;
RPAREN
: ')'
;
WS
: [ \t\f]+ -> skip
;
fragment
DIGIT
: [0-9]
;
fragment
LETTER
: [a-zA-Z]
;
Also you can use the following grammar without id restriction. Data types will be recognized earlier than letters. it's not clear too:
grammar expr;
compileUnit
: typedeclaration EOF
;
typedeclaration
: id LPAREN datatype DIGIT+ RPAREN
;
id
: (datatype | LETTER) (datatype | LETTER | DIGIT)*
;
datatype
: DATATYPE_ALPHANUMERIC
| DATATYPE_NUMERIC
;
DATATYPE_ALPHANUMERIC: 'A';
DATATYPE_NUMERIC: 'N';
// List with another Data types.
LETTER: [a-zA-Z];
LPAREN
: '('
;
RPAREN
: ')'
;
WS
: [ \t\f]+ -> skip
;
DIGIT
: [0-9]
;

Antlr grammar won't work

My grammar won't accept inputs like "x = 4, xy = 1"
grammar Ex5;
prog : stat+;
stat : ID '=' NUMBER;
LETTER : ('a'..'z'|'A'..'Z');
DIGIT : ('0'..'9');
ID : LETTER+;
NUMBER : DIGIT+;
WS : (' '|'\t'| '\r'? '\n' )+ {skip();};
What do i have to do to make it accept a multitude of inputs like ID = NUMBER??
Thank you in advance.
You'll have to account for the comma, ',', inside your grammar. Also, since you (most probably) don't want LETTER and DIGIT tokens to be created since they are only used in other lexer rules, you should make these fragments:
grammar Ex5;
prog : stat (',' stat)*;
stat : ID '=' NUMBER;
ID : LETTER+;
NUMBER : DIGIT+;
fragment LETTER : 'a'..'z' | 'A'..'Z';
fragment DIGIT : '0'..'9';
WS : (' '|'\t'| '\r'? '\n')+ {skip();};