Antlr grammar for parsing simple expression

Antlr grammar for parsing simple expression - antlr

I would like to parse following expresion with antlr4
termspannear ( xxx, xxx , 5 , true )
termspannear ( xxx, termspannear ( xxx, xxx , 5 , true ) , 5 , true )
Where termspannear functions can be nested
Here is my grammar:
//Define a gramar to parse TermSpanNear
grammar TermSpanNear;
start : TERMSPAN ;
TERMSPAN : TERMSPANNEAR | 'xxx' ;
TERMSPANNEAR: 'termspannear' OPENP BODY CLOSEP ;
BODY : TERMSPAN COMMA TERMSPAN COMMA SLOP COMMA ORDERED ;
COMMA : ',' ;
OPENP : '(' ;
CLOSEP : ')' ;
SLOP : [0-9]+ ;
ORDERED : 'true' | 'false' ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
After running:
antlr4 TermSpanNear.g4
javac TermSpanNear*.java
grun TermSpanNear start -gui
termspannear ( xxx, xxx , 5 , true )
^D![enter image description here][1]
line 1:0 token recognition error at: 'termspannear '
line 1:13 extraneous input '(' expecting TERMSPAN
and the tree looks like:
Can someone help me with this grammar ?
So the parsed tree contains all params and and also nesting works
NOTE:
After suggestion by I rewrote it to
//Define a gramar to parse TermSpanNear
grammar TermSpanNear;
start : termspan EOF;
termspan : termspannear | 'xxx' ;
termspannear: 'termspannear' '(' body ')' ;
body : termspan ',' termspan ',' SLOP ',' ORDERED ;
SLOP : [0-9]+ ;
ORDERED : 'true' | 'false' ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
I think now it works
I'm geting the following trees:
For
termspannear ( xxx, xxx , 5 , true )
For
termspannear ( xxx, termspannear ( xxx, xxx , 5 , true ) , 5 , true )

You're using way too many lexer rules.
When you're defining a token like this:
BODY : TERMSPAN COMMA TERMSPAN COMMA SLOP COMMA ORDERED ;
then the tokenizer (lexer) will try to create the (single!) token: xxx,xxx,5,true. E.g. it does not allow any space in between it. Lexer rules (the ones starting with a capital) should really be the "atoms" of your language (the smallest parts). Whenever you start creating elements like a body, you glue atoms together in parser rules, not in lexer rules.
Try something like this:
grammar TermSpanNear;
// parser rules (the elements)
start : termpsan EOF ;
termpsan : termpsannear | 'xxx' ;
termpsannear : 'termspannear' OPENP body CLOSEP ;
body : termpsan COMMA termpsan COMMA SLOP COMMA ORDERED ;
// lexer rules (the atoms)
COMMA : ',' ;
OPENP : '(' ;
CLOSEP : ')' ;
SLOP : [0-9]+ ;
ORDERED : 'true' | 'false' ;
WS : [ \t\r\n]+ -> skip ;

Related

ANTLR4: wrong lexer rule matches

I'm at a very beginning of learning ANTLR4 lexer rules. My goal is to create a simple grammar for Java properties files. Here is what I have so far:
lexer grammar PropertiesLexer;
LineComment
: ( LineCommentHash
| LineCommentExcl
)
-> skip
;
fragment LineCommentHash
: '#' ~[\r\n]*
;
fragment LineCommentExcl
: '!' ~[\r\n]*
;
fragment WrappedLine
: '\\'
( '\r' '\n'?
| '\n'
)
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
Key
: KeyLetterStart
( KeyLetter
| Escaped
)*
;
fragment KeyLetterStart
: ~[ \t\r\n:=]
;
fragment KeyLetter
: ~[\t\r\n:=]
;
fragment Escaped
: '\\' .?
;
Equal
: ( '\\'? ':'
| '\\'? '='
)
;
Value
: ValueLetterBegin
( ValueLetter
| Escaped
| WrappedLine
)*
;
fragment ValueLetterBegin
: ~[ \t\r\n]
;
fragment ValueLetter
: ~ [\r\n]+
;
Whitespace
: [ \t]+
-> skip
;
My test file is this one:
# comment 1
# comment 2
#
.key1= value1
key2\:sub=value2
key3 \= value3
key4=value41\
value42
# comment3
#comment4
key=value
When I run grun, I'm getting following output:
[#0,30:42='.key1= value1',<Value>,4:0]
[#1,45:60='key2\:sub=value2',<Value>,5:0]
[#2,63:76='key3 \= value3',<Value>,6:0]
[#3,81:102='key4=value41\\r\nvalue42',<Value>,8:0]
[#4,130:138='key=value',<Value>,13:0]
[#5,141:140='<EOF>',<EOF>,14:0]
I don't understand why the Value definition is matched. When commenting out the Value definition, however, it recognizes the Key and Equal definitions:
[#0,30:34='.key1',<Key>,4:0]
[#1,35:35='=',<Equal>,4:5]
[#2,37:42='value1',<Key>,4:7]
[#3,45:49='key2\',<Key>,5:0]
[#4,50:50=':',<Equal>,5:5]
[#5,51:53='sub',<Key>,5:6]
[#6,54:54='=',<Equal>,5:9]
[#7,55:60='value2',<Key>,5:10]
[#8,63:68='key3 \',<Key>,6:0]
[#9,69:69='=',<Equal>,6:6]
[#10,71:76='value3',<Key>,6:8]
[#11,81:84='key4',<Key>,8:0]
[#12,85:85='=',<Equal>,8:4]
[#13,86:93='value41\',<Key>,8:5]
[#14,96:102='value42',<Key>,9:0]
[#15,130:132='key',<Key>,13:0]
[#16,133:133='=',<Equal>,13:3]
[#17,134:138='value',<Key>,13:4]
[#18,141:140='<EOF>',<EOF>,14:0]
but how to let it recognize the Key, Equal and Value definitons?

ANTLR's lexer rules match as much characters as possible, that is why you're seeing all these Value tokens being created (they match the most characters).
Lexical modes seem like a good fit to use here. Something like this:
lexer grammar PropertiesLexer;
COMMENT
: [!#] ~[\r\n]* -> skip
;
KEY
: ( '\\' ~[\r\n] | ~[\r\n\\=:] )+
;
EQUAL
: [=:] -> pushMode(VALUE_MODE)
;
NL
: [\r\n]+ -> skip
;
mode VALUE_MODE;
VALUE
: ( ~[\\\r\n] | '\\' . )+
;
END_VALUE
: [\r\n]+ -> skip, popMode
;

Ignore spaces, but allow text with spaces

I need to write a simple antlr4 grammar for expressions like this:
{paramName=simple text} //correct
{ paramName = simple text} //correct
{bad param=text} //incorrect
First two expression is almost equal. The difference is a space before and after parameter name. Third is incorrect, spaces not allowed in parameter name. I write a grammar:
grammar Test;
prog : '{' paramName '=' paramValue '}' ;
paramName : PARAM_NAME ;
paramValue : TEXT_WITH_SPACES ;
PARAM_NAME : [A-Za-zА-Яа-я_] [A-Za-zА-Яа-я_0-9]* ;
TEXT_WITH_SPACES : (LETTERS_EN|' ')+ ;
WS : [ ]+ -> skip;
fragment LETTERS_EN : ([A-Za-z]) ;
So, the task is ignore spaces around parameter name, but allow spaces in parameter value. But when I add a space inside rule TEXT_WITH_SPACES, my second expression highlight as icorrect.
screenshot
What can I do? Thank you in advance!

Ignore all spaces, but consider them to be "end of word", and allow more words in the value:
grammar Test;
prog : '{' paramName '=' paramValue '}' ;
paramName : WORD ;
paramValue : WORD+ ;
WORD : [A-Za-zА-Яа-я_] [A-Za-zА-Яа-я_0-9]* ;
WS : [ ]+ -> skip;
Update: To preserve spaces in the value:
grammar Test;
prog : '{' paramName '=' paramValue '}' ;
paramName : WORD ;
paramValue : WORD | MULTIWORD ;
MULTIWORD : WORD ((' ')+ WORD)* ;
WORD : [A-Za-zА-Яа-я_] [A-Za-zА-Яа-я_0-9]* ;
WS : [ ]+ -> skip;
This is based on MULTIWORD matching multiple words with nothing but space in between them, and other cases being matched by sequence of WORD and WS.

How to fix extraneous input ' ' expecting, in antlr4

Hello when running antlr4 with the following input i get the following error
image showing problem
[
I have been trying to fix it by doing some changes here and there but it seems it only works if I write every component of whileLoop in a new line.
Could you please tell me what i am missing here and why the problem persits?
grammar AM;
COMMENTS :
'{'~[\n|\r]*'}' -> skip
;
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
anything : whileLoop | write ;
write : 'WRITE' '(' '"' sentance '"' ')' ;
read : 'READ' '(' '"' sentance '"' ')' ;
whileLoop : 'WHILE' expression 'DO' ;
block : 'BODY' anything 'END';
expression : 'TRUE'|'FALSE' ;
test : ID? {System.out.println("Done");};
logicalOperators : '<' | '>' | '<>' | '<=' | '>=' | '=' ;
numberExpressionS : (NUMBER numberExpression)* ;
numberExpression : ('-' | '/' | '*' | '+' | '%') NUMBER ;
sentance : (ID)* {System.out.println("Sentance");};
WS : [ \t\r\n]+ -> skip ;
NUMBER : [0-9]+ ;
ID : [a-zA-Z0-9]* ;
**`strong text`**

Your lexer rules produce conflicts:
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
vs
WS : [ \t\r\n]+ -> skip ;
The critical section is the ' '*. This defines an implicit lexer token. It matches spaces and it is defined above of WS. So any sequence of spaces is not handled as WS but as implicit token.
If I am right putting tabs between the components of whileloop will work, also putting more than one space between them should work. You should simply remove ' '*, since whitespace is to be skipped anyway.

Guide or approval for ANTLR example

I have an AlgebraRelacional.g4 file with this. I need to read a file with a syntax like a CSV file, put the content in some memory tables and then resolve relational algebra operations with that. Can you tell me if I am doing it right?
Example data file to read:
cod_buy(char);name_suc(char);Import(int);date_buy(date)
“P-11”;”DC Med”;900;01/03/14
“P-14”;”Center”;1500;02/05/14
Current ANTLR grammar:
grammar AlgebraRelacional;
SEL : '\u03C3'
;
PRO : '\u220F'
;
UNI : '\u222A'
;
DIF : '\u002D'
;
PROC : '\u0058'
;
INT : '\u2229'
;
AND : 'AND'
;
OR : 'OR'
;
NOT : 'NOT'
;
EQ : '='
;
DIFERENTE : '!='
;
MAYOR : '>'
;
MENOR : '<'
;
SUMA : '+'
;
MULTI : '*'
;
IPAREN : '('
;
DPAREN : ')'
;
COMA : ','
;
PCOMA : ';'
;
Comillas: '"'
;
file : hdr row+ ;
hdr : row ;
row : field (',' field)* '\r'? '\n' ;
field : TEXT | STRING | ;
TEXT : ~[,\n\r"]+ ;
STRING : '"' ('""'|~'"')* '"' ;

I suggest you that read this document (http://is.muni.cz/th/208197/fi_b/bc_thesis.pdf), It contains usefull information about how to write a parser for relational algebra. That is not ANTLR, but you only has to translate the grammar in BNF to EBNF.

Intellij Antlr4 Plugin Left direct recursion doesn't work

I'm trying to make parser using Antlr4 for the sql select statement, in which contains the following part
expr: '1' | expr('*'|'/'|'+'|'-'|'||') expr; // As the re-factored form of expression: compound expression;
WS :[ \t\r\n]+ -> skip ;
I suppose this rule will allow the following sets of result:
1
1+1
1+1-1
....
But in the graph it shows that it cannot be parsed
Does anyone get the idea why it cannot be parsed like what i expected?

This slightly adjusted grammar works for me. Tested on input 1+1-1||1*1-1/1. Tested in ANTLRWorks2.1
grammar myGrammar;
top : expr EOF ;
expr : '1'
| expr '+' expr
| expr '*' expr
| expr '/' expr
| expr '+' expr
| expr '-' expr
| expr '||' expr
;
WS :[ \t\r\n]+ -> skip ;
One : '1' ;
Times : '*' ;
Div : '/' ;
Plus : '+' ;
Minus : '-' ;
Or : '||' ;
EDIT
I was able to get this to work, too, when matching the rule top:
grammar newEmptyCombinedGrammar;
top : expr EOF ;
expr: one
| expr op=(Times|Div|Plus|Minus|Or) expr
;
one : One ;
One : '1' ;
Times : '*' ;
Div : '/' ;
Plus : '+' ;
Minus : '-' ;
Or : '||' ;
WS :[ \t\r\n]+ -> skip ;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Antlr grammar for parsing simple expression - antlr

Related

ANTLR4: wrong lexer rule matches

Ignore spaces, but allow text with spaces

How to fix extraneous input ' ' expecting, in antlr4

Guide or approval for ANTLR example

Intellij Antlr4 Plugin Left direct recursion doesn't work

Categories

Resources