Cannot load from object array because "this.decisionToDFA" is null - antlr

Update 2):- It is as if the static block of code at the bottom of the TestLexer.java is not running on my side.
I change the static block
static {
_decisionToDFA = new DFA[_ATN.getNumberOfDecisions()];
for (int i = 0; i < _ATN.getNumberOfDecisions(); i++) {
_decisionToDFA[i] = new DFA(_ATN.getDecisionState(i), i);
}
}
to a static metod which I called makeDecisionToDFA()
protected static final DFA[] _decisionToDFA = makeDecisionToDFA();
private static DFA[] makeDecisionToDFA() {
DFA[] decisionToDFA = new DFA[_ATN.getNumberOfDecisions()];
for (int i = 0; i < _ATN.getNumberOfDecisions(); i++) {
decisionToDFA[i] = new DFA(_ATN.getDecisionState(i), i);
}
return decisionToDFA;
}
The exception then moves to another location:-
Exception in thread "main" java.lang.NullPointerException: Cannot load from object array because "this.decisionToDFA" is null
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:341)
at com.example.test.TestParser.statements(TestParser.java:209)
at com.example.test.TestParser.parse(TestParser.java:154)
at Main.main(Main.java:31)
Updated 1):- I have added TestLexer.g4 and TestParser.g4
I am using Antlr 4.9.2 to generate a Java parser and lexer.
My java version is:-
Java(TM) SE Runtime Environment (build 15.0.1+9-18)
I am getting the following exception when I run my program.
Exception in thread "main" java.lang.NullPointerException: Cannot load from object array because "this.decisionToDFA" is null
at org.antlr.v4.runtime.atn.LexerATNSimulator.match(LexerATNSimulator.java:109)
at org.antlr.v4.runtime.Lexer.nextToken(Lexer.java:141)
at org.antlr.v4.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:169)
at org.antlr.v4.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:152)
at org.antlr.v4.runtime.BufferedTokenStream.setup(BufferedTokenStream.java:254)
at org.antlr.v4.runtime.BufferedTokenStream.lazyInit(BufferedTokenStream.java:249)
at org.antlr.v4.runtime.CommonTokenStream.LT(CommonTokenStream.java:92)
at org.antlr.v4.runtime.Parser.enterRule(Parser.java:628)
at com.examlpe.test.TestParser.parse(MyParser.java:142)
at Main.main(Main.java:37)
I am not sure what exactly I am doing wrong; I might be missing something.
I would highly appreciate it if somebody can point me in the right direction.
public class Main {
public static void main(String[] args) throws Exception {
CharStream stream = CharStreams.fromFileName(args[0]);
TestLexer lexer = new TestLexer(stream);
TestParser parser = new TestParser(new CommonTokenStream(lexer));
parser.setBuildParseTree(true);
TestParser.ParseContext tree = parser.parse();
ParseTreeWalker walker = new ParseTreeWalker();
TestListener listener = new TestListener();
walker.walk(listener, tree);
}
}
TestLexer.g4
lexer grammar TestLexer;
#header {
package com.example.test;
}
OUTPUT:'output';
PACKAGE:'package';
STRUCT:'struct';
CLASS:'class';
// §3.11 Separators
LPAREN : '(';
RPAREN : ')';
LBRACE : '{';
RBRACE : '}';
LBRACK : '[';
RBRACK : ']';
SEMI : ';';
COMMA : ',';
DOT : '.';
LIST : 'List';
MAP : 'Map';
ID : ('a'..'z' | 'A'..'Z'| '1'..'9' | '#' | '*' | '<' | '>')+ ;
PACKAGE_NAME : ID ('.' ID)* ;
ANNOTATION_NAME : AT ID ;
// §3.12 Operators
BACKTICK : '`';
ASSIGN : '=';
GT : '>';
LT : '<';
BANG : '!';
TILDE : '~';
QUESTION : '?';
COLON : ':';
EQUAL : '==';
LE : '<=';
GE : '>=';
NOTEQUAL : '!=';
AND : '&&';
OR : '||';
INC : '++';
DEC : '--';
ADD : '+';
SUB : '-';
MUL : '*';
DIV : '/';
BITAND : '&';
BITOR : '|';
CARET : '^';
MOD : '%';
ARROW : '->';
COLONCOLON : '::';
DOUBEQOATE : '"';
AT : '#';
ELLIPSIS : '...';
WS : [ \t\r\n\u000C]+ -> skip
;
COMMENT
: '/*' .*? '*/' -> skip
;
LINE_COMMENT
: '//' ~[\r\n]* -> skip
;
TestParser.g4
parser grammar TestParser;
options { tokenVocab=TestLexer; }
#header {
package com.example.test;
}
parse
:
statements* EOF
;
statements
: outputDecl
| packageDecl
| structDecl
| classDecl
;
outputDecl
: OUTPUT outputAnnotationDecl*?
;
packageDecl
: PACKAGE PACKAGE_NAME
;
outputAnnotationDecl
: name=ANNOTATION_NAME
;
structAnnotationDecl
: name=ANNOTATION_NAME
;
classAnnotationDecl
: name=ANNOTATION_NAME
;
structDecl
: structAnnotationDecl*? name=ID STRUCT LBRACE variableDecl+ RBRACE
;
variableDecl
: name=ID type=ID tagsDecl?
| name=ID LIST GT type=ID LT tagsDecl?
| name=ID MAP GT type=ID COMMA ID LT tagsDecl?
;
tagsDecl
: BACKTICK (tagDecl*?) BACKTICK
;
tagDecl
:name=ID COLON DOUBEQOATE (vale=ID (COMMA?))+ DOUBEQOATE
;
classDecl
: classAnnotationDecl*? name=ID CLASS LBRACE (functionDeclType)* RBRACE
;
functionDeclType
: name=ID COLON (functionDecl)*
;
functionDecl
: name=ID LPAREN (functionParameterDecl (COMMA)?)*? RPAREN (COLON returnType=ID)?
;
functionParameterDecl
: name=ID type=ID
;
Thank you in advance.

Related

Using Antlr to parse formulas with multiple locales

I'm very new to Antlr, so forgive what may be a very easy question.
I am creating a grammar which parses Excel-like formulas and it needs to support multiple locales based on the list separator (, for en-US) and decimal separator (. for en-US). I would prefer not to choose between separate grammars to parse with based on locale.
Can I modify or inherit from the CommonTokenStream class to accomplish this, or is there another way to do this? Examples would be helpful.
I am using the Antlr v4.5.0-alpha003 NuGet package in my VS2015 C# project.
What you can do is add a locale (or custom separator- and grouping-characters) to your lexer, and add a semantic predicate before the lexer rule that inspects your custom separator- and grouping-characters and match these tokens dynamically.
I don't have ANTLR and C# running here, but the Java demo should be pretty similar:
grammar LocaleDemo;
#lexer::header {
import java.text.DecimalFormatSymbols;
import java.util.Locale;
}
#lexer::members {
private char decimalSeparator = '.';
private char groupingSeparator = ',';
public LocaleDemoLexer(CharStream input, Locale locale) {
this(input);
DecimalFormatSymbols dfs = new DecimalFormatSymbols(locale);
this.decimalSeparator = dfs.getDecimalSeparator();
this.groupingSeparator = dfs.getGroupingSeparator();
}
}
parse
: .*? EOF
;
NUMBER
: D D? ( DG D D D )* ( DS D+ )?
;
OTHER
: .
;
fragment D : [0-9];
fragment DS : {_input.LA(1) == decimalSeparator}? . ;
fragment DG : {_input.LA(1) == groupingSeparator}? . ;
To test the grammar above, run this class:
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.Token;
import java.util.Locale;
public class Main {
private static void tokenize(String input, Locale locale) {
LocaleDemoLexer lexer = new LocaleDemoLexer(new ANTLRInputStream(input), locale);
System.out.printf("\ninput='%s', locale=%s, tokens:\n", input, locale);
for (Token t : lexer.getAllTokens()) {
System.out.printf(" %-10s '%s'\n", LocaleDemoLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
}
public static void main(String[] args) throws Exception {
tokenize("1.23", Locale.ENGLISH);
tokenize("1.23", Locale.GERMAN);
tokenize("12.345.678,90", Locale.ENGLISH);
tokenize("12.345.678,90", Locale.GERMAN);
}
}
which would print:
input='1.23', locale=en, tokens:
NUMBER '1.23'
input='1.23', locale=de, tokens:
NUMBER '1'
OTHER '.'
NUMBER '23'
input='12.345.678,90', locale=en, tokens:
NUMBER '12.345'
OTHER '.'
NUMBER '67'
NUMBER '8'
OTHER ','
NUMBER '90'
input='12.345.678,90', locale=de, tokens:
NUMBER '12.345.678,90'
Related Q&A's:
What is a 'semantic predicate' in ANTLR?
What does "fragment" mean in ANTLR?
As a follow-up to Bart's answer, this is the grammar I created with his suggestions:
grammar ExcelScript;
#lexer::header
{
using System;
using System.Globalization;
}
#lexer::members
{
private Int32 listseparator = 44; // UTF16 value for comma
private Int32 decimalseparator = 46; // UTF16 value for period
/// <summary>
/// Creates a new lexer object
/// </summary>
/// <param name="input">The input stream</param>
/// <param name="locale">The locale to use in parsing numbers</param>
/// <returns>A new lexer object</returns>
public ExcelScriptLexer (ICharStream input, CultureInfo locale)
: this(input)
{
this.listseparator = Convert.ToInt32(locale.TextInfo.ListSeparator[0]);
this.decimalseparator = Convert.ToInt32(locale.NumberFormat.NumberDecimalSeparator[0]);
// special case for 8 locales where the list separator is a , and the number separator is a , too
// Excel uses semicolon for list separator, so we will too
if (this.listseparator == 44 && this.decimalseparator == 44)
this.listseparator = 59; // UTF16 value for semicolon
}
}
/*
* Parser Rules
*/
formula
: numberLiteral
| Identifier
| '=' expression
;
expression
: primary # PrimaryExpression
| Identifier arguments # FunctionCallExpression
| ('+' | '-') expression # UnarySignExpression
| expression ('*' | '/' | '%') expression # MulDivModExpression
| expression ('+' | '-') expression # AddSubExpression
| expression ('<=' | '>=' | '>' | '<') expression # CompareExpression
| expression ('=' | '<>') expression # EqualCompareExpression
;
primary
: '(' expression ')' # ParenExpression
| literal # LiteralExpression
| Identifier # IdentifierExpression
;
literal
: numberLiteral # NumberLiteralRule
| booleanLiteral # BooleanLiteralRule
;
numberLiteral
: IntegerLiteral
| FloatingPointLiteral
;
booleanLiteral
: TrueKeyword
| FalseKeyword
;
arguments
: '(' expressionList? ')'
;
expressionList
: expression (ListSeparator expression)*
;
/*
* Lexer Rules
*/
AddOperator : '+' ;
SubOperator : '-' ;
MulOperator : '*' ;
DivOperator : '/' ;
PowOperator : '^' ;
EqOperator : '=' ;
NeqOperator : '<>' ;
LeOperator : '<=' ;
GeOperator : '>=' ;
LtOperator : '<' ;
GtOperator : '>' ;
ListSeparator : {_input.La(1) == listseparator}? . ;
DecimalSeparator : {_input.La(1) == decimalseparator}? . ;
TrueKeyword : [Tt][Rr][Uu][Ee] ;
FalseKeyword : [Ff][Aa][Ll][Ss][Ee] ;
Identifier
: Letter (Letter | Digit)*
;
fragment Letter
: [A-Z_a-z]
;
fragment Digit
: [0-9]
;
IntegerLiteral
: '0'
| [1-9] [0-9]*
;
FloatingPointLiteral
: [0-9]+ DecimalSeparator [0-9]* Exponent?
| DecimalSeparator [0-9]+ Exponent?
| [0-9]+ Exponent
;
fragment Exponent
: ('e' | 'E') ('+' | '-')? ('0'..'9')+
;
WhiteSpace
: [ \t]+ -> channel(HIDDEN)
;

ANTLR v3 grammar for boolean/conditional expression

I'm taking a first stab at creating a grammar for expressions like:
(foo = bar or (bar = "bar" and baz = 45.43)) and test = true
My grammar so far looks like:
grammar filter;
tokens {
TRUE = 'true';
FALSE = 'false';
AND = 'and';
OR = 'or';
LT = '<';
GT = '>';
EQ = '=';
NEQ = '!=';
PATHSEP = '/';
LBRACK = '[';
RBRACK = ']';
LPAREN = '(';
RPAREN = ')';
}
expression : or_expression EOF;
or_expression : and_expression (OR or_expression)*;
and_expression : term (AND term)*;
term : atom ( operator atom)? | LPAREN expression RPAREN;
atom : ID | INT | FLOAT | STRING | TRUE | FALSE;
operator : LT | GT | EQ | NEQ;
INT : '0'..'9'+;
FLOAT : ('0'..'9')+ '.' ('0'..'9')*;
STRING : '"' ('a'..'z'|'A'..'Z'|'_'|' ')* '"';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
But in ANTLRWorks 1.4.3, I get the parse tree:
But for the life of me I can't figure out what is wrong with my grammar. What token is it missing here?
Many thanks in advance.
Edit: To clarify the atom ( operator atom)? alternative in the atom production, I should perhaps mention that atoms should be able to be free-standing without comparison to another atom. E.g. a or b is a valid expression.
I'm answering to my own question here. I found two problems with my grammar. The first was easy to spot; I had put EOF at the end of my top-level rule:
expression : or_expression EOF;
The EOF was thus the missing token. My solution was remove the EOF from the expression rule, and instead introduce a rule above it:
filter: expression EOF;
The second problem was that my or_expression rule should be:
or_expression : and_expression (OR and_expression)*;
and not
or_expression : and_expression (OR or_expression)*;
The full corrected grammar is:
grammar filter;
tokens {
TRUE = 'true';
FALSE = 'false';
AND = 'and';
OR = 'or';
LT = '<';
GT = '>';
EQ = '=';
NEQ = '!=';
PATHSEP = '/';
LBRACK = '[';
RBRACK = ']';
LPAREN = '(';
RPAREN = ')';
}
filter: expression EOF;
expression : or_expression;
or_expression : and_expression (OR and_expression)*;
and_expression : term (AND term)*;
term : atom (operator atom)? | LPAREN expression RPAREN;
atom : ID | INT | FLOAT | STRING | TRUE | FALSE;
operator : LT | GT | EQ | NEQ;
INT : '0'..'9'+;
FLOAT : ('0'..'9')+ '.' ('0'..'9')*;
STRING : '"' ('a'..'z'|'A'..'Z'|'_'|' ')* '"';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
And the resulting parse tree is:

How to resolve "The following alternatives can never be matched"

I have been struggling to resolve a "multiple alternatives" error in my parser for a couple of days now but with no success. I have been converting Bart Kiers excellent Tiny Language(TL) tutorial code to C# using Sam Harwell's port of ANTLR3 and VS2010. Kudos to both these guys for their excellent work. I believe I have followed Bart's tutorial accurately but as I am a newbie with ANTLR I can't be sure.
I did have the TL code working nicely on a pure math basis i.e. no "functions" or "if then else" or "while" (see screenshot of a little app)
but when I added the code for the missing pieces to complete the tutorial I get a parsing error in "functionCall" and in "list" (see the code below)
grammar Paralex2;
options {
language=CSharp3;
TokenLabelType=CommonToken;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
BLOCK;
RETURN;
STATEMENTS;
ASSIGNMENT;
FUNC_CALL;
EXP;
EXP_LIST;
ID_LIST;
IF;
TERNARY;
U_SUB;
NEGATE;
FUNCTION;
INDEXES;
LIST;
LOOKUP;
}
#lexer::namespace{Paralex2}
#parser::namespace{Paralex2}
/*
* Parser Rules
*/
#parser::header {using System; using System.Collections.Generic;}
#parser::members{
public SortedList<string, Function> functions = new SortedList<string, Function>();
private void defineFunction(string id, Object idList, Object block) {
// `idList` is possibly null! Create an empty tree in that case.
CommonTree idListTree = idList == null ? new CommonTree() : (CommonTree)idList;
// `block` is never null.
CommonTree blockTree = (CommonTree)block;
// The function name with the number of parameters after it the unique key
string key = id + idListTree.Children.Count();
functions.Add(key, new Function(id, idListTree, blockTree));
}
}
public parse
: block EOF -> block
;
block
: (statement | functionDecl)* (Return exp ';')? -> ^(BLOCK ^(STATEMENTS statement*) ^(RETURN exp?))
;
statement
: assignment ';' -> assignment
| functionCall ';' -> functionCall
| ifStatement
| forStatement
| whileStatement
;
assignment
: Identifier indexes? '=' exp
-> ^(ASSIGNMENT Identifier indexes? exp)
;
functionCall
: Identifier '(' expList? ')' -> ^(FUNC_CALL Identifier expList?)
| Assert '(' exp ')' -> ^(FUNC_CALL Assert exp)
| Size '(' exp ')' -> ^(FUNC_CALL Size exp)
;
ifStatement
: ifStat elseIfStat* elseStat? End -> ^(IF ifStat elseIfStat* elseStat?)
;
ifStat
: If exp Do block -> ^(EXP exp block)
;
elseIfStat
: Else If exp Do block -> ^(EXP exp block)
;
elseStat
: Else Do block -> ^(EXP block)
;
functionDecl
: Def Identifier '(' idList? ')' block End
{defineFunction($Identifier.text, $idList.tree, $block.tree);}
;
forStatement
: For Identifier '=' exp To exp Do block End
-> ^(For Identifier exp exp block)
;
whileStatement
: While exp Do block End -> ^(While exp block)
;
idList
: Identifier (',' Identifier)* -> ^(ID_LIST Identifier+)
;
expList
: exp (',' exp)* -> ^(EXP_LIST exp+)
;
exp
: condExp
;
condExp
: (orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
orExp
: andExp ('||'^ andExp)*
;
andExp
: equExp ('&&'^ equExp)*
;
equExp
: relExp (('==' | '!=')^ relExp)*
;
relExp
: addExp (('>=' | '<=' | '>' | '<')^ addExp)*
;
addExp
: mulExp ((Add | Sub)^ mulExp)*
;
mulExp
: powExp ((Mul | Div)^ powExp)*
;
powExp
: unaryExp ('^'^ unaryExp)*
;
unaryExp
: Sub atom -> ^(U_SUB atom)
| '!' atom -> ^(NEGATE atom)
| atom
;
atom
: Nmber
| Bool
| Null
| lookup
;
list
: '[' expList? ']' -> ^(LIST expList?)
;
lookup
: list indexes? -> ^(LOOKUP list indexes?)
| functionCall indexes? -> ^(LOOKUP functionCall indexes?)
| Identifier indexes? -> ^(LOOKUP Identifier indexes?)
| String indexes? -> ^(LOOKUP String indexes?)
| '(' exp ')' indexes? -> ^(LOOKUP exp indexes?)
;
indexes
: ('[' exp ']')+ -> ^(INDEXES exp+)
;
/*
* Lexer Rules
*/
Assert : 'assert';
Size : 'size';
Def : 'def';
If : 'if';
Else : 'else';
Return : 'return';
For : 'for';
While : 'while';
To : 'to';
Do : 'do';
End : 'end';
In : 'in';
Null : 'null';
Or : '||';
And : '&&';
Equals : '==';
NEquals : '!=';
GTEquals : '>=';
LTEquals : '<=';
Pow : '^';
GT : '>';
LT : '<';
Add : '+';
Sub : '-';
Mul : '*';
Div : '/';
Modulus : '%';
OBrace : '{';
CBrace : '}';
OBracket : '[';
CBracket : ']';
OParen : '(';
CParen : ')';
SColon : ';';
Assign : '=';
Comma : ',';
QMark : '?';
Colon : ':';
Bool
: 'true'
| 'false'
;
Nmber
: Int ('.' Digit*)?
;
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | Digit)*
;
String
#after {
setText(getText().substring(1, getText().length()-1).replaceAll("\\\\(.)", "$1"));
}
: '"' (~('"' | '\\') | '\\' ('\\' | '"'))* '"'
| '\'' (~('\'' | '\\') | '\\' ('\\' | '\''))* '\''
;
Comment
: '//' ~('\r' | '\n')* {Skip();}
| '/*' .* '*/' {Skip();}
;
Space
: (' ' | '\t' | '\r' | '\n' | '\u000C') {Skip();}
;
fragment Int
: '1'..'9' Digit*
| '0'
;
fragment Digit
: '0'..'9'
;
The error messages I get are
Decision can match input such as "CParen" using multiple alternatives: 1, 2 : Line 79:20
and
Decision can match input such as "CBracket" using multiple alternatives: 1, 2 : Line 176:10
The errors relate to the functionCall and list rules. I have examined the parser file in ANTLRWorks 1.5 and confirmed the same errors there. The syntax diagrams for the two rules look like this;
and this;
I have tried several changes to try to solve the problem but I don't seem to be able to get the syntax right. I would appreciate any help you guys could provide and can email the images if that would help.
Thanks in advance
Ian Carson
You have an OR-operator too many in the condExp rule making the grammar ambiguous.
You have:
condExp
: ( orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:
But it should be:
condExp
: ( orExp -> orExp)
( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:

adding (...) {...} function literals while abstaining from backtracking

Building off the answer found in How to have both function calls and parenthetical grouping without backtrack, I'd like to add function literals which are in a non LL(*) means implemented like
...
tokens {
...
FN;
ID_LIST;
}
stmt
: expr SEMI // SEMI=';'
;
callable
: ...
| fn
;
fn
: OPAREN opt_id_list CPAREN compound_stmt
-> ^(FN opt_id_list compound_stmt)
;
compound_stmt
: OBRACE stmt* CBRACE
opt_id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
What I'd like to do is allow anonymous function literals that have an argument list (e.g. () or (a) or (a, b, c)) followed by a compound_stmt. So (a, b, c){...} is good. But (x)(y){} not so much. (Of course (x) * (y){} is "valid" in terms of the parser, just as ((y){})()[1].x would be.)
The parser needs a bit of extra look ahead. I guess it could be done without it, but it would definitely result in some horrible looking parser rule(s) that are a pain to maintain and a parser that would accept (a, 2, 3){...} (a function literal with an expression-list instead of an id-list), for example. This would cause you to do quite a bit of semantic checking after the AST has been created.
The (IMO) best way to solve this is by adding the function literal rule in the callable and adding a syntactic predicate in front of it which will tell the parser to make sure there really is such an alternative before actually matching it.
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
A demo:
grammar T;
options {
output=AST;
}
tokens {
// literal tokens
EQ = '==' ;
GT = '>' ;
LT = '<' ;
GTE = '>=' ;
LTE = '<=' ;
LAND = '&&' ;
LOR = '||' ;
PLUS = '+' ;
MINUS = '-' ;
TIMES = '*' ;
DIVIDE = '/' ;
OPAREN = '(' ;
CPAREN = ')' ;
OBRACK = '[' ;
CBRACK = ']' ;
DOT = '.' ;
COMMA = ',' ;
OBRACE = '{' ;
CBRACE = '}' ;
SEMI = ';' ;
// imaginary tokens
CALL;
INDEX;
LOOKUP;
UNARY_MINUS;
PARAMS;
FN;
ID_LIST;
STATS;
}
prog
: expr EOF -> expr
;
expr
: boolExpr
;
boolExpr
: relExpr ((LAND | LOR)^ relExpr)?
;
relExpr
: (a=addExpr -> $a) ( (oa=relOp b=addExpr -> ^($oa $a $b))
( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
)?
)?
;
addExpr
: mulExpr ((PLUS | MINUS)^ mulExpr)*
;
mulExpr
: unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
;
unaryExpr
: MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
| atomExpr
;
atomExpr
: INT
| call
;
call
: (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
| OBRACK expr CBRACK -> ^(INDEX $call expr)
| DOT ID -> ^(INDEX $call ID)
)*
;
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
fn_literal
: OPAREN id_list CPAREN compound_stmt -> ^(FN id_list compound_stmt)
;
id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
params
: (expr (COMMA expr)*)? -> ^(PARAMS expr*)
;
compound_stmt
: OBRACE stmt* CBRACE -> ^(STATS stmt*)
;
stmt
: expr SEMI
;
relOp
: EQ | GT | LT | GTE | LTE
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
SPACE : (' ' | '\t') {skip();};
A parser generated by the grammar above would reject the input (x)(y){} while it properly parses the following 3 snippets of code:
1
(a, b, c){ a+b*c; }
2
(x) * (y){ x.y; }
3
((y){})()[1].x

define a grammar in Antlr

I have defined the following grammar.
grammar Sample_1;
#header {
package a;
}
#lexer::header {
package a;
}
program
:
define*
implement*
;
define
: IDENT '=(' INTEGER',' INTEGER ')'
;
implement
:IDENT '=(' (IDENT ','?)* ')'
;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
IDENT : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
How to check in this grammar so that when I have the example
A=(1,1)
B=(1,2)
G=(A,B)
the result is successful but if I write
A=(1,1)
B=(1,2)
G=(A,E)
it gives an error that E is not defined
thanks
the result:
i got it working thanks a lot:
grammar Sample_1;
#members{
int level=0;
}
#header {
package a;
}
#lexer::header {
package a;
}
program
:
block
;
block
scope {
List symbols;
}
#init {
$block::symbols=new ArrayList();
level++;
}
#after {
System.err.println("Hello");
level--;
}
: (define* implement+)
;
define
: IDENT {$block::symbols.add($IDENT.text);} '=(' INTEGER',' INTEGER ')'
;
implement
:IDENT '=(' (a=IDENT
{if (!$block::symbols.contains($a.text)){
System.err.println("undefined");
}}','?)* ')'
;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
IDENT : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
Antlr supports actions, little snippets of code embedded in the grammar file.
An action for an assignment could store into a map. An action for a right-hand-side IDENT could try to pull a value from the map, and throw an exception if it fails.
Chapter 6 in Terrence Parr's "The Definitive ANTLR Reference" covers actions.