Update 2):- It is as if the static block of code at the bottom of the TestLexer.java is not running on my side.
I change the static block
static {
_decisionToDFA = new DFA[_ATN.getNumberOfDecisions()];
for (int i = 0; i < _ATN.getNumberOfDecisions(); i++) {
_decisionToDFA[i] = new DFA(_ATN.getDecisionState(i), i);
}
}
to a static metod which I called makeDecisionToDFA()
protected static final DFA[] _decisionToDFA = makeDecisionToDFA();
private static DFA[] makeDecisionToDFA() {
DFA[] decisionToDFA = new DFA[_ATN.getNumberOfDecisions()];
for (int i = 0; i < _ATN.getNumberOfDecisions(); i++) {
decisionToDFA[i] = new DFA(_ATN.getDecisionState(i), i);
}
return decisionToDFA;
}
The exception then moves to another location:-
Exception in thread "main" java.lang.NullPointerException: Cannot load from object array because "this.decisionToDFA" is null
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:341)
at com.example.test.TestParser.statements(TestParser.java:209)
at com.example.test.TestParser.parse(TestParser.java:154)
at Main.main(Main.java:31)
Updated 1):- I have added TestLexer.g4 and TestParser.g4
I am using Antlr 4.9.2 to generate a Java parser and lexer.
My java version is:-
Java(TM) SE Runtime Environment (build 15.0.1+9-18)
I am getting the following exception when I run my program.
Exception in thread "main" java.lang.NullPointerException: Cannot load from object array because "this.decisionToDFA" is null
at org.antlr.v4.runtime.atn.LexerATNSimulator.match(LexerATNSimulator.java:109)
at org.antlr.v4.runtime.Lexer.nextToken(Lexer.java:141)
at org.antlr.v4.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:169)
at org.antlr.v4.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:152)
at org.antlr.v4.runtime.BufferedTokenStream.setup(BufferedTokenStream.java:254)
at org.antlr.v4.runtime.BufferedTokenStream.lazyInit(BufferedTokenStream.java:249)
at org.antlr.v4.runtime.CommonTokenStream.LT(CommonTokenStream.java:92)
at org.antlr.v4.runtime.Parser.enterRule(Parser.java:628)
at com.examlpe.test.TestParser.parse(MyParser.java:142)
at Main.main(Main.java:37)
I am not sure what exactly I am doing wrong; I might be missing something.
I would highly appreciate it if somebody can point me in the right direction.
public class Main {
public static void main(String[] args) throws Exception {
CharStream stream = CharStreams.fromFileName(args[0]);
TestLexer lexer = new TestLexer(stream);
TestParser parser = new TestParser(new CommonTokenStream(lexer));
parser.setBuildParseTree(true);
TestParser.ParseContext tree = parser.parse();
ParseTreeWalker walker = new ParseTreeWalker();
TestListener listener = new TestListener();
walker.walk(listener, tree);
}
}
TestLexer.g4
lexer grammar TestLexer;
#header {
package com.example.test;
}
OUTPUT:'output';
PACKAGE:'package';
STRUCT:'struct';
CLASS:'class';
// §3.11 Separators
LPAREN : '(';
RPAREN : ')';
LBRACE : '{';
RBRACE : '}';
LBRACK : '[';
RBRACK : ']';
SEMI : ';';
COMMA : ',';
DOT : '.';
LIST : 'List';
MAP : 'Map';
ID : ('a'..'z' | 'A'..'Z'| '1'..'9' | '#' | '*' | '<' | '>')+ ;
PACKAGE_NAME : ID ('.' ID)* ;
ANNOTATION_NAME : AT ID ;
// §3.12 Operators
BACKTICK : '`';
ASSIGN : '=';
GT : '>';
LT : '<';
BANG : '!';
TILDE : '~';
QUESTION : '?';
COLON : ':';
EQUAL : '==';
LE : '<=';
GE : '>=';
NOTEQUAL : '!=';
AND : '&&';
OR : '||';
INC : '++';
DEC : '--';
ADD : '+';
SUB : '-';
MUL : '*';
DIV : '/';
BITAND : '&';
BITOR : '|';
CARET : '^';
MOD : '%';
ARROW : '->';
COLONCOLON : '::';
DOUBEQOATE : '"';
AT : '#';
ELLIPSIS : '...';
WS : [ \t\r\n\u000C]+ -> skip
;
COMMENT
: '/*' .*? '*/' -> skip
;
LINE_COMMENT
: '//' ~[\r\n]* -> skip
;
TestParser.g4
parser grammar TestParser;
options { tokenVocab=TestLexer; }
#header {
package com.example.test;
}
parse
:
statements* EOF
;
statements
: outputDecl
| packageDecl
| structDecl
| classDecl
;
outputDecl
: OUTPUT outputAnnotationDecl*?
;
packageDecl
: PACKAGE PACKAGE_NAME
;
outputAnnotationDecl
: name=ANNOTATION_NAME
;
structAnnotationDecl
: name=ANNOTATION_NAME
;
classAnnotationDecl
: name=ANNOTATION_NAME
;
structDecl
: structAnnotationDecl*? name=ID STRUCT LBRACE variableDecl+ RBRACE
;
variableDecl
: name=ID type=ID tagsDecl?
| name=ID LIST GT type=ID LT tagsDecl?
| name=ID MAP GT type=ID COMMA ID LT tagsDecl?
;
tagsDecl
: BACKTICK (tagDecl*?) BACKTICK
;
tagDecl
:name=ID COLON DOUBEQOATE (vale=ID (COMMA?))+ DOUBEQOATE
;
classDecl
: classAnnotationDecl*? name=ID CLASS LBRACE (functionDeclType)* RBRACE
;
functionDeclType
: name=ID COLON (functionDecl)*
;
functionDecl
: name=ID LPAREN (functionParameterDecl (COMMA)?)*? RPAREN (COLON returnType=ID)?
;
functionParameterDecl
: name=ID type=ID
;
Thank you in advance.
Related
I'm very new to Antlr, so forgive what may be a very easy question.
I am creating a grammar which parses Excel-like formulas and it needs to support multiple locales based on the list separator (, for en-US) and decimal separator (. for en-US). I would prefer not to choose between separate grammars to parse with based on locale.
Can I modify or inherit from the CommonTokenStream class to accomplish this, or is there another way to do this? Examples would be helpful.
I am using the Antlr v4.5.0-alpha003 NuGet package in my VS2015 C# project.
What you can do is add a locale (or custom separator- and grouping-characters) to your lexer, and add a semantic predicate before the lexer rule that inspects your custom separator- and grouping-characters and match these tokens dynamically.
I don't have ANTLR and C# running here, but the Java demo should be pretty similar:
grammar LocaleDemo;
#lexer::header {
import java.text.DecimalFormatSymbols;
import java.util.Locale;
}
#lexer::members {
private char decimalSeparator = '.';
private char groupingSeparator = ',';
public LocaleDemoLexer(CharStream input, Locale locale) {
this(input);
DecimalFormatSymbols dfs = new DecimalFormatSymbols(locale);
this.decimalSeparator = dfs.getDecimalSeparator();
this.groupingSeparator = dfs.getGroupingSeparator();
}
}
parse
: .*? EOF
;
NUMBER
: D D? ( DG D D D )* ( DS D+ )?
;
OTHER
: .
;
fragment D : [0-9];
fragment DS : {_input.LA(1) == decimalSeparator}? . ;
fragment DG : {_input.LA(1) == groupingSeparator}? . ;
To test the grammar above, run this class:
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.Token;
import java.util.Locale;
public class Main {
private static void tokenize(String input, Locale locale) {
LocaleDemoLexer lexer = new LocaleDemoLexer(new ANTLRInputStream(input), locale);
System.out.printf("\ninput='%s', locale=%s, tokens:\n", input, locale);
for (Token t : lexer.getAllTokens()) {
System.out.printf(" %-10s '%s'\n", LocaleDemoLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
}
public static void main(String[] args) throws Exception {
tokenize("1.23", Locale.ENGLISH);
tokenize("1.23", Locale.GERMAN);
tokenize("12.345.678,90", Locale.ENGLISH);
tokenize("12.345.678,90", Locale.GERMAN);
}
}
which would print:
input='1.23', locale=en, tokens:
NUMBER '1.23'
input='1.23', locale=de, tokens:
NUMBER '1'
OTHER '.'
NUMBER '23'
input='12.345.678,90', locale=en, tokens:
NUMBER '12.345'
OTHER '.'
NUMBER '67'
NUMBER '8'
OTHER ','
NUMBER '90'
input='12.345.678,90', locale=de, tokens:
NUMBER '12.345.678,90'
Related Q&A's:
What is a 'semantic predicate' in ANTLR?
What does "fragment" mean in ANTLR?
As a follow-up to Bart's answer, this is the grammar I created with his suggestions:
grammar ExcelScript;
#lexer::header
{
using System;
using System.Globalization;
}
#lexer::members
{
private Int32 listseparator = 44; // UTF16 value for comma
private Int32 decimalseparator = 46; // UTF16 value for period
/// <summary>
/// Creates a new lexer object
/// </summary>
/// <param name="input">The input stream</param>
/// <param name="locale">The locale to use in parsing numbers</param>
/// <returns>A new lexer object</returns>
public ExcelScriptLexer (ICharStream input, CultureInfo locale)
: this(input)
{
this.listseparator = Convert.ToInt32(locale.TextInfo.ListSeparator[0]);
this.decimalseparator = Convert.ToInt32(locale.NumberFormat.NumberDecimalSeparator[0]);
// special case for 8 locales where the list separator is a , and the number separator is a , too
// Excel uses semicolon for list separator, so we will too
if (this.listseparator == 44 && this.decimalseparator == 44)
this.listseparator = 59; // UTF16 value for semicolon
}
}
/*
* Parser Rules
*/
formula
: numberLiteral
| Identifier
| '=' expression
;
expression
: primary # PrimaryExpression
| Identifier arguments # FunctionCallExpression
| ('+' | '-') expression # UnarySignExpression
| expression ('*' | '/' | '%') expression # MulDivModExpression
| expression ('+' | '-') expression # AddSubExpression
| expression ('<=' | '>=' | '>' | '<') expression # CompareExpression
| expression ('=' | '<>') expression # EqualCompareExpression
;
primary
: '(' expression ')' # ParenExpression
| literal # LiteralExpression
| Identifier # IdentifierExpression
;
literal
: numberLiteral # NumberLiteralRule
| booleanLiteral # BooleanLiteralRule
;
numberLiteral
: IntegerLiteral
| FloatingPointLiteral
;
booleanLiteral
: TrueKeyword
| FalseKeyword
;
arguments
: '(' expressionList? ')'
;
expressionList
: expression (ListSeparator expression)*
;
/*
* Lexer Rules
*/
AddOperator : '+' ;
SubOperator : '-' ;
MulOperator : '*' ;
DivOperator : '/' ;
PowOperator : '^' ;
EqOperator : '=' ;
NeqOperator : '<>' ;
LeOperator : '<=' ;
GeOperator : '>=' ;
LtOperator : '<' ;
GtOperator : '>' ;
ListSeparator : {_input.La(1) == listseparator}? . ;
DecimalSeparator : {_input.La(1) == decimalseparator}? . ;
TrueKeyword : [Tt][Rr][Uu][Ee] ;
FalseKeyword : [Ff][Aa][Ll][Ss][Ee] ;
Identifier
: Letter (Letter | Digit)*
;
fragment Letter
: [A-Z_a-z]
;
fragment Digit
: [0-9]
;
IntegerLiteral
: '0'
| [1-9] [0-9]*
;
FloatingPointLiteral
: [0-9]+ DecimalSeparator [0-9]* Exponent?
| DecimalSeparator [0-9]+ Exponent?
| [0-9]+ Exponent
;
fragment Exponent
: ('e' | 'E') ('+' | '-')? ('0'..'9')+
;
WhiteSpace
: [ \t]+ -> channel(HIDDEN)
;
I'm taking a first stab at creating a grammar for expressions like:
(foo = bar or (bar = "bar" and baz = 45.43)) and test = true
My grammar so far looks like:
grammar filter;
tokens {
TRUE = 'true';
FALSE = 'false';
AND = 'and';
OR = 'or';
LT = '<';
GT = '>';
EQ = '=';
NEQ = '!=';
PATHSEP = '/';
LBRACK = '[';
RBRACK = ']';
LPAREN = '(';
RPAREN = ')';
}
expression : or_expression EOF;
or_expression : and_expression (OR or_expression)*;
and_expression : term (AND term)*;
term : atom ( operator atom)? | LPAREN expression RPAREN;
atom : ID | INT | FLOAT | STRING | TRUE | FALSE;
operator : LT | GT | EQ | NEQ;
INT : '0'..'9'+;
FLOAT : ('0'..'9')+ '.' ('0'..'9')*;
STRING : '"' ('a'..'z'|'A'..'Z'|'_'|' ')* '"';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
But in ANTLRWorks 1.4.3, I get the parse tree:
But for the life of me I can't figure out what is wrong with my grammar. What token is it missing here?
Many thanks in advance.
Edit: To clarify the atom ( operator atom)? alternative in the atom production, I should perhaps mention that atoms should be able to be free-standing without comparison to another atom. E.g. a or b is a valid expression.
I'm answering to my own question here. I found two problems with my grammar. The first was easy to spot; I had put EOF at the end of my top-level rule:
expression : or_expression EOF;
The EOF was thus the missing token. My solution was remove the EOF from the expression rule, and instead introduce a rule above it:
filter: expression EOF;
The second problem was that my or_expression rule should be:
or_expression : and_expression (OR and_expression)*;
and not
or_expression : and_expression (OR or_expression)*;
The full corrected grammar is:
grammar filter;
tokens {
TRUE = 'true';
FALSE = 'false';
AND = 'and';
OR = 'or';
LT = '<';
GT = '>';
EQ = '=';
NEQ = '!=';
PATHSEP = '/';
LBRACK = '[';
RBRACK = ']';
LPAREN = '(';
RPAREN = ')';
}
filter: expression EOF;
expression : or_expression;
or_expression : and_expression (OR and_expression)*;
and_expression : term (AND term)*;
term : atom (operator atom)? | LPAREN expression RPAREN;
atom : ID | INT | FLOAT | STRING | TRUE | FALSE;
operator : LT | GT | EQ | NEQ;
INT : '0'..'9'+;
FLOAT : ('0'..'9')+ '.' ('0'..'9')*;
STRING : '"' ('a'..'z'|'A'..'Z'|'_'|' ')* '"';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
And the resulting parse tree is:
I have been struggling to resolve a "multiple alternatives" error in my parser for a couple of days now but with no success. I have been converting Bart Kiers excellent Tiny Language(TL) tutorial code to C# using Sam Harwell's port of ANTLR3 and VS2010. Kudos to both these guys for their excellent work. I believe I have followed Bart's tutorial accurately but as I am a newbie with ANTLR I can't be sure.
I did have the TL code working nicely on a pure math basis i.e. no "functions" or "if then else" or "while" (see screenshot of a little app)
but when I added the code for the missing pieces to complete the tutorial I get a parsing error in "functionCall" and in "list" (see the code below)
grammar Paralex2;
options {
language=CSharp3;
TokenLabelType=CommonToken;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
BLOCK;
RETURN;
STATEMENTS;
ASSIGNMENT;
FUNC_CALL;
EXP;
EXP_LIST;
ID_LIST;
IF;
TERNARY;
U_SUB;
NEGATE;
FUNCTION;
INDEXES;
LIST;
LOOKUP;
}
#lexer::namespace{Paralex2}
#parser::namespace{Paralex2}
/*
* Parser Rules
*/
#parser::header {using System; using System.Collections.Generic;}
#parser::members{
public SortedList<string, Function> functions = new SortedList<string, Function>();
private void defineFunction(string id, Object idList, Object block) {
// `idList` is possibly null! Create an empty tree in that case.
CommonTree idListTree = idList == null ? new CommonTree() : (CommonTree)idList;
// `block` is never null.
CommonTree blockTree = (CommonTree)block;
// The function name with the number of parameters after it the unique key
string key = id + idListTree.Children.Count();
functions.Add(key, new Function(id, idListTree, blockTree));
}
}
public parse
: block EOF -> block
;
block
: (statement | functionDecl)* (Return exp ';')? -> ^(BLOCK ^(STATEMENTS statement*) ^(RETURN exp?))
;
statement
: assignment ';' -> assignment
| functionCall ';' -> functionCall
| ifStatement
| forStatement
| whileStatement
;
assignment
: Identifier indexes? '=' exp
-> ^(ASSIGNMENT Identifier indexes? exp)
;
functionCall
: Identifier '(' expList? ')' -> ^(FUNC_CALL Identifier expList?)
| Assert '(' exp ')' -> ^(FUNC_CALL Assert exp)
| Size '(' exp ')' -> ^(FUNC_CALL Size exp)
;
ifStatement
: ifStat elseIfStat* elseStat? End -> ^(IF ifStat elseIfStat* elseStat?)
;
ifStat
: If exp Do block -> ^(EXP exp block)
;
elseIfStat
: Else If exp Do block -> ^(EXP exp block)
;
elseStat
: Else Do block -> ^(EXP block)
;
functionDecl
: Def Identifier '(' idList? ')' block End
{defineFunction($Identifier.text, $idList.tree, $block.tree);}
;
forStatement
: For Identifier '=' exp To exp Do block End
-> ^(For Identifier exp exp block)
;
whileStatement
: While exp Do block End -> ^(While exp block)
;
idList
: Identifier (',' Identifier)* -> ^(ID_LIST Identifier+)
;
expList
: exp (',' exp)* -> ^(EXP_LIST exp+)
;
exp
: condExp
;
condExp
: (orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
orExp
: andExp ('||'^ andExp)*
;
andExp
: equExp ('&&'^ equExp)*
;
equExp
: relExp (('==' | '!=')^ relExp)*
;
relExp
: addExp (('>=' | '<=' | '>' | '<')^ addExp)*
;
addExp
: mulExp ((Add | Sub)^ mulExp)*
;
mulExp
: powExp ((Mul | Div)^ powExp)*
;
powExp
: unaryExp ('^'^ unaryExp)*
;
unaryExp
: Sub atom -> ^(U_SUB atom)
| '!' atom -> ^(NEGATE atom)
| atom
;
atom
: Nmber
| Bool
| Null
| lookup
;
list
: '[' expList? ']' -> ^(LIST expList?)
;
lookup
: list indexes? -> ^(LOOKUP list indexes?)
| functionCall indexes? -> ^(LOOKUP functionCall indexes?)
| Identifier indexes? -> ^(LOOKUP Identifier indexes?)
| String indexes? -> ^(LOOKUP String indexes?)
| '(' exp ')' indexes? -> ^(LOOKUP exp indexes?)
;
indexes
: ('[' exp ']')+ -> ^(INDEXES exp+)
;
/*
* Lexer Rules
*/
Assert : 'assert';
Size : 'size';
Def : 'def';
If : 'if';
Else : 'else';
Return : 'return';
For : 'for';
While : 'while';
To : 'to';
Do : 'do';
End : 'end';
In : 'in';
Null : 'null';
Or : '||';
And : '&&';
Equals : '==';
NEquals : '!=';
GTEquals : '>=';
LTEquals : '<=';
Pow : '^';
GT : '>';
LT : '<';
Add : '+';
Sub : '-';
Mul : '*';
Div : '/';
Modulus : '%';
OBrace : '{';
CBrace : '}';
OBracket : '[';
CBracket : ']';
OParen : '(';
CParen : ')';
SColon : ';';
Assign : '=';
Comma : ',';
QMark : '?';
Colon : ':';
Bool
: 'true'
| 'false'
;
Nmber
: Int ('.' Digit*)?
;
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | Digit)*
;
String
#after {
setText(getText().substring(1, getText().length()-1).replaceAll("\\\\(.)", "$1"));
}
: '"' (~('"' | '\\') | '\\' ('\\' | '"'))* '"'
| '\'' (~('\'' | '\\') | '\\' ('\\' | '\''))* '\''
;
Comment
: '//' ~('\r' | '\n')* {Skip();}
| '/*' .* '*/' {Skip();}
;
Space
: (' ' | '\t' | '\r' | '\n' | '\u000C') {Skip();}
;
fragment Int
: '1'..'9' Digit*
| '0'
;
fragment Digit
: '0'..'9'
;
The error messages I get are
Decision can match input such as "CParen" using multiple alternatives: 1, 2 : Line 79:20
and
Decision can match input such as "CBracket" using multiple alternatives: 1, 2 : Line 176:10
The errors relate to the functionCall and list rules. I have examined the parser file in ANTLRWorks 1.5 and confirmed the same errors there. The syntax diagrams for the two rules look like this;
and this;
I have tried several changes to try to solve the problem but I don't seem to be able to get the syntax right. I would appreciate any help you guys could provide and can email the images if that would help.
Thanks in advance
Ian Carson
You have an OR-operator too many in the condExp rule making the grammar ambiguous.
You have:
condExp
: ( orExp -> orExp)
| ( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:
But it should be:
condExp
: ( orExp -> orExp)
( '?' a=exp ':' b=exp -> ^(TERNARY orExp $a $b)
| In exp -> ^(In orExp exp)
)?
;
corresponding to:
Building off the answer found in How to have both function calls and parenthetical grouping without backtrack, I'd like to add function literals which are in a non LL(*) means implemented like
...
tokens {
...
FN;
ID_LIST;
}
stmt
: expr SEMI // SEMI=';'
;
callable
: ...
| fn
;
fn
: OPAREN opt_id_list CPAREN compound_stmt
-> ^(FN opt_id_list compound_stmt)
;
compound_stmt
: OBRACE stmt* CBRACE
opt_id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
What I'd like to do is allow anonymous function literals that have an argument list (e.g. () or (a) or (a, b, c)) followed by a compound_stmt. So (a, b, c){...} is good. But (x)(y){} not so much. (Of course (x) * (y){} is "valid" in terms of the parser, just as ((y){})()[1].x would be.)
The parser needs a bit of extra look ahead. I guess it could be done without it, but it would definitely result in some horrible looking parser rule(s) that are a pain to maintain and a parser that would accept (a, 2, 3){...} (a function literal with an expression-list instead of an id-list), for example. This would cause you to do quite a bit of semantic checking after the AST has been created.
The (IMO) best way to solve this is by adding the function literal rule in the callable and adding a syntactic predicate in front of it which will tell the parser to make sure there really is such an alternative before actually matching it.
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
A demo:
grammar T;
options {
output=AST;
}
tokens {
// literal tokens
EQ = '==' ;
GT = '>' ;
LT = '<' ;
GTE = '>=' ;
LTE = '<=' ;
LAND = '&&' ;
LOR = '||' ;
PLUS = '+' ;
MINUS = '-' ;
TIMES = '*' ;
DIVIDE = '/' ;
OPAREN = '(' ;
CPAREN = ')' ;
OBRACK = '[' ;
CBRACK = ']' ;
DOT = '.' ;
COMMA = ',' ;
OBRACE = '{' ;
CBRACE = '}' ;
SEMI = ';' ;
// imaginary tokens
CALL;
INDEX;
LOOKUP;
UNARY_MINUS;
PARAMS;
FN;
ID_LIST;
STATS;
}
prog
: expr EOF -> expr
;
expr
: boolExpr
;
boolExpr
: relExpr ((LAND | LOR)^ relExpr)?
;
relExpr
: (a=addExpr -> $a) ( (oa=relOp b=addExpr -> ^($oa $a $b))
( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
)?
)?
;
addExpr
: mulExpr ((PLUS | MINUS)^ mulExpr)*
;
mulExpr
: unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
;
unaryExpr
: MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
| atomExpr
;
atomExpr
: INT
| call
;
call
: (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
| OBRACK expr CBRACK -> ^(INDEX $call expr)
| DOT ID -> ^(INDEX $call ID)
)*
;
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
fn_literal
: OPAREN id_list CPAREN compound_stmt -> ^(FN id_list compound_stmt)
;
id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
params
: (expr (COMMA expr)*)? -> ^(PARAMS expr*)
;
compound_stmt
: OBRACE stmt* CBRACE -> ^(STATS stmt*)
;
stmt
: expr SEMI
;
relOp
: EQ | GT | LT | GTE | LTE
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
SPACE : (' ' | '\t') {skip();};
A parser generated by the grammar above would reject the input (x)(y){} while it properly parses the following 3 snippets of code:
1
(a, b, c){ a+b*c; }
2
(x) * (y){ x.y; }
3
((y){})()[1].x
I have defined the following grammar.
grammar Sample_1;
#header {
package a;
}
#lexer::header {
package a;
}
program
:
define*
implement*
;
define
: IDENT '=(' INTEGER',' INTEGER ')'
;
implement
:IDENT '=(' (IDENT ','?)* ')'
;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
IDENT : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
How to check in this grammar so that when I have the example
A=(1,1)
B=(1,2)
G=(A,B)
the result is successful but if I write
A=(1,1)
B=(1,2)
G=(A,E)
it gives an error that E is not defined
thanks
the result:
i got it working thanks a lot:
grammar Sample_1;
#members{
int level=0;
}
#header {
package a;
}
#lexer::header {
package a;
}
program
:
block
;
block
scope {
List symbols;
}
#init {
$block::symbols=new ArrayList();
level++;
}
#after {
System.err.println("Hello");
level--;
}
: (define* implement+)
;
define
: IDENT {$block::symbols.add($IDENT.text);} '=(' INTEGER',' INTEGER ')'
;
implement
:IDENT '=(' (a=IDENT
{if (!$block::symbols.contains($a.text)){
System.err.println("undefined");
}}','?)* ')'
;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
IDENT : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
Antlr supports actions, little snippets of code embedded in the grammar file.
An action for an assignment could store into a map. An action for a right-hand-side IDENT could try to pull a value from the map, and throw an exception if it fails.
Chapter 6 in Terrence Parr's "The Definitive ANTLR Reference" covers actions.