I am new to ANTLR and I am trying to implement if-else, for, while loop and logical symbol, but I am not able to do so. Can Anyone help me with this? Below is what I have done.
grammar BasForCCAL;
#header {
package basforccal;
import java.util.HashMap;
import java.util.Scanner;
}
#lexer::header{
package basforccal;
}
#members{
String programName;
HashMap memory = new HashMap();
public void checkName(String endName){
if(!endName.equals(programName)){
System.out.println("Wrong Program name in end of the program");
}
}
}
program : start programbody end;
start :'PROGRAM' ID {programName = $ID.text ; System.out.println("Checking program :"+$ID.text);};
programbody
: (devcar|ID'='(expr|CHAR)| ctrlStmt)*;
devcar : initInt var1|
intFloat var1|
intChar var1 ;
initInt : 'INT'
;
intFloat
: 'FLOAT'
;
intChar: 'CHAR';
var1 : idname (',' var1)* ;
idname : ID {Integer v = (Integer)memory.get($ID.text);
if(v!=null)
{System.err.println("Error: "+$ID.text+" already defined line:"+$ID.getLine());}
else
{memory.put($ID.text,new Integer('1'));}
}
;
expr
: (multExpr |'('expr')')
( '+' multExpr
| '-' multExpr
| '/' multExpr
| '*' multExpr
)*
;
logiExpr
: expr relOpr expr;
relOpr
: '<'
| '>'
| '<>'
| '<='
| '>='
;
ctrlStmt
: 'IF''('logiExpr')' 'THEN' (stat)+ 'ENDIF'
| 'WHILE''('logiExpr')' 'DO' (stat)+ 'ENDDO'
| 'FOR' ID '=' expr 'TO' expr 'LOOP' stat+ 'ENDLOOP';
stat
: ctrlStmt|multExpr
| ID '=' (expr|CHAR);
multExpr
: ID {
Integer v = (Integer)memory.get($ID.text);
if ( v!=null ){}
else System.err.println("undefined variable "+$ID.text);
}
| INT
| FLOAT
;
end
: 'END' ID '.' {checkName($ID.text);};
My Java code to check it.
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import java.io.IOException;
public class AntlrParser {
public static void main(String args[]) throws IOException, RecognitionException {
basforccal.BasForCCALLexer lexer = new basforccal.BasForCCALLexer(new ANTLRFileStream(args[0]));
CommonTokenStream token = new CommonTokenStream(lexer);
basforccal.BasForCCALParser parser = new basforccal.BasForCCALParser(token);
parser.program();
}
}
Below is the program in a file(prog1.bfcc) which I am trying to check using my Java code.
PROGRAM TESTIF
FLOAT A,B,C
A=1.0
C=1.0
IF(A>1.0)THEN
B=2.0
ENDIF
IF(B*C<=10)THEN
IF(A>0.0)THEN
C=5.0
ENDIF
ENDIF=
IF(3=4)THEN
A=1.0
B=2.0
C=3.0
ENDIF
END TESTIF.
Below is the error which I am getting while checking it from JAVA.
Checking program :TESTIF
C:\Users\vivek\IdeaProjects\BasForCCal\prog1.bfcc line 16:4 mismatched input '=' expecting set null
Process finished with exit code 0
You have ENDIF= in your input, which looks suspicious. It should probably be: ENDIF without the =. This is what the error message is trying to tell you.
Also, there is IF(3=4)THEN in your input, but your relOpr does not inlcude the = operator. You should probably add that to it:
relOpr
: '='
| '<'
| '>'
| '<>'
| '<='
| '>='
;
Related
When running ANTLR3 on the following code, I get the message - warning(200): MYGRAMMAR.g:40:36: Decision can match input such as "QMARK" using multiple alternatives: 3, 4
As a result, alternative(s) 4 were disabled for that input.
The warning message is pointing me to postfixExpr. Is there a way to fix this?
grammar MYGRAMMAR;
options {language = C;}
tokens {
BANG = '!';
COLON = ':';
FALSE_LITERAL = 'false';
GREATER = '>';
LSHIFT = '<<';
MINUS = '-';
MINUS_MINUS = '--';
PLUS = '+';
PLUS_PLUS = '++';
QMARK = '?';
QMARK_COLON = '?:';
TILDE = '~';
TRUE_LITERAL = 'true';
}
condExpr
: shiftExpr (QMARK condExpr COLON condExpr)? ;
shiftExpr
: addExpr ( shiftOp addExpr)* ;
addExpr
: qmarkColonExpr ( addOp qmarkColonExpr)* ;
qmarkColonExpr
: prefixExpr ( QMARK_COLON prefixExpr )? ;
prefixExpr
: ( prefixOrUnaryMinus | postfixExpr) ;
prefixOrUnaryMinus
: prefixOp prefixExpr ;
postfixExpr
: primaryExpr ( postfixOp | BANG | QMARK )*;
primaryExpr
: literal ;
shiftOp
: ( LSHIFT | rShift);
addOp
: (PLUS | MINUS);
prefixOp
: ( BANG | MINUS | TILDE | PLUS_PLUS | MINUS_MINUS );
postfixOp
: (PLUS_PLUS | MINUS_MINUS);
rShift
: (GREATER GREATER)=> a=GREATER b=GREATER {assertNoSpace($a,$b)}? ;
literal
: ( TRUE_LITERAL | FALSE_LITERAL );
assertNoSpace [pANTLR3_COMMON_TOKEN t1, pANTLR3_COMMON_TOKEN t2]
: { $t1->line == $t2->line && $t1->getCharPositionInLine($t1) + 1 == $t2->getCharPositionInLine($t2) }? ;
I think one problem is that PLUS_PLUS as well as MINUS_MINUS will never be matched as they are defined after the respective PLUS or MINUS token. therefore the lexer will always output two PLUS tokens instead of one PLUS_PLUS token.
In order to avaoid something like this you have to define your PLUS_PLUS or MINUS_MINUS token before the PLUS or MINUS token as the lexer processes them in the order they are defined and won't look any further once it found a way to match the current input.
The same problem applies to QMARK_COLON as it is defined after QMARK (this only is a problem because there is another token type COLON to match the following colon).
See if fixing the ambiguities resolves the error message.
Hello i'm trying to build a simple lexer to tokenize lines starting with an ';' character.
This is my lexer grammar:
lexer grammar TestLex;
options {
language = Java;
filter = true;
}
#header {
package com.ualberta.slmyers.cmput415.assign1;
}
IR : LINE+
;
LINE : SEMICOLON (~NEWLINE)* NEWLINE
;
SEMICOLON : ';'
;
NEWLINE : '\n'
;
WS : (' ' | '\t')+
{$channel = HIDDEN;}
;
And here is my java class to run my lexer:
package com.ualberta.slmyers.cmput415.assign1;
import java.io.IOException;
import org.antlr.runtime.*;
public class Test {
public static void main(String[] args) throws RecognitionException,
IOException {
// create an instance of the lexer
TestLex lexer = new TestLex(
new ANTLRFileStream(
"/home/linux/workspace/Cmput415Assign1/src/com/ualberta/slmyers/cmput415/assign1/test3.s"));
// wrap a token-stream around the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// when using ANTLR v3.3 or v3.4, un-comment the next line:
tokens.fill();
// traverse the tokens and print them to see if the correct tokens are
// created
int n = 1;
for (Object o : tokens.getTokens()) {
CommonToken token = (CommonToken) o;
System.out.println("token(" + n + ") = "
+ token.getText().replace("\n", "\\n"));
n++;
}
}
}
credits to: http://bkiers.blogspot.ca/2011/03/2-introduction-to-antlr.html
for the adapted code above.
This is my test file:
; token 1
; token 2
; token 3
; token 4
Note there is a newline character after the last '4'.
This is my output:
token(1) = ; token 1\n; token 2\n; token 3\n; token 4\n
token(2) = <EOF>
I'm expecting this as my output:
token(1) = ; token 1\n
token(2) = ; token 2\n
token(3) = ; token 3\n
token(4) = ; token 4\n
token(5) = <EOF>
OK I figured it out the problem was this line:
IR : LINE+
;
which returned a one token comprised of many lines.
I have the statement:
=MYFUNCTION_NAME(1,2,3)
My grammar is:
grammar Expression;
options
{
language=CSharp3;
output=AST;
backtrack=true;
}
tokens
{
FUNC;
PARAMS;
}
#parser::namespace { Expression }
#lexer::namespace { Expression }
public
parse : ('=' func )*;
func : funcId '(' formalPar* ')' -> ^(FUNC funcId formalPar);
formalPar : (par ',')* par -> ^(PARAMS par+);
par : INT;
funcId : complexId+ ('_'? complexId+)*;
complexId
: ID+
| ID+DIGIT+ ;
ID : ('a'..'z'|'A'..'Z'|'а'..'я'|'А'..'Я')+;
DIGIT : ('0'..'9')+;
INT : '-'? ('0'..'9')+;
In a tree i get:
[**FUNC**]
|
[MYFUNCTION] [_] [NAME] [**PARAMS**]
Why the parser splits function's name into 3 nodes: "MYFUNCTION, "_", "NAME" ? How can i fix it?
The division is always performed based on tokens. Since the ID token cannot contain an _ character, the result is 3 separate tokens that are handled later by the funcId grammar rule. To create a single node for your function name, you'll need to create a lexer rule that can match the input MYFUNCTION_NAME as a single token.
The java code generated from ANTLR is one rule, one method in most times. But for the following rule:
switchBlockLabels[ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts]
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel[_entity, _method, _preStmts]* switchDefaultLabel? switchCaseLabel*)
;
it generates a submethod named synpred125_TreeParserStage3_fragment(), in which mehod switchCaseLabel(_entity, _method, _preStmts) is called:
synpred125_TreeParserStage3_fragment(){
......
switchCaseLabel(_entity, _method, _preStmts);//variable not found error
......
}
switchBlockLabels(ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts){
......
synpred125_TreeParserStage3_fragment();
......
}
The problem is switchCaseLabel has parameters and the parameters come from the parameters of switchBlockLabels() method, so "variable not found error" occurs.
How can I solve this problem?
My guess is that you've enabled global backtracking in your grammar like this:
options {
backtrack=true;
}
in which case you can't pass parameters to ambiguous rules. In order to communicate between ambiguous rules when you have enabled global backtracking, you must use rule scopes. The "predicate-methods" do have access to rule scopes variables.
A demo
Let's say we have this ambiguous grammar:
grammar Scope;
options {
backtrack=true;
}
parse
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName
: Number
| Name
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
(for the record, the atom+ and numberOrName+ make it ambiguous)
If you now want to pass information between the parse and numberOrName rule, say an integer n, something like this will fail (which is the way you tried it):
grammar Scope;
options {
backtrack=true;
}
parse
#init{int n = 0;}
: (atom[++n])+ EOF
;
atom[int n]
: (numberOrName[n])+
;
numberOrName[int n]
: Number {System.out.println(n + " = " + $Number.text);}
| Name {System.out.println(n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
In order to do this using rule scopes, you could do it like this:
grammar Scope;
options {
backtrack=true;
}
parse
scope{int n; /* define the scoped variable */ }
#init{$parse::n = 0; /* important: initialize the variable! */ }
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName /* increment and print the scoped variable from the parse rule */
: Number {System.out.println(++$parse::n + " = " + $Number.text);}
| Name {System.out.println(++$parse::n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
Test
If you now run the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "foo 42 Bar 666";
ScopeLexer lexer = new ScopeLexer(new ANTLRStringStream(src));
ScopeParser parser = new ScopeParser(new CommonTokenStream(lexer));
parser.parse();
}
}
you will see the following being printed to the console:
1 = foo
2 = 42
3 = Bar
4 = 666
P.S.
I don't know what language you're parsing, but enabling global backtracking is usually overkill and can have quite an impact on the performance of your parser. Computer languages often are ambiguous in just a few cases. Instead of enabling global backtracking, you really should look into adding syntactic predicates, or enabling backtracking on those rules that are ambiguous. See The Definitive ANTLR Reference for more info.
I've a pretty basic math expression grammar for ANTLR here and what's of interest is handling the implied * operator between parentheses e.g. (2-3)(4+5)(6*7) should actually be (2-3)*(4+5)*(6*7).
Given the input (2-3)(4+5)(6*7) I'm trying to add the missing * operator to the AST tree while parsing, in the following grammar I think I've managed to achieve that but I'm wondering if this is the correct, most elegant way?
grammar G;
options {
language = Java;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ADD = '+' ;
SUB = '-' ;
MUL = '*' ;
DIV = '/' ;
OPARN = '(' ;
CPARN = ')' ;
}
start
: expression EOF!
;
expression
: mult (( ADD^ | SUB^ ) mult)*
;
mult
: atom (( MUL^ | DIV^) atom)*
;
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN expression CPARN -> ^(MUL expression)+
)*
;
INTEGER : ('0'..'9')+ ;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
This grammar appears to output the correct AST Tree in ANTLRworks:
I'm only just starting to get to grips with parsing and ANTLR, don't have much experience so feedback with really appreciated!
Thanks in advance! Carl
First of all, you did a great job given the fact that you've never used ANTLR before.
You can omit the language=Java and ASTLabelType=CommonTree, which are the default values. So you can just do:
options {
output=AST;
}
Also, you don't have to specify the root node for each operator separately. So you don't have to do:
(ADD^ | SUB^)
but the following:
(ADD | SUB)^
will suffice. With only two operators, there's not much difference, but when implementing relational operators (>=, <=, > and <), the latter is a bit easier.
Now, for you AST: you'll probably want to create a binary tree: that way, all internal nodes are operators, and the leafs will be operands which makes the actual evaluating of your expressions much easier. To get a binary tree, you'll have to change your atom rule slightly:
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN e=expression CPARN -> ^(MUL $atom $e)
)*
;
which produces the following AST given the input "(2-3)(4+5)(6*7)":
(image produced by: graphviz-dev.appspot.com)
The DOT file was generated with the following test-class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
GLexer lexer = new GLexer(new ANTLRStringStream("(2-3)(4+5)(6*7)"));
GParser parser = new GParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}