I have modified the PLSQL parser given by [Porcelli] (https://github.com/porcelli/plsql-parser ). I am using this parser to parse PlSql files. I am facing issue with parsing FOR loop statements, e.g.
for i in 1..l_line_tbl.count
LOOP
l_line_tbl(i).schedule_ship_date := l_max_ship_date;
l_line_tbl(i).ship_set_id := x_ship_set_id;
END LOOP;
Above statement is not parsing and its throwing EarlyExitException.
Now, if I modify this statement and put a space between 1 and double dot (..) , it will parse the statement. I am not sure how to handle first case.
for i in 1 ..l_line_tbl.count
LOOP
l_line_tbl(i).schedule_ship_date := l_max_ship_date;
l_line_tbl(i).ship_set_id := x_ship_set_id;
END LOOP;
Parser Grammar:
loop_statement
#init { int mode = 0; }
: label_name?
(while_key condition {mode = 1;} | for_key cursor_loop_param {mode = 2;})?
loop_key
seq_of_statements
end_key loop_key label_name?
-> {mode == 1}? ^(WHILE_LOOP[$while_key.start] label_name*
^(LOGIC_EXPR condition) seq_of_statements)
-> {mode == 2}? ^(FOR_LOOP[$for_key.start] label_name* cursor_loop_param seq_of_statements)
-> ^(loop_key label_name* seq_of_statements)
;
// $<Loop - Specific Clause
cursor_loop_param
#init { int mode = 0; }
: (index_name in_key reverse_key? lower_bound DOUBLE_PERIOD)=>
index_name in_key reverse_key? lower_bound DOUBLE_PERIOD upper_bound
-> ^(INDEXED_FOR index_name reverse_key? ^(SIMPLE_BOUND lower_bound upper_bound))
| record_name in_key ( cursor_name expression_list? {mode = 1;} | LEFT_PAREN
select_statement RIGHT_PAREN)
->{mode == 1}? ^(CURSOR_BASED_FOR record_name cursor_name expression_list?)
-> ^(SELECT_BASED_FOR record_name select_statement)
;
// $>
Lexer Grammar:
FOR_NOTATION
: UNSIGNED_INTEGER
{state.type = UNSIGNED_INTEGER; emit(); advanceInput();}
'..'
{state.type = DOUBLE_PERIOD; emit(); advanceInput();}
UNSIGNED_INTEGER
{state.type = UNSIGNED_INTEGER; emit(); advanceInput(); $channel=HIDDEN;}
;
fragment
UNSIGNED_INTEGER
: ('0'..'9')+
;
Related
I am new to ANTLR and I am trying to implement if-else, for, while loop and logical symbol, but I am not able to do so. Can Anyone help me with this? Below is what I have done.
grammar BasForCCAL;
#header {
package basforccal;
import java.util.HashMap;
import java.util.Scanner;
}
#lexer::header{
package basforccal;
}
#members{
String programName;
HashMap memory = new HashMap();
public void checkName(String endName){
if(!endName.equals(programName)){
System.out.println("Wrong Program name in end of the program");
}
}
}
program : start programbody end;
start :'PROGRAM' ID {programName = $ID.text ; System.out.println("Checking program :"+$ID.text);};
programbody
: (devcar|ID'='(expr|CHAR)| ctrlStmt)*;
devcar : initInt var1|
intFloat var1|
intChar var1 ;
initInt : 'INT'
;
intFloat
: 'FLOAT'
;
intChar: 'CHAR';
var1 : idname (',' var1)* ;
idname : ID {Integer v = (Integer)memory.get($ID.text);
if(v!=null)
{System.err.println("Error: "+$ID.text+" already defined line:"+$ID.getLine());}
else
{memory.put($ID.text,new Integer('1'));}
}
;
expr
: (multExpr |'('expr')')
( '+' multExpr
| '-' multExpr
| '/' multExpr
| '*' multExpr
)*
;
logiExpr
: expr relOpr expr;
relOpr
: '<'
| '>'
| '<>'
| '<='
| '>='
;
ctrlStmt
: 'IF''('logiExpr')' 'THEN' (stat)+ 'ENDIF'
| 'WHILE''('logiExpr')' 'DO' (stat)+ 'ENDDO'
| 'FOR' ID '=' expr 'TO' expr 'LOOP' stat+ 'ENDLOOP';
stat
: ctrlStmt|multExpr
| ID '=' (expr|CHAR);
multExpr
: ID {
Integer v = (Integer)memory.get($ID.text);
if ( v!=null ){}
else System.err.println("undefined variable "+$ID.text);
}
| INT
| FLOAT
;
end
: 'END' ID '.' {checkName($ID.text);};
My Java code to check it.
import org.antlr.runtime.ANTLRFileStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import java.io.IOException;
public class AntlrParser {
public static void main(String args[]) throws IOException, RecognitionException {
basforccal.BasForCCALLexer lexer = new basforccal.BasForCCALLexer(new ANTLRFileStream(args[0]));
CommonTokenStream token = new CommonTokenStream(lexer);
basforccal.BasForCCALParser parser = new basforccal.BasForCCALParser(token);
parser.program();
}
}
Below is the program in a file(prog1.bfcc) which I am trying to check using my Java code.
PROGRAM TESTIF
FLOAT A,B,C
A=1.0
C=1.0
IF(A>1.0)THEN
B=2.0
ENDIF
IF(B*C<=10)THEN
IF(A>0.0)THEN
C=5.0
ENDIF
ENDIF=
IF(3=4)THEN
A=1.0
B=2.0
C=3.0
ENDIF
END TESTIF.
Below is the error which I am getting while checking it from JAVA.
Checking program :TESTIF
C:\Users\vivek\IdeaProjects\BasForCCal\prog1.bfcc line 16:4 mismatched input '=' expecting set null
Process finished with exit code 0
You have ENDIF= in your input, which looks suspicious. It should probably be: ENDIF without the =. This is what the error message is trying to tell you.
Also, there is IF(3=4)THEN in your input, but your relOpr does not inlcude the = operator. You should probably add that to it:
relOpr
: '='
| '<'
| '>'
| '<>'
| '<='
| '>='
;
When running ANTLR3 on the following code, I get the message - warning(200): MYGRAMMAR.g:40:36: Decision can match input such as "QMARK" using multiple alternatives: 3, 4
As a result, alternative(s) 4 were disabled for that input.
The warning message is pointing me to postfixExpr. Is there a way to fix this?
grammar MYGRAMMAR;
options {language = C;}
tokens {
BANG = '!';
COLON = ':';
FALSE_LITERAL = 'false';
GREATER = '>';
LSHIFT = '<<';
MINUS = '-';
MINUS_MINUS = '--';
PLUS = '+';
PLUS_PLUS = '++';
QMARK = '?';
QMARK_COLON = '?:';
TILDE = '~';
TRUE_LITERAL = 'true';
}
condExpr
: shiftExpr (QMARK condExpr COLON condExpr)? ;
shiftExpr
: addExpr ( shiftOp addExpr)* ;
addExpr
: qmarkColonExpr ( addOp qmarkColonExpr)* ;
qmarkColonExpr
: prefixExpr ( QMARK_COLON prefixExpr )? ;
prefixExpr
: ( prefixOrUnaryMinus | postfixExpr) ;
prefixOrUnaryMinus
: prefixOp prefixExpr ;
postfixExpr
: primaryExpr ( postfixOp | BANG | QMARK )*;
primaryExpr
: literal ;
shiftOp
: ( LSHIFT | rShift);
addOp
: (PLUS | MINUS);
prefixOp
: ( BANG | MINUS | TILDE | PLUS_PLUS | MINUS_MINUS );
postfixOp
: (PLUS_PLUS | MINUS_MINUS);
rShift
: (GREATER GREATER)=> a=GREATER b=GREATER {assertNoSpace($a,$b)}? ;
literal
: ( TRUE_LITERAL | FALSE_LITERAL );
assertNoSpace [pANTLR3_COMMON_TOKEN t1, pANTLR3_COMMON_TOKEN t2]
: { $t1->line == $t2->line && $t1->getCharPositionInLine($t1) + 1 == $t2->getCharPositionInLine($t2) }? ;
I think one problem is that PLUS_PLUS as well as MINUS_MINUS will never be matched as they are defined after the respective PLUS or MINUS token. therefore the lexer will always output two PLUS tokens instead of one PLUS_PLUS token.
In order to avaoid something like this you have to define your PLUS_PLUS or MINUS_MINUS token before the PLUS or MINUS token as the lexer processes them in the order they are defined and won't look any further once it found a way to match the current input.
The same problem applies to QMARK_COLON as it is defined after QMARK (this only is a problem because there is another token type COLON to match the following colon).
See if fixing the ambiguities resolves the error message.
I would like to be able use a for loop to loop through an array of typedef values as demonstrated below:
typedef chanArray {
chan ch[5] = [1] of {bit};
}
chanArray comms[5];
active proctype Reliable() {
chanArray channel;
for ( channel in comms ) {
channel.ch[0] ! 0;
}
}
Spin gives the following error:
spin: test2.pml:8, Error: for ( channel in .channel_name ) { ... }
Is it possible to use a for loop in this form to loop through the array instead of having to use a for loop with an index pointer?
Try:
active proctype Reliable () {
byte index;
index = 0;
do
:: index < 5 -> channel.ch[index] ! 0; index++
:: else -> break
od
}
this is the only way. So the answer to your 'is it possible ...' question is 'no, it is not possible ...'
I'm new to Promela, but it seems that you are using
for '(' varref in channel ')' '{' sequence '}'
instead of
for '(' varref ':' expr '..' expr ')' '{' sequence '}'
Try with something like
int i;
for (i : 0..4 ) {...}
path[Scope sc] returns [Path p]
#init{
List<String> parts = new ArrayList<String>();
}
: ^(PATH (id=IDENT{parts.add($id.text);})+ pathIndex? )
{// ACTION CODE
// need to check if pathIndex has executed before running this code.
if ($pathIndex.index >=0 ){
p = new Path($sc, parts, $pathIndex.index);
}else if($pathIndex.pathKey != ""){
p = new Path($sc, parts, $pathIndex.pathKey);
}
;
Is there a way to detect if pathIndex was executed? In my action code, I tried testing $pathIndex == null, but ANTLR doesn't let you do that. ANTLRWorks gives a syntax error which saying "Missing attribute access on rule scope: pathIndex."
The reason why I need to do this is because in my action code I do:
$pathIndex.index
which returns 0 if the variable $pathIndex is translated to is null. When you are accessing an attribute, ANTLR generates pathIndex7!=null?pathIndex7.index:0 This causes a problem with an object because it changes a value I have preset to -1 as an error flag to 0.
There are a couple of options:
1
Put your code inside the optional pathIndex:
rule
: ^(PATH (id=IDENT{parts.add($id.text);})+ (pathIndex {/*pathIndex cannot be null here!*/} )? )
;
2
Use a boolean flag to denote the presence (or absence) of pathIndex:
rule
#init{boolean flag = false;}
: ^(PATH (id=IDENT{parts.add($id.text);})+ (pathIndex {flag = true;} )? )
{
if(flag) {
// ...
}
}
;
EDIT
You could also make pathIndex match nothing so that you don't need to make it optional inside path:
path[Scope sc] returns [Path p]
: ^(PATH (id=IDENT{parts.add($id.text);})+ pathIndex)
{
// code
}
;
pathIndex returns [int index, String pathKey]
#init {
$index = -1;
$pathKey = "";
}
: ( /* some rules here */ )?
;
PS. Realize that the expression $pathIndex.pathKey != "" will most likely evaluate to false. To compare the contents of strings in Java, use their equals(...) method instead:
!$pathIndex.pathKey.equals("")
or if $pathIndex.pathKey can be null, you can circumvent a NPE by doing:
!"".equals($pathIndex.pathKey)
More information would have been helpful. However, if I understand correctly, when a value for the index is not present in the input you want to test for $pathIndex.index == null. This code does that using the pathIndex rule to return the Integer $index to the path rule:
path
: ^(PATH IDENT+ pathIndex?)
{ if ($pathIndex.index == null)
System.out.println("path index is null");
else
System.out.println("path index = " + $pathIndex.index); }
;
pathIndex returns [Integer index]
: DIGIT
{ $index = Integer.parseInt($DIGIT.getText()); }
;
For testing, I created these simple parser and lexer rules:
path : 'path' IDENT+ pathIndex? -> ^(PATH IDENT+ pathIndex?)
;
pathIndex : DIGIT
;
/** lexer rules **/
DIGIT : '0'..'9' ;
IDENT : LETTER+ ;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
When the index is present in the input, as in path a b c 5, the output is:
Tree = (PATH a b c 5)
path index = 5
When the index is not present in the input, as in path a b c, the output is:
Tree = (PATH a b c)
path index is null
I've read that you need to use the '^' and '!' operators in order to build a parse tree similar to the ones displayed in ANTLR Works (even though you don't need to use them to get a nice tree in ANTLR Works). My question then is how can I build such a tree? I've seen a few pages on tree construction using the two operators and rewrites, and yet say I have an input string abc abc123 and a grammar:
grammar test;
program : idList;
idList : id* ;
id : ID ;
ID : LETTER (LETTER | NUMBER)* ;
LETTER : 'a' .. 'z' | 'A' .. 'Z' ;
NUMBER : '0' .. '9' ;
ANTLR Works will output:
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
You can't use ^ and ! alone. These operators only operate on existing tokens, while you want to create extra tokens (and make these the root of your sub trees). You can do that using rewrite rules and defining some imaginary tokens.
A quick demo:
grammar test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
IdList;
Id;
}
#parser::members {
private static void walk(CommonTree tree, int indent) {
if(tree == null) return;
for(int i = 0; i < indent; i++, System.out.print(" "));
System.out.println(tree.getText());
for(int i = 0; i < tree.getChildCount(); i++) {
walk((CommonTree)tree.getChild(i), indent + 1);
}
}
public static void main(String[] args) throws Exception {
testLexer lexer = new testLexer(new ANTLRStringStream("abc abc123"));
testParser parser = new testParser(new CommonTokenStream(lexer));
walk((CommonTree)parser.program().getTree(), 0);
}
}
program : idList EOF -> idList;
idList : id* -> ^(IdList id*);
id : ID -> ^(Id ID);
ID : LETTER (LETTER | DIGIT)*;
SPACE : ' ' {skip();};
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
fragment DIGIT : '0' .. '9';
If you run the demo above, you will see the following being printed to the console:
IdList
Id
abc
Id
abc123
As you can see, imaginary tokens must also start with an upper case letter, just like lexer rules. If you want to give the imaginary tokens the same text as the parser rule they represent, do something like this instead:
idList : id* -> ^(IdList["idList"] id*);
id : ID -> ^(Id["id"] ID);
which will print:
idList
id
abc
id
abc123