Try (honor+=NAME|honor+=DIGIT)+ and then $honor is a list of tokens.
I took out list of $honor
for(int r = 0; r < list_honor.size(); r++)
honorstr = honorstr + list_honor.get(r).text;
input: test
output: [#752,2539:2585='test',<6>,19:11]
what is wrong?
I think the list is initialized in both alternatives:
rule
: ( honor+=NAME /* alternatvie 1 */
| honor+=DIGIT /* alternatvie 2 */
)+
;
Try something like this:
rule
: honor+=(NAME | DIGIT)+
;
or if that doesn't work, something like this:
rule
: honor+=sub_rule+
;
sub_rule
: NAME
| DIGIT
;
Related
When running ANTLR3 on the following code, I get the message - warning(200): MYGRAMMAR.g:40:36: Decision can match input such as "QMARK" using multiple alternatives: 3, 4
As a result, alternative(s) 4 were disabled for that input.
The warning message is pointing me to postfixExpr. Is there a way to fix this?
grammar MYGRAMMAR;
options {language = C;}
tokens {
BANG = '!';
COLON = ':';
FALSE_LITERAL = 'false';
GREATER = '>';
LSHIFT = '<<';
MINUS = '-';
MINUS_MINUS = '--';
PLUS = '+';
PLUS_PLUS = '++';
QMARK = '?';
QMARK_COLON = '?:';
TILDE = '~';
TRUE_LITERAL = 'true';
}
condExpr
: shiftExpr (QMARK condExpr COLON condExpr)? ;
shiftExpr
: addExpr ( shiftOp addExpr)* ;
addExpr
: qmarkColonExpr ( addOp qmarkColonExpr)* ;
qmarkColonExpr
: prefixExpr ( QMARK_COLON prefixExpr )? ;
prefixExpr
: ( prefixOrUnaryMinus | postfixExpr) ;
prefixOrUnaryMinus
: prefixOp prefixExpr ;
postfixExpr
: primaryExpr ( postfixOp | BANG | QMARK )*;
primaryExpr
: literal ;
shiftOp
: ( LSHIFT | rShift);
addOp
: (PLUS | MINUS);
prefixOp
: ( BANG | MINUS | TILDE | PLUS_PLUS | MINUS_MINUS );
postfixOp
: (PLUS_PLUS | MINUS_MINUS);
rShift
: (GREATER GREATER)=> a=GREATER b=GREATER {assertNoSpace($a,$b)}? ;
literal
: ( TRUE_LITERAL | FALSE_LITERAL );
assertNoSpace [pANTLR3_COMMON_TOKEN t1, pANTLR3_COMMON_TOKEN t2]
: { $t1->line == $t2->line && $t1->getCharPositionInLine($t1) + 1 == $t2->getCharPositionInLine($t2) }? ;
I think one problem is that PLUS_PLUS as well as MINUS_MINUS will never be matched as they are defined after the respective PLUS or MINUS token. therefore the lexer will always output two PLUS tokens instead of one PLUS_PLUS token.
In order to avaoid something like this you have to define your PLUS_PLUS or MINUS_MINUS token before the PLUS or MINUS token as the lexer processes them in the order they are defined and won't look any further once it found a way to match the current input.
The same problem applies to QMARK_COLON as it is defined after QMARK (this only is a problem because there is another token type COLON to match the following colon).
See if fixing the ambiguities resolves the error message.
I have modified the PLSQL parser given by [Porcelli] (https://github.com/porcelli/plsql-parser ). I am using this parser to parse PlSql files. I am facing issue with parsing FOR loop statements, e.g.
for i in 1..l_line_tbl.count
LOOP
l_line_tbl(i).schedule_ship_date := l_max_ship_date;
l_line_tbl(i).ship_set_id := x_ship_set_id;
END LOOP;
Above statement is not parsing and its throwing EarlyExitException.
Now, if I modify this statement and put a space between 1 and double dot (..) , it will parse the statement. I am not sure how to handle first case.
for i in 1 ..l_line_tbl.count
LOOP
l_line_tbl(i).schedule_ship_date := l_max_ship_date;
l_line_tbl(i).ship_set_id := x_ship_set_id;
END LOOP;
Parser Grammar:
loop_statement
#init { int mode = 0; }
: label_name?
(while_key condition {mode = 1;} | for_key cursor_loop_param {mode = 2;})?
loop_key
seq_of_statements
end_key loop_key label_name?
-> {mode == 1}? ^(WHILE_LOOP[$while_key.start] label_name*
^(LOGIC_EXPR condition) seq_of_statements)
-> {mode == 2}? ^(FOR_LOOP[$for_key.start] label_name* cursor_loop_param seq_of_statements)
-> ^(loop_key label_name* seq_of_statements)
;
// $<Loop - Specific Clause
cursor_loop_param
#init { int mode = 0; }
: (index_name in_key reverse_key? lower_bound DOUBLE_PERIOD)=>
index_name in_key reverse_key? lower_bound DOUBLE_PERIOD upper_bound
-> ^(INDEXED_FOR index_name reverse_key? ^(SIMPLE_BOUND lower_bound upper_bound))
| record_name in_key ( cursor_name expression_list? {mode = 1;} | LEFT_PAREN
select_statement RIGHT_PAREN)
->{mode == 1}? ^(CURSOR_BASED_FOR record_name cursor_name expression_list?)
-> ^(SELECT_BASED_FOR record_name select_statement)
;
// $>
Lexer Grammar:
FOR_NOTATION
: UNSIGNED_INTEGER
{state.type = UNSIGNED_INTEGER; emit(); advanceInput();}
'..'
{state.type = DOUBLE_PERIOD; emit(); advanceInput();}
UNSIGNED_INTEGER
{state.type = UNSIGNED_INTEGER; emit(); advanceInput(); $channel=HIDDEN;}
;
fragment
UNSIGNED_INTEGER
: ('0'..'9')+
;
I would like to be able use a for loop to loop through an array of typedef values as demonstrated below:
typedef chanArray {
chan ch[5] = [1] of {bit};
}
chanArray comms[5];
active proctype Reliable() {
chanArray channel;
for ( channel in comms ) {
channel.ch[0] ! 0;
}
}
Spin gives the following error:
spin: test2.pml:8, Error: for ( channel in .channel_name ) { ... }
Is it possible to use a for loop in this form to loop through the array instead of having to use a for loop with an index pointer?
Try:
active proctype Reliable () {
byte index;
index = 0;
do
:: index < 5 -> channel.ch[index] ! 0; index++
:: else -> break
od
}
this is the only way. So the answer to your 'is it possible ...' question is 'no, it is not possible ...'
I'm new to Promela, but it seems that you are using
for '(' varref in channel ')' '{' sequence '}'
instead of
for '(' varref ':' expr '..' expr ')' '{' sequence '}'
Try with something like
int i;
for (i : 0..4 ) {...}
I've read that you need to use the '^' and '!' operators in order to build a parse tree similar to the ones displayed in ANTLR Works (even though you don't need to use them to get a nice tree in ANTLR Works). My question then is how can I build such a tree? I've seen a few pages on tree construction using the two operators and rewrites, and yet say I have an input string abc abc123 and a grammar:
grammar test;
program : idList;
idList : id* ;
id : ID ;
ID : LETTER (LETTER | NUMBER)* ;
LETTER : 'a' .. 'z' | 'A' .. 'Z' ;
NUMBER : '0' .. '9' ;
ANTLR Works will output:
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
What I dont understand is how you can get the 'idList' node on top of this tree (as well as the grammar one as a matter of fact). How can I reproduce this tree using rewrites and those operators?
You can't use ^ and ! alone. These operators only operate on existing tokens, while you want to create extra tokens (and make these the root of your sub trees). You can do that using rewrite rules and defining some imaginary tokens.
A quick demo:
grammar test;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
IdList;
Id;
}
#parser::members {
private static void walk(CommonTree tree, int indent) {
if(tree == null) return;
for(int i = 0; i < indent; i++, System.out.print(" "));
System.out.println(tree.getText());
for(int i = 0; i < tree.getChildCount(); i++) {
walk((CommonTree)tree.getChild(i), indent + 1);
}
}
public static void main(String[] args) throws Exception {
testLexer lexer = new testLexer(new ANTLRStringStream("abc abc123"));
testParser parser = new testParser(new CommonTokenStream(lexer));
walk((CommonTree)parser.program().getTree(), 0);
}
}
program : idList EOF -> idList;
idList : id* -> ^(IdList id*);
id : ID -> ^(Id ID);
ID : LETTER (LETTER | DIGIT)*;
SPACE : ' ' {skip();};
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
fragment DIGIT : '0' .. '9';
If you run the demo above, you will see the following being printed to the console:
IdList
Id
abc
Id
abc123
As you can see, imaginary tokens must also start with an upper case letter, just like lexer rules. If you want to give the imaginary tokens the same text as the parser rule they represent, do something like this instead:
idList : id* -> ^(IdList["idList"] id*);
id : ID -> ^(Id["id"] ID);
which will print:
idList
id
abc
id
abc123
The java code generated from ANTLR is one rule, one method in most times. But for the following rule:
switchBlockLabels[ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts]
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel[_entity, _method, _preStmts]* switchDefaultLabel? switchCaseLabel*)
;
it generates a submethod named synpred125_TreeParserStage3_fragment(), in which mehod switchCaseLabel(_entity, _method, _preStmts) is called:
synpred125_TreeParserStage3_fragment(){
......
switchCaseLabel(_entity, _method, _preStmts);//variable not found error
......
}
switchBlockLabels(ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts){
......
synpred125_TreeParserStage3_fragment();
......
}
The problem is switchCaseLabel has parameters and the parameters come from the parameters of switchBlockLabels() method, so "variable not found error" occurs.
How can I solve this problem?
My guess is that you've enabled global backtracking in your grammar like this:
options {
backtrack=true;
}
in which case you can't pass parameters to ambiguous rules. In order to communicate between ambiguous rules when you have enabled global backtracking, you must use rule scopes. The "predicate-methods" do have access to rule scopes variables.
A demo
Let's say we have this ambiguous grammar:
grammar Scope;
options {
backtrack=true;
}
parse
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName
: Number
| Name
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
(for the record, the atom+ and numberOrName+ make it ambiguous)
If you now want to pass information between the parse and numberOrName rule, say an integer n, something like this will fail (which is the way you tried it):
grammar Scope;
options {
backtrack=true;
}
parse
#init{int n = 0;}
: (atom[++n])+ EOF
;
atom[int n]
: (numberOrName[n])+
;
numberOrName[int n]
: Number {System.out.println(n + " = " + $Number.text);}
| Name {System.out.println(n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
In order to do this using rule scopes, you could do it like this:
grammar Scope;
options {
backtrack=true;
}
parse
scope{int n; /* define the scoped variable */ }
#init{$parse::n = 0; /* important: initialize the variable! */ }
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName /* increment and print the scoped variable from the parse rule */
: Number {System.out.println(++$parse::n + " = " + $Number.text);}
| Name {System.out.println(++$parse::n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
Test
If you now run the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "foo 42 Bar 666";
ScopeLexer lexer = new ScopeLexer(new ANTLRStringStream(src));
ScopeParser parser = new ScopeParser(new CommonTokenStream(lexer));
parser.parse();
}
}
you will see the following being printed to the console:
1 = foo
2 = 42
3 = Bar
4 = 666
P.S.
I don't know what language you're parsing, but enabling global backtracking is usually overkill and can have quite an impact on the performance of your parser. Computer languages often are ambiguous in just a few cases. Instead of enabling global backtracking, you really should look into adding syntactic predicates, or enabling backtracking on those rules that are ambiguous. See The Definitive ANTLR Reference for more info.