ANTLR Parser, need to which parser rule is matched - antlr

In ANTLR, for a given token, is there a way to tell which parser rule is matched?
For example, from the ANTLR grammar:
tokens
{
ADD='Add';
SUB='Sub';
}
fragment
ANYDIGIT : '0'..'9';
fragment
UCASECHAR : 'A'..'Z';
fragment
LCASECHAR : 'a'..'z';
fragment
DATEPART : ('0'..'1') (ANYDIGIT) '/' ('0'..'3') (ANYDIGIT) '/' (ANYDIGIT) (ANYDIGIT) (ANYDIGIT) (ANYDIGIT);
fragment
TIMEPART : ('0'..'2') (ANYDIGIT) ':' ('0'..'5') (ANYDIGIT) ':' ('0'..'5') (ANYDIGIT);
SPACE : ' ';
NEWLINE : '\r'? '\n';
TAB : '\t';
FORMFEED : '\f';
WS : (SPACE|NEWLINE|TAB|FORMFEED)+ {$channel=HIDDEN;};
IDENTIFIER : (LCASECHAR|UCASECHAR|'_') (LCASECHAR|UCASECHAR|ANYDIGIT|'_')*;
TIME : '\'' (TIMEPART) '\'';
DATE : '\'' (DATEPART) (' ' (TIMEPART))? '\'';
STRING : '\''! (.)* '\''!;
DOUBLE : (ANYDIGIT)+ '.' (ANYDIGIT)+;
INT : (ANYDIGIT)+;
literal : INT|DOUBLE|STRING|DATE|TIME;
var : IDENTIFIER;
param : literal|fcn_call|var;
fcn_name : ADD |
SUB |
DIVIDE |
MOD |
DTSECONDSBETWEEN |
DTGETCURRENTDATETIME |
APPEND |
STRINGTOFLOAT;
fcn_call : fcn_name WS? '('! WS? ( param WS? ( ','! WS? param)*)* ')'!;
expr : fcn_call WS? EOF;
And in Java:
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
nodes.reset();
Object obj;
while((obj = nodes.nextElement()) != null)
{
if(nodes.isEOF(obj))
{
break;
}
System.out.println(obj);
}
So, what I want to know, at System.out.println(obj), did the node match the fcn_name rule, or did it match the var rule.
The reason being, I am trying to handle vars differently than fcn_names.

Add this to your listener/visitor:
String[] ruleNames;
public void loadParser(gramParser parser) { //get parser
ruleNames = parser.getRuleNames(); //load parser rules from parser
}
Call loadParser() from wherever you create your listener/visitor, eg.:
MyParser parser = new MyParser(tokens);
MyListener listener = new MyListener();
listener.loadParser(parser); //so we can access rule names
Then inside each rule you can get the name of the rule like this:
ruleName = ruleNames[ctx.getRuleIndex()];

No, you cannot get the name of a parser rule (at least, not without an ugly hack ➊).
But if tree is an instance of CommonTree, it means you've already invoked the expr rule of your parser, which means you already know expr matches first (which in its turn matches fcn_name).
➊ On a related note, see: Get active Antlr rule

Related

Why parser splits command name into different nodes

I have the statement:
=MYFUNCTION_NAME(1,2,3)
My grammar is:
grammar Expression;
options
{
language=CSharp3;
output=AST;
backtrack=true;
}
tokens
{
FUNC;
PARAMS;
}
#parser::namespace { Expression }
#lexer::namespace { Expression }
public
parse : ('=' func )*;
func : funcId '(' formalPar* ')' -> ^(FUNC funcId formalPar);
formalPar : (par ',')* par -> ^(PARAMS par+);
par : INT;
funcId : complexId+ ('_'? complexId+)*;
complexId
: ID+
| ID+DIGIT+ ;
ID : ('a'..'z'|'A'..'Z'|'а'..'я'|'А'..'Я')+;
DIGIT : ('0'..'9')+;
INT : '-'? ('0'..'9')+;
In a tree i get:
[**FUNC**]
|
[MYFUNCTION] [_] [NAME] [**PARAMS**]
Why the parser splits function's name into 3 nodes: "MYFUNCTION, "_", "NAME" ? How can i fix it?
The division is always performed based on tokens. Since the ID token cannot contain an _ character, the result is 3 separate tokens that are handled later by the funcId grammar rule. To create a single node for your function name, you'll need to create a lexer rule that can match the input MYFUNCTION_NAME as a single token.

Antrl lexer/parser exception understanding

I have the following language i wish to parse using antlr 1.2.2.
TEST <name>
{
<param_name> = <param value>;
}
while
<...> - means user value, not part of the language keywords
for example
TEST myTest
{
my_param = 1.0;
}
the value can be an integer, a real or a quated string
my_param = 1.0;, my_param = 1; and my_param = "myStringValue"; are all valid inputs.
here is the grammer for this parsing.
parse_test : TESTKEYWORD TEST_NAME '{' param_value_def '}';
param_value_def : ID EQUALS param_value ';';
param_value : REAL|INTEGER|QUOTED_STRING;
TESTKEYWORD : 'TEST';
QUOTED_STRING : '"' ~('"')* '"';
INTEGER : MINUS? DIGIT DIGIT*
REAL : INTEGER '.' DIGIT DIGIT*;
EQUALS : '=';
fragment
MINUS : '-';
fragment
DIGIT : '0'..'9';
when i feed the sample input to the antlr interpreter, i get a `MismatchedTokenException' related to the param_value rule.
can you help me cipher the error message and what i am doing wrong?
thanks
Although ANTLRWorks is not a tool well written, you can use its debugger to see which token in the input leads to this exception, and then you can see which rules need to be revised (since you did not post the full grammar).
http://www.antlr.org/works/index.html

ANTLR Is it possible to make grammar with embed grammar inside?

ANTLR: Is it possible to make grammar with embed grammar (with it's own lexer) inside?
For example in my language I have ability to use embed SQL language:
var Query = [select * from table];
with Query do something ....;
Is it possible with ANTLR?
Is it possible to make grammar with embed grammar (with it's own lexer) inside?
If you mean whether it is possible to define two languages in a single grammar (using separate lexers), then the answer is: no, that's not possible.
However, if the question is whether it is possible to parse two languages into a single AST, then the answer is: yes, it is possible.
You simply need to:
define both languages in their own grammar;
create a lexer rule in you main grammar that captures the entire input of the embedded language;
use a rewrite rule that calls a custom method that parses the external AST and inserts it in the main AST using { ... } (see the expr rule in the main grammar (MyLanguage.g)).
MyLanguage.g
grammar MyLanguage;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ROOT;
}
#members {
private CommonTree parseSQL(String sqlSrc) {
try {
MiniSQLLexer lexer = new MiniSQLLexer(new ANTLRStringStream(sqlSrc));
MiniSQLParser parser = new MiniSQLParser(new CommonTokenStream(lexer));
return (CommonTree)parser.parse().getTree();
} catch(Exception e) {
return new CommonTree(new CommonToken(-1, e.getMessage()));
}
}
}
parse
: assignment+ EOF -> ^(ROOT assignment+)
;
assignment
: Var Id '=' expr ';' -> ^('=' Id expr)
;
expr
: Num
| SQL -> {parseSQL($SQL.text)}
;
Var : 'var';
Id : ('a'..'z' | 'A'..'Z')+;
Num : '0'..'9'+;
SQL : '[' ~']'* ']';
Space : ' ' {skip();};
MiniSQL.g
grammar MiniSQL;
options {
output=AST;
ASTLabelType=CommonTree;
}
parse
: '[' statement ']' EOF -> statement
;
statement
: select
;
select
: Select '*' From ID -> ^(Select '*' From ID)
;
Select : 'select';
From : 'from';
ID : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "var Query = [select * from table]; var x = 42;";
MyLanguageLexer lexer = new MyLanguageLexer(new ANTLRStringStream(src));
MyLanguageParser parser = new MyLanguageParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
Run the demo
java -cp antlr-3.3.jar org.antlr.Tool MiniSQL.g
java -cp antlr-3.3.jar org.antlr.Tool MyLanguage.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
Given the input:
var Query = [select * from table]; var x = 42;
the output of the Main class corresponds to the following AST:
And if you want to allow string literals inside your SQL (which could contain ]), and comments (which could contain ' and ]), the you could use the following SQL rule inside your main grammar:
SQL
: '[' ( ~(']' | '\'' | '-')
| '-' ~'-'
| COMMENT
| STR
)*
']'
;
fragment STR
: '\'' (~('\'' | '\r' | '\n') | '\'\'')+ '\''
| '\'\''
;
fragment COMMENT
: '--' ~('\r' | '\n')*
;
which would properly parse the following input in a single token:
[
select a,b,c
from table
where a='A''B]C'
and b='' -- some ] comment ] here'
]
Just beware that trying to create a grammar for an entire SQL dialect (or even a large subset) is no trivial task! You may want to search for existing SQL parsers, or look at the ANTLR wiki for example-grammars.
Yes, with AntLR it is called Island grammar.
You can get a working example in the v3 examples, inside the island-grammar folder : it shows the usage of a grammar to parse javadoc comments inside of java code.
You can also find some clues in the doc Island Grammars Under Parser Control and that Another one.

variable not passed to predicate method in ANTLR

The java code generated from ANTLR is one rule, one method in most times. But for the following rule:
switchBlockLabels[ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts]
: ^(SWITCH_BLOCK_LABEL_LIST switchCaseLabel[_entity, _method, _preStmts]* switchDefaultLabel? switchCaseLabel*)
;
it generates a submethod named synpred125_TreeParserStage3_fragment(), in which mehod switchCaseLabel(_entity, _method, _preStmts) is called:
synpred125_TreeParserStage3_fragment(){
......
switchCaseLabel(_entity, _method, _preStmts);//variable not found error
......
}
switchBlockLabels(ITdcsEntity _entity,TdcsMethod _method,List<IStmt> _preStmts){
......
synpred125_TreeParserStage3_fragment();
......
}
The problem is switchCaseLabel has parameters and the parameters come from the parameters of switchBlockLabels() method, so "variable not found error" occurs.
How can I solve this problem?
My guess is that you've enabled global backtracking in your grammar like this:
options {
backtrack=true;
}
in which case you can't pass parameters to ambiguous rules. In order to communicate between ambiguous rules when you have enabled global backtracking, you must use rule scopes. The "predicate-methods" do have access to rule scopes variables.
A demo
Let's say we have this ambiguous grammar:
grammar Scope;
options {
backtrack=true;
}
parse
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName
: Number
| Name
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
(for the record, the atom+ and numberOrName+ make it ambiguous)
If you now want to pass information between the parse and numberOrName rule, say an integer n, something like this will fail (which is the way you tried it):
grammar Scope;
options {
backtrack=true;
}
parse
#init{int n = 0;}
: (atom[++n])+ EOF
;
atom[int n]
: (numberOrName[n])+
;
numberOrName[int n]
: Number {System.out.println(n + " = " + $Number.text);}
| Name {System.out.println(n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
In order to do this using rule scopes, you could do it like this:
grammar Scope;
options {
backtrack=true;
}
parse
scope{int n; /* define the scoped variable */ }
#init{$parse::n = 0; /* important: initialize the variable! */ }
: atom+ EOF
;
atom
: numberOrName+
;
numberOrName /* increment and print the scoped variable from the parse rule */
: Number {System.out.println(++$parse::n + " = " + $Number.text);}
| Name {System.out.println(++$parse::n + " = " + $Name.text);}
;
Number : '0'..'9'+;
Name : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
Test
If you now run the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "foo 42 Bar 666";
ScopeLexer lexer = new ScopeLexer(new ANTLRStringStream(src));
ScopeParser parser = new ScopeParser(new CommonTokenStream(lexer));
parser.parse();
}
}
you will see the following being printed to the console:
1 = foo
2 = 42
3 = Bar
4 = 666
P.S.
I don't know what language you're parsing, but enabling global backtracking is usually overkill and can have quite an impact on the performance of your parser. Computer languages often are ambiguous in just a few cases. Instead of enabling global backtracking, you really should look into adding syntactic predicates, or enabling backtracking on those rules that are ambiguous. See The Definitive ANTLR Reference for more info.

How to pass CommonTree parameter to an Antlr rule

I am trying to do what I think is a simple parameter passing to a rule in Antlr 3.3:
grammar rule_params;
options
{
output = AST;
}
rule_params
: outer;
outer: outer_id '[' inner[$outer_id.tree] ']';
inner[CommonTree parent] : inner_id '[' ']';
outer_id : '#'! ID;
inner_id : '$'! ID ;
ID : ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' )* ;
So the inner[CommonTree parent] generates the following:
inner4=inner((outer_id2!=null?((Object)outer_id2.tree):null));
resulting in this error:
The method inner(CommonTree) in the type rule_paramsParser is not applicable for the arguments (Object)
As best I can tell, this is the exact same as the example in the Antrl book:
classDefinition[CommonTree mod]
(Kindle Location 3993) - sorry I don't know the page number but it is in the middle of the book in chapter 9, section labeled "Creating Nodes with Arbitrary Actions".
Thanks for any help.
M
If you don't explicitly specify the tree to be used in your grammar, .tree (which is short for getTree()) will return a java.lang.Object and a CommonTree will be used as default Tree implementation. To avoid casting, set the type of tree in your options { ... } section:
options
{
output=AST;
ASTLabelType=CommonTree;
}