This is the part of my grammar that make error:
expr : func_name '(' constant (',' constant)* ')' ;
constant
: '"' (~'"')* '"';
WS : (' '|'\t')+ {skip();} ;
And the error is about this part of text:
"w9ygS99Qp_", "vuPfq6YcbX"
The interpreter of ANTLRWorks give me the next leaves, which have a node constant as parent:
"
w9ygS99Qp_",
"
Then it is an NoViableAltException error.
Normally, it should have this leaves:
"
w9ygS99Qp_
"
Apparently, the problem is the _ before the ", because I tried to suppress the _, but the same error appears when te parser meet the next _"
Your constant should be a lexer rule, not a parser rule. Inside a parser rule, ~'"' matches any token other than a double quote-token. It does not match any charatcer except the double quote-char.
Do it like this instead:
expr : func_name '(' Constant (',' Constant)* ')' ;
Constant
: '"' (~'"')* '"';
OK, thus I have to compile the grammar in Java, and then test it in Java, if I understood well?
Yes, or use ANTLRWorks' debugger instead. The debugger works like a charm.
To test in plain Java, do something like this:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("name(\"w9ygS99Qp_\", \"vuPfq6YcbX\")"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.expr();
}
}
Maybe should I use ANTLR4, to be able to use ANTLRWorks correctly?
If you have a choice to use either ANTLR3 or ANTLR4, then go for ANTLR4. Note that there's a new (rewritten) version of ANTLRWorks for ANTLR4 grammars: http://tunnelvisionlabs.com/products/demo/antlrworks
Related
I am trying to match a very basic ANTLR grammar. But ANTLR is keep telling me that he got the input '.' and expects '.' .
The full error is:
line 1:0 extraneous input '.' expecting '.'
line 1:2 missing '*' at '<EOF>'
With the grammar:
grammar regex;
#parser::header
{
package antlr;
}
#lexer::header
{
package antlr;
}
WHITESPACE : (' ' | '\t' | '\n' | '\r') -> channel(HIDDEN);
COMP : '.';
KLEENE : '*';
start : COMP KLEENE;
And input:
.*
Both files have the same charset:
regex.g: text/plain; charset=us-ascii
test.grammar: text/plain; charset=us-ascii
There should be no Lexer rule mix up. Why does this not work as expected?
Given your example grammar and this test class:
import org.antlr.v4.runtime.*;
public class Main {
public static void main(String[] args) {
String source = ".*";
regexLexer lexer = new regexLexer(CharStreams.fromString(source));
regexParser parser = new regexParser(new CommonTokenStream(lexer));
System.out.println(parser.start().toStringTree(parser));
}
}
the following is printed to my console:
(start . *)
My guess is you have either dumbed down the grammar too much causing the error in your original grammar to disappear, or you haven't generated new lexer/parser classes.
I'm trying to create a lexer with multiple modes using Antlr 4.7. My lexer currently is:
ACTIONONLY : 'AO';
BELIEFS : ':Initial Beliefs:' -> mode(INITIAL_BELIEFS);
NAME : ':name:';
WORD: ('a'..'z'|'A'..'Z'|'0'..'9'|'_')+;
COMMENT : '/*' .*? '*/' -> skip ;
LINE_COMMENT : '//' ~[\n]* -> skip ;
NEWLINE:'\r'? '\n' -> skip ;
WS : (' '|'\t') -> skip ;
mode INITIAL_BELIEFS;
GOAL_IB : ':Initial Goal:' -> mode(GOALS);
IB_COMMENT : '/*' .*? '*/' -> skip ;
IB_LINE_COMMENT : '//' ~[\n]* -> skip ;
IB_NEWLINE:'\r'? '\n' -> skip ;
IB_WS : (' '|'\t') -> skip ;
BELIEF_BLOCK: ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'('|')'|','|'.')+;
mode REASONING_RULES;
R1: 'a';
R2: 'b';
mode GOALS;
GL_COMMENT : '/*' .*? '*/' -> skip ;
GL_LINE_COMMENT : '//' ~[\n]* -> skip ;
GL_NEWLINE:'\r'? '\n' -> skip ;
GL_WS : (' '|'\t') -> skip ;
GOAL_BLOCK: ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'('|')'|','|'.')+;
Note that there is no way, at present, to get into the REASONING_RULES mode (so this should not, as I understand it have any effect on the operation of the lexer). Obviously I do want to use this mode, but this is the minimal version of the lexer that seems to display the problem I'm having.
My parser is:
grammar ActionOnly;
options { tokenVocab = ActionOnlyLexer; }
// Mas involving ActionOnly Agents
mas : aoagents;
aoagents: ACTIONONLY (aoagent)+;
// Agent stuff
aoagent :
(ACTIONONLY?)
NAME w=WORD
BELIEFS (bs=BELIEF_BLOCK )?
GOAL_IB gs=GOAL_BLOCK;
and I'm trying to parse:
AO
:name: robot
:Initial Beliefs:
abelief
:Initial Goal:
at(4, 2)
This fails with the error
line 35:0 mismatched input 'at(4,' expecting GOAL_BLOCK
which I'm assuming is because it isn't tokenising correctly.
If I omit rule R2 in the REASONING_RULES mode then it parses correctly (in general I seem to be able to have one rule in REASONING_RULES and it will work, but more than one rule and it fails to match GOAL_BLOCK)
I'm really struggling to see what I'm doing wrong here, but this is the first time I've tried to use lexer modes with Antlr.
I don't get that error when I try your grammars. I also tested with ANTLR 4.7.
Here's my test rig:
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.Token;
public class Main {
public static void main(String[] args) {
String source = "AO\n" +
"\n" +
":name: robot\n" +
"\n" +
":Initial Beliefs:\n" +
"\n" +
"abelief\n" +
"\n" +
":Initial Goal:\n" +
"\n" +
"at(4, 2)";
ActionOnlyLexer lexer = new ActionOnlyLexer(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
System.out.println("[TOKENS]");
for (Token t : tokens.getTokens()) {
System.out.printf(" %-20s %s\n", ActionOnlyLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
System.out.println("\n[PARSE-TREE]");
ActionOnlyParser parser = new ActionOnlyParser(tokens);
ParserRuleContext context = parser.mas();
System.out.println(" "+context.toStringTree(parser));
}
}
And this is printed to my console:
[TOKENS]
ACTIONONLY AO
NAME :name:
WORD robot
BELIEFS :Initial Beliefs:
BELIEF_BLOCK abelief
GOAL_IB :Initial Goal:
GOAL_BLOCK at(4,
GOAL_BLOCK 2)
EOF <EOF>
[PARSE-TREE]
(mas (aoagents AO (aoagent :name: robot :Initial Beliefs: abelief :Initial Goal: at(4,)))
Perhaps you need to generate new lexer/parser classes?
PS. note that ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'('|')'|','|'.')+ can be written as [a-zA-Z0-9_(),.]+
Given this g4 grammar:
grammar smaller;
root
: ( componentDefinition )* EOF;
componentDefinition
: Addr
Id?
Lbrace
Rbrace
Semi
;
ExprElem
: Num
| Id
;
Addr : 'addr' {System.out.println("addr");};
Lbrace : '{' ;
Rbrace : '}' ;
Semi : ';' ;
Id : [a-zA-z0-9_]+ {System.out.println("id");};
Num : [0-9]+;
//------------------------------------------------
// Whitespace and Comments
//------------------------------------------------
Wspace : [ \t]+ -> skip;
Newline : ('\r' '\n'?
| '\n'
) -> skip;
and this file to parse
addr basic {
};
this cmdline:
rm *.class *.java ; java -Xmx500M org.antlr.v4.Tool smaller.g4 ; javac *.java ; cat basic | java org.antlr.v4.runtime.misc.TestRig smaller root -tree
I get this error:
line 2:0 mismatched input 'addr' expecting {<EOF>, 'addr'}
(root addr basic { } ;)
If I remove the ExprElem (which is not used anywhere else in the grammar), the parser works:
addr
id
(root (componentDefinition addr basic { } ;) <EOF>)
Why? Note that this is a greatly reduced version of the grammar. Normally, the ExprElem does have a purpose.
Addr is a literal, so it shouldn't conflict with Id in the way that other questions like this usually do.
Your rule ExprElem is a lexer rule, not a parser rule (it begins with an upercase) and is masking the Addr rule, so, no Addr :(
Also, as ExprElem is a lexer rule and it relies on Id or Num rule. Consequently, when an Id is found, ANTLR lexer gives it the ExprElem token type and not the Id token type.
So, two things, you can either rewrite your ExprElem rule to exprElem (assuming you want a parser rule):
exprElem : Num | Id;
or you can use Id token in your ExprElem as part of the rule but you need something that can differentiate ExprElem from Id (example below, but I really think you want a parser rule):
Addr : 'addr' {System.out.println("addr");};
ExprElem
: Sharp Num // This token use others but defines its own 'pattern'
| Sharp Id
;
Lbrace : '{' ;
Rbrace : '}' ;
Semi : ';' ;
Id : [a-zA-z0-9_]+ {System.out.println("id");};
Num : [0-9]+;
Sharp : '#';
From what I suppose, this is definitely not what you want, but I just put it here to illustrate how lexer rule can reuse others.
When you have doubt about what your token do, do not hesitate to display the recognize tokens. Here is the Java code fragment I often use (I named your grammar test in this case):
public class Main {
public static void main(String[] args) throws InterruptedException {
String txt =
"addr Basic {\n"
+ "\n"
+ "};";
TestLexer lexer = new TestLexer(new ANTLRInputStream(txt));
CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
parser.root();
for (Token t : tokens.getTokens()) {
System.out.println(t);
}
}
}
NOTE: by the way, Num will never be recognized as Id rule can match the same thing. Try this instead:
Id : Letter (Letter | [0-9])*;
Num : [0-9]+;
fragment Letter : [a-zA-z_];
I have a simple grammar to parse files containing identifiers and keywords between brackets (hopefully):
grammar Keyword;
// PARSER RULES
//
entry_point : ('['ID']')*;
// LEXER RULES
//
KEYWORD : '[Keyword]';
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
WS : ( ' ' | '\t' | '\r' | '\n' | '\r\n')
{
$channel = HIDDEN;
};
It works for input:
[Hi]
[Hi]
It returns a NoViableAltException error for input:
[Hi]
[Ki]
If I comment KEYWORD, then it works fine. Also, if I change my grammar to:
grammar Keyword;
// PARSER RULES
//
entry_point : ID*;
// LEXER RULES
//
KEYWORD : '[Keyword]';
ID : '[' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ']';
WS : ( ' ' | '\t' | '\r' | '\n' | '\r\n')
{
$channel = HIDDEN;
};
Then it works. Could you please help me figuring out why?
Best regards.
The 1st grammar fails because whenever the lexer sees "[K", the lexer will enter the KEYWORD rule. If it then encounters something other then "eyword]", "i" in your case, it tries to go back to some other rule that can match "[K". But there is no other lexer rule that starts with "[K" and will therefor throw an exception. Note that the lexer doesn't remove "K" and then tries to match again (the lexer is a dumb machine)!
Your 2nd grammar works, because the lexer now can find something to fall back on when "[Ki" does not get matched by the KEYWORD since ID now includes the "[".
Some keywords (string constant) in my grammar contain capital letters
e.g.
PREV_VALUE : 'PreviousValue';
This causes strange parsing behavior: other tokens that contain same capital letters ('P','V') are parsed incorrectly.
Here's a simplified version of the lexer grammar:
lexer grammar ExpressionLexer;
COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
PREV_VALUE : 'PreviousValue';
fragment DIGIT : ('0'..'9');
fragment LETTER : ('a'..'z'|'A'..'Z'|'_');
fragment TAB : ('\t') ;
fragment NEWLINE : ('\r'|'\n') ;
fragment SPACE : (' ') ;
When I try parsing such expression:
var expression = "P"; //Capital 'P' which included to the keyword 'PreviousValue'
var stringReader = new StringReader(expression);
var input = new ANTLRReaderStream(stringReader);
var expressionLexer = new ExpressionLexer(input);
var tokens = new CommonTokenStream(expressionLexer);
tokens._tokens collection contains one value
[0] = {[#0,1:1='<EOF>',<-1>,1:1]}
It's incorrect.
If I change expression to 'p' (lowercase letter)
tokens._tokens collection contains two values
[0] = {[#0,0:0='p',<0>,1:0]}
[1] = {[#1,1:1='<EOF>',<-1>,1:1]}
It's correct.
When string PREV_VALUE : 'PreviousValue'; is removed from grammar, both expressions are parsed correctly.
Is it possible to use different case in keywords?
Is there any example of using such keywords in ANTLR grammar?
I find it hard to believe a p token is created based on the grammar you posted. Lexer rules that have fragment in front of them will not produce tokens: these rules are only used by other lexer rules.
A simple demo shows this:
lexer grammar ExpressionLexer;
#lexer::members {
public static void main(String[] args) throws Exception {
ExpressionLexer lexer = new ExpressionLexer(new ANTLRStringStream(args[0]));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill(); // remove this line when using ANTLR 3.2 or an older version
System.out.println(tokens);
}
}
COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
PREV_VALUE : 'PreviousValue';
fragment DIGIT : ('0'..'9');
fragment LETTER : ('a'..'z'|'A'..'Z'|'_');
fragment TAB : ('\t') ;
fragment NEWLINE : ('\r'|'\n') ;
fragment SPACE : (' ') ;
Now generate the lexer and compile the .java source file:
java -cp antlr-3.3.jar org.antlr.Tool ExpressionLexer.g
javac -cp antlr-3.3.jar *.java
and run a few tests:
java -cp .:antlr-3.3.jar ExpressionLexer p
line 1:0 no viable alternative at character 'p'
which is correct since there is no (non-fragment) rule that starts with, or matches, a "p".
java -cp .:antlr-3.3.jar ExpressionLexer P
line 1:1 mismatched character '' expecting 'r'
which is correct since the only (non-fragment) rule that starts with a "P" expects an "r" to be the next character (which isn't there).