I see answer How to get ANTLR 3.2 to exit upon first error? which is helpful.
However, I can't seem to add these '#' rules without my grammar freaking out. My grammar file is simple:
grammar Exp;
options {
output=AST;
}
program
: includes decls (procedure)* main -> ^(SMALLCPROGRAM includes decls (procedure)* main) //AST - PROGRAM root
;
//Lexer and Parser rules continue below as normal..tested thoroughly and works
But if I try to add any of these # rules, I get errors such as:
grammar file Exp.g has no rules
and:
Exp.g:0:1: syntax error: assign.types: org.antlr.runtime.EarlyExitException
org\antlr\grammar\v3\DefineGrammarItemsWalker.g: node from line 202:4 required (...)+ loop did not match anything at input ';'
Anyone have an idea what the problem is? I simply want to change my grammar so that when I run it from my separate main class (passing input into it using ANTLRStringStream etc) it actually throws an error in the main class when there is a syntactic problem rather than just saying something like:
line 1:57 missing RPAREN at '{'
Before continuing to parse the rest of the input fine. Ultimately, my main class should refuse to parse any syntactically malformed input as defined by my grammar, and should report the errors to the user.
You probably have the order of the sections/blocks incorrectly. Be sure it looks like this:
grammar Exp;
options {
...
}
tokens {
...
}
#parser::header {
...
}
#lexer::header {
...
}
#parser::members {
...
}
#lexer::members {
...
}
I'm guessing you placed an #member or #header section before the tokens { ... } block. tokens { ... } should come directly after options { ... }.
I can also remember that certain 3.x version(s) had an issue with an empty sections: be sure there is at least something in all sections, otherwise omit the empty section.
Related
I am trying to learn kframework, and as an exercise I wanted to attempt to create a high-level language which compiles down to a scripting language for a video game. This high level language does no real execution, just compiles down to the scripting language with rewrite rules.
Example of the original scripting language syntax below
variables {
0: 'message'
}
init {
SetVariable("message", "Test message");
}
rule("press button") {
conditions {
IsButtonPressed(EventPlayer, INTERACT_KEY);
}
actions {
SendMessage(EventPlayer, GetVariable("message"))
}
}
I wanted my high-level language to allow proper variable declarations, so I could write something like this instead, and it would compile down to the script above.
init {
var message = "Test message";
}
rule("press button") {
conditions {
IsButtonPressed(EventPlayer, INTERACT_KEY);
}
actions {
SendMessage(EventPlayer, message)
}
}
I know how to make a simple rewrite rule to replace variable declarations var x = y with SetVariable("x", y), but how could I also append to the variable declaration block at the top?
I could very well be misunderstanding the capabilities of K, or how I am supposed to be going about doing this. Any help would be appreciated.
Typically the way you translate one input program into another output program in K is to have an output cell containing the output program as you construct it and to have a sequence of rules that iteratively removes statements and declarations from the input cell and adds them to the output cell in whatever modified form you are expecting. If you have a situation like this where you want to insert something out of order, the way it is typically done is to have a second cell containing a portion of the output program, and the rule that processes the variable declaration modifies two output cells. And then some rule will match later on and combine the outputs together. In this case, that rule would probably apply when the input program has been exhausted.
Here is roughly what that will look like in K:
rule <k> var X:Id = E:Expr => . ... </k>
<output> init { D:Declarations => append(D, SetVariable(Id2String(X), E)) } </output>
<variables> variables { D:Declarations => append(D, !Y:Int : Id2String(X)) } </variables>
rule <k> . </k>
<output> P:Program => append(P2, P) </output>
<variables> P2:Program => . </variables>
Note that you would have to write the list append functions yourself. If you really care about performance, you should probably use either the List sort or else append to the front of the cons list and then reverse it afterwards, but I simplified for the purposes of explanation.
We know Antlr4 is using the sync-and-return recovery mechanism. For example, I have the following simple grammar:
grammar Hello;
r : prefix body ;
prefix: 'hello' ':';
body: INT ID ;
INT: [0-9]+ ;
ID : [a-z]+ ;
WS : [ \t\r\n]+ -> skip ;
I use the following listener to grab the input:
public class HelloLoader extends HelloBaseListener {
String input;
public void exitR(HelloParser.RContext ctx) {
input = ctx.getText();
}
}
The main method in my HelloRunner looks like this:
public static void main(String[] args) throws IOException {
CharStream input = CharStreams.fromStream(System.in);
HelloLexer lexer = new HelloLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
HelloParser parser = new HelloParser(tokens);
ParseTree tree = parser.r();
ParseTreeWalker walker = new ParseTreeWalker();
HelloLoader loader = new HelloLoader();
walker.walk(loader, tree);
System.out.println(loader.input);
}
Now if I enter a correct input "hello : 1 morning", I will get hello:1morning, as expected.
What if an incorrect input "hello ; 1 morning"? I will get the following output:
line 1:6 token recognition error at: ';'
line 1:8 missing ':' at '1'
hello<missing ':'>1morning
It seems that Antlr4 automatically recognized a wrong token ";" and delete it; however, it will not smartly add ":" in the corresponding place, but just claim <missing ':'>.
My question is: is there some way to solve this problem so that when Antlr found an error it will automatically fix it? How to achieve this coding? Do we need other tools?
Typically the input for a parser comes from some source file that contains some code or text that (supposedly) conforms to some grammar. A typical use scenario for syntax errors is to alert the user so that the source file can be corrected.
As the commented noted, you can insert your own error recovery system, but before trying to insert a single token into the token stream and recover, please consider that it would be a very limited solution. Why? Consider a much richer grammar where for a given token, many -- perhaps dozens or hundreds -- of other tokens can legally follow it. How would a single-token replacement strategy work then?
The hello.g4 example is the epitome of a trivial grammar, the "hello world" of ANTLR. But most of the time, for non-trivial grammars, the best we can do with imperfect syntax is to simply alert the user so the syntax can be corrected.
Basically, I've extended the BaseErrorListener, and I need to know when the error is semantic and when it's syntactic. So I want the following to give me a failed predicate exception, but I'm getting a NoViableAltException instead (I know the counting is working, because I can print out the value of things, and it's correct). Is there a way I can re-work it to do what I want? In my example below, I want there to be a failed predicate exception if we don't end up with 6 things.
grammar Test;
#parser::members {
int things = 0;
}
.
.
.
samplerule : THING { things++; } ;
.
.
.
// Want this to be a failed predicate instead of NoViableAltException
anotherrule : ENDTOKEN { things == 6 }? ;
.
.
.
I'm already properly getting failed predicate exceptions with the following (for a different scenario):
somerule : { Integer.valueOf(getCurrentToken().getText()) < 256 }? NUMBER ;
.
.
.
NUMBER : [0-9]+ ;
In ANTLR 4, predicates should only be used in cases where your input leads to two different possible parse trees (ambiguous grammar) and the default handling is producing the wrong parse tree. You should create a listener or visitor implementation containing your logic for semantic validation of the source.
Due to 280Z28's answer and the apparent fact that predicates should not be used for what I was trying to do, I went a different route.
If you know what you're looking for, ANTLR4's documentation is actually pretty useful--visit Parser.getCurrentToken()'s documentation and poke around further to see what more you can do with my implementation below.
My driver ended up looking something like the following:
// NameOfMyGrammar.java
public class NameOfMyGrammar {
public static void main(String[] args) throws Exception {
String inputFile = args[0];
try {
ANTLRInputStream input = new ANTLRFileStream(inputFile);
NameOfMyGrammarLexer lexer = new NameOfMyGrammarLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
MyCustomParser parser = new MyCustomParser(tokens);
try {
// begin parsing at "start" rule
ParseTree tree = parser.start();
// can print out parse tree if you want..
} catch (RuntimeException e) {
// Handle errors if you want..
}
} catch (IOException e) {
System.err.println("Error: " + e);
}
}
// extend ANTLR-generated parser
private static class MyCustomParser extends NameOfMyGrammarParser {
// Constructor (my understanding is that you DO need this)
public MyCustomParser(TokenStream input) {
super(input);
}
#Override
public Token getCurrentToken() {
// Do your semantic checking as you encounter tokens here..
// Depending on how you want to handle your errors, you can
// throw exceptions, print out errors, etc.
// Make sure you end by returning the current token
return _input.LT(1);
}
}
}
I am using Antlr4, and here is a simplified grammar I wrote:
grammar BooleanExpression;
/*******************************
* Parser Rules
*******************************/
booleanTerm
: booleanLiteral (KW_OR booleanLiteral)+
| booleanLiteral
;
id
: IDENTIFIER
;
booleanLiteral
: KW_TRUE
| KW_FALSE
;
/*******************************
* Lexer Rules
*******************************/
KW_TRUE
: 'true'
;
KW_FALSE
: 'false'
;
KW_OR
: 'or'
;
IDENTIFIER
: (SIMPLE_LATIN)+
;
fragment
SIMPLE_LATIN
: 'A' .. 'Z'
| 'a' .. 'z'
;
WHITESPACE
: [ \t\n\r]+ -> skip
;
I used a BailErrorStategy and BailLexer like below:
public class BailErrorStrategy extends DefaultErrorStrategy {
/**
* Instead of recovering from exception e, rethrow it wrapped in a generic
* IllegalArgumentException so it is not caught by the rule function catches.
* Exception e is the "cause" of the IllegalArgumentException.
*/
#Override
public void recover(Parser recognizer, RecognitionException e) {
throw new IllegalArgumentException(e);
}
/**
* Make sure we don't attempt to recover inline; if the parser successfully
* recovers, it won't throw an exception.
*/
#Override
public Token recoverInline(Parser recognizer) throws RecognitionException {
throw new IllegalArgumentException(new InputMismatchException(recognizer));
}
/** Make sure we don't attempt to recover from problems in subrules. */
#Override
public void sync(Parser recognizer) {
}
#Override
protected Token getMissingSymbol(Parser recognizer) {
throw new IllegalArgumentException(new InputMismatchException(recognizer));
}
}
public class BailLexer extends BooleanExpressionLexer {
public BailLexer(CharStream input) {
super(input);
//removeErrorListeners();
//addErrorListener(new ConsoleErrorListener());
}
#Override
public void recover(LexerNoViableAltException e) {
throw new IllegalArgumentException(e); // Bail out
}
#Override
public void recover(RecognitionException re) {
throw new IllegalArgumentException(re); // Bail out
}
}
Everything works okay except one case. I tried the following expression:
true OR false
I expect this expression to be rejected and an IllegalArgumentException is thrown because the 'or' token should be lower case instead of upper case. But it turned out Antlr4 didn't reject this expression and the expression is tokenized into "KW_TRUE IDENTIFIER KW_FALSE" (which is expected, upper case 'OR' will be considered as an IDENTIFIER), but the parser didn't throw an error during processing this token stream and parsed it into a tree containing only "true" and discarded the remaining "IDENTIFIER KW_FALSE" tokens. I tried different prediction modes but all of them worked like above. I have no idea why it works like this and did some debugging, and it eventually led to to this piece of code in Antlr:
ATNConfigSet reach = computeReachSet(previous, t, false);
if ( reach==null ) {
// if any configs in previous dipped into outer context, that
// means that input up to t actually finished entry rule
// at least for SLL decision. Full LL doesn't dip into outer
// so don't need special case.
// We will get an error no matter what so delay until after
// decision; better error message. Also, no reachable target
// ATN states in SLL implies LL will also get nowhere.
// If conflict in states that dip out, choose min since we
// will get error no matter what.
int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);
if ( alt!=ATN.INVALID_ALT_NUMBER ) {
// return w/o altering DFA
return alt;
}
throw noViableAlt(input, outerContext, previous, startIndex);
}
The code "int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);" returned the second alternative in booleanTerm (because "true" matches the second alternative "booleanLiteral") but since it is not equal to ATN.INVALID_ALT_NUMBER, noViableAlt is not thrown immediately. According to the Java comments there, "We will get an error no matter what, so delay until after decision" but it seems no error was thrown eventually.
I really have no idea how to make Antlr reports an error in this case, could some one shed me some light on this? Any help is appreciated, thanks.
If your top-level rule does not end with an explicit EOF, then ANTLR is not required to parse to the end of the input sequence. Rather than throw an exception, it simply parsed the valid portion of the sequence you gave it.
The following start rule would force it to parse the entire input sequence as a single booleanTerm.
start : booleanTerm EOF;
Also, BailErrorStrategy is provided by the ANTLR 4 runtime, and throws a more informative ParseCancellationException than the one shown in your example.
Like this file,in my opinion only declaration is enough.
Can anyone explain why rules are necessary in lexical analysis?
In my opinion they're only necessary in .y files...
By rule I'm talking about blocks like :
rdels {
if ($this->smarty->auto_literal) {
$this->token = Smarty_Internal_Templateparser::TP_OTHER;
} else {
$this->token = Smarty_Internal_Templateparser::TP_RDEL;
$this->yypopstate();
}
}
When to yypopstate,and yypushstate?
You enter states, when there can be ambiguous meanings to character input.
If the lexer encounters a " (quote), you might enter a state (yypushstate) called "string" in which any following character, which would otherwise have a special meaning (i.e. +, -, etc.) is considered part of the string. The "string" state is finished (yypopstate) when the lexer encounters another ".
In flex, these states are called start conditions.