Antlr4 discards remaining tokens instead of bailing out - error-handling

I am using Antlr4, and here is a simplified grammar I wrote:
grammar BooleanExpression;
/*******************************
* Parser Rules
*******************************/
booleanTerm
: booleanLiteral (KW_OR booleanLiteral)+
| booleanLiteral
;
id
: IDENTIFIER
;
booleanLiteral
: KW_TRUE
| KW_FALSE
;
/*******************************
* Lexer Rules
*******************************/
KW_TRUE
: 'true'
;
KW_FALSE
: 'false'
;
KW_OR
: 'or'
;
IDENTIFIER
: (SIMPLE_LATIN)+
;
fragment
SIMPLE_LATIN
: 'A' .. 'Z'
| 'a' .. 'z'
;
WHITESPACE
: [ \t\n\r]+ -> skip
;
I used a BailErrorStategy and BailLexer like below:
public class BailErrorStrategy extends DefaultErrorStrategy {
/**
* Instead of recovering from exception e, rethrow it wrapped in a generic
* IllegalArgumentException so it is not caught by the rule function catches.
* Exception e is the "cause" of the IllegalArgumentException.
*/
#Override
public void recover(Parser recognizer, RecognitionException e) {
throw new IllegalArgumentException(e);
}
/**
* Make sure we don't attempt to recover inline; if the parser successfully
* recovers, it won't throw an exception.
*/
#Override
public Token recoverInline(Parser recognizer) throws RecognitionException {
throw new IllegalArgumentException(new InputMismatchException(recognizer));
}
/** Make sure we don't attempt to recover from problems in subrules. */
#Override
public void sync(Parser recognizer) {
}
#Override
protected Token getMissingSymbol(Parser recognizer) {
throw new IllegalArgumentException(new InputMismatchException(recognizer));
}
}
public class BailLexer extends BooleanExpressionLexer {
public BailLexer(CharStream input) {
super(input);
//removeErrorListeners();
//addErrorListener(new ConsoleErrorListener());
}
#Override
public void recover(LexerNoViableAltException e) {
throw new IllegalArgumentException(e); // Bail out
}
#Override
public void recover(RecognitionException re) {
throw new IllegalArgumentException(re); // Bail out
}
}
Everything works okay except one case. I tried the following expression:
true OR false
I expect this expression to be rejected and an IllegalArgumentException is thrown because the 'or' token should be lower case instead of upper case. But it turned out Antlr4 didn't reject this expression and the expression is tokenized into "KW_TRUE IDENTIFIER KW_FALSE" (which is expected, upper case 'OR' will be considered as an IDENTIFIER), but the parser didn't throw an error during processing this token stream and parsed it into a tree containing only "true" and discarded the remaining "IDENTIFIER KW_FALSE" tokens. I tried different prediction modes but all of them worked like above. I have no idea why it works like this and did some debugging, and it eventually led to to this piece of code in Antlr:
ATNConfigSet reach = computeReachSet(previous, t, false);
if ( reach==null ) {
// if any configs in previous dipped into outer context, that
// means that input up to t actually finished entry rule
// at least for SLL decision. Full LL doesn't dip into outer
// so don't need special case.
// We will get an error no matter what so delay until after
// decision; better error message. Also, no reachable target
// ATN states in SLL implies LL will also get nowhere.
// If conflict in states that dip out, choose min since we
// will get error no matter what.
int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);
if ( alt!=ATN.INVALID_ALT_NUMBER ) {
// return w/o altering DFA
return alt;
}
throw noViableAlt(input, outerContext, previous, startIndex);
}
The code "int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);" returned the second alternative in booleanTerm (because "true" matches the second alternative "booleanLiteral") but since it is not equal to ATN.INVALID_ALT_NUMBER, noViableAlt is not thrown immediately. According to the Java comments there, "We will get an error no matter what, so delay until after decision" but it seems no error was thrown eventually.
I really have no idea how to make Antlr reports an error in this case, could some one shed me some light on this? Any help is appreciated, thanks.

If your top-level rule does not end with an explicit EOF, then ANTLR is not required to parse to the end of the input sequence. Rather than throw an exception, it simply parsed the valid portion of the sequence you gave it.
The following start rule would force it to parse the entire input sequence as a single booleanTerm.
start : booleanTerm EOF;
Also, BailErrorStrategy is provided by the ANTLR 4 runtime, and throws a more informative ParseCancellationException than the one shown in your example.

Related

how to catch minor errors?

I have a little ANTLR v4 grammer and I am implementing a visitor on it.
Lets say it is a simple calculator and every input must be terminated with a ";"
e.g. x=4+5;
If I do not put the ; at the end, then it is working too but I get a output the teminal.
line 1:56 missing ';' at '<EOF>'
Seems it can find the rule and more or less ignores the missing terminal ";".
I would prefer a strict error or an exception instead of this soft information.
The output is generated by the line
ParseTree tree = parser.input ()
Is there a way I can intensify the error-handling and check for that kind of error?
Yes, you can. Like you, I wanted a 100% perfect parse from user-submitted text and so created a strict error handler that prevents recovery from even simple errors.
The first step is in removing the default error listeners and adding your own STRICT error handler:
AntlrInputStream inputStream = new AntlrInputStream(stream);
BailLexer lexer = new BailLexer(inputStream); // TALK ABOUT THIS AT BOTTOM
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
LISBASICParser parser = new LISBASICParser(tokenStream);
parser.RemoveErrorListeners(); // UNHOOK ERROR HANDLER
parser.ErrorHandler = new StrictErrorStrategy(); // REPLACE WITH YOUR OWN
LISBASICParser.CalculationContext context = parser.calculation();
CalculationVisitor visitor = new CalculationVisitor();
visitor.VisitCalculation(context);
Here's my StrictErrorStrategy class. It inherits from the DefaultErrorStrategy class and overrides the two 'recovery' methods that are letting small errors like your semicolon error be recoverable:
public class StrictErrorStrategy : DefaultErrorStrategy
{
public override void Recover(Parser recognizer, RecognitionException e)
{
IToken token = recognizer.CurrentToken;
string message = string.Format("parse error at line {0}, position {1} right before {2} ", token.Line, token.Column, GetTokenErrorDisplay(token));
throw new Exception(message, e);
}
public override IToken RecoverInline(Parser recognizer)
{
IToken token = recognizer.CurrentToken;
string message = string.Format("parse error at line {0}, position {1} right before {2} ", token.Line, token.Column, GetTokenErrorDisplay(token));
throw new Exception(message, new InputMismatchException(recognizer));
}
public override void Sync(Parser recognizer) { }
}
Overriding these two methods allows you to stop (in this case with an exception that is caught elsewhere) on ANY parser error. And making the Sync method empty prevents the normal 're-sync after error' behavior from happening.
The final step is in catching all LEXER errors. You do this by creating a new class that inherits from your main lexer class; it overrides the Recover() method like so:
public class BailLexer : LISBASICLexer
{
public BailLexer(ICharStream input) : base(input) { }
public override void Recover(LexerNoViableAltException e)
{
string message = string.Format("lex error after token {0} at position {1}", _lasttoken.Text, e.StartIndex);
BasicEnvironment.SyntaxError = message;
BasicEnvironment.ErrorStartIndex = e.StartIndex;
throw new ParseCanceledException(BasicEnvironment.SyntaxError);
}
}
(Edit: In this code, BasicEnvironment is a high-level context object I used in the application to hold settings, errors, results, etc. So if you decide to use this, either do as another reader commented below, or substitute your own context/container.)
With this in place, even small errors during the lexing step will be caught as well. With these two overridden classes in place, the user of my app must supply absolutely perfect syntax to get a successful execution. There you go!
Because my ANTLR is in Java I add the answer here too. But it is the same idea as the accepted answer.
TempParser parser = new TempParser (tokens);
parser.removeErrorListeners ();
parser.addErrorListener (new BaseErrorListener ()
{
#Override
public void syntaxError (final Recognizer <?,?> recognizer, Object sym, int line, int pos, String msg, RecognitionException e)
{
throw new AssertionError ("ANTLR - syntax-error - line: " + line + ", position: " + pos + ", message: " + msg);
}
});

Exceptions thrown while soft asserting fail the subsequent tests

As per title, I'm trying to run a test case in a loop. To be able to calculate the number of failed assertions, I'm expecting that if AssertJ is trying to assert the returned value from a method call, it should softly fail a single iteration and carry on. Otherwise, it defies the purpose of soft assertions. Here's a snippet illustrating this:
public static void main(String[] args) {
SoftAssertions softAssertions = new SoftAssertions();
softAssertions.assertThat(throwException(10)).isTrue();
softAssertions.assertThat(throwException(10)).isTrue();
softAssertions.assertThat(throwException(1)).isTrue();
softAssertions.assertAll();
}
private static boolean throwException(int stuff){
if(stuff == 1){
throw new RuntimeException();
}
return true;
}
The output:
Exception in thread "main" java.lang.RuntimeException
at eLCMUpdate.throwException(MyClass.java:101)
at eLCMUpdate.main(MyClass.java:95)
I'm missing something here. Am I doing something wrong?
The problem in the code softAssertions.assertThat(throwException(10)).isTrue(); is that if the exception is thrown then assertThat is not executed at all.
What you need is to lazy evaluate the code you are passing in assertThat, you can do this with AssertJ assertThatCode as below:
final SoftAssertions softAssertions = new SoftAssertions();
softAssertions.assertThatCode(() -> throwException(10)).doesNotThrowAnyException();
softAssertions.assertThatCode(() -> throwException(1)).isInstanceOf(RuntimeException.class);
softAssertions.assertAll();
According to my understanding soft assertions work on boolean values and not on exceptions.
Also: if you throw an exception before calling softAssertions.assertAll(), obviously this method will also never be executed. This is actually the cause of the behaviour you reported.
Just try to debug through your code and you will see that the softAssertions.assertAll() is never called.
Soft assertions will work properly if you change your code to:
#Test
void soft_assertions() {
SoftAssertions softAssertions = new SoftAssertions();
softAssertions.assertThat(checkCondition(10)).isTrue();
softAssertions.assertThat(checkCondition(10)).isTrue();
softAssertions.assertThat(checkCondition(1)).isTrue();
softAssertions.assertThat(checkCondition(2)).isTrue();
softAssertions.assertThat(checkCondition(20)).isTrue();
softAssertions.assertAll();
}
private static boolean checkCondition(int stuff){
if(stuff == 1 || stuff == 2){
return false;
}
return true;
}
This will output the result of multiple assertions and not stop on the evaluation of the first failed assertion.
Output:
org.assertj.core.api.SoftAssertionError:
The following 2 assertions failed:
1)
Expecting:
<false>
to be equal to:
<true>
but was not.
at JsonStewardshipCustomerConversionTest.soft_assertions(JsonStewardshipCustomerConversionTest.java:301)
2)
Expecting:
<false>
to be equal to:
<true>
but was not.
at JsonStewardshipCustomerConversionTest.soft_assertions(JsonStewardshipCustomerConversionTest.java:302)
Update
SoftAssertion does not seem to fit your purpose.
I suggest you use instead JUnit 5 assertAll. According to my tests it evaluates all conditions in an assertAll block and survives exceptions too. The problem here is you need JUnit 5 which is probably not largely adopted yet.
Here is an example with a failure on a boolean condition and also an exception. Both are reported in the console.
#Test
void soft_assertions() {
assertAll("Check condition",
() -> assertThat(checkCondition(9)).isTrue(),
() -> assertThat(checkCondition(10)).isTrue(),
() -> assertThat(checkCondition(11)).isTrue(),
() -> assertThat(checkCondition(2)).isTrue(), // Throws exception
() -> assertThat(checkCondition(3)).isFalse(), // fails
() -> assertThrows(IllegalArgumentException.class, () -> {
checkCondition(1);
})
);
}
private static boolean checkCondition(int stuff) {
if (stuff == 1 || stuff == 2) {
throw new IllegalArgumentException();
}
return true;
}
You will see this in the output:
org.opentest4j.MultipleFailuresError: Check condition (2 failures)
<no message> in java.lang.IllegalArgumentException
Expecting:
<true>
to be equal to:
<false>
but was not.

After Antlr4 recognize error, how to ask the application to automatically fix it?

We know Antlr4 is using the sync-and-return recovery mechanism. For example, I have the following simple grammar:
grammar Hello;
r : prefix body ;
prefix: 'hello' ':';
body: INT ID ;
INT: [0-9]+ ;
ID : [a-z]+ ;
WS : [ \t\r\n]+ -> skip ;
I use the following listener to grab the input:
public class HelloLoader extends HelloBaseListener {
String input;
public void exitR(HelloParser.RContext ctx) {
input = ctx.getText();
}
}
The main method in my HelloRunner looks like this:
public static void main(String[] args) throws IOException {
CharStream input = CharStreams.fromStream(System.in);
HelloLexer lexer = new HelloLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
HelloParser parser = new HelloParser(tokens);
ParseTree tree = parser.r();
ParseTreeWalker walker = new ParseTreeWalker();
HelloLoader loader = new HelloLoader();
walker.walk(loader, tree);
System.out.println(loader.input);
}
Now if I enter a correct input "hello : 1 morning", I will get hello:1morning, as expected.
What if an incorrect input "hello ; 1 morning"? I will get the following output:
line 1:6 token recognition error at: ';'
line 1:8 missing ':' at '1'
hello<missing ':'>1morning
It seems that Antlr4 automatically recognized a wrong token ";" and delete it; however, it will not smartly add ":" in the corresponding place, but just claim <missing ':'>.
My question is: is there some way to solve this problem so that when Antlr found an error it will automatically fix it? How to achieve this coding? Do we need other tools?
Typically the input for a parser comes from some source file that contains some code or text that (supposedly) conforms to some grammar. A typical use scenario for syntax errors is to alert the user so that the source file can be corrected.
As the commented noted, you can insert your own error recovery system, but before trying to insert a single token into the token stream and recover, please consider that it would be a very limited solution. Why? Consider a much richer grammar where for a given token, many -- perhaps dozens or hundreds -- of other tokens can legally follow it. How would a single-token replacement strategy work then?
The hello.g4 example is the epitome of a trivial grammar, the "hello world" of ANTLR. But most of the time, for non-trivial grammars, the best we can do with imperfect syntax is to simply alert the user so the syntax can be corrected.

Force semantic error (failed predicate) in ANTLR4

Basically, I've extended the BaseErrorListener, and I need to know when the error is semantic and when it's syntactic. So I want the following to give me a failed predicate exception, but I'm getting a NoViableAltException instead (I know the counting is working, because I can print out the value of things, and it's correct). Is there a way I can re-work it to do what I want? In my example below, I want there to be a failed predicate exception if we don't end up with 6 things.
grammar Test;
#parser::members {
int things = 0;
}
.
.
.
samplerule : THING { things++; } ;
.
.
.
// Want this to be a failed predicate instead of NoViableAltException
anotherrule : ENDTOKEN { things == 6 }? ;
.
.
.
I'm already properly getting failed predicate exceptions with the following (for a different scenario):
somerule : { Integer.valueOf(getCurrentToken().getText()) < 256 }? NUMBER ;
.
.
.
NUMBER : [0-9]+ ;
In ANTLR 4, predicates should only be used in cases where your input leads to two different possible parse trees (ambiguous grammar) and the default handling is producing the wrong parse tree. You should create a listener or visitor implementation containing your logic for semantic validation of the source.
Due to 280Z28's answer and the apparent fact that predicates should not be used for what I was trying to do, I went a different route.
If you know what you're looking for, ANTLR4's documentation is actually pretty useful--visit Parser.getCurrentToken()'s documentation and poke around further to see what more you can do with my implementation below.
My driver ended up looking something like the following:
// NameOfMyGrammar.java
public class NameOfMyGrammar {
public static void main(String[] args) throws Exception {
String inputFile = args[0];
try {
ANTLRInputStream input = new ANTLRFileStream(inputFile);
NameOfMyGrammarLexer lexer = new NameOfMyGrammarLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
MyCustomParser parser = new MyCustomParser(tokens);
try {
// begin parsing at "start" rule
ParseTree tree = parser.start();
// can print out parse tree if you want..
} catch (RuntimeException e) {
// Handle errors if you want..
}
} catch (IOException e) {
System.err.println("Error: " + e);
}
}
// extend ANTLR-generated parser
private static class MyCustomParser extends NameOfMyGrammarParser {
// Constructor (my understanding is that you DO need this)
public MyCustomParser(TokenStream input) {
super(input);
}
#Override
public Token getCurrentToken() {
// Do your semantic checking as you encounter tokens here..
// Depending on how you want to handle your errors, you can
// throw exceptions, print out errors, etc.
// Make sure you end by returning the current token
return _input.LT(1);
}
}
}

Force antlr3 to immediately exit when a rule fails

I've got a rule like this:
declaration returns [RuntimeObject obj]:
DECLARE label value { $obj = new RuntimeObject($label.text, $value.text); };
Unfortunately, it throws an exception in the RuntimeObject constructor because $label.text is null. Examining the debug output and some other things reveals that the match against "label" actually failed, but the Antlr runtime "helpfully" continues with the match for the purpose of giving a more helpful error message (http://www.antlr.org/blog/antlr3/error.handling.tml).
Okay, I can see how this would be useful for some situations, but how can I tell Antlr to stop doing that? The defaultErrorHandler=false option from v2 seems to be gone.
I don't know much about Antlr, so this may be way off base, but the section entitled "Error Handling" on this migration page looks helpful.
It suggests you can either use #rulecatch { } to disable error handling entirely, or override the mismatch() method of the BaseRecogniser with your own implementation that doesn't attempt to recover. From your problem description, the example on that page seems like it does exactly what you want.
You could also override the reportError(RecognitionException) method, to make it rethrow the exception instead of print it, like so:
#parser::members {
#Override
public void reportError(RecognitionException e) {
throw new RuntimeException(e);
}
}
However, I'm not sure you want this (or the solution by ire_and_curses), because you will only get one error per parse attempt, which you can then fix, just to find the next error. If you try to recover (ANTLR does it okay) you can get multiple errors in one try, and fix all of them.
You need to override the mismatch and recoverFromMismatchedSet methods to ensure an exception is thrown immediately (examples are for Java):
#members {
protected void mismatch(IntStream input, int ttype, BitSet follow) throws RecognitionException {
throw new MismatchedTokenException(ttype, input);
}
public Object recoverFromMismatchedSet(IntStream input, RecognitionException e, BitSet follow) throws RecognitionException {
throw e;
}
}
then you need to change how the parser deals with those exceptions so they're not swallowed:
#rulecatch {
catch (RecognitionException e) {
throw e;
}
}
(The bodies of all the rule-matching methods in your parser will be enclosed in try blocks, with this as the catch block.)
For comparison, the default implementation of recoverFromMismatchedSet inherited from BaseRecognizer:
public Object recoverFromMismatchedSet(IntStream input, RecognitionException e, BitSet follow) throws RecognitionException {
if (mismatchIsMissingToken(input, follow)) {
reportError(e);
return getMissingSymbol(input, e, Token.INVALID_TOKEN_TYPE, follow);
}
throw e;
}
and the default rulecatch:
catch (RecognitionException re) {
reportError(re);
recover(input,re);
}