Consuming Error tokens in antlr4 - antlr

Here is my grammar i am trying to give input as
alter table ;
everything works fine but when i give
altasder table; alter table ;
it gives me an error on first string as expected but i want is to parse the second command ignoring the first 'altasder table;'
grammar Hello;
start : compilation;
compilation : sql*;
sql : altercommand;
altercommand : ALTER TABLE SEMICOLON;
ALTER: 'alter';
TABLE: 'table';
SEMICOLON : ';';
how can i achieve it???
I have used the DefualtError stategy but still its not wotking
import org.antlr.v4.runtime.DefaultErrorStrategy;
import org.antlr.v4.runtime.Parser;
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.TokenStream;
import org.antlr.v4.runtime.misc.IntervalSet;
public class CustomeErrorHandler extends DefaultErrorStrategy {
#Override
public void recover(Parser recognizer, RecognitionException e) {
// TODO Auto-generated method stub
super.recover(recognizer, e);
TokenStream tokenStream = (TokenStream)recognizer.getInputStream();
if (tokenStream.LA(1) == HelloParser.SEMICOLON )
{
IntervalSet intervalSet = getErrorRecoverySet(recognizer);
tokenStream.consume();
consumeUntil(recognizer, intervalSet);
}
}
}
main class :
public class Main {
public static void main(String[] args) throws IOException {
ANTLRInputStream ip = new ANTLRInputStream("altasdere table ; alter table ;");
HelloLexer lex = new HelloLexer(ip);
CommonTokenStream token = new CommonTokenStream(lex);
HelloParser parser = new HelloParser(token);
parser.setErrorHandler(new CustomeErrorHandler());
System.out.println(parser.start().toStringTree(parser));
}
}
myoutput :
line 1:0 token recognition error at: 'alta'
line 1:4 token recognition error at: 's'
line 1:5 token recognition error at: 'd'
line 1:6 token recognition error at: 'e'
line 1:7 token recognition error at: 'r'
line 1:8 token recognition error at: 'e'
line 1:9 token recognition error at: ' '
(start compilation)
why its not moving to second command ?

Need to use the DefaultErrorStrategy to control how the parser behaves in response to recognition errors. Extend as necessary, modifying the #recover method, to consume tokens up to the desired parsing restart point in the token stream.
A naive implementation of #recover would be:
#Override
public void recover(Parser recognizer, RecognitionException e) {
if (e instanceof InputMismatchException) {
int ttype = recognizer.getInputStream().LA(1);
while (ttype != Token.EOF && ttype != HelloParser.SEMICOLON) {
recognizer.consume();
ttype = recognizer.getInputStream().LA(1);
}
} else {
super.recover(recognizer, e);
}
}
Adjust the while condition as necessary to identify the next valid point to resume recognition.
Note, the error messages are due to the lexer being unable to match extraneous input characters. To remove the error messages, add as the last lexer rule:
ERR_TOKEN : . ;

Related

ANTLR4. Accessing context objects with actions

Parsing input files with size bigger than computer memory requires three steps:
The use of unbuffered character and token streams
Copy text out of sliding buffer and store in tokens
Ask the parser not to create parse trees
The objective is a simplified grammar to parse a file composed of many lines, each of one composed of many words.
With this requirement in mind, the following code uses #members action and sub classing the parser generated by ANTLR
The method printPagesAndWords receives a List of LineContext objects.
It prints the total number of lines (with the start method provided by LineContext), but it cannot access to the number of WORDs per line, whith the method WORD() provided by the LineContext object.
This is the output obtained:
Number of lines in page: 3
Line start token: word1
Number of words in line: 0
Line start token: 2323
Number of words in line: 0
Line start token: 554545
Number of words in line: 0
Furthermore, If I try to get the WORDs in a line, for example changing the line
System.out.println("Number of words in line:\t"+row.WORD().size()+"\n");
by the line
System.out.println("Number of words in line:\t"+row.WORD(1).getText()+"\n");
The following exception is thrown:
Exception in thread "main" java.lang.NullPointerException
at TestContext.printPagesAndWords(TestContext.java:11)
at ContextParser.read(ContextParser.java:130)
at Main.main(Main.java:11)
The following is the complete set of files:
ContextLexer.g4
lexer grammar ContextLexer;
NL
: '\r'? '\n'
;
WORD
: ~[ \t\n\r]+
;
SPACE
: [ \t] -> skip
;
ContextParser.g4
parser grammar ContextParser;
options {
tokenVocab=ContextLexer;
}
#members{
void printPagesAndWords(List<LineContext> rows){};
}
read
: dataLine+=line* {printPagesAndWords($dataLine);}
;
line
: WORD* NL
;
TestContext which extends ContextParser
import org.antlr.v4.runtime.TokenStream;
import java.util.List;
public class TestContext extends ContextParser {
public TestContext(TokenStream input) {
super(input);
}
void printPagesAndWords(List<LineContext> rows){
System.out.println("Number of lines in page:\t" + rows.size()+"\n");
for(LineContext row: rows){
System.out.println("Line start token:\t\t\t"+row.start.getText());
System.out.println("Number of words in line:\t"+row.WORD().size()+"\n");
}
};
}
The Main Class:
import org.antlr.v4.runtime.*;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
String source = "word1 word2 number anothernumber \n 2323 55r ere\n554545 lll 545\n";
ContextLexer lexer = new ContextLexer(CharStreams.fromString(source));
lexer.setTokenFactory(new CommonTokenFactory(true));
TokenStream tokens = new UnbufferedTokenStream<CommonToken>(lexer);
TestContext parser = new TestContext(tokens);
parser.setBuildParseTree(false);
parser.read();
}
}

how to report grammar ambiguity in antlr4

According to the antlr4 book (page 159), and using the grammar Ambig.g4, grammar ambiguity can be reported by:
grun Ambig stat -diagnostics
or equivalently, in code form:
parser.removeErrorListeners();
parser.addErrorListener(new DiagnosticErrorListener());
parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
The grun command reports the ambiguity properly for me, using antlr-4.5.3. But when I use the code form, I dont get the ambiguity report. Here is the command trace:
$ antlr4 Ambig.g4 # see the book's page.159 for the grammar
$ javac Ambig*.java
$ grun Ambig stat -diagnostics < in1.txt # in1.txt is as shown on page.159
line 1:3 reportAttemptingFullContext d=0 (stat), input='f();'
line 1:3 reportAmbiguity d=0 (stat): ambigAlts={1, 2}, input='f();'
$ javac TestA_Listener.java
$ java TestA_Listener < in1.txt # exits silently
The TestA_Listener.java code is the following:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.atn.*; // for PredictionMode
import java.util.*;
public class TestA_Listener {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
AmbigLexer lexer = new AmbigLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AmbigParser parser = new AmbigParser(tokens);
parser.removeErrorListeners(); // remove ConsoleErrorListener
parser.addErrorListener(new DiagnosticErrorListener());
parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
parser.stat();
}
}
Can somebody please point out how the above java code should be modified, to print the ambiguity report?
For completeness, here is the code Ambig.g4 :
grammar Ambig;
stat: expr ';' // expression statement
| ID '(' ')' ';' // function call statement
;
expr: ID '(' ')'
| INT
;
INT : [0-9]+ ;
ID : [a-zA-Z]+ ;
WS : [ \t\r\n]+ -> skip ;
And here is the input file in1.txt :
f();
Antlr4 is a top-down parser, so for the given input, the parse match is unambiguously:
stat -> expr -> ID -> ( -> ) -> stat(cnt'd) -> ;
The second stat alt is redundant and never reached, not ambiguous.
To resolve the apparent redundancy, a predicate might be used:
stat: e=expr {isValidExpr($e)}? ';' #exprStmt
| ID '(' ')' ';' #funcStmt
;
When isValidExpr is false, the function statement alternative will be evaluated.
I waited for several days for other people to post their answers. Finally after several rounds of experimenting, I found an answer:
The following line should be deleted from the above code. Then we get the same ambiguity report as given by grun.
parser.removeErrorListeners(); // remove ConsoleErrorListener
The following code will be work
public static void main(String[] args) throws IOException {
CharStream input = CharStreams.fromStream(System.in);
AmbigLexer lexer = new AmbigLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AmbigParser parser = new AmbigParser(tokens);
//parser.removeErrorListeners(); // remove ConsoleErrorListener
parser.addErrorListener(new org.antlr.v4.runtime.DiagnosticErrorListener()); // add ours
parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
parser.stat(); // parse as usual
}

antlr displayRecognitionError

Using antlr 3 with java
I'm having trouble printing the text from the token here:
#lexer::members {
public void displayRecognitionError(String[] tokenNames, RecognitionException e) {
System.err.println("Encountered an illegal char " + getText() + " on line "+getLine()+":"+getCharPositionInLine ());
}
}
I'm making a more detailed error report for the lexer grammar.
The thing is that when the error occurs (the user enters a token like : which isn't defined) it only shows "Encountered an illegal char on line x:y", instead it should show the invalid character : between char and on line x:y.
What can I do to show the invalid character, line and column?

antlr, how to fix grammar so (varname)? fails parsing if it exists but is wrong

I am rewriting this post(same question though, less words).
I rewrote my grammar to just have at the top level this
statement: SELECT EOF!;
#removed all other rules and tokens and still not working
SELECT : ('S'|'s')('E'|'e')('L'|'l')('E'|'e')('C'|'c')('T'|'t');
I then have the following java code(There is no parse method on my Parser like I see on everyone else's posts :( :( )...
#Test
public void testGrammar() throws RecognitionException {
String sql = "select p FROM TABLE p left join p.security s where p.numShares = :shares and s.securityType = :type";
ANTLRStringStream stream = new ANTLRStringStream(sql);
NoSqlLexer lexer = new NoSqlLexer(stream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
NoSqlParser parser = new NoSqlParser(tokenStream);
try {
CommonTree tree = (CommonTree) parser.statement().getTree();
Assert.fail("should fail parsing");
} catch(Exception e) {
log.info("Exception", e);
}
}
I should get an Exception on parsing but I do not :( :(. This is a very simple grammar with an EOF but it just is not working. Any ideas why? Am I calling the wrong parsing statement?
NOTE: I am trying to make a much bigger problem simpler by cutting the problem down to this simple thing which seems not to work.
full (very small) grammar file is here
grammar NoSql;
options {
output = AST;
language = Java;
}
tokens {
}
#header {
package com.alvazan.orm.parser.antlr;
}
#lexer::header {
package com.alvazan.orm.parser.antlr;
}
#rulecatch {
catch (RecognitionException e)
{
throw e;
}
}
statement: SELECT EOF!;
SELECT : ('S'|'s')('E'|'e')('L'|'l')('E'|'e')('C'|'c')('T'|'t');
WS :
( ' '
| '\t'
| '\r'
| '\n'
)+
{ $channel=HIDDEN; }
;
ESC :
'\\' ('"'|'\''|'\\')
;
thanks,
Dean
I should get an Exception on parsing but I do not :( :(. This is a very simple grammar with an EOF but it just is not working.
If you didn't "tell" ANTLR to throw an exception, then ANTLR might not throw an exception/error in case of invalid input. In many cases, ANTLR tries to recover from mismatched/invalid input and tries to continue parsing. You will see output being written on the stderr in your case, but you might not get an exception.
If you do want an exception in your case, see: ANTLR not throwing errors on invalid input

ANTLR, heterogeneous AST problem

I examine heterogeneous trees in ANTLR (using ANTLRWorks 1.4.2).
Here is the example of what I have already done in ANTLR.
grammar test;
options {
language = java;
output = AST;
}
tokens {
PROGRAM;
VAR;
}
#members {
class Program extends CommonTree {
public Program(int ttype) {
token = new CommonToken(ttype, "<start>");
}
}
}
start
: program var function
// Works fine:
//-> ^(PROGRAM program var function)
// Does not work (described below):
-> ^(PROGRAM<Program> program var function)
;
program
: 'program'! ID ';'!
;
var
: TYPE^ ID ';'!
;
function
: ID '('! ')'! ';'!
;
TYPE
: 'int'
| 'string'
;
ID
: ('a'..'z' | 'A'..'Z')+
;
WHITESPACE
: (' ' | '\t' '\n'| '\r' | '\f')+ {$channel = HIDDEN;}
;
Sample input:
program foobar;
int foo;
bar();
When I use rewrite rule ^(PROGRAM<Program> program var function), ANTLR stumbles over and I get AST like this:
Whereas when I use this rewrite rule ^(PROGRAM program var function) it works:
Could anyone explain where am I wrong, please? Frankly, I do not really get the idea of heterogeneous trees and how do I use <…> syntax in ANTLR.
What do r0 and r1 mean (first picture)?
I have no idea what these r0 and r1 mean: I don't use ANTLRWorks for debugging, so can't comment on that.
Also, language = java; causes ANTLR 3.2 to produce the error:
error(10): internal error: no such group file java.stg
error(20): cannot find code generation templates java.stg
error(10): internal error: no such group file java.stg
error(20): cannot find code generation templates java.stg
ANTLR 3.2 expects it to be language = Java; (capital "J"). But, by default the target is Java, so, mind as well remove the language = ... entirely.
Now, as to you problem: I cannot reproduce it. As I mentioned, I tested it with ANTLR 3.2, and removed the language = java; part from your grammar, after which everything went as (I) expected.
Enabling the rewrite rule -> ^(PROGRAM<Program> program var function) produces the following ATS:
and when enabling the rewrite rule -> ^(PROGRAM program var function) instead, the following AST is created:
I tested both rewrite rules this with the following class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("program foobar; int foo; bar();");
testLexer lexer = new testLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
testParser parser = new testParser(tokens);
testParser.start_return returnValue = parser.start();
CommonTree tree = (CommonTree)returnValue.getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
And the images are produced using graph.gafol.net (and the output of the Main class, of course).