What is the purpose of # in antlr grammar - antlr

I have a statement in an ANTLR4 grammar like:
expression : DEFAULT #primitive_expression
;
I don't know what is the meaning of # here.

They're alternative labels:
Alternative Labels
As we saw in Section 7.4, Labeling Rule Alternatives for Precise Event Methods, we can get more precise parse-tree listener events by labeling the outermost alternatives of a rule using the # operator. All alternatives within a rule must be labeled, or none of them. Here are two rules with labeled alternatives.
grammar T;
stat: 'return' e ';' # Return
| 'break' ';' # Break
;
e : e '*' e # Mult
| e '+' e # Add
| INT # Int
;
Alternative labels do not have to be at the end of the line and there does not have to be a space after the # symbol. ANTLR generates a rule context class definition for each label. For example, here is the listener that ANTLR generates:
public interface AListener extends ParseTreeListener {
void enterReturn(AParser.ReturnContext ctx);
void exitReturn(AParser.ReturnContext ctx);
void enterBreak(AParser.BreakContext ctx);
void exitBreak(AParser.BreakContext ctx);
void enterMult(AParser.MultContext ctx);
void exitMult(AParser.MultContext ctx);
void enterAdd(AParser.AddContext ctx);
void exitAdd(AParser.AddContext ctx);
void enterInt(AParser.IntContext ctx);
void exitInt(AParser.IntContext ctx);
}
From: https://github.com/antlr/antlr4/blob/master/doc/parser-rules.md#alternative-labels

Related

Unable to parse APL Symbol using ANTLR

I am trying to parse APL expressions using ANTLR, It is sort of APL source code parser. It parse normal characters but fails to parse special symbols(like '←')
expression = N←0
Lexer
/* Lexer Tokens. */
NUMBER:
(DIGIT)+ ( '.' (DIGIT)+ )?;
ASSIGN:
'←'
;
DIGIT :
[0-9]
;
Output:
[#0,0:1='99',<NUMBER>,1:0]
**[#1,4:6='â??',<'â??'>,2:0**]
[#2,7:6='<EOF>',<EOF>,2:3]
Can some one help me to parse special characters from APL language.
I am following below steps.
Written Grammar
"antlr4.bat" used to generate parser from grammar.
"grun.bat" is used to generate token
"grun.bat" is used to generate token
That just means your terminal cannot display the character properly. There is nothing wrong with the generated parser or lexer not being able to recognise ←.
Just don't use the bat file, but rather test your lexer and parser by writing a small class yourself using your favourite IDE (which can display the characters properly).
Something like this:
grammar T;
expression
: ID ARROW NUMBER
;
ID : [a-zA-Z]+;
ARROW : '←';
NUMBER : [0-9]+;
SPACE : [ \t\r\n]+ -> skip;
and a main class:
import org.antlr.v4.runtime.*;
public class Main {
public static void main(String[] args) {
TLexer lexer = new TLexer(CharStreams.fromString("N ← 0"));
TParser parser = new TParser(new CommonTokenStream(lexer));
System.out.println(parser.expression().toStringTree(parser));
}
}
which will display:
(expression N ← 0)
EDIT
You could also try using the unicode escape for the arrow like this:
grammar T;
expression
: ID ARROW NUMBER
;
ID : [a-zA-Z]+;
ARROW : '\u2190';
NUMBER : [0-9]+;
SPACE : [ \t\r\n]+ -> skip;
and the Java class:
import org.antlr.v4.runtime.*;
public class Main {
public static void main(String[] args) {
String source = "N \u2190 0";
TLexer lexer = new TLexer(CharStreams.fromString(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
System.out.println(source + ": " + parser.expression().toStringTree(parser));
}
}
which will print:
N ← 0: (expression N ← 0)

Antlr 4: Is getting this form of output possible?

Within the context of scanning, what do i need to override, extend, listen to, visit to be able to print out this form of informative output when my text is being scanned?
-- Example output only ---------
DEBUG ... current mode: DEFAULT_MODE
DEBUG ... matching text '#' on rule SHARP ; pushing and switching to DIRECTIVE_MODE
DEBUG ... matching text 'IF" on rule IF ; pushing and switching to IF_MODE
DEBUG ... matching text ' ' on rule WS; skipping
DEBUG ... no match for text %
DEBUG ... no match for text &
DEBUG ... mathcing text '\r\n' on rule EOL; popping mode; current mode: DIRECTIVE_MODE
...
thanks
The solution was a lot simpler than I thought.
You just need to subclass the generated Lexer and override methods such as popMode(), pushMode() to get the printout you want. If you do this you should also override emit() methods as well to get properly sequential and contextual information.
Here's an example in C#:
class ExtendedLexer : MyGeneratedLexer
{
public ExtendedLexer(ICharStream input)
: base(input) { }
public override int PopMode()
{
Console.WriteLine($"Mode is being popped: Line: {Line} Column:{Column} ModeName: {ModeNames[ModeStack.Peek()]}");
return base.PopMode();
}
public override void PushMode(int m)
{
Console.WriteLine($"Mode is being pushed: Line: {Line} Column:{Column} ModeName: {ModeNames[m]}");
base.PushMode(m);
}
public override void Emit(IToken t)
{
Console.WriteLine($"[#{t.TokenIndex},{t.StartIndex}:{t.StopIndex}, <{Vocabulary.GetSymbolicName(t.Type)}> = '{t.Text}']");
base.Emit(t);
}
}
And the output would be something like:
Mode is being pushed: Line: 4 Column:3 ModeName: IF_MODE
[#-1,163:165, <IF> = '#IF']
Mode is being pushed: Line: 4 Column:4 ModeName: CONDITION_MODE
[#-1,166:166, <LPAREN> = '(']
[#-1,167:189, <EXP> = '#setStartDateAndEndDate']
Mode is being popped: Line: 4 Column:28 ModeName: IF_MODE
[#-1,190:190, <RPAREN> = ')']

Consuming Error tokens in antlr4

Here is my grammar i am trying to give input as
alter table ;
everything works fine but when i give
altasder table; alter table ;
it gives me an error on first string as expected but i want is to parse the second command ignoring the first 'altasder table;'
grammar Hello;
start : compilation;
compilation : sql*;
sql : altercommand;
altercommand : ALTER TABLE SEMICOLON;
ALTER: 'alter';
TABLE: 'table';
SEMICOLON : ';';
how can i achieve it???
I have used the DefualtError stategy but still its not wotking
import org.antlr.v4.runtime.DefaultErrorStrategy;
import org.antlr.v4.runtime.Parser;
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.TokenStream;
import org.antlr.v4.runtime.misc.IntervalSet;
public class CustomeErrorHandler extends DefaultErrorStrategy {
#Override
public void recover(Parser recognizer, RecognitionException e) {
// TODO Auto-generated method stub
super.recover(recognizer, e);
TokenStream tokenStream = (TokenStream)recognizer.getInputStream();
if (tokenStream.LA(1) == HelloParser.SEMICOLON )
{
IntervalSet intervalSet = getErrorRecoverySet(recognizer);
tokenStream.consume();
consumeUntil(recognizer, intervalSet);
}
}
}
main class :
public class Main {
public static void main(String[] args) throws IOException {
ANTLRInputStream ip = new ANTLRInputStream("altasdere table ; alter table ;");
HelloLexer lex = new HelloLexer(ip);
CommonTokenStream token = new CommonTokenStream(lex);
HelloParser parser = new HelloParser(token);
parser.setErrorHandler(new CustomeErrorHandler());
System.out.println(parser.start().toStringTree(parser));
}
}
myoutput :
line 1:0 token recognition error at: 'alta'
line 1:4 token recognition error at: 's'
line 1:5 token recognition error at: 'd'
line 1:6 token recognition error at: 'e'
line 1:7 token recognition error at: 'r'
line 1:8 token recognition error at: 'e'
line 1:9 token recognition error at: ' '
(start compilation)
why its not moving to second command ?
Need to use the DefaultErrorStrategy to control how the parser behaves in response to recognition errors. Extend as necessary, modifying the #recover method, to consume tokens up to the desired parsing restart point in the token stream.
A naive implementation of #recover would be:
#Override
public void recover(Parser recognizer, RecognitionException e) {
if (e instanceof InputMismatchException) {
int ttype = recognizer.getInputStream().LA(1);
while (ttype != Token.EOF && ttype != HelloParser.SEMICOLON) {
recognizer.consume();
ttype = recognizer.getInputStream().LA(1);
}
} else {
super.recover(recognizer, e);
}
}
Adjust the while condition as necessary to identify the next valid point to resume recognition.
Note, the error messages are due to the lexer being unable to match extraneous input characters. To remove the error messages, add as the last lexer rule:
ERR_TOKEN : . ;

how to report grammar ambiguity in antlr4

According to the antlr4 book (page 159), and using the grammar Ambig.g4, grammar ambiguity can be reported by:
grun Ambig stat -diagnostics
or equivalently, in code form:
parser.removeErrorListeners();
parser.addErrorListener(new DiagnosticErrorListener());
parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
The grun command reports the ambiguity properly for me, using antlr-4.5.3. But when I use the code form, I dont get the ambiguity report. Here is the command trace:
$ antlr4 Ambig.g4 # see the book's page.159 for the grammar
$ javac Ambig*.java
$ grun Ambig stat -diagnostics < in1.txt # in1.txt is as shown on page.159
line 1:3 reportAttemptingFullContext d=0 (stat), input='f();'
line 1:3 reportAmbiguity d=0 (stat): ambigAlts={1, 2}, input='f();'
$ javac TestA_Listener.java
$ java TestA_Listener < in1.txt # exits silently
The TestA_Listener.java code is the following:
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.atn.*; // for PredictionMode
import java.util.*;
public class TestA_Listener {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(System.in);
AmbigLexer lexer = new AmbigLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AmbigParser parser = new AmbigParser(tokens);
parser.removeErrorListeners(); // remove ConsoleErrorListener
parser.addErrorListener(new DiagnosticErrorListener());
parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
parser.stat();
}
}
Can somebody please point out how the above java code should be modified, to print the ambiguity report?
For completeness, here is the code Ambig.g4 :
grammar Ambig;
stat: expr ';' // expression statement
| ID '(' ')' ';' // function call statement
;
expr: ID '(' ')'
| INT
;
INT : [0-9]+ ;
ID : [a-zA-Z]+ ;
WS : [ \t\r\n]+ -> skip ;
And here is the input file in1.txt :
f();
Antlr4 is a top-down parser, so for the given input, the parse match is unambiguously:
stat -> expr -> ID -> ( -> ) -> stat(cnt'd) -> ;
The second stat alt is redundant and never reached, not ambiguous.
To resolve the apparent redundancy, a predicate might be used:
stat: e=expr {isValidExpr($e)}? ';' #exprStmt
| ID '(' ')' ';' #funcStmt
;
When isValidExpr is false, the function statement alternative will be evaluated.
I waited for several days for other people to post their answers. Finally after several rounds of experimenting, I found an answer:
The following line should be deleted from the above code. Then we get the same ambiguity report as given by grun.
parser.removeErrorListeners(); // remove ConsoleErrorListener
The following code will be work
public static void main(String[] args) throws IOException {
CharStream input = CharStreams.fromStream(System.in);
AmbigLexer lexer = new AmbigLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
AmbigParser parser = new AmbigParser(tokens);
//parser.removeErrorListeners(); // remove ConsoleErrorListener
parser.addErrorListener(new org.antlr.v4.runtime.DiagnosticErrorListener()); // add ours
parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
parser.stat(); // parse as usual
}

SableCC not hitting interpreter methods

I am new to SableCC. Just ran the calculator example at http://sablecc.sourceforge.net/thesis/thesis.html#PAGE26. I used the grammar file and interpreter file as they are, and tried to parse simple arithmetic expression like "45 * 5 + 2". The problem is, the interpreter method caseAMultFactor does not seem to be hit. I see it hit caseAPlusExpr, or caseAMinusExpr if I change the "+" to "-". So does the Start.apply(DepthFirstAdapter) method only go through the top mode node? How can I iterate through all nodes like that sample codes seem to do? I am using Java 1.7 and hope that's not a problem.
For your convenience I have pasted the grammar and interpreter codes here. Thanks for your help.
### Grammar:
Package postfix;
Tokens
number = ['0' .. '9']+;
plus = '+';
minus = '-';
mult = '*';
div = '/';
mod = '%';
l_par = '(';
r_par = ')';
blank = (' ' | 13 | 10)+;
Ignored Tokens
blank;
Productions
expr =
{factor} factor |
{plus} expr plus factor |
{minus} expr minus factor;
factor =
{term} term |
{mult} factor mult term |
{div} factor div term |
{mod} factor mod term;
term =
{number} number |
{expr} l_par expr r_par;
### Interpreter:
package postfix.interpret;
import postfix.analysis.DepthFirstAdapter;
import postfix.node.ADivFactor;
import postfix.node.AMinusExpr;
import postfix.node.AModFactor;
import postfix.node.AMultFactor;
import postfix.node.APlusExpr;
import postfix.node.TNumber;
public class Interpreter extends DepthFirstAdapter
{
public void caseTNumber(TNumber node)
{// When we see a number, we print it.
System.out.print(node);
}
public void caseAPlusExpr(APlusExpr node)
{
System.out.println(node);
}
public void caseAMinusExpr(AMinusExpr node)
{
System.out.println(node);
}
public void caseAMultFactor(AMultFactor node)
{// out of alternative {mult} in Factor, we print the mult.
System.out.print(node.getMult());
}
public void outAMultFactor(AMultFactor node)
{// out of alternative {mult} in Factor, we print the mult.
System.out.print(node.getMult());
}
public void outADivFactor(ADivFactor node)
{// out of alternative {div} in Factor, we print the div.
System.out.print(node.getDiv());
}
public void outAModFactor(AModFactor node)
{// out of alternative {mod} in Factor, we print the mod.
System.out.print(node.getMod());
}
}
What you posted looks fine. You did not post any of the output, nor did you post the code to run the interpreter.
Here's my code (I'm omitting the code for Interpreter as it's the same as yours):
package postfix;
import postfix.parser.*;
import postfix.lexer.*;
import postfix.node.*;
import java.io.*;
public class Compiler {
public static void main(String[] arguments) {
try {
Parser p = new Parser(new Lexer(new PushbackReader(
new StringReader("(45 + 36/2) * 3 + 5 * 2"), 1024)));
Start tree = p.parse();
tree.apply(new Interpreter());
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
and when run, it produces this:
45 36 2 / + 3 * 5 2 * +
Note the * is displayed, as expected.
UPDATE 2015-03-09
First, please copy/paste this grammar into a file named postfix.grammar. It should be the same as the one you have, but just copy/paste anyway:
Package postfix;
Tokens
number = ['0' .. '9']+;
plus = '+';
minus = '-';
mult = '*';
div = '/';
mod = '%';
l_par = '(';
r_par = ')';
blank = (' ' | 13 | 10)+;
Ignored Tokens
blank;
Productions
expr =
{factor} factor |
{plus} expr plus factor |
{minus} expr minus factor;
factor =
{term} term |
{mult} factor mult term |
{div} factor div term |
{mod} factor mod term;
term =
{number} number |
{expr} l_par expr r_par;
Next, run this from a command line (make any necessary directory changes, of course):
java -jar "C:\Program Files\Java\sablecc-3.2\lib\sablecc.jar" src\postfix.grammar
Please ensure that you only have the Java classes from this invocation of SableCC (i.e. make sure any previously generated Java classes are deleted). Then using the Compiler class that I previously posted, try again. I cannot think of any problem with the grammar or problem with version 3.2 of SableCC that would cause the problem you're having. I'm hoping a fresh start will fix the problem.