This does not compile in ANTLR 4:
Number options { backtrack=true; }
: (IntegerLiteral Range)=> IntegerLiteral { $type = IntegerLiteral; }
| (FloatLiteral)=> FloatLiteral { $type = FloatLiteral; }
| IntegerLiteral { $type = IntegerLiteral; }
;
because of backtrace= true... What happened to it?
WHat should I use in ANTLR 4 instread of it?
At the moment, there are no rule-level options in ANTLR v4. Note that backtrack=true is no longer needed since the new parsing algorithm has no need for backtracking. Also note that in ANTLR v3, backtrack=true was not valid inside lexer rules, only parser rules.
Related
I'm playing around with Antlr, designing a toy language, which I think is where most people start! - I had a question on how best to think about switching on token type.
consider a 'function call' in the language, where a function can consume a string, number or variable - for example like the below (project() is the function call)
project("ABC") vs project(123) vs project($SOME_VARIABLE)
I have the alteration operator in my grammar, so the grammar parses the right thing, but in the visitor code, it would be nice to tell the difference between the three versions of the above.
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
try {
s1 = ctx.STRING_LITERAL().getText();
}catch(Exception e){}
try{
s2 = ctx.NUM().getText();
}catch(Exception e){}
System.out.println("Created Project via => " + ctx.getChild(1).toString());
}
The code above worked, depending on whether s1 or s2 are null, I can infer how I was called (with a literal or a number, I haven't shown the variable case above), but I'm interested if there is a better or more elegant way - for example switching on token type inside the visitor code to actually process the language.
The grammar I had for the above was
createproj: 'project('WS?(STRING_LITERAL|NUM)')';
and when I use the intellij antlr plugin, it seems to know the token type of the argument to the project() function - but I don't seem to be able to get to it from my code.
You could do something like this:
createproj
: 'project' '(' WS? param ')'
;
param
: STRING_LITERAL
| NUM
;
and in your visitor code:
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
switch(ctx.param().start.getType()) {
case YourLexerName.STRING_LITERAL:
...
case YourLexerName.NUM:
...
...
}
}
so by inlining the token in the grammar I had originally, I've lost the opportunity to inspect it in the visitor code?
No really, you could also do it like this:
createproj
: 'project' '(' WS? param_token=(STRING_LITERAL | NUM) ')'
;
and could then do this:
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
switch(ctx.param_token.getType()) {
case YourLexerName.STRING_LITERAL:
...
case YourLexerName.NUM:
...
...
}
}
Just make sure you don't mix lexer rules (tokens) and parser rules in your set param_token=( ... ). When it's a parser rule, ctx.param_token.getType() will fail (it must then be ctx.param_token.start.getType()). That is why I recommended adding an extra parser rule, because this would then still work:
param
: STRING_LITERAL
| NUM
| some_parser_rule
;
I have a simple grammar like so:
grammar Test;
generator : expression;
expression
: NUMBER # Number
| ID # String
| expression '+' expression # Add
;
NUMBER: [0-9]+ [0-9]*;
ID : [a-zA-Z_]+ [a-zA-Z0-9_]* ;
I want the expression 5xx to be considered an error (since it should be 5+xx or 5 or xx). With Antlr 4.6 this would happen, but with antlr 4.7 this doesn't happen.
Here's my full test:
#Test()
public void doATest() {
TestLexer lexer = new TestLexer(new ANTLRInputStream("5xx"));
TestParser parser = new TestParser(new CommonTokenStream(lexer));
//Walk the tree and throw if there are any error nodes.
ParseTreeWalker.DEFAULT.walk(new TestBaseListener() {
#Override public void visitErrorNode(ErrorNode node) {
//Throws with 4.6, not with 4.7
throw new RuntimeException("Hit error node: " + node);
}
}, parser.generator());
}
The other odd observation I have is that including the expression '+' expression rule is important, without this 4.6 won't generate an error either.
Is there some special flag that I need to set somewhere to indicate that an input stream should be exactly one generator and not have any trailing tokens?
Is there some special flag that I need to set somewhere to indicate that an input stream should be exactly one generator and not have any trailing tokens?
Yes, that's exactly what the EOF token does:
generator : expression EOF;
This way you'll always get an error on extra tokens, regardless of the version of ANTLR or whether or not you include the expression '+' expression rule.
I'm trying to implement a lexer rule for an oracle Q quoted string mechanism where we have something like q'$some string$'
Here you can have any character in place of $ other than whitespace, (, {, [, <, but the string must start and end with the same character. Some examples of accepted tokens would be:
q'!some string!'
q'ssome strings'
Notice how s is the custom delimiter but it is fine to have that in the string as well because we would only end at s'
Here's how I was trying to implement the rule:
Q_QUOTED_LITERAL: Q_QUOTED_LITERAL_NON_TERMINATED . QUOTE-> type(QUOTED_LITERAL);
Q_QUOTED_LITERAL_NON_TERMINATED:
Q QUOTE ~[ ({[<'"\t\n\r] { setDelimChar( (char)_input.LA(-1) ); }
( . { !isValidEndDelimChar() }? )*
;
I have already checked the value I get from !isValidEndDelimChar() and I'm getting a false predicate here at the right place so everything should work, but antlr simply ignores this predicate. I've also tried moving the predicate around, putting that part in a separate rule, and a bunch of other stuff, after a day and a half of research on the same I'm finally raising this issue.
I have also tried to implement it in other ways but there doesn't seem to be a way to implement a custom char delimited string in antlr4 (The antlr3 version used to work).
Not sure why the { ... } action isn't invoked, but it's not needed. The following grammar worked for me (put the predicate in front of the .!):
grammar Test;
#lexer::members {
boolean isValidEndDelimChar() {
return (_input.LA(1) == getText().charAt(2)) && (_input.LA(2) == '\'');
}
}
parse
: .*? EOF
;
Q_QUOTED_LITERAL
: 'q\'' ~[ ({[<'"\t\n\r] ( {!isValidEndDelimChar()}? . )* . '\''
;
SPACE
: [ \t\f\r\n] -> skip
;
If you run the class:
import org.antlr.v4.runtime.*;
public class Main {
public static void main(String[] args) {
Lexer lexer = new TestLexer(CharStreams.fromString("q'ssome strings' q'!foo!'"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
for (Token t : tokens.getTokens()) {
System.out.printf("%-20s %s\n", TestLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
}
}
the following output will be printed:
Q_QUOTED_LITERAL q'ssome strings'
Q_QUOTED_LITERAL q'!foo!'
EOF <EOF>
I am not sure but I think the Antlr backtrack option is not working properly or something...
Here is my grammar:
grammar Test;
options {
backtrack=true;
memoize=true;
}
prog: (code)+;
code
: ABC {System.out.println("ABC");}
| OTHER {System.out.println("OTHER");}
;
ABC : 'ABC';
OTHER : .;
If the input stream is "ABC" then I'll see ABC printed.
If the input stream is "ACD" then I'll see 3 times OTHER printed.
But if the input stream is "ABD" then I'll see
line 1:2 mismatched character 'D' expecting 'C'
line 1:3 required (...)+ loop did not match anything at input ''
but I expect to see three times OTHER, since the input should match the second rule if the first rule fails.
That doesn't make any sense. Why the parser didn't backtrack when it sees that the last character was not 'C'? However, it was ok with "ACD."
Could someone please help me solve this issue???
Thanks for your time!!!
The option backtrack=true applies to parser rules only, not lexer rules.
EDIT
The only work-around I am aware of, is by letting "AB" followed by some other char other than "C" be matched in the same ABC rule and then manually emitting other tokens.
A demo:
grammar Test;
#lexer::members {
List<Token> tokens = new ArrayList<Token>();
public void emit(int type, String text) {
state.token = new CommonToken(type, text);
tokens.add(state.token);
}
public Token nextToken() {
super.nextToken();
if(tokens.size() == 0) {
return Token.EOF_TOKEN;
}
return tokens.remove(0);
}
}
prog
: code+
;
code
: ABC {System.out.println("ABC");}
| OTHER {System.out.println("OTHER");}
;
ABC
: 'ABC'
| 'AB' t=~'C'
{
emit(OTHER, "A");
emit(OTHER, "B");
emit(OTHER, String.valueOf((char)$t));
}
;
OTHER
: .
;
Another solution. this might be a simpler solution though. i made use of "syntactic predicates".
grammar ABC;
#lexer::header {package org.inanme.antlr;}
#parser::header {package org.inanme.antlr;}
prog: (code)+ EOF;
code: ABC {System.out.println($ABC.text);}
| OTHER {System.out.println($OTHER.text);};
ABC : ('ABC') => 'ABC' | 'A';
OTHER : .;
I have a problem while building AST in ANTLR (I'm using ANTLR 3.2, ANTLRWorks 1.4).
This is my grammar:
classDeclaration
:
(
'class' n=IDENTIFIER ('extends' e=IDENTIFIER)?
'{'
…
'}'
)
-> ^(CLASSDECLARATION ^(NAME $n) ^(EXTENDS $e)
;
The problem occurs with optional part of the class — ('extends' e=IDENTIFIER)?.
So the grammar works good with this class declaration:
class Test1 extends AbstractTest1 {
…
}
And fails when I exclude extends part, as follows:
class Test2 {
…
}
ANTLR just stops before this fragment and gives this exception in console:
javax.swing.text.BadLocationException: Position not represented by view
How can I point to ANTLR to handle rewrite rule ^(EXTENDS $e) as optional?
Got the problem solved. Nothing tricky, just had to use common RegExp syntax:
^(EXTENDS $e)?