Antlr: how to switch on token type in Visitor implementation - antlr

I'm playing around with Antlr, designing a toy language, which I think is where most people start! - I had a question on how best to think about switching on token type.
consider a 'function call' in the language, where a function can consume a string, number or variable - for example like the below (project() is the function call)
project("ABC") vs project(123) vs project($SOME_VARIABLE)
I have the alteration operator in my grammar, so the grammar parses the right thing, but in the visitor code, it would be nice to tell the difference between the three versions of the above.
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
try {
s1 = ctx.STRING_LITERAL().getText();
}catch(Exception e){}
try{
s2 = ctx.NUM().getText();
}catch(Exception e){}
System.out.println("Created Project via => " + ctx.getChild(1).toString());
}
The code above worked, depending on whether s1 or s2 are null, I can infer how I was called (with a literal or a number, I haven't shown the variable case above), but I'm interested if there is a better or more elegant way - for example switching on token type inside the visitor code to actually process the language.
The grammar I had for the above was
createproj: 'project('WS?(STRING_LITERAL|NUM)')';
and when I use the intellij antlr plugin, it seems to know the token type of the argument to the project() function - but I don't seem to be able to get to it from my code.

You could do something like this:
createproj
: 'project' '(' WS? param ')'
;
param
: STRING_LITERAL
| NUM
;
and in your visitor code:
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
switch(ctx.param().start.getType()) {
case YourLexerName.STRING_LITERAL:
...
case YourLexerName.NUM:
...
...
}
}
so by inlining the token in the grammar I had originally, I've lost the opportunity to inspect it in the visitor code?
No really, you could also do it like this:
createproj
: 'project' '(' WS? param_token=(STRING_LITERAL | NUM) ')'
;
and could then do this:
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
switch(ctx.param_token.getType()) {
case YourLexerName.STRING_LITERAL:
...
case YourLexerName.NUM:
...
...
}
}
Just make sure you don't mix lexer rules (tokens) and parser rules in your set param_token=( ... ). When it's a parser rule, ctx.param_token.getType() will fail (it must then be ctx.param_token.start.getType()). That is why I recommended adding an extra parser rule, because this would then still work:
param
: STRING_LITERAL
| NUM
| some_parser_rule
;

Related

Xbase Interpreter: Could not access field on instance: null

I am testing the idea of making my dsl Jvm compatible and I wanted to test the possibility of extending Xbase and using the interpreter. I have tried to make a minimal test project to use with the interpreter but I am getting a runtime error. I think I understand the general concepts of adapting Xbase, but am unsure about how the setup/entrypoints for the interpreter and could not find any information regarding the error I am getting or how to resolve. Here are the relevant files for my situation:
Text.xtext:
import "http://www.eclipse.org/xtext/xbase/Xbase" as xbase
import "http://www.eclipse.org/xtext/common/JavaVMTypes" as types
Program returns Program:
{Program}
'program' name=ID '{'
variables=Var_Section?
run=XExpression?
'}'
;
Var_Section returns VarSection:
{VarSection}
'variables' '{'
decls+=XVariableDeclaration+
'}'
;
#Override // Change syntax
XVariableDeclaration returns xbase::XVariableDeclaration:
type=JvmTypeReference name=ID '=' right=XLiteral ';'
;
#Override // Do not allow declarations outside of variable region
XExpressionOrVarDeclaration returns xbase::XExpression:
XExpression;
TestJvmModelInferrer:
def dispatch void infer(Program element, IJvmDeclaredTypeAcceptor acceptor, boolean isPreIndexingPhase) {
acceptor.accept(element.toClass(element.fullyQualifiedName)) [
documentation = element.documentation
if (element.variables !== null) {
for (decl : element.variables.decls) {
members += decl.toField(decl.name, decl.type) [
static = true
initializer = decl.right
visibility = JvmVisibility.PUBLIC
]
}
}
if (element.run !== null) {
members += element.run.toMethod('main', typeRef(Void::TYPE)) [
parameters += element.run.toParameter("args", typeRef(String).addArrayTypeDimension)
visibility = JvmVisibility.PUBLIC
static = true
body = element.run
]
}
]
}
Test case:
#Inject ParseHelper<Program> parseHelper
#Inject extension ValidationTestHelper
#Inject XbaseInterpreter interpreter
#Test
def void basicInterpret() {
val result = parseHelper.parse('''
program program1 {
variables {
int var1 = 0;
double var2 = 3.4;
}
var1 = 13
}
''')
result.assertNoErrors
var interpretResult = interpreter.evaluate(result.run)
println(interpretResult.result)
Partial stack trace:
java.lang.IllegalStateException: Could not access field: program1.var1 on instance: null
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._assignValueTo(XbaseInterpreter.java:1262)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.assignValueTo(XbaseInterpreter.java:1221)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._doEvaluate(XbaseInterpreter.java:1213)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.doEvaluate(XbaseInterpreter.java:216)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.internalEvaluate(XbaseInterpreter.java:204)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.evaluate(XbaseInterpreter.java:190)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.evaluate(XbaseInterpreter.java:180)
The interpreter does only support expressions, but does not work with types that are created by a JvmModelInferrer. Your code tries to work with fields of such an inferred type.
Rather than using the interpreter, I'd recommend to use an InMemoryCompiler in your test. The domainmodel example may serve as an inspiration: https://github.com/eclipse/xtext-eclipse/blob/c2b15c3ec118c4c200e2b28ea72d8c9116fb6800/org.eclipse.xtext.xtext.ui.examples/projects/domainmodel/org.eclipse.xtext.example.domainmodel.tests/xtend-gen/org/eclipse/xtext/example/domainmodel/tests/XbaseIntegrationTest.java
You may find this project interesting, which (among other stuff) implements an interpreter for Xtend based on the Xbase interpreter. It might be a bit outdated, though, and also will not fully support all Xtend concepts. But it could be a starting point, and your contrbutions are welcome :-)
https://github.com/kbirken/xtendency

No way to implement a q quoted string with custom delimiters in Antlr4

I'm trying to implement a lexer rule for an oracle Q quoted string mechanism where we have something like q'$some string$'
Here you can have any character in place of $ other than whitespace, (, {, [, <, but the string must start and end with the same character. Some examples of accepted tokens would be:
q'!some string!'
q'ssome strings'
Notice how s is the custom delimiter but it is fine to have that in the string as well because we would only end at s'
Here's how I was trying to implement the rule:
Q_QUOTED_LITERAL: Q_QUOTED_LITERAL_NON_TERMINATED . QUOTE-> type(QUOTED_LITERAL);
Q_QUOTED_LITERAL_NON_TERMINATED:
Q QUOTE ~[ ({[<'"\t\n\r] { setDelimChar( (char)_input.LA(-1) ); }
( . { !isValidEndDelimChar() }? )*
;
I have already checked the value I get from !isValidEndDelimChar() and I'm getting a false predicate here at the right place so everything should work, but antlr simply ignores this predicate. I've also tried moving the predicate around, putting that part in a separate rule, and a bunch of other stuff, after a day and a half of research on the same I'm finally raising this issue.
I have also tried to implement it in other ways but there doesn't seem to be a way to implement a custom char delimited string in antlr4 (The antlr3 version used to work).
Not sure why the { ... } action isn't invoked, but it's not needed. The following grammar worked for me (put the predicate in front of the .!):
grammar Test;
#lexer::members {
boolean isValidEndDelimChar() {
return (_input.LA(1) == getText().charAt(2)) && (_input.LA(2) == '\'');
}
}
parse
: .*? EOF
;
Q_QUOTED_LITERAL
: 'q\'' ~[ ({[<'"\t\n\r] ( {!isValidEndDelimChar()}? . )* . '\''
;
SPACE
: [ \t\f\r\n] -> skip
;
If you run the class:
import org.antlr.v4.runtime.*;
public class Main {
public static void main(String[] args) {
Lexer lexer = new TestLexer(CharStreams.fromString("q'ssome strings' q'!foo!'"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
for (Token t : tokens.getTokens()) {
System.out.printf("%-20s %s\n", TestLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
}
}
the following output will be printed:
Q_QUOTED_LITERAL q'ssome strings'
Q_QUOTED_LITERAL q'!foo!'
EOF <EOF>

Capturing content which can start with Parser keywords in Xtext

The following is the simplified version of my actual grammar :-
grammar org.hello.World
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate world "http://www.hello.org/World"
Model:
content=AnyContent greetings+=Greeting*;
AnyContent:
(ID | ANY_OTHER)*
;
Greeting:
'<hello>' name=ID '</hello>';
terminal ID:
('a'..'z'|'A'..'Z')+
;
terminal ANY_OTHER:
.
;
So using above grammar if my input is like :-
<hi><hello>world</hello>
Then I am getting an syntax error saying that mismatched character 'i' expecting 'e' at Column 2 .
My requirement is that AnyContent should match "<hi>" , can anyone guide me about how to achieve that?
If you want to make it with Xtext. I advice you to split your problem. You first problem is syntaxic, you need to parser your file. The second problem is semantic, you want to give a "sense" to your objets and tell who is the container. Define the container and the containment for XML can't be done inside your grammar.
Make a custom Ecore and make an easy grammar, with start and end tag. You don't really care about the name of your tag.
Example :
Model returns XmlFile: (StartTag|EndTag|Text)+;
Text returns Text: text=STRING;
StartTag returns StartTag: '<' name=ID '>';
EndTag returns EndTag: '</' name=ID '>';
Change the TokenSource. The token source will give the token to your Parser. You can override the nature of your token, merge or split them.
The idea here is to merge all token outside the between of ">" and "</".
This token represent a Text, so you can create a single token for all elements containing between this elements. Example :
class CustomTokenSource extends XtextTokenStream{
new(TokenSource tokenSource, ITokenDefProvider tokenDefProvider) {
super(tokenSource,tokenDefProvider)
}
override LT(int k) {
var Token token = super.LT(k)
if(token != null && token.text != null) token.tokenOverride(k);
token
}
In this example you need to add your custom code on the method "tokenOverride".
Add your custom token source on your parser :
class XDSLParser extends DSLParser{
override protected XtextTokenStream createTokenStream(TokenSource tokenSource) {
return new CustomTokenSource(tokenSource, getTokenDefProvider());
}
}
Compute the containement : the containment of your elements can be compute after the parsing. After it, you can get your model and change it as you will. To make it, you need to override the method "doParse" of your Parser "XDSLParser" as follow :
override protected IParseResult doParse(String ruleName, CharStream in, NodeModelBuilder nodeModelBuilder, int initialLookAhead) {
var IParseResult result = super.doParse( ruleName, in, nodeModelBuilder, initialLookAhead)
//Give you model
result.rootASTElement;
return result
}
Note : The model that you obtain after the parsing will be flat. The xmlFile Object will contain all the elements in the good order. You need to write an algorithm to build the containement on your AST model.
This will require a lot of tweaking in the grammar due to the nature of the antlr lexer that is used by Xtext. The lexer will not roll back for the keyword <hello>: As soon as it sees a < followed by an h it'll try consume the hello-token. Something along these lines could work though:
Model:
content=AnyContent greetings+=Greeting*;
AnyContent:
(ID | ANY_OTHER | '<' (ID | ANY_OTHER | '/' | '>') | '/' | '>' | 'hello')*
;
Greeting:
'<' 'hello '>' name=ID '<' '/' 'hello' '>';
terminal ID:
('a'..'z'|'A'..'Z')+
;
terminal ANY_OTHER:
.
;
The approach won't scale for real world grammars but maybe it helps to get on the some working track.

antlr rule boolean parameter showing up in syntactic predicate code one level higher, causing compilation errors

I have a grammar that can parse expressions like 1+2-4 or 1+x-y, creating an appropriate structure on the fly which later, given a Map<String, Integer> with appropriate content, can be evaluated numerically (after parsing is complete, i.e. for x or y only known later).
Inside the grammar, there are also places where an expression that can be evaluated on the spot, i.e. does not contain variables, should occur. I figured I could parse these with the same logic, adding a boolean parameter variablesAllowed to the rule, like so:
grammar MiniExprParser;
INT : ('0'..'9')+;
ID : ('a'..'z'| 'A'..'Z')('a'..'z'| 'A'..'Z'| '0'..'9')*;
PLUS : '+';
MINUS : '-';
numexpr returns [Double val]:
expr[false] {$val = /* some evaluation code */ 0.;};
varexpr /*...*/:
expr[true] {/*...*/};
expr[boolean varsAllowed] /*...*/:
e=atomNode[varsAllowed] {/*...*/}
(PLUS e2=atomNode[varsAllowed] {/*...*/}
|MINUS e2=atomNode[varsAllowed] {/*...*/}
)* ;
atomNode[boolean varsAllowed] /*...*/:
(n=INT {/*...*/})
|{varsAllowed}?=> ID {/*...*/}
;
result:
(numexpr) => numexpr {System.out.println("Numeric result: " + $numexpr.val);}
|varexpr {System.out.println("Variable expression: " + $varexpr.text);};
However, the generated Java code does not compile. In the part apparently responsible for the final rule's syntactic predicate, varsAllowed occurs even although the variable is never defined at this level.
/* ... */
else if ( (LA3_0==ID) && ((varsAllowed))) {
int LA3_2 = input.LA(2);
if ( ((synpred1_MiniExprParser()&&(varsAllowed))) ) {
alt3=1;
}
else if ( ((varsAllowed)) ) {
alt3=2;
}
/* ... */
Am I using it wrong? (I am using Eclipse' AntlrIDE 2.1.2 with Antlr 3.5.2.)
This problem is part of the hoisting process the parser uses for prediction. I encountered the same problem and ended up with a member var (or static var for the C target) instead of a parameter.

How to get the evaluation result from the parser expression when using antlr 3?

I'm using ANTLR 3.5. I would like to build a grammar that evaluates boolean expressions like
x=true;
b=false;
c=true;
a=x&&b||c;
and get back the evaluation result via a Java call (like ExprParser.eval() of the above entry will return true.)
I'll look forward for an example.
You can do something like below (using the context of the grammar that I linked to in the comments to the question):
First of all, declare a member to store the latest evaluation result:
#members {
private int __value;
}
Then, set it whenever you compute something
stat: expr NEWLINE { __value = $expr.value; } | // rest of the stat entry
And, finally, return it when all the stats are computed:
// will return 0 if no expr blocks were evaluated
public prog returns [int value]: stat+ {$value = __value;};
In C#, I used slightly different approach — I added an event to the parser and raised it when an expression result could computed. A client can subscribe to this event and receive all the computation results.
#members
{
public event Action<int> Computed;
}
stat: expr NEWLINE { Computed($expr.value); }