How to highlight an Xtext cross-reference differently for targets of different types? - eclipse-plugin

I have an Xtext grammar which reads (in part):
grammar mm.ecxt.MMLanguage hidden(WS, COMMENT)
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
...
Statement:
ConstantStatement |
VariableStatement |
LabeledStatement |
...
LabeledStatement:
EssentialHypothesisStatement |
...
ConstantStatement:
DOLLAR_C (constants+=ConstDecl)+ DOLLAR_DOT;
VariableStatement:
DOLLAR_V (variables+=VarDecl)+ DOLLAR_DOT;
EssentialHypothesisStatement:
name=LABEL DOLLAR_E (symbols+=[Decl|MATHSYMBOL])+ DOLLAR_DOT;
Decl: ConstDecl | VarDecl;
ConstDecl returns ConstDecl: name=MATHSYMBOL;
VarDecl returns VarDecl: name=MATHSYMBOL;
MATHSYMBOL: PARENOPEN | PARENCLOSE | QUESTIONMARK | COMPRESSED | TLABEL | WORD;
...
(The full grammar is MMLanguage.xtext from current commit 328a5e7 of https://github.com/marnix/metamath-eclipse-xtext/.)
My question: How do I highlight the symbols in an EssentialHypothesisStatement, by using a different color for constants and variables? So if the MATHSYMBOL refers to a ConstDecl, then it should be highlighted one way, and some other way for a VarDecl.
I've tried to create an ISemanticHighlightingCalculator in all kinds of ways, but I can't seem to detect what the actual reference type is, neither through the node model nor through the Ecore model. On the one hand, the grammar-related methods only tell me that the reference goes to a Decl. On the other hand, the Ecore model's EReferences tell me whether the target is a ConstDecl or a VarDecl, but there I can't find the location of the source MATHSYMBOL.
Note that I prefer to use the node model (as opposed to the Ecore model) since I also want to highlight comments, and for performance reasons I cannot afford multiple passes over the document.
What is a good/canonical/efficient/simple way to achieve this?

from EObject perspective have a look at org.eclipse.xtext.nodemodel.util.NodeModelUtils.findNodesForFeature(EObject, EStructuralFeature)
from the node model perspective you can use EObjectAtOffsetHelper
sample grammar
Model:
defs+=Def*
uses+=Use*
;
Def:
ADef | BDef;
ADef:
"adef" name=ID
;
BDef:
"bdef" name=ID
;
Use:
"use" def=[Def]
;
And here the Impl
public class MyDslSemanticHighlightingCalculator implements ISemanticHighlightingCalculator {
#Inject
private MyDslGrammarAccess ga;
#Inject
private EObjectAtOffsetHelper helper;
#Override
public void provideHighlightingFor(XtextResource resource, IHighlightedPositionAcceptor accoptor, CancelIndicator cancelIndicator) {
if (resource == null)
return;
IParseResult parseResult = resource.getParseResult();
if (parseResult == null || parseResult.getRootNode() == null)
return;
BidiTreeIterable<INode> tree = parseResult.getRootNode().getAsTreeIterable();
for (INode node : tree) {
if (cancelIndicator.isCanceled()) {
return;
}
if (node.getGrammarElement() instanceof CrossReference) {
if (ga.getUseAccess().getDefDefCrossReference_1_0() == node.getGrammarElement()) {
EObject target = helper.resolveElementAt(resource, node.getOffset());
if (target instanceof ADef) {
accoptor.addPosition(node.getOffset(), node.getLength(), HighlightingStyles.COMMENT_ID);
} else if (target instanceof BDef) {
accoptor.addPosition(node.getOffset(), node.getLength(), HighlightingStyles.STRING_ID);
}
}
}
}
}
}

Related

Antlr: how to switch on token type in Visitor implementation

I'm playing around with Antlr, designing a toy language, which I think is where most people start! - I had a question on how best to think about switching on token type.
consider a 'function call' in the language, where a function can consume a string, number or variable - for example like the below (project() is the function call)
project("ABC") vs project(123) vs project($SOME_VARIABLE)
I have the alteration operator in my grammar, so the grammar parses the right thing, but in the visitor code, it would be nice to tell the difference between the three versions of the above.
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
try {
s1 = ctx.STRING_LITERAL().getText();
}catch(Exception e){}
try{
s2 = ctx.NUM().getText();
}catch(Exception e){}
System.out.println("Created Project via => " + ctx.getChild(1).toString());
}
The code above worked, depending on whether s1 or s2 are null, I can infer how I was called (with a literal or a number, I haven't shown the variable case above), but I'm interested if there is a better or more elegant way - for example switching on token type inside the visitor code to actually process the language.
The grammar I had for the above was
createproj: 'project('WS?(STRING_LITERAL|NUM)')';
and when I use the intellij antlr plugin, it seems to know the token type of the argument to the project() function - but I don't seem to be able to get to it from my code.
You could do something like this:
createproj
: 'project' '(' WS? param ')'
;
param
: STRING_LITERAL
| NUM
;
and in your visitor code:
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
switch(ctx.param().start.getType()) {
case YourLexerName.STRING_LITERAL:
...
case YourLexerName.NUM:
...
...
}
}
so by inlining the token in the grammar I had originally, I've lost the opportunity to inspect it in the visitor code?
No really, you could also do it like this:
createproj
: 'project' '(' WS? param_token=(STRING_LITERAL | NUM) ')'
;
and could then do this:
#Override
public ASTRoot visitCreateproj(projectmgmtParser.CreateprojContext ctx) {
switch(ctx.param_token.getType()) {
case YourLexerName.STRING_LITERAL:
...
case YourLexerName.NUM:
...
...
}
}
Just make sure you don't mix lexer rules (tokens) and parser rules in your set param_token=( ... ). When it's a parser rule, ctx.param_token.getType() will fail (it must then be ctx.param_token.start.getType()). That is why I recommended adding an extra parser rule, because this would then still work:
param
: STRING_LITERAL
| NUM
| some_parser_rule
;

Xbase Interpreter: Could not access field on instance: null

I am testing the idea of making my dsl Jvm compatible and I wanted to test the possibility of extending Xbase and using the interpreter. I have tried to make a minimal test project to use with the interpreter but I am getting a runtime error. I think I understand the general concepts of adapting Xbase, but am unsure about how the setup/entrypoints for the interpreter and could not find any information regarding the error I am getting or how to resolve. Here are the relevant files for my situation:
Text.xtext:
import "http://www.eclipse.org/xtext/xbase/Xbase" as xbase
import "http://www.eclipse.org/xtext/common/JavaVMTypes" as types
Program returns Program:
{Program}
'program' name=ID '{'
variables=Var_Section?
run=XExpression?
'}'
;
Var_Section returns VarSection:
{VarSection}
'variables' '{'
decls+=XVariableDeclaration+
'}'
;
#Override // Change syntax
XVariableDeclaration returns xbase::XVariableDeclaration:
type=JvmTypeReference name=ID '=' right=XLiteral ';'
;
#Override // Do not allow declarations outside of variable region
XExpressionOrVarDeclaration returns xbase::XExpression:
XExpression;
TestJvmModelInferrer:
def dispatch void infer(Program element, IJvmDeclaredTypeAcceptor acceptor, boolean isPreIndexingPhase) {
acceptor.accept(element.toClass(element.fullyQualifiedName)) [
documentation = element.documentation
if (element.variables !== null) {
for (decl : element.variables.decls) {
members += decl.toField(decl.name, decl.type) [
static = true
initializer = decl.right
visibility = JvmVisibility.PUBLIC
]
}
}
if (element.run !== null) {
members += element.run.toMethod('main', typeRef(Void::TYPE)) [
parameters += element.run.toParameter("args", typeRef(String).addArrayTypeDimension)
visibility = JvmVisibility.PUBLIC
static = true
body = element.run
]
}
]
}
Test case:
#Inject ParseHelper<Program> parseHelper
#Inject extension ValidationTestHelper
#Inject XbaseInterpreter interpreter
#Test
def void basicInterpret() {
val result = parseHelper.parse('''
program program1 {
variables {
int var1 = 0;
double var2 = 3.4;
}
var1 = 13
}
''')
result.assertNoErrors
var interpretResult = interpreter.evaluate(result.run)
println(interpretResult.result)
Partial stack trace:
java.lang.IllegalStateException: Could not access field: program1.var1 on instance: null
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._assignValueTo(XbaseInterpreter.java:1262)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.assignValueTo(XbaseInterpreter.java:1221)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter._doEvaluate(XbaseInterpreter.java:1213)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.doEvaluate(XbaseInterpreter.java:216)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.internalEvaluate(XbaseInterpreter.java:204)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.evaluate(XbaseInterpreter.java:190)
at org.eclipse.xtext.xbase.interpreter.impl.XbaseInterpreter.evaluate(XbaseInterpreter.java:180)
The interpreter does only support expressions, but does not work with types that are created by a JvmModelInferrer. Your code tries to work with fields of such an inferred type.
Rather than using the interpreter, I'd recommend to use an InMemoryCompiler in your test. The domainmodel example may serve as an inspiration: https://github.com/eclipse/xtext-eclipse/blob/c2b15c3ec118c4c200e2b28ea72d8c9116fb6800/org.eclipse.xtext.xtext.ui.examples/projects/domainmodel/org.eclipse.xtext.example.domainmodel.tests/xtend-gen/org/eclipse/xtext/example/domainmodel/tests/XbaseIntegrationTest.java
You may find this project interesting, which (among other stuff) implements an interpreter for Xtend based on the Xbase interpreter. It might be a bit outdated, though, and also will not fully support all Xtend concepts. But it could be a starting point, and your contrbutions are welcome :-)
https://github.com/kbirken/xtendency

specifying a grammar rule of "appearing in any order but only at most once"

Say I have three symbols A, B, C.
In ANTLR, how do I specify that in a sentence, A, B, and C can appear at most once and that they can occur in any order. (For eg., ABC, BCA are both legit)
I tried
(A | B | C)*
knowing it'd only take care of the "any order" part, but couldn't think of a way to say it can only appear at most once.
Edited: I've tried using boolean flags, which worked but seems too hairy - there has to be a simpler way, yes?
myrule;
{
boolean aSeen = false;
boolean bSeen = false;
boolean cSeen = false;
}
:
( A { if (aSeen) throw RuntimeException("alraedy seen") else aSeen = true; }
| B { if (bSeen) throw RuntimeException("alraedy seen") else bSeen = true; }
| C { if (cSeen) throw RuntimeException("alraedy seen") else cSeen = true; }
)*
;
Since you mention there could be many, many permutations, I would instead opt to keep the grammar dead simple and handle this in the visitor or listener, for example:
public class ValuesListener : ValuesBaseListener
{
bool isASeen = false; // "seen flag here"
public override void ExitA(ValuesParser.AContext context)
{
if (isASeen) // already parsed this once
<throw exception to stop and inform user>
else // first time parsing this, so process and set flag so it won't happen again
{
isASeen = true; // never gets reset during this tree walk
<perform normal processing here>
}
}
}
Then your grammar can be something like
myrule: someothertoken myRuleOptions* ;
myRuleOptions
: A
| B
| C
| ...etc.
My reason? There are ways to do it with predicates as suggested above, but for readability and maintainability by engineers inexperienced in ANTLR4 but very experienced in the target language, I would consider this approach. In my environment, I often hand off ANTLR projects to engineers who simply follow a pattern that I establish, and who don't really understand ANTLR. This is easier for them to follow.

ANTLR4 change listener during parse

I have an ANTLR4 listener which handles a standard and well-formed grammar, however am struggling with how to deal the non-standard implementations. Although all of the variants go through the lexer without problems the parse stage is a lot trickier.
A traditional way of doing this would be something like
// Header of document
variant = STANDARD;
if (header.indexOf("microsoft") != -1) {
variant = MICROSOFT;
} else if (header.indexOf("google") != -1) {
variant = GOOGLE;
}
...
// Parsing a particular element
if (variant.equals(MICROSOFT)) {
// Microsoft-specific stuff
} else if (variant.equals(GOOGLE)) {
// Google-specific stuff
} else {
// Standard stuff
}
but this quickly becomes unmaintainable. The obvious solution is to have a ParseTreeListener for the standard implementation and then subclass it for each variant, but I don't know which variant it is until I've started the parse.
So how can I either switch from one listener to another part-way through the parse, or restart the parse with a new listener once I know which variant I'm dealing with?
If these variants occur frequently, you might want to consider embedding custom code to handle context sensitive parsing by using predicates (the {...}? construct in the following pseudo grammar):
rule
: { boolean-expression-a }? a-alternative
| { boolean-expression-b }? b-alternative
| /* fall through */ not-a-or-b-alternative
;
Let's say you want to parse a file containing chunks. A chunk consists of a header and a data row. In the header you can set your variant. The data of a normal variant contains 3 NUMBERs, Google's variant contains 2 NUMBERs and Microsoft's variant contains a single NUMBER. An example of such a file would look like this:
header: none
data: 1 2 3
header: google
data: 4 5
header: microsoft
data: 6
And here's a demo of a context sensitive ANTLR v4 grammar able to parse this:
grammar T;
#parser::members {
enum Variant {
GOOGLE,
MICROSOFT,
OTHER;
public static Variant tryValueOf(String name) {
try {
return Variant.valueOf(name.toUpperCase());
}
catch(Exception e) {
return OTHER;
}
}
}
private Variant variant = Variant.OTHER;
}
parse
: chunk+ EOF
;
chunk
: header data
;
header
: K_HEADER COLON NAME {variant = Variant.tryValueOf($NAME.text);}
;
data
: {variant == Variant.MICROSOFT}? K_DATA COLON NUMBER #MicrosoftData
| {variant == Variant.GOOGLE}? K_DATA COLON NUMBER NUMBER #GoogleData
| K_DATA COLON NUMBER NUMBER NUMBER #OtherData
;
K_DATA : 'data';
K_HEADER : 'header';
NAME : [a-zA-Z]+;
NUMBER : [0-9]+;
COLON : ':';
SPACE : [ \t\r\n] -> skip;
Resulting in the following parse:

How do I rewrite a subtree with a composite root using ANTLR?

I have an antlr grammer with subtrees like this:
^(type ID)
that I want to convert to:
^(type DUMMY ID)
where type is 'a'|'b'.
Note: what I really want to do is convert anonymous instantiations to explicit by generating dummy names.
I've narrowed it down to the grammars below, but I'm getting this:
(a bar) (b bar)
got td
got bu
Exception in thread "main" org.antlr.runtime.tree.RewriteEmptyStreamException: rule type
at org.antlr.runtime.tree.RewriteRuleElementStream._next(RewriteRuleElementStream.java:157)
at org.antlr.runtime.tree.RewriteRuleSubtreeStream.nextNode(RewriteRuleSubtreeStream.java:77)
at Pattern.bu(Pattern.java:382)
The error message continues. My debug so far:
The input made it through the initial grammar generating two trees. a bar and b bar.
The second grammar does match the trees. it's printing td and bu.
The rewrite crashes, but I have no idea why? What does RewriteEmptyStreamException mean.
What the proper way to do this kind of a rewrite?
My main grammer Rewrite.g:
grammar Rewrite;
options {
output=AST;
}
#members{
public static void main(String[] args) throws Exception {
RewriteLexer lexer = new RewriteLexer(new ANTLRStringStream("a foo\nb bar"));
RewriteParser parser = new RewriteParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.test().getTree();
System.out.println(tree.toStringTree());
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
Pattern p = new Pattern(nodes);
CommonTree newtree = (CommonTree) p.downup(tree);
}
}
type
: 'a'
| 'b'
;
test : id+;
id : type ID -> ^(type ID["bar"]);
DUMMY : 'dummy';
ID : ('a'..'z')+;
WS : (' '|'\n'|'r')+ {$channel=HIDDEN;};
and Pattern.g
tree grammar Pattern;
options {
tokenVocab = Rewrite;
ASTLabelType=CommonTree;
output=AST;
filter=true; // tree pattern matching mode
}
topdown
: td
;
bottomup
: bu
;
type
: 'a'
| 'b'
;
td
: ^(type ID) { System.out.println("got td"); }
;
bu
: ^(type ID) { System.out.println("got bu"); }
-> ^(type DUMMY ID)
;
to do compile:
java -cp ../jar/antlr-3.4-complete-no-antlrv2.jar org.antlr.Tool Rewrite.g
java -cp ../jar/antlr-3.4-complete-no-antlrv2.jar org.antlr.Tool Pattern.g
javac -cp ../jar/antlr-3.4-complete-no-antlrv2.jar *.java
java -classpath .:../jar/antlr-3.4-complete-no-antlrv2.jar RewriteParser
EDIT 1: I have also tried using antlr4 and I get the same crash.
There are two small problems to address to get the rewrite to work, one problem in Rewrite and the other in Pattern.
The Rewrite grammar produces ^(type ID) as root elements in the output AST, as shown in the output (a bar) (b bar). A root element can't be transformed because transforming is actually a form of child-swapping: the element's parent drops the element and replaces it with the new, "transformed" version. Without a parent, you'll get the error Can't set single child to a list. Adding the root is a matter of creating an imaginary token ROOT or whatever name you like and referencing it in your entry-level rule's AST generation like so: test : id+ -> ^(ROOT id+);.
The Pattern grammar, the one producing the error you're getting, is confused by the type rule: type : 'a' | 'b' ; as part of the rewrite. I don't know the low-level details here, but apparently a tree parser doesn't maintain the state of a visited root rule like type in ^(type ID) when writing a transform (or maybe it can't or shouldn't, or maybe it's some other limitation). The easiest way to address this is with the following two changes:
Let text "a" and "b" match rule ID in the lexer by changing rule type in Rewrite from type: 'a' | 'b'; to just type: ID;.
Let rule bu in Pattern match against ^(ID ID) and transform to ^(ID DUMMY ID).
Now with a couple of minor debugging changes to Rewrite's main, input "a foo\nb bar" produces the following output:
(ROOT (a foo) (b bar))
got td
got bu
(a foo) -> (a DUMMY foo)
got td
got bu
(b bar) -> (b DUMMY bar)
(ROOT (a DUMMY foo) (b DUMMY bar))
Here are the files as I've changed them:
Rewrite.g
grammar Rewrite;
options {
output=AST;
}
tokens {
ROOT;
}
#members{
public static void main(String[] args) throws Exception {
RewriteLexer lexer = new RewriteLexer(new ANTLRStringStream("a foo\nb bar"));
RewriteParser parser = new RewriteParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.test().getTree();
System.out.println(tree.toStringTree());
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
Pattern p = new Pattern(nodes);
CommonTree newtree = (CommonTree) p.downup(tree, true); //print the transitions to help debugging
System.out.println(newtree.toStringTree()); //print the final result
}
}
type : ID;
test : id+ -> ^(ROOT id+);
id : type ID -> ^(type ID);
DUMMY : 'dummy';
ID : ('a'..'z')+;
WS : (' '|'\n'|'r')+ {$channel=HIDDEN;};
Pattern.g
tree grammar Pattern;
options {
tokenVocab = Rewrite;
ASTLabelType=CommonTree;
output=AST;
filter=true; // tree pattern matching mode
}
topdown
: td
;
bottomup
: bu
;
td
: ^(ID ID) { System.out.println("got td"); }
;
bu
: ^(ID ID) { System.out.println("got bu"); }
-> ^(ID DUMMY ID)
;
I don't have much experience with tree-patterns, with or without rewrites. But when using rewrite rules in them, I believe your options should also include rewrite=true;. The Definitive ANTLR Reference doesn't handle them, so I'm not entirely sure (have a look at the ANTLR wiki for more info).
However, for such (relatively) simple rewrites, you don't really need a separate grammar. You could make DUMMY an imaginary token and inject it in some other parser rule, like this:
grammar T;
options {
output=AST;
}
tokens {
DUMMY;
}
test : id+;
id : type ID -> ^(type DUMMY["dummy"] ID);
type
: 'a'
| 'b'
;
ID : ('a'..'z')+;
WS : (' '|'\n'|'r')+ {$channel=HIDDEN;};
which would parse the input:
a bar
b foo
into the following AST:
Note that if your lexer is also meant to tokenize the input "dummy" as a DUMMY token, change the tokens { ... } block into this:
tokens {
DUMMY='dummy';
}
and you'd still be able to inject a DUMMY in other rules.