xText and ANTLR - eclipse-plugin

My current project is focusing on code generation from DSL ( i.e., high-level specification). More specifically, developers write high-level specifications and these specifications are parsed and generate code in Java and Android.
For parser, I have used ANTLR grammar and for code generation I have used StringTemplateFiles.
However, developer writes high-level specifications in notepad. Because of it, I am not able to provide syntax highlighting, coloring , and error handling. To provide this support, I am thinking to use xText.
I am thinking to integrate xText as follows:
Developers will write high-level specifications into editor support provide by xtext (Basically, I will write xtext grammar and generate editor support). Here, Xtext editor will handle syntax coloring, syntax highlighting and error handling.
I will take all these specifications as .txt inputs and then ANTLR parse these files. And generate Java and Android code.
Need your suggestions on the following questions:
(1) How can I extract files, written in xtext editor, and provide input to ANTLR parser? OR (2) Should I stick with xText and try to integrate ANTLR parser and xtext? (kindly advise the - how could I integrate xtext and ANTLR with a simple example) OR (3) Should I use only ANTLR and StringTemplateFiles and try to create Eclipse plugin out of it?
Other alternative suggestions are also welcomed.

You don't need to integrate XText and ANTLR; XText already uses ANTLR for actual parsing.

Xtext is based on Antlr. So no need to integrate Antlr and Xtext.
I advice you to create an Xtext project on Eclipse and to generate the artifacts using the mwe2 file. Then in the src-gen folder you will can find your Antlr grammar generated from your Xtext grammar.
If you want to generate code from your Xtext grammar you can use Xtend. It's provide already everything that you need. See : https://eclipse.org/Xtext/documentation/207_template.html.
Otherwise if you have already an antlr grammar and a generator, you will need to (re) write it in Xtext.

By example:
public class CustomGenerator extends AbstractHandler{
#Override
public Object execute(ExecutionEvent event) throws ExecutionException {
ISelection selection = HandlerUtil.getCurrentSelection(event);
//If your selection is an IFile
//Selection from the Project Explorer
if(selection instanceof IStructuredSelection){
IStructuredSelection structuredSelection = (IStructuredSelection) selection;
Object element = structuredSelection.getFirstElement();
if(element instanceof IFile){
IFile file = (IFile) element;
InputStream contentOfYouFile = ((IFile) element).getContents();
//make your job
}
}
return null;
}
}

Related

Xtext based language within Intellij Idea

I want to make a plugin for a language for the Intellij Idea IDE. The language has been developped using Eclipse Xtext and is open source. A plugin already exists for Eclipse.
My goal is to port this language to Intellij Idea. I want to be able to use Intellij to create source files, to have the specific syntax highlighting and to be able to compile and run programs written with this language.
Is there a simple way to generate the Intellij Idea plugin using the Xtext project?
If not is there an efficient solution to be able to have the specific syntax highlighting in Intellij? (an automatic way if possible, I would prefer not rewriting everything everytime the Xtext project is updated)
Short answer
Yes, with a bit of work.
Long Answer
Sadly, Xtext uses antlr in the background and IntelliJ use their own grammar kit based on Parsing Expression Grammars. As such, the parsing and editor code generated by XText, as you might have guessed, will not work.
In order to get your language working in IntelliJ you will need to:
Create grammar *.bnf file
Generate lexer *.flex file, possibly tweak it and then run JFlex generator
Implement helper classes to provide, among others, file recognition via file extension, syntax highlighting, color settings page, folding, etc.
The *.flex file is generated from the bnf. Luckily, most of the classes in step 3 follow a very similar structure so they can be easily generated (more on that later). So basically, if you manage to generate the *.bnf file, you are 80% there.
Although from different technologies, the syntax of bnf files is very similar to XText files. I recently migrated some antlr grammars to IntelliJ's bnf and I had to do very small changes. Thus, it should be possible to autogenerate the bnf files from your XText ones.
That brings me back to point 3. Using XTend, Epsilon's EGL, or similar, it would be easy to generate all the boiler plate classes. As part of the migration I mentioned before I also did this. I am in the process of making the code public, so I will post it here when done and add some details.

"Human-readable" ANTLR-generated code?

I've been learning ANTLR for a few days now. My goal in learning it was that I would be able to generate parsers and lexers, and then personally hand-translate them from Java into my target language (neither C/C++/Java/C#/Python, no tool has support for it). I chose ANTLR because from its About page: ANTLR is widely used because it's easy to understand, powerful, flexible, generates human-readable output[...]
In learning this tool, I decided to start with a simple lexer for a simple grammar: JSON. However, once I generated the .java file for this lexer using ANTLR4 I was caught widely off-guard. I got a huge mess of far-from-human-readable serialized code, followed by:
public static final ATN _ATN =
ATNSimulator.deserialize(_serializedATN.toCharArray());
static {
_decisionToDFA = new DFA[_ATN.getNumberOfDecisions()];
}
A few Google searches were unable to provide me a way to disable this behavior.
Is there a way to disable this behavior and produce human-readable code, or am I going to have to hand-write my lexers and parsers for this target programming language?
ANTLR 4 uses a new algorithm for prediction. Terence Parr is currently working on a tech report describing the algorithm in detail. The human-readable output refers to the generated parsers.
ANTLR 4 lexers use a DFA recognizer for a massive speed and memory usage improvement over previous releases of ANTLR. For parsers, the _ATN field is a data structure used within calls to adaptivePredict (you'll notice lines in the generated code calling that method).
You won't be able to manually translate the generated Java code of an ANTLR 4 lexer to another programming language. You might be able to manually translate the code of a generated parser provided the grammar is strictly LL(1) (i.e. the generated code does not contain any calls to adaptivePredict). However, you will lose the error recovery ability that draws from information encoded in the serialized ATN.

Purpose of antlr in xtext

I'm new to Xtext and wondering what's the purpose of antlr is in xtext. As I've understand so far, antlr generate a parser based on the grammar and the parser then deal with the text models. Right?
And what about the other generated stuff like the editor or the ecore. Are there other components behind xtext which generate them?
Xtext needs a parser generator to produce a parser for the language you define. They could have built one of their own. They chose to use ANTLR instead.
I don't know what other third party machinery they might have chosen to use.
I've been hacking one Xtext based plugin and from what I saw I think it works like this:
Xtext has it's own BNF syntax, which is very similar to ANTLR one. In fact its it's subset.
Xtext takes your grammar, and generates the ANTLR one from it(.g file). The generated ANTLR grammar adds specific actions to your BNF rules. The actions code interacts with the Xtext runtime and (maybe) with the Eclipse itself. The .g file is processed using some older version of ANTLR and .java file is generated. This file is then compiled.

Is there a way to use one ANTLR grammar that targets multiple languages?

I am developing a language service in Visual Studio using an ANTLR grammar for a custom language. However, the grammar is filled with C++ code to handle preprocessor directives and for efficiency reasons for the compiler.
Language services for Visual Studio are a pain to write in C++, so I need a C# parser for the same language. That means I have to set language=CSharp2 and strip all the C++ code from the grammar.
I am thinking of writing a little exporter that strips away all the C++ code from the grammar, and converts simple statements like { $channel = HIDDEN; } to { $channel = TokenChannels.Hidden; }.
Is there a more clever method to do this? Like through templates, or little tricks to embed both languages in the grammar?
I'd break the problem up into two phases using an AST. Have your parser in a target language neutral grammar (that produces an AST) and use the -target option to the Antlr Tool to generate the parser in the target language of your choice (C++, C#, Java, etc).
Then implement AST walkers in the target language with your actions. The benefit of this is that once you get one AST walker finished you can copy it and just change the actions for another target language.

Does anyone know of a way to debug tree grammars in ANTLRWorks

The recommended pattern for ANTLR usage is to have the Parser construct an Abstract Syntax Tree, and then build Tree walkers (AKA tree grammars) to process them.
I'm trying to get to the bottom of why my tree grammar isn't working and would love to use ANTLRWorks' debugger the same way I used it for the parser itself. The input to the parser is the "source code", but the input to a tree parser is the AST result of the parser. I don't see how to make that available as input to test the tree grammar.
It's not clear that there is a way to test a tree grammar in ANTLRWorks. If it can be done, a pointer in the right direction would really be appreciated.
The ANTLRWorks debugger should work fine with your tree grammar. If I recall correctly, you need to use the ANTLR code generation tool with the "-debug" flag (I'm using the Java target), then, where you create your tree parser instance, use the debug constructor that takes a port as an argument. In my case, the default port didn't work, so I arbitrarily picked 35505.
Fire up ANTLRWorks, open your tree grammar, click "Run"->"Debug Remote...", set the port to the same value used in the constructor for your tree parser, and you should be able to connect the debugger to your running application. See the ANTLR 3 Debugging FAQ for details.
[Update] Assuming you're using the Java target (let us know if that's not the case), here's more detailed information on getting started:
When you're testing your non-tree parser in ANTLRWorks, there's a behind-the-scenes process that generates Java code from your grammar file, then uses that code to parse your input. When you use your parser in your own application, you have to use ANTLR (specifically, the class org.antlr.Tool) to generate Java code that you can then include in your application. ANTLRWorks has a menu option for this, which should get you started. In my case, I have a target in my ant build file that generates the Java code from my grammars and puts those Java source files in a place where the rest of my application can find them. My ant target looks something like this:
<java classpath="${antlr.tool.classpath}" classname="org.antlr.Tool" failonerror="true">
<arg value="-o" />
<arg value="${antlr.out.dir}" />
<arg value="${grammar.dir}/GrammarName.g" />
</java>
The property antlr.tool.classpath needs to contain stringtemplate.jar and antlr.jar, and antlr.out.dir needs to point to the directory where you want the generated source code to go (e.g., build/antlr/src/org/myorg/antlr/parser, if your parser grammars specify the package org.myorg.antlr.parser).
Then, when you compile the rest of your application, you can use something like:
<javac destdir="${build.classes.dir}" debug="on" optimize="on" deprecation="${javac.deprecation}" source="${javac.source}" target="${javac.target}">
<classpath refid="stdclasspath"/>
<src path="${src.dir}" />
<src path="${antlr.src.dir}" />
</javac>
Here, we compile our application sources (in src.dir) along with the generated ANTLR code (in antlr.src.dir, which in this example would be build/antlr/src).
As far as using the generated code in your application (i.e., outside ANTLRWorks), you'll need to do something like:
String sourceText = "a + b = foo";
ANTLRStringStream inStream = new ANTLRStringStream(sourceText);
// your generated lexer class
MyLexer lexer = new MyLexer(inStream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
// your generated parser class
MyParser parser = new MyParser(tokens);
// run the toplevel rule (in this case, `program`)
MyParser.program_return prog = parser.program();
// get the resulting AST (a CommonTree instance, in this case)
CommonTree tree = (CommonTree) prog.getTree();
// run a tree parser rule on the AST
MyTreeParser treeParser = new MyTreeParser(new CommonTreeNodeStream(tree));
treeParser.program();
I strongly recommend getting a copy of The Definitive ANTLR Reference if you're going to be using ANTLR. All of this is covered pretty thoroughly, with plenty of examples to get you started.
There is a way to use AntlrWorks:
Write your grammar in AntlrWorks
Generate its code (this is the same as running Antlr from the commandline w/o debug)
Write yourself a stub similar to what was suggested on the Debugging with AntlrWorks faq
Write your tree grammar
Select debug Antlrworks (this is the same as running Antlr from the commandline with the debug flag.
Run the stub program. The program will block until antlrworks is connected, so you can debug the tree grammar
Go back to antlrworks that has your tree grammar open, and Debug Remote
Solve issues.... :)
If you're sure that the AST you're building is fine (with ANTLRWORKS debugger), the tree-walking testing is not different than testing any other app. If you're emitting Java Code for example, use Eclipse's debugger to test it, or plain log messages...