I am trying to implement language support plugin for a basic language. I am following jetbrains's tutorial for simple language support (.properties file support basically) and on the side I have rust plugin for reference. However the complexity gap between them is huge so some questions are hard to find answers to from both sources.
Here is my question: What is the best way to allow spaces between tokens which DO NOT require spaces, however force spacing between tokens which do?
i.e. class Foo{ <- here first space is mandatory, but the second one (before '{' symbol) can be ommited.
Related
I want to make a plugin for a language for the Intellij Idea IDE. The language has been developped using Eclipse Xtext and is open source. A plugin already exists for Eclipse.
My goal is to port this language to Intellij Idea. I want to be able to use Intellij to create source files, to have the specific syntax highlighting and to be able to compile and run programs written with this language.
Is there a simple way to generate the Intellij Idea plugin using the Xtext project?
If not is there an efficient solution to be able to have the specific syntax highlighting in Intellij? (an automatic way if possible, I would prefer not rewriting everything everytime the Xtext project is updated)
Short answer
Yes, with a bit of work.
Long Answer
Sadly, Xtext uses antlr in the background and IntelliJ use their own grammar kit based on Parsing Expression Grammars. As such, the parsing and editor code generated by XText, as you might have guessed, will not work.
In order to get your language working in IntelliJ you will need to:
Create grammar *.bnf file
Generate lexer *.flex file, possibly tweak it and then run JFlex generator
Implement helper classes to provide, among others, file recognition via file extension, syntax highlighting, color settings page, folding, etc.
The *.flex file is generated from the bnf. Luckily, most of the classes in step 3 follow a very similar structure so they can be easily generated (more on that later). So basically, if you manage to generate the *.bnf file, you are 80% there.
Although from different technologies, the syntax of bnf files is very similar to XText files. I recently migrated some antlr grammars to IntelliJ's bnf and I had to do very small changes. Thus, it should be possible to autogenerate the bnf files from your XText ones.
That brings me back to point 3. Using XTend, Epsilon's EGL, or similar, it would be easy to generate all the boiler plate classes. As part of the migration I mentioned before I also did this. I am in the process of making the code public, so I will post it here when done and add some details.
This post about the antlr simple example shows how to create and us a grammar for java.
However, this intermixes the grammar and the Java source code in the Exp.g source.
My Question is, Is it possible to decouple the grammar file from the target language, so that the one grammar file can be used for generating multiple Java, Scala, C++, etc Lexers/Parsers?
It depends mostly on the reason why target code is used in the grammar. Is it only action code to do something with the found tokens (e.g. building a symbol table or alternative tree representation) then is indeed no problem do remove such native code and do the processing afterwards (using a parse tree walker or visitor).
However, predicates are a different. They are used to guide the parser and also require native code. What you can do is to move all the native code into a base class from which your generated parser derives. You then only need to re-write this base class in your target language and keep the grammar mostly free of native code (except for a single function call, which invokes the native code).
This approach has the advantage that no additional library reference is necessary (#include in C/C++, import in other languages), which also is native code preventing use for multiple targets.
I am creating a custom language plugin for IntelliJ.
I would like it to be possible for a file in the new language to contain fragments of text in other languages.
The specific languages I would like to support are HTML, JS, CSS, and SQL.
I would also like to support other custom languages (i.e. languages I would define the syntax for).
The main feature I want is syntax coloring, but if I can get stuff like "go to declaration" and refactoring out of the box then all the better.
My last requirement is that it would be possible to use my own code to tell IntelliJ which language a fragment contains; fragments containing different languages will not be distinguishable at the lexer / parser level.
In short, I would like to implement something similar to what PhpStorm does when it detects, say, SQL inside a string:
I looked at IntelliJ's source code and found the ILazyParseableElementType interface which seemed relevant, but I'm not sure if this is the way to go (and if so - how to use it in my code exactly...)
Any pointers would be highly appreciated...
The Intellij feature/terminology you are looking for is Language Injections.
Here is a github PR which implements a very similar feature for JFLex files.
In short you need to implement a LanguageInjector and add it to your plugin.xml as a <languageInjector implementation="YourImplClass">.
I've been learning ANTLR for a few days now. My goal in learning it was that I would be able to generate parsers and lexers, and then personally hand-translate them from Java into my target language (neither C/C++/Java/C#/Python, no tool has support for it). I chose ANTLR because from its About page: ANTLR is widely used because it's easy to understand, powerful, flexible, generates human-readable output[...]
In learning this tool, I decided to start with a simple lexer for a simple grammar: JSON. However, once I generated the .java file for this lexer using ANTLR4 I was caught widely off-guard. I got a huge mess of far-from-human-readable serialized code, followed by:
public static final ATN _ATN =
ATNSimulator.deserialize(_serializedATN.toCharArray());
static {
_decisionToDFA = new DFA[_ATN.getNumberOfDecisions()];
}
A few Google searches were unable to provide me a way to disable this behavior.
Is there a way to disable this behavior and produce human-readable code, or am I going to have to hand-write my lexers and parsers for this target programming language?
ANTLR 4 uses a new algorithm for prediction. Terence Parr is currently working on a tech report describing the algorithm in detail. The human-readable output refers to the generated parsers.
ANTLR 4 lexers use a DFA recognizer for a massive speed and memory usage improvement over previous releases of ANTLR. For parsers, the _ATN field is a data structure used within calls to adaptivePredict (you'll notice lines in the generated code calling that method).
You won't be able to manually translate the generated Java code of an ANTLR 4 lexer to another programming language. You might be able to manually translate the code of a generated parser provided the grammar is strictly LL(1) (i.e. the generated code does not contain any calls to adaptivePredict). However, you will lose the error recovery ability that draws from information encoded in the serialized ATN.
In the past I've used Doxygen for C and C++, but now I've been thrown on Fortran project and I would like to get a quick all encompassing look at the architecture.
In the past I've found reverse engineering tools to be useful where no documentation of the architecture exists.
So, is there a tool out there that will reverse engineer Fortran code?
I tried to use Doxygen, but didn't have any luck. I will be working with two different projects - one Fortran 90 and I think is in Fortran 77.
Thanks for any insights and feedback.
Tools which may help with reverse engineering:
SciTools Understand
Link with some more tools (search "fortran")
Also, maybe some of these unit testing frameworks will be helpful (I haven't used them, so I cannot comment on the pros and cons of any of them):
FUnit
FRUIT
Ftnunit
(these links link to fortranwiki, where you can find a tidbit on every one of them, and from there there are links to their home sites).
Doxygen 1.6.1 will generate documentation, call graphs, etc. for Fortran source code in free-format (F90) format. You are out of luck for auto-documenting fixed-format (F77) code with doxygen.
All is not lost, however. The conversion from fixed to free format is straightforward and can be automated to a great degree - change comment characters to '!', change continuation characters to '&', and append '&' to lines to be continued. In fact, if the appended continuation character is placed in column 73, it should be ignored by standard F77 compilers (which still only recognize code in columns 1 through 72) but will be recognized by F9x/F2003/F2008 compilers. This allows the same code to be recognized as both in fixed and free format, which lets you gracefully migrate from one format to the other.
Conveniently, there are about a thousand small programs that will do this format adjustment to some degree or another. Realistically, if you're going to be maintaining the code, you might as well move it away from the 1928 spec for Hollerith (IBM) punched cards. :)