Antlr - How to generate exact input file to the output? (source-to-source transformation) - antlr

Let's say I have a source code file. I want to give this file to the ANTLR and generate the same code and save it to an output file.
Usage:
To beautify the input file.
To add some comments to the input file.
To inject some code into the input file.
Is it possible to do such a thing by exploiting ANTLR?
Basically, I am trying to do a source-to-source transformation with ANTLR from C/C++ to C/C++.
I am interested to add, delete, replace, or modify some lines of the code and generate an output that complies with C/C++ language rules.
P.S.: Please let me know if you know any other tool (other than CLANG) that does the same thing. Parsing C/C++ (or even Fortran) and providing some events to the user and let the user modify the source code.

Related

Xtext based language within Intellij Idea

I want to make a plugin for a language for the Intellij Idea IDE. The language has been developped using Eclipse Xtext and is open source. A plugin already exists for Eclipse.
My goal is to port this language to Intellij Idea. I want to be able to use Intellij to create source files, to have the specific syntax highlighting and to be able to compile and run programs written with this language.
Is there a simple way to generate the Intellij Idea plugin using the Xtext project?
If not is there an efficient solution to be able to have the specific syntax highlighting in Intellij? (an automatic way if possible, I would prefer not rewriting everything everytime the Xtext project is updated)
Short answer
Yes, with a bit of work.
Long Answer
Sadly, Xtext uses antlr in the background and IntelliJ use their own grammar kit based on Parsing Expression Grammars. As such, the parsing and editor code generated by XText, as you might have guessed, will not work.
In order to get your language working in IntelliJ you will need to:
Create grammar *.bnf file
Generate lexer *.flex file, possibly tweak it and then run JFlex generator
Implement helper classes to provide, among others, file recognition via file extension, syntax highlighting, color settings page, folding, etc.
The *.flex file is generated from the bnf. Luckily, most of the classes in step 3 follow a very similar structure so they can be easily generated (more on that later). So basically, if you manage to generate the *.bnf file, you are 80% there.
Although from different technologies, the syntax of bnf files is very similar to XText files. I recently migrated some antlr grammars to IntelliJ's bnf and I had to do very small changes. Thus, it should be possible to autogenerate the bnf files from your XText ones.
That brings me back to point 3. Using XTend, Epsilon's EGL, or similar, it would be easy to generate all the boiler plate classes. As part of the migration I mentioned before I also did this. I am in the process of making the code public, so I will post it here when done and add some details.

antlr - generate grammar from java source code

I am wondering if I can generate ANTLR grammar from java source code. I want to do some kind of research project, but I am just exploring different open sources to see which one is best.
For ANTLR, do I always have to write a grammar and pass it to the ANTLR?
Is there a way to generate grammar from an existing Java source code?
Not easily. ANTLR generate a recursive descent parser from your grammar, encoding the tests into procedural code, as well as lots of other bookkeeping stuff.
Knowing how the code is generated, you might be able to take it apart but you'll have to reach into the middle of generated statements and that isn't easy without a full parser for the generated language. (Hint: regex won't work).
I don't see a lot of point of this exercise. Why don't you just use the original grammar?

Generate source code from AST with Antlr4 and StringTemplates

If I have an AST and modify it, can I use StringTemplates to generate the source code for the modified AST?
I have successfully implemented my grammar for Antlr4. It generates the AST of a source code and I use the Visitor Class to perform the desired actions. I then modify something in the AST and I would like to generate the source code for that modified AST. (I believe it is called pretty-printing?).
Does Antlr's built in StringTemplates have all the functionality to do this? Where should one start (practical advice is very welcome)?
You can walk the tree and use string templates (or even plain out string prints) to spit out text equivalents that to some extent reproduce the source text.
But you will find reproducing the source text in a realistic way harder to do than this suggests. If you want back code that the original programmer will not reject, you need to:
Preserve comments. I don't think ANTLR ASTs do this.
Generate layout that preserves the original indentation.
Preserve the radix, leading-zero count, and other "format" properties of literal values
Renerate strings with reasonable escapes
Doing all of this well is tricky. See my SO answer How to compile an AST back to source code for more details. (Weirdly, the ANTLR guy suggests not using an AST at all; I'm guessing this is because string templates only work on ANTLR parse trees whose structure ANTLR understands, vs. ASTs which are whatever you home-rolled.)
If you get all of this right, what you are likely to discover is that modifying the parse tree/AST is harder than it looks. For almost any interesting task on complex languages, you need information which is not trivial to extract from the tree (e.g., what is the meaning of this identifier?, where is this variable used?,...) I call this the problem of Life After Parsing. My main point is that it takes a lot of machinery to modify ASTs and regenerate code; be aware of the size of your project.

How (if possible) to use PostgreSQL's parser (in C) independently?

I need a parser (mainly for the "select" type of queries) and avoid the hassle of doing it from scratch. Does anybody know how to use the scan.l/gram.y of pgsql for this purpose? I've looked up pgpool too, but it seems similar. Currently, it might be very helpful if someone could give instructions to compile the parser (using the makefile provided maybe) without errors so that it can be supplied (valid?) queries and outputs the parse tree (in whatever form)!
You probably cannot take any file from postgres source tarball and compile it separately. Parser use internal OOP structures (implemented in C). But there is some possibility (not simple) - ecpg preprocessor try to transform PostgreSQL gram file to secondary gram file - and you can use same mechanism. It use a small utility parse.pl (it is part of PostgreSQL source code (src/postgresql/src/interfaces/ecpg/preproc))
PostgreSQL compiles the language parser using yacc. Presumably you could take the yacc files and create a compatible parser with very little effort. Note you must have flex and yacc installed to do this.
Note this is not taking a .c file from source and transplanting it into your system. All you are getting is the parser, not the planner or anything else.
Given the level of detail in the question no more detail can be possible. Perhaps you could start there and post another question when you get stuck.

ANTLR and content assist in Eclipse

I have a project in Eclipse where I have an editor for a custom language. I am using ANTLR to generate the compiler for it. What I need is to add content assist to the editor.
The input is a source code in the custom language, and the position of the character where the user requested content assist. The source code is most of time incomplete as the user can ask for content assist any time. What I need is to calculate the list of possible tokens that are valid for the given position.
It is possible to write a custom code to do the calculation, but that code would have to be manually kept in sync with the grammar. I figured the parser is doing something similar. It has to be able to determine at a given context what are the acceptable tokens. Is it possible to "reuse" that? What is the best practice in creating content assist anyway?
Thanks,
Balint
Have a look at Xtext. Xtext uses Antlr3 under the hood and provides content assist for the Antlr based languages. Have a look especially into package org.eclipse.xtext.ui.editor.contentassist.
You may consider to redefine your grammar with Xtext, which would provide the content assist out-of-the-box. It is not possible to reuse the Antlr grammar of a custom language.