Simple programmatic generation and compilation of grammar in ANTLR? - antlr

I want to programmatically take a grammar in the form a String and generate the Java for it as a String or Strings. I want to do this all in memory, no files involved. I took a look at org.antlr.Tool source but I was hoping there would be some simpler way to do what I want rather than rewrite Tool without files. Does something already exist?

Does something already exist?
No, not AFAIK. Not in ANTLR's public API, nor some existing 3rd party tool that can do this.

Related

Generate source code from AST with Antlr4 and StringTemplates

If I have an AST and modify it, can I use StringTemplates to generate the source code for the modified AST?
I have successfully implemented my grammar for Antlr4. It generates the AST of a source code and I use the Visitor Class to perform the desired actions. I then modify something in the AST and I would like to generate the source code for that modified AST. (I believe it is called pretty-printing?).
Does Antlr's built in StringTemplates have all the functionality to do this? Where should one start (practical advice is very welcome)?
You can walk the tree and use string templates (or even plain out string prints) to spit out text equivalents that to some extent reproduce the source text.
But you will find reproducing the source text in a realistic way harder to do than this suggests. If you want back code that the original programmer will not reject, you need to:
Preserve comments. I don't think ANTLR ASTs do this.
Generate layout that preserves the original indentation.
Preserve the radix, leading-zero count, and other "format" properties of literal values
Renerate strings with reasonable escapes
Doing all of this well is tricky. See my SO answer How to compile an AST back to source code for more details. (Weirdly, the ANTLR guy suggests not using an AST at all; I'm guessing this is because string templates only work on ANTLR parse trees whose structure ANTLR understands, vs. ASTs which are whatever you home-rolled.)
If you get all of this right, what you are likely to discover is that modifying the parse tree/AST is harder than it looks. For almost any interesting task on complex languages, you need information which is not trivial to extract from the tree (e.g., what is the meaning of this identifier?, where is this variable used?,...) I call this the problem of Life After Parsing. My main point is that it takes a lot of machinery to modify ASTs and regenerate code; be aware of the size of your project.

How (if possible) to use PostgreSQL's parser (in C) independently?

I need a parser (mainly for the "select" type of queries) and avoid the hassle of doing it from scratch. Does anybody know how to use the scan.l/gram.y of pgsql for this purpose? I've looked up pgpool too, but it seems similar. Currently, it might be very helpful if someone could give instructions to compile the parser (using the makefile provided maybe) without errors so that it can be supplied (valid?) queries and outputs the parse tree (in whatever form)!
You probably cannot take any file from postgres source tarball and compile it separately. Parser use internal OOP structures (implemented in C). But there is some possibility (not simple) - ecpg preprocessor try to transform PostgreSQL gram file to secondary gram file - and you can use same mechanism. It use a small utility parse.pl (it is part of PostgreSQL source code (src/postgresql/src/interfaces/ecpg/preproc))
PostgreSQL compiles the language parser using yacc. Presumably you could take the yacc files and create a compatible parser with very little effort. Note you must have flex and yacc installed to do this.
Note this is not taking a .c file from source and transplanting it into your system. All you are getting is the parser, not the planner or anything else.
Given the level of detail in the question no more detail can be possible. Perhaps you could start there and post another question when you get stuck.

Checkstyle DetailAST, StringLiteral

public void visitToken(DetailAST aAST) {}
I am trying to write a custom checkstyle rule. I am interested in the TokenTypes.STRING_LITERAL. The problem with this approach is, A string might be a concatenated string, StringBuffer, StringBuilder or could be within a method.
Bear with me, as I am a newbie to the Checkstyle coding.
How do I get a full string if it is concatenated. The aAST seems to be spitting them out as individual string literals.
Is there another way to grab a complete string?
Any pointers, greatly appreciated.
This is hard to do in Checkstyle, because Checkstyle works purely on the AST. It is no compiler, so it does not know about runtime types or syntactic meaning.
So, in order to do this using Checkstyle, you would have to analyze the AST manually and build your concatenated String by hand. If parts of the String are generated by, say, static methods, or by using a StringBuilder/StringBuffer, then I would say the task of finding the complete String by AST analysis becomes virtually impossible.
Instead, you might want to look at other static code analysis tools which might be better suited to your task. FindBugs, for instance, works on the compiled code and is generally able to perform quite sophisticated checks. However, it takes more resources to run than Checkstyle, and on older machines you may not be able to have FindBugs run automatically on save in your IDE.

ANTLR and content assist in Eclipse

I have a project in Eclipse where I have an editor for a custom language. I am using ANTLR to generate the compiler for it. What I need is to add content assist to the editor.
The input is a source code in the custom language, and the position of the character where the user requested content assist. The source code is most of time incomplete as the user can ask for content assist any time. What I need is to calculate the list of possible tokens that are valid for the given position.
It is possible to write a custom code to do the calculation, but that code would have to be manually kept in sync with the grammar. I figured the parser is doing something similar. It has to be able to determine at a given context what are the acceptable tokens. Is it possible to "reuse" that? What is the best practice in creating content assist anyway?
Thanks,
Balint
Have a look at Xtext. Xtext uses Antlr3 under the hood and provides content assist for the Antlr based languages. Have a look especially into package org.eclipse.xtext.ui.editor.contentassist.
You may consider to redefine your grammar with Xtext, which would provide the content assist out-of-the-box. It is not possible to reuse the Antlr grammar of a custom language.

Batch source-code aware spell check

What is a tool or technique that can be used to perform spell checks upon a whole source code base and its associated resource files?
The spell check should be source code aware meaning that it would stick to checking string literals in the code and not the code itself. Bonus points if the spell checker understands common resource file formats, for example text files containing name-value pairs (only check the values). Super-bonus points if you can tell it which parts of an XML DTD or Schema should be checked and which should be ignored.
Many IDEs can do this for the file you are currently working with. The difference in what I am looking for is something that can operate upon a whole source code base at once.
Something like a Findbugs or PMD type tool for mis-spellings would be ideal.
As you mentioned, many IDEs have this functionality already, and one such IDE is Eclipse. However, unlike many other IDEs Eclipse is:
A) open source
B) designed to be programmable
For instance, here's an article on using Eclipse's code formatting functionality from the command line:
http://www.peterfriese.de/formatting-your-code-using-the-eclipse-code-formatter/
In theory, you should be able to do something similar with it's spell-checking mechanism. I know this isn't exactly what you're looking for, and if there is a program for doing spell-checking in code then obviously that'd be better, but if not then Eclipse may be the next best thing.
This seems little old but seems to do a good job
Source Code Spell Checker