I understand that one could generate lexer and parser given the antl4 grammar but Is there a way to generate builder using the antlr4 grammar? That way client can use the builder to construct the possible structure specified in the grammar while the server can use the generated parser to parse the structure.
There is, yes. Such a sentence generator can walk the ATN and create sentences according to the grammar (see my antlr4-vscode extension of how this can be implemented). However, unless you have a very simple grammar with no recursions or iterations, you will probably not be able to generate a fixed set of sentences, since there are infinitive possible combinations.
Related
I am wondering if I can generate ANTLR grammar from java source code. I want to do some kind of research project, but I am just exploring different open sources to see which one is best.
For ANTLR, do I always have to write a grammar and pass it to the ANTLR?
Is there a way to generate grammar from an existing Java source code?
Not easily. ANTLR generate a recursive descent parser from your grammar, encoding the tests into procedural code, as well as lots of other bookkeeping stuff.
Knowing how the code is generated, you might be able to take it apart but you'll have to reach into the middle of generated statements and that isn't easy without a full parser for the generated language. (Hint: regex won't work).
I don't see a lot of point of this exercise. Why don't you just use the original grammar?
If I have an AST and modify it, can I use StringTemplates to generate the source code for the modified AST?
I have successfully implemented my grammar for Antlr4. It generates the AST of a source code and I use the Visitor Class to perform the desired actions. I then modify something in the AST and I would like to generate the source code for that modified AST. (I believe it is called pretty-printing?).
Does Antlr's built in StringTemplates have all the functionality to do this? Where should one start (practical advice is very welcome)?
You can walk the tree and use string templates (or even plain out string prints) to spit out text equivalents that to some extent reproduce the source text.
But you will find reproducing the source text in a realistic way harder to do than this suggests. If you want back code that the original programmer will not reject, you need to:
Preserve comments. I don't think ANTLR ASTs do this.
Generate layout that preserves the original indentation.
Preserve the radix, leading-zero count, and other "format" properties of literal values
Renerate strings with reasonable escapes
Doing all of this well is tricky. See my SO answer How to compile an AST back to source code for more details. (Weirdly, the ANTLR guy suggests not using an AST at all; I'm guessing this is because string templates only work on ANTLR parse trees whose structure ANTLR understands, vs. ASTs which are whatever you home-rolled.)
If you get all of this right, what you are likely to discover is that modifying the parse tree/AST is harder than it looks. For almost any interesting task on complex languages, you need information which is not trivial to extract from the tree (e.g., what is the meaning of this identifier?, where is this variable used?,...) I call this the problem of Life After Parsing. My main point is that it takes a lot of machinery to modify ASTs and regenerate code; be aware of the size of your project.
Where do we start to manually build a CST from scratch? Or does ANTLR4 always require the lex/parse process as our input step?
I have some visual elements in my program that represent code structures.
e.g. a square represents a class, while a circle embedded within that square represents a method.
Now I want to turn those into code. How do I use ANTLR4 to do this, at runtime (using ANTLR4.js)? Most of the ANTLR examples seem to rely on lexing and parsing existing code to get to a syntax tree. So rather than:
input code->lex->parse->syntax tree->output code (1)
I want
manually create syntax tree->output code (2)
(Later, as the user adds code to that class and its methods, then ANTLR will be used as in (1).)
EDIT Maybe I'm misunderstanding this. Do I create some custom data structure and then run the parser over it? i.e. write structures to some in-memory format->parse->output code (3)?
IIUC, you could use StringTemplate directly.
By, way of background, Antlr itself builds an in-memory parse-tree and then walks it, incrementally calling StringTemplate to output code snippets qualified by corresponding parse-tree node data. That Antlr uses an internal parse-tree is just a convenience for simplifying walking (since Antlr is built using Antlr).
If you have your own data structure, regardless of its specific implementation, procedurally process it to progressively call ST templates to emit the corresponding code. And, you can directly use the same templates that Antlr uses (JavaScript.stg), if they meet your requirements.
Of course, if your data structure is of a nature that can be lex'd/parsed into a standard Antlr parse-tree, you can then use a standard Antlr visitor to call and populate node-specific templates.
I've been learning ANTLR for a few days now. My goal in learning it was that I would be able to generate parsers and lexers, and then personally hand-translate them from Java into my target language (neither C/C++/Java/C#/Python, no tool has support for it). I chose ANTLR because from its About page: ANTLR is widely used because it's easy to understand, powerful, flexible, generates human-readable output[...]
In learning this tool, I decided to start with a simple lexer for a simple grammar: JSON. However, once I generated the .java file for this lexer using ANTLR4 I was caught widely off-guard. I got a huge mess of far-from-human-readable serialized code, followed by:
public static final ATN _ATN =
ATNSimulator.deserialize(_serializedATN.toCharArray());
static {
_decisionToDFA = new DFA[_ATN.getNumberOfDecisions()];
}
A few Google searches were unable to provide me a way to disable this behavior.
Is there a way to disable this behavior and produce human-readable code, or am I going to have to hand-write my lexers and parsers for this target programming language?
ANTLR 4 uses a new algorithm for prediction. Terence Parr is currently working on a tech report describing the algorithm in detail. The human-readable output refers to the generated parsers.
ANTLR 4 lexers use a DFA recognizer for a massive speed and memory usage improvement over previous releases of ANTLR. For parsers, the _ATN field is a data structure used within calls to adaptivePredict (you'll notice lines in the generated code calling that method).
You won't be able to manually translate the generated Java code of an ANTLR 4 lexer to another programming language. You might be able to manually translate the code of a generated parser provided the grammar is strictly LL(1) (i.e. the generated code does not contain any calls to adaptivePredict). However, you will lose the error recovery ability that draws from information encoded in the serialized ATN.
As an project assignment, I need to parse a plain-C grammar from Java to generate AST output. As a startup, I am using the file c.jj that I have found among grammar files at
http://java.net/projects/javacc/sources/svn/
but I found that it only has syntactic and lexical actions and no real semantics for parsing C source. Is there some other source that incorporate typedef, variables, construct functions, include files?
You could go looking for a complete grammar. Will you learn much this way?
You could ask your lecturer which would impress them more: implementing some small subset of C grammar by writing your own rules, or by searching google for alternative complete rules?
I trust writing your own rules - and even your own hand-crafted parser - will be more a more useful exercise. Even if its only parsing expressions.