Convert simple Antlr grammar to Xtext - antlr

I want to convert a very simple Antlr grammar to Xtext, so no syntactic predicates, no fancy features of Antlr not provided by Xtext. Consider this grammar
grammar simple; // Antlr3
foo: number+;
number: NUMBER;
NUMBER: '0'..'9'+;
and its Xtext counterpart
grammar Simple; // Xtext
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate Simple "http://www.example.org/Simple"
Foo: dummy=Number+;
Number: NUMBER_TOKEN;
terminal NUMBER_TOKEN: '0'..'9'+;
Xtext uses Antlr behind the scenes, but the two format are not exactly the same. There are quite a few annoying (and partly understandable) things I have to modify, including:
Prefix terminals with the terminal keyword
Include import "http://www.eclipse.org/emf/2002/Ecore" as ecore to make terminals work
Add a feature to the top-level rule, e.g. foo: dummy=number+
Keep in mind that rule and terminal names have to be unique even case-insensitive.
Optionally, capitalize the first letter of rule names to follow Java convention.
Is there a tool to make this conversion automatically at least for simple cases? If not, is there a more complete checklist of such required modifications?

It's basically not possible to do this conversion automatically since the Antlr grammar lacks information that is required in the Xtext grammar. The rule names in Xtext will be used to create classes from them. There are assignments in Xtext that will become getters and setters in those classes. However, these assignments should not be used for every rule call since there are special patterns in Xtext that allow to reduce the noise in the resulting AST. Stuff like that makes it hardly possible to do this transformation automatically. However, it's usually straight forward to copy the Antlr grammar into the Xtext editor and fix the issues manually.

Related

Is there a way to generate builder using the antlr4 grammar?

I understand that one could generate lexer and parser given the antl4 grammar but Is there a way to generate builder using the antlr4 grammar? That way client can use the builder to construct the possible structure specified in the grammar while the server can use the generated parser to parse the structure.
There is, yes. Such a sentence generator can walk the ATN and create sentences according to the grammar (see my antlr4-vscode extension of how this can be implemented). However, unless you have a very simple grammar with no recursions or iterations, you will probably not be able to generate a fixed set of sentences, since there are infinitive possible combinations.

antlr - generate grammar from java source code

I am wondering if I can generate ANTLR grammar from java source code. I want to do some kind of research project, but I am just exploring different open sources to see which one is best.
For ANTLR, do I always have to write a grammar and pass it to the ANTLR?
Is there a way to generate grammar from an existing Java source code?
Not easily. ANTLR generate a recursive descent parser from your grammar, encoding the tests into procedural code, as well as lots of other bookkeeping stuff.
Knowing how the code is generated, you might be able to take it apart but you'll have to reach into the middle of generated statements and that isn't easy without a full parser for the generated language. (Hint: regex won't work).
I don't see a lot of point of this exercise. Why don't you just use the original grammar?

"Human-readable" ANTLR-generated code?

I've been learning ANTLR for a few days now. My goal in learning it was that I would be able to generate parsers and lexers, and then personally hand-translate them from Java into my target language (neither C/C++/Java/C#/Python, no tool has support for it). I chose ANTLR because from its About page: ANTLR is widely used because it's easy to understand, powerful, flexible, generates human-readable output[...]
In learning this tool, I decided to start with a simple lexer for a simple grammar: JSON. However, once I generated the .java file for this lexer using ANTLR4 I was caught widely off-guard. I got a huge mess of far-from-human-readable serialized code, followed by:
public static final ATN _ATN =
ATNSimulator.deserialize(_serializedATN.toCharArray());
static {
_decisionToDFA = new DFA[_ATN.getNumberOfDecisions()];
}
A few Google searches were unable to provide me a way to disable this behavior.
Is there a way to disable this behavior and produce human-readable code, or am I going to have to hand-write my lexers and parsers for this target programming language?
ANTLR 4 uses a new algorithm for prediction. Terence Parr is currently working on a tech report describing the algorithm in detail. The human-readable output refers to the generated parsers.
ANTLR 4 lexers use a DFA recognizer for a massive speed and memory usage improvement over previous releases of ANTLR. For parsers, the _ATN field is a data structure used within calls to adaptivePredict (you'll notice lines in the generated code calling that method).
You won't be able to manually translate the generated Java code of an ANTLR 4 lexer to another programming language. You might be able to manually translate the code of a generated parser provided the grammar is strictly LL(1) (i.e. the generated code does not contain any calls to adaptivePredict). However, you will lose the error recovery ability that draws from information encoded in the serialized ATN.

Purpose of antlr in xtext

I'm new to Xtext and wondering what's the purpose of antlr is in xtext. As I've understand so far, antlr generate a parser based on the grammar and the parser then deal with the text models. Right?
And what about the other generated stuff like the editor or the ecore. Are there other components behind xtext which generate them?
Xtext needs a parser generator to produce a parser for the language you define. They could have built one of their own. They chose to use ANTLR instead.
I don't know what other third party machinery they might have chosen to use.
I've been hacking one Xtext based plugin and from what I saw I think it works like this:
Xtext has it's own BNF syntax, which is very similar to ANTLR one. In fact its it's subset.
Xtext takes your grammar, and generates the ANTLR one from it(.g file). The generated ANTLR grammar adds specific actions to your BNF rules. The actions code interacts with the Xtext runtime and (maybe) with the Eclipse itself. The .g file is processed using some older version of ANTLR and .java file is generated. This file is then compiled.

Source for parsing C grammar using JavaCC

As an project assignment, I need to parse a plain-C grammar from Java to generate AST output. As a startup, I am using the file c.jj that I have found among grammar files at
http://java.net/projects/javacc/sources/svn/
but I found that it only has syntactic and lexical actions and no real semantics for parsing C source. Is there some other source that incorporate typedef, variables, construct functions, include files?
You could go looking for a complete grammar. Will you learn much this way?
You could ask your lecturer which would impress them more: implementing some small subset of C grammar by writing your own rules, or by searching google for alternative complete rules?
I trust writing your own rules - and even your own hand-crafted parser - will be more a more useful exercise. Even if its only parsing expressions.