Generate only a Lexer from Antlr - antlr

I am attempting to use Antlr to tokenize and classify the tokens of an input stream. Does anyone know of a way to generate only a Lexer from Antlr using a grammar with only Lexer rules?

You can specify the type of grammar you want using the grammar title line.
grammar MyGrammar;
for combined grammars.
lexer grammar MyLexer;
for a lexer grammar (etc.). Of course in a pure lexer grammar you may only use lexer rules.

You can basically generate a parser and extend the listener class then inside every exitMethod() push the tokens into a stack.
You can't generate only a lexer. If you are not familiar with ANTLR 4 grammar or the steps required to generate a parser, I advise you to spend 10 mins in reading this book "The definitive of ANTLR 4".

Related

Is there a way to generate builder using the antlr4 grammar?

I understand that one could generate lexer and parser given the antl4 grammar but Is there a way to generate builder using the antlr4 grammar? That way client can use the builder to construct the possible structure specified in the grammar while the server can use the generated parser to parse the structure.
There is, yes. Such a sentence generator can walk the ATN and create sentences according to the grammar (see my antlr4-vscode extension of how this can be implemented). However, unless you have a very simple grammar with no recursions or iterations, you will probably not be able to generate a fixed set of sentences, since there are infinitive possible combinations.

Difference in grammar parsed by JavaCC and ANTLR

JavaCC and Antlr wikipedia page says that both the parser generator works for grammar written in EBNF format. Does it mean that any grammar that can be parsed by JavaCC can also be parsed by ANTLR without modifying the grammar structure? If yes, why do we have a grammar repository for ANTLR (https://github.com/antlr/grammars-v4) ? My current understanding is that some differences do exist between the grammars parsed by JavaCC and ANTLR. Can someone please point out the differences.

Purpose of antlr in xtext

I'm new to Xtext and wondering what's the purpose of antlr is in xtext. As I've understand so far, antlr generate a parser based on the grammar and the parser then deal with the text models. Right?
And what about the other generated stuff like the editor or the ecore. Are there other components behind xtext which generate them?
Xtext needs a parser generator to produce a parser for the language you define. They could have built one of their own. They chose to use ANTLR instead.
I don't know what other third party machinery they might have chosen to use.
I've been hacking one Xtext based plugin and from what I saw I think it works like this:
Xtext has it's own BNF syntax, which is very similar to ANTLR one. In fact its it's subset.
Xtext takes your grammar, and generates the ANTLR one from it(.g file). The generated ANTLR grammar adds specific actions to your BNF rules. The actions code interacts with the Xtext runtime and (maybe) with the Eclipse itself. The .g file is processed using some older version of ANTLR and .java file is generated. This file is then compiled.

Pretty print ANTLR grammar

Some tools output an Antlr grammar in a human-unreadable form, at least with ugly placing of parens and indentation. I'd like to transform the grammar into a more readable (standard?) form. The only reference I found is ANTLR pretty printer which is quite old, and looking at its source, it seems to be removing parts of a grammar rather than pretty print it.
How can I format/pretty print a grammar file?
I know of no tool that does this. The one you mentioned, prettyPrinter, is written in - and seems to handle only - ANTLR v2.x grammars, making it unsuitable for v3 grammars.
If you're going to write your own, I'd recommend using the grammar of ANTLR v3 itself to parse a .g grammar file and emit it in a readable form. Terence Parr has posted the grammar here: http://www.antlr.org/grammar/ANTLR
I just installed an Antlr plugin for Eclipse. It can do a lot more than syntax highlight and code formatting...

Convert simple Antlr grammar to Xtext

I want to convert a very simple Antlr grammar to Xtext, so no syntactic predicates, no fancy features of Antlr not provided by Xtext. Consider this grammar
grammar simple; // Antlr3
foo: number+;
number: NUMBER;
NUMBER: '0'..'9'+;
and its Xtext counterpart
grammar Simple; // Xtext
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate Simple "http://www.example.org/Simple"
Foo: dummy=Number+;
Number: NUMBER_TOKEN;
terminal NUMBER_TOKEN: '0'..'9'+;
Xtext uses Antlr behind the scenes, but the two format are not exactly the same. There are quite a few annoying (and partly understandable) things I have to modify, including:
Prefix terminals with the terminal keyword
Include import "http://www.eclipse.org/emf/2002/Ecore" as ecore to make terminals work
Add a feature to the top-level rule, e.g. foo: dummy=number+
Keep in mind that rule and terminal names have to be unique even case-insensitive.
Optionally, capitalize the first letter of rule names to follow Java convention.
Is there a tool to make this conversion automatically at least for simple cases? If not, is there a more complete checklist of such required modifications?
It's basically not possible to do this conversion automatically since the Antlr grammar lacks information that is required in the Xtext grammar. The rule names in Xtext will be used to create classes from them. There are assignments in Xtext that will become getters and setters in those classes. However, these assignments should not be used for every rule call since there are special patterns in Xtext that allow to reduce the noise in the resulting AST. Stuff like that makes it hardly possible to do this transformation automatically. However, it's usually straight forward to copy the Antlr grammar into the Xtext editor and fix the issues manually.