composite grammars: accessing imported grammars scope's in action - antlr

Let's suppose I have two grammars (and that there is a Lexer defined somewhere), ParserA and ParserB.
In ParserA I have the following code:
parser grammar ParserA;
classDeclaration
scope {
ST mList;
}
...
ParserB is something like:
parser grammar ParserB;
import ParserA;
methodDeclaration : something something { $classDeclaration::mList.add(...) };
The code in the action will fail to compile (by javac) since classDeclaration is in a different class (and file). Any tips on how to fix it?

Any tips on how to fix it?
No, there's (AFAIK) no ANTLR shortcut here: there's no communication possible between imported grammars (either by using scopes or by providing parameters to imported grammar rules).

Related

Code indentor using ANTLR 4

I'am writing a code indentor using ANTLR4 and Java. I have successfully generated the lexer and the parser. And the approach i am using is to walk through the generated parse tree.
ParseTreeWalker mywalker = new ParseTreeWalker();
mywalker.walk(myListener, myTree);
The auto-generated *BaseListener has methods like below...
#Override public void enterEveryRule(ParserRuleContext ctx) { }
I'm very new to ANTLR. But, As I understand, I need to extend *BaseListener and override the relevant methods and write code to indent, So my question is what are the methods that I should be overriding for indenting the input code file? Or if there is an alternate approach I should take, please let me know.
Thanks!
None. You don't need a parser for this task and you are limiting yourself to valid code, when you require a parser (hence you cannot reformat code with a syntax error). Instead take the lexer and iterate over all tokens. Keep a state to know where you are (a block, a function, whatever) and indent according to that.

my lexer token action is not invoked

I use antlr4 with javascript target.
Here is a sample grammar:
P : T ;
T : [a-z]+ {console.log(this.text);} ;
start: P ;
When I run the generated parser, nothing is printed, although the input is matched. If I move the action to the token P, then it gets invoked. Why is that?
Actions are ignored in referenced rules. This was the original behavior of ANTLR 4, back when the lexer only supported a single action per token (and that action must appear at the end of the token).
Several releases later the limitation of one-action-per-rule was lifted, allowing any number of actions to be executed for a token. However, we found that many existing users relied on the original behavior, and wrote their grammars assuming that actions in referenced rules were ignored. Many of these grammars used complicated logic in these rules, so changing the behavior would be a severe breaking change that would prevent people from using new versions of ANTLR 4.
Rather than break so many existing ANTLR 4 lexers, we decided to preserve the original behavior and only execute actions that appear in the same rule as the matched token. Newer versions do allow you to place multiple actions in each rule though.
tl;dr: We considered allowing actions in other rules to execute, but decided not to because it would break a lot of grammars already written and used by people.
I found that #init and #after actions will override this default behavior.
Change the example code to:
grammar Test;
ALPHA : [a-z]+;
p : t ;
t
#init {
console.log(this.text);
}
#after {
console.log(this.text);
}
: ALPHA;
start: p ;
I changed parser rules to LOWER case as my Eclipse tool was complaining about the syntax otherwise. I also had to insert ALPHA for [a-z]+; for the same reason. The above text compiled, but I haven't tried running the generated parser. However, I am successfully working around this issue with #init/#after in my larger parser.
Hope this is helpful.

Is "Implicit token definition in parser rule" something to worry about?

I'm creating my first grammar with ANTLR and ANTLRWorks 2. I have mostly finished the grammar itself (it recognizes the code written in the described language and builds correct parse trees), but I haven't started anything beyond that.
What worries me is that every first occurrence of a token in a parser rule is underlined with a yellow squiggle saying "Implicit token definition in parser rule".
For example, in this rule, the 'var' has that squiggle:
variableDeclaration: 'var' IDENTIFIER ('=' expression)?;
How it looks exactly:
The odd thing is that ANTLR itself doesn't seem to mind these rules (when doing test rig test, I can't see any of these warning in the parser generator output, just something about incorrect Java version being installed on my machine), so it's just ANTLRWorks complaining.
Is it something to worry about or should I ignore these warnings? Should I declare all the tokens explicitly in lexer rules? Most exaples in the official bible The Defintive ANTLR Reference seem to be done exactly the way I write the code.
I highly recommend correcting all instances of this warning in code of any importance.
This warning was created (by me actually) to alert you to situations like the following:
shiftExpr : ID (('<<' | '>>') ID)?;
Since ANTLR 4 encourages action code be written in separate files in the target language instead of embedding them directly in the grammar, it's important to be able to distinguish between << and >>. If tokens were not explicitly created for these operators, they will be assigned arbitrary types and no named constants will be available for referencing them.
This warning also helps avoid the following problematic situations:
A parser rule contains a misspelled token reference. Without the warning, this could lead to silent creation of an additional token that may never be matched.
A parser rule contains an unintentional token reference, such as the following:
number : zero | INTEGER;
zero : '0'; // <-- this implicit definition causes 0 to get its own token
If you're writing lexer grammar which wouldn't be used across multiple parser grammmar(s) then you can ignore this warning shown by ANTLRWorks2.

Antlr undefined import in Antlrworks using composite grammars

I'm trying to use a composite grammar with Antlr 3.1 and Antlrworks 1.4.2. When I put the import statement in, it says 'undefined import'. I've tried a number of different combinations of lexer grammar and parser grammer but can't get it to generate the code. Am I missing something obvious? Am example is below.
grammar Tokens;
TOKEN : 'token';
grammar Parser;
import Tokens;//gives undefined import error
rule : TOKEN+;
I'm referencing the documentation from
http://www.antlr.org/wiki/display/ANTLR3/Composite+Grammars
Thanks
When separating lexer- and parser grammars, you need to explicitly define what type of grammar it is.
Try:
parser grammar Parser;
import Tokens;//gives undefined import error
rule : TOKEN+;
and:
lexer grammar Tokens;
TOKEN : 'token';
Note that from a combined grammar file Foo.g, the lexer and parser get a Parser and Lexer prefix by default: FooLexer.java and FooParser.java respectively. But in "explicit" grammars, the name of the .java file is that of the grammar itself: Parser.java and Tokens.java in your case. You might want to watch out calling a class Parser since that is the name of ANTLR's base parser class:
http://www.antlr.org/api/Java/classorg_1_1antlr_1_1runtime_1_1_parser.html
Also watch out to place the import statement below the options { ... } section, but before any tokens { ... } you may have defined, otherwise you might get strange errors.
Argghh It was something stupid. Antlrworks will underline the import and highlight all the tokens as undefined syntax errors but still allow you to generate the code if you try!
The reason it wasn't working the first time was the import was above the options as per Bart's suggestions.

Writing a TemplateLanguage/VewEngine

Aside from getting any real work done, I have an itch. My itch is to write a view engine that closely mimics a template system from another language (Template Toolkit/Perl). This is one of those if I had time/do it to learn something new kind of projects.
I've spent time looking at CoCo/R and ANTLR, and honestly, it makes my brain hurt, but some of CoCo/R is sinking in. Unfortunately, most of the examples are about creating a compiler that reads source code, but none seem to cover how to create a processor for templates.
Yes, those are the same thing, but I can't wrap my head around how to define the language for templates where most of the source is the html, rather than actual code being parsed and run.
Are there any good beginner resources out there for this kind of thing? I've taken a ganer at Spark, which didn't appear to have the grammar in the repo.
Maybe that is overkill, and one could just test-replace template syntax with c# in the file and compile it. http://msdn.microsoft.com/en-us/magazine/cc136756.aspx#S2
If you were in my shoes and weren't a language creating expert, where would you start?
The Spark grammar is implemented with a kind-of-fluent domain specific language.
It's declared in a few layers. The rules which recognize the html syntax are declared in MarkupGrammar.cs - those are based on grammar rules copied directly from the xml spec.
The markup rules refer to a limited subset of csharp syntax rules declared in CodeGrammar.cs - those are a subset because Spark only needs to recognize enough csharp to adjust single-quotes around strings to double-quotes, match curley braces, etc.
The individual rules themselves are of type ParseAction<TValue> delegate which accept a Position and return a ParseResult. The ParseResult is a simple class which contains the TValue data item parsed by the action and a new Position instance which has been advanced past the content which produced the TValue.
That isn't very useful on it's own until you introduce a small number of operators, as described in Parsing expression grammar, which can combine single parse actions to build very detailed and robust expressions about the shape of different syntax constructs.
The technique of using a delegate as a parse action came from a Luke H's blog post Monadic Parser Combinators using C# 3.0. I also wrote a post about Creating a Domain Specific Language for Parsing.
It's also entirely possible, if you like, to reference the Spark.dll assembly and inherit a class from the base CharGrammar to create an entirely new grammar for a particular syntax. It's probably the quickest way to start experimenting with this technique, and an example of that can be found in CharGrammarTester.cs.
Step 1. Use regular expressions (regexp substitution) to split your input template string to a token list, for example, split
hel<b>lo[if foo]bar is [bar].[else]baz[end]world</b>!
to
write('hel<b>lo')
if('foo')
write('bar is')
substitute('bar')
write('.')
else()
write('baz')
end()
write('world</b>!')
Step 2. Convert your token list to a syntax tree:
* Sequence
** Write
*** ('hel<b>lo')
** If
*** ('foo')
*** Sequence
**** Write
***** ('bar is')
**** Substitute
***** ('bar')
**** Write
***** ('.')
*** Write
**** ('baz')
** Write
*** ('world</b>!')
class Instruction {
}
class Write : Instruction {
string text;
}
class Substitute : Instruction {
string varname;
}
class Sequence : Instruction {
Instruction[] items;
}
class If : Instruction {
string condition;
Instruction then;
Instruction else;
}
Step 3. Write a recursive function (called the interpreter), which can walk your tree and execute the instructions there.
Another, alternative approach (instead of steps 1--3) if your language supports eval() (such as Perl, Python, Ruby): use a regexp substitution to convert the template to an eval()-able string in the host language, and run eval() to instantiate the template.
There are sooo many thing to do. But it does work for on simple GET statement plus a test. That's a start.
http://github.com/claco/tt.net/
In the end, I already had too much time in ANTLR to give loudejs' method a go. I wanted to spend a little more time on the whole process rather than the parser/lexer. Maybe in version 2 I can have a go at the Spark way when my brain understands things a little more.
Vici Parser (formerly known as LazyParser.NET) is an open-source tokenizer/template parser/expression parser which can help you get started.
If it's not what you're looking for, then you may get some ideas by looking at the source code.