Like this file,in my opinion only declaration is enough.
Can anyone explain why rules are necessary in lexical analysis?
In my opinion they're only necessary in .y files...
By rule I'm talking about blocks like :
rdels {
if ($this->smarty->auto_literal) {
$this->token = Smarty_Internal_Templateparser::TP_OTHER;
} else {
$this->token = Smarty_Internal_Templateparser::TP_RDEL;
$this->yypopstate();
}
}
When to yypopstate,and yypushstate?
You enter states, when there can be ambiguous meanings to character input.
If the lexer encounters a " (quote), you might enter a state (yypushstate) called "string" in which any following character, which would otherwise have a special meaning (i.e. +, -, etc.) is considered part of the string. The "string" state is finished (yypopstate) when the lexer encounters another ".
In flex, these states are called start conditions.
Related
I am trying to learn kframework, and as an exercise I wanted to attempt to create a high-level language which compiles down to a scripting language for a video game. This high level language does no real execution, just compiles down to the scripting language with rewrite rules.
Example of the original scripting language syntax below
variables {
0: 'message'
}
init {
SetVariable("message", "Test message");
}
rule("press button") {
conditions {
IsButtonPressed(EventPlayer, INTERACT_KEY);
}
actions {
SendMessage(EventPlayer, GetVariable("message"))
}
}
I wanted my high-level language to allow proper variable declarations, so I could write something like this instead, and it would compile down to the script above.
init {
var message = "Test message";
}
rule("press button") {
conditions {
IsButtonPressed(EventPlayer, INTERACT_KEY);
}
actions {
SendMessage(EventPlayer, message)
}
}
I know how to make a simple rewrite rule to replace variable declarations var x = y with SetVariable("x", y), but how could I also append to the variable declaration block at the top?
I could very well be misunderstanding the capabilities of K, or how I am supposed to be going about doing this. Any help would be appreciated.
Typically the way you translate one input program into another output program in K is to have an output cell containing the output program as you construct it and to have a sequence of rules that iteratively removes statements and declarations from the input cell and adds them to the output cell in whatever modified form you are expecting. If you have a situation like this where you want to insert something out of order, the way it is typically done is to have a second cell containing a portion of the output program, and the rule that processes the variable declaration modifies two output cells. And then some rule will match later on and combine the outputs together. In this case, that rule would probably apply when the input program has been exhausted.
Here is roughly what that will look like in K:
rule <k> var X:Id = E:Expr => . ... </k>
<output> init { D:Declarations => append(D, SetVariable(Id2String(X), E)) } </output>
<variables> variables { D:Declarations => append(D, !Y:Int : Id2String(X)) } </variables>
rule <k> . </k>
<output> P:Program => append(P2, P) </output>
<variables> P2:Program => . </variables>
Note that you would have to write the list append functions yourself. If you really care about performance, you should probably use either the List sort or else append to the front of the cons list and then reverse it afterwards, but I simplified for the purposes of explanation.
We know Antlr4 is using the sync-and-return recovery mechanism. For example, I have the following simple grammar:
grammar Hello;
r : prefix body ;
prefix: 'hello' ':';
body: INT ID ;
INT: [0-9]+ ;
ID : [a-z]+ ;
WS : [ \t\r\n]+ -> skip ;
I use the following listener to grab the input:
public class HelloLoader extends HelloBaseListener {
String input;
public void exitR(HelloParser.RContext ctx) {
input = ctx.getText();
}
}
The main method in my HelloRunner looks like this:
public static void main(String[] args) throws IOException {
CharStream input = CharStreams.fromStream(System.in);
HelloLexer lexer = new HelloLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
HelloParser parser = new HelloParser(tokens);
ParseTree tree = parser.r();
ParseTreeWalker walker = new ParseTreeWalker();
HelloLoader loader = new HelloLoader();
walker.walk(loader, tree);
System.out.println(loader.input);
}
Now if I enter a correct input "hello : 1 morning", I will get hello:1morning, as expected.
What if an incorrect input "hello ; 1 morning"? I will get the following output:
line 1:6 token recognition error at: ';'
line 1:8 missing ':' at '1'
hello<missing ':'>1morning
It seems that Antlr4 automatically recognized a wrong token ";" and delete it; however, it will not smartly add ":" in the corresponding place, but just claim <missing ':'>.
My question is: is there some way to solve this problem so that when Antlr found an error it will automatically fix it? How to achieve this coding? Do we need other tools?
Typically the input for a parser comes from some source file that contains some code or text that (supposedly) conforms to some grammar. A typical use scenario for syntax errors is to alert the user so that the source file can be corrected.
As the commented noted, you can insert your own error recovery system, but before trying to insert a single token into the token stream and recover, please consider that it would be a very limited solution. Why? Consider a much richer grammar where for a given token, many -- perhaps dozens or hundreds -- of other tokens can legally follow it. How would a single-token replacement strategy work then?
The hello.g4 example is the epitome of a trivial grammar, the "hello world" of ANTLR. But most of the time, for non-trivial grammars, the best we can do with imperfect syntax is to simply alert the user so the syntax can be corrected.
Suppose I have a really long escaped property value to input in LESS. I can get it to output with newlines in the final formatted css by setting a variable to add a newline like so #nl: `"\n"`; and use that in my escaped string to output newlines in the property value. So this:
#nl: `"\n"`;
.test {
property: ~"one,#{nl} two";
}
Will output this:
.test {
property: one,
two;
}
The question is whether there is any way to input it with newlines, something like:
.test {
property: ~"one,
two";
}
This throws a Parse error: Unrecognised input in LESS. A long set of property values like one might have with a progid:DXImageTransform.Microsoft.Matrix would benefit from being able to code it with newlines to begin with.
Though I posted an answer to my own question that I discovered worked, I'm open to a simpler one if it exists.
Well, one solution I have found is to put each line in its own escaped string. So this seems to work reasonably well for both input and output (depending on what the tab setting is):
Input LESS
#nl: `"\n\t\t\t"`;
.test {
property: ~"#{nl}"
~"one,#{nl}"
~"two";
}
Output CSS
.test {
property:
one,
two;
}
In short, no. Less does not support any kind of multiline strings (Recently this feature was proposed and rejected for various reasons). Though, speaking of IE hacks, the following code compiles fine since v1.4.2:
.rotate(#angle) {
#cos: cos(#angle);
#sin: sin(#angle);
#nsin: (0-sin(#angle));
-ms-filter: progid:DXImageTransform.Microsoft.Matrix(
M11=#cos,
M12=#sin,
M21=#nsin,
M22=#cos,
sizingMethod='auto expand'
);
}
test {
.rotate(20deg);
}
The only problem there is the combination of =, - and func() that needs some special handling.
Update,
as for the per line escaped strings example I guess the following would look a bit more clear:
#nl: ~`"\n "`;
.test {
property: #nl
~"one", #nl
~"two";
}
I want to create an interactive version of the ANTLR calculator example, which tells the user what to type next. For instance, in the beginning, the ID, INT, NEWLINE, and WS tokens are possible. Ignoring WS, a suggestion message could be:
Type an identifier, a number, or newline.
After parsing a number, the message should be
Type +, -, *, or newline.
and so on. How to do this?
Edit
What I have tried so far:
private void accept(String sentence) {
ANTLRInputStream is = new ANTLRInputStream(sentence);
OperationLexer l = new OperationLexer(is);
CommonTokenStream cts = new CommonTokenStream(l);
final OperationParser parser = new OperationParser(cts);
parser.addParseListener(new OperationBaseListener() {
#Override
public void enterEveryRule(ParserRuleContext ctx) {
ATNState state = parser.getATN().states.get(parser.getState());
System.out.print("RULE " + parser.ruleNames[state.ruleIndex] + " ");
IntervalSet following = parser.getATN().nextTokens(state, ctx);
for (Integer token : following.toList()) {
System.out.print(parser.tokenNames[token] + " ");
}
System.out.println();
}
});
parser.prog();
}
prints the right suggestion for the first token, but for all other tokens, it print the current token. I guess capturing the state at enterEveryRule() is too early.
Accurately gathering this information in an LL(k) parser, where k>1, requires a thorough understanding of the parser internals. Several years ago, I faced this problem with ANTLR 3, and found the only real solution was so complex that it resulted in me becoming a co-author of ANTLR 4 specifically so I could handle this issue.
ANTLR (including ANTLR 4) disambiguates the parse tree during the parsing phase, which means if your grammar is not LL(1) then performing this analysis in the parse tree means you have already lost information necessary to be accurate. You'll need to write your own version of ParserATNSimulator (or a custom interpreter which wraps it) which does not lose the information.
I see answer How to get ANTLR 3.2 to exit upon first error? which is helpful.
However, I can't seem to add these '#' rules without my grammar freaking out. My grammar file is simple:
grammar Exp;
options {
output=AST;
}
program
: includes decls (procedure)* main -> ^(SMALLCPROGRAM includes decls (procedure)* main) //AST - PROGRAM root
;
//Lexer and Parser rules continue below as normal..tested thoroughly and works
But if I try to add any of these # rules, I get errors such as:
grammar file Exp.g has no rules
and:
Exp.g:0:1: syntax error: assign.types: org.antlr.runtime.EarlyExitException
org\antlr\grammar\v3\DefineGrammarItemsWalker.g: node from line 202:4 required (...)+ loop did not match anything at input ';'
Anyone have an idea what the problem is? I simply want to change my grammar so that when I run it from my separate main class (passing input into it using ANTLRStringStream etc) it actually throws an error in the main class when there is a syntactic problem rather than just saying something like:
line 1:57 missing RPAREN at '{'
Before continuing to parse the rest of the input fine. Ultimately, my main class should refuse to parse any syntactically malformed input as defined by my grammar, and should report the errors to the user.
You probably have the order of the sections/blocks incorrectly. Be sure it looks like this:
grammar Exp;
options {
...
}
tokens {
...
}
#parser::header {
...
}
#lexer::header {
...
}
#parser::members {
...
}
#lexer::members {
...
}
I'm guessing you placed an #member or #header section before the tokens { ... } block. tokens { ... } should come directly after options { ... }.
I can also remember that certain 3.x version(s) had an issue with an empty sections: be sure there is at least something in all sections, otherwise omit the empty section.