I am currently using noescape function in apache velocity, but it did not work and had parse error exception.
{
tree: #noescape()$!{__NavTree__}#end
}
The error message is
Encountered "#end" at .... Was expecting one of:
<EOF>
"(" ...
<RPAREN> ...
<ESCAPE_DIRECTIVE> ...
<SET_DIRECTIVE> ...
"##" ...
"\\\\" ...
"\\" ...
<TEXT> ...
"*#" ...
"*#" ...
"]]#" ...
<STRING_LITERAL> ...
<IF_DIRECTIVE> ...
<INTEGER_LITERAL> ...
<FLOATING_POINT_LITERAL> ...
<WORD> ...
<BRACKETED_WORD> ...
<IDENTIFIER> ...
<DOT> ...
"{" ...
"}" ...
<EMPTY_INDEX> ...
Does anyone have any solution to this question?
The #noescape() directive is not a standard one, I don't know its definition. Most probably, it is not a block directive, so you should rather call it with:
#noescape($!{__NavTree__})
Related
I am trying to match a very basic ANTLR grammar. But ANTLR is keep telling me that he got the input '.' and expects '.' .
The full error is:
line 1:0 extraneous input '.' expecting '.'
line 1:2 missing '*' at '<EOF>'
With the grammar:
grammar regex;
#parser::header
{
package antlr;
}
#lexer::header
{
package antlr;
}
WHITESPACE : (' ' | '\t' | '\n' | '\r') -> channel(HIDDEN);
COMP : '.';
KLEENE : '*';
start : COMP KLEENE;
And input:
.*
Both files have the same charset:
regex.g: text/plain; charset=us-ascii
test.grammar: text/plain; charset=us-ascii
There should be no Lexer rule mix up. Why does this not work as expected?
Given your example grammar and this test class:
import org.antlr.v4.runtime.*;
public class Main {
public static void main(String[] args) {
String source = ".*";
regexLexer lexer = new regexLexer(CharStreams.fromString(source));
regexParser parser = new regexParser(new CommonTokenStream(lexer));
System.out.println(parser.start().toStringTree(parser));
}
}
the following is printed to my console:
(start . *)
My guess is you have either dumbed down the grammar too much causing the error in your original grammar to disappear, or you haven't generated new lexer/parser classes.
I need to tokenize everything that is "outside" any comment, until end of line. For instance:
take me */ and me /* but not me! */ I'm in! // I'm not...
Tokenized as (STR is the "outside" string, BC is block-comment and LC is single-line-comment):
{
STR: "take me */ and me ", // note the "*/" in the string!
BC : " but not me! ",
STR: " I'm in! ",
LC : " I'm not..."
}
And:
/* starting with don't take me */ ...take me...
Tokenized as:
{
BC : " starting with don't take me ",
STR: " ...take me..."
}
The problem is that STR can be anything except the comments, and since the comments openers are not single char tokens I can't use a negation rule for STR.
I thought maybe to do something like:
STR : { IsNextSequenceTerminatesThe_STR_rule(); }?;
But I don't know how to look-ahead for characters in lexer actions.
Is it even possible to accomplish with the ANTLR4 lexer, if yes then how?
Yes, it is possible to perform the tokenization you are attempting.
Based on what has been described above, you want nested comments. These can be achieved in the lexer only without Action, Predicate nor any code. In order to have nested comments, its easier if you do not use the greedy/non-greedy ANTLR options. You will need to specify/code this into the lexer grammar. Below are the three lexer rules you will need... with STR definition.
I added a parser rule for testing. I've not tested this, but it should do everything you mentioned. Also, its not limited to 'end of line' you can make that modification if you need to.
/*
All 3 COMMENTS are Mutually Exclusive
*/
DOC_COMMENT
: '/**'
( [*]* ~[*/] // Cannot START/END Comment
( DOC_COMMENT
| BLK_COMMENT
| INL_COMMENT
| .
)*?
)?
'*'+ '/' -> channel( DOC_COMMENT )
;
BLK_COMMENT
: '/*'
(
( /* Must never match an '*' in position 3 here, otherwise
there is a conflict with the definition of DOC_COMMENT
*/
[/]? ~[*/] // No START/END Comment
| DOC_COMMENT
| BLK_COMMENT
| INL_COMMENT
)
( DOC_COMMENT
| BLK_COMMENT
| INL_COMMENT
| .
)*?
)?
'*/' -> channel( BLK_COMMENT )
;
INL_COMMENT
: '//'
( ~[\n\r*/] // No NEW_LINE
| INL_COMMENT // Nested Inline Comment
)* -> channel( INL_COMMENT )
;
STR // Consume everthing up to the start of a COMMENT
: ( ~'/' // Any Char not used to START a Comment
| '/' ~[*/] // Cannot START a Comment
)+
;
start
: DOC_COMMENT
| BLK_COMMENT
| INL_COMMENT
| STR
;
Try something like this:
grammar T;
#lexer::members {
// Returns true iff either "//" or "/*" is ahead in the char stream.
boolean startCommentAhead() {
return _input.LA(1) == '/' && (_input.LA(2) == '/' || _input.LA(2) == '*');
}
}
// other rules
STR
: ( {!startCommentAhead()}? . )+
;
I am using antlr 3.1.3 and generating a python target. My lexer and parser accept very large files. Based on command-line or dynamic run-time controlled parameters, I would like to capture a portion of the recognized input and stop parsing early. For example, if my language consists of a header and a body, and the body might have gigabytes of tokens, and I am only interested in the header, I would like to have a rule that stops the lexer and parser without raising an exception. For performance reasons, I don't want to read the entire body.
grammar Example;
options {
language=Python;
k=2;
}
language:
header
body
EOF
;
header:
HEAD
(STRING)*
;
body:
BODY { if stopearly: help() }
(STRING)*
;
// string literals
STRING: '"'
(
'"' '"'
| NEWLINE
| ~('"'|'\n'|'\r')
)*
'"'
;
// Whitespace -- ignored
WS:
( ' '
| '\t'
| '\f'
| NEWLINE
)+ { $channel=HIDDEN }
;
HEAD: 'head';
BODY: 'body';
fragment NEWLINE: '\r' '\n' | '\r' | '\n';
What about:
body:
BODY {!stopearly}? => (STRING)*
;
?
That's using a syntantic predicate to enable certain language parts. I use that often to toggle language parts depending on a version number. I'm not 100% certain. It might be you have to move the predicate and the code following it into an own rule.
This is a python-specific answer. I Added this to my parser:
#parser::header
{
class QuitEarlyException(Exception):
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
}
and changed this:
body:
BODY { if stopearly: raise QuitEarlyException('ok') }
(STRING)*
;
Now I have a "try" block around my parser:
try:
parser.language()
except QuitEarlyException as e:
print "stopped early"
I have a grammar to parse some source code:
document
: header body_block* EOF
-> body_block*
;
header
: header_statement*
;
body_block
: '{' block_contents '}'
;
block_contents
: declaration_list
| ... other things ....
It's legal for a document to have a header without a body or a body without a header.
If I try to parse a document that looks like
int i;
then ANTLR complains that it found int when it was expecting EOF. This is true, but I'd like it to say that it was expecting {. That is, if the input contains something between the header and the EOF that's not a body_block, then I'd like to suggest to the user that they meant to enclose that text inside a body_block.
I've made a couple almost working attempts at this that I can post if that's illuminating, but I'm hoping that I've just missed something easy.
Not pretty, but something like this would do it:
body_block
: ('{')=> '{' block_contents '}'
| t=.
{
if(!$t.text.equals("{")) {
String message = "expected a '{' on line " + $t.getLine() + " near '" + $t.text + "'";
}
else {
String message = "encountered a '{' without a '}' on line " + $t.getLine();
}
throw new RuntimeException(message);
}
;
(not tested, may contain syntax errors!)
So, whenever '{' ... '}' is not matched, it falls through to .1 and produces a more understandable error message.
1 note that a . in a parser rule matches any token, not any character!
I have sentences like :
" a"
"a "
" a "
I would like to catch all this examples (with lex), but I don't how to say the beginning of the line
I'm not totally sure what exactly you're looking for, but the regex symbol to specify matching the beginning of a line in a lex definition is the caret:
^
If I understand correctly, you're trying to pull the "a" out as the token, but you don't want to grab any of the whitespace? If this is the case, then you just need something like the following:
[\n\t\r ]+ {
// do nothing
}
"a" {
assignYYText( yylval );
return aToken;
}