Unity3D Convert Key Event to plain text - unity3d-gui

I am implementing a virtual console for my game and I am piping Key up events thro an event pipe (Observable Subject) and handling them on the virtual console side with
public void ReceiveInput (Event e)
{
Debug.Log ("ReceiveInput: " + e.ToString ());
if (e.keyCode == KeyCode.Backspace) {
if (ConsoleInputLine.LineText.Length > 2)
ConsoleInputLine.RemoveLastChar ();
return;
}
if (e.keyCode == KeyCode.Return || e.keyCode == KeyCode.KeypadEnter) {
ConsoleInputLine.LineText = "$ ";
return;
}
var chr = e.keyCode.ToString ();
if (e.capsLock || e.shift)
chr = chr.ToUpper ();
else
chr = chr.ToLower ();
ConsoleInputLine.AppendText ("" + chr);
}
While this works for simple A-Z and a-z letters, when you get to the numbers and other characters I will get the raw key name, such as "Alpha1" for the horizontal "1" key above the "Q" key, "Alpha2" for the "2" above "W", etc.
Is there a way to easily get the "rendered" text out of the key events without building a switch of all the possible keycode results?
PS: I forgot to mention that, shift+Alpha1 is "!" in US Querty, shift+Alpha2 is "#", etc, but it will differ in keyboards of different countries and its not feasible to make switch statements for every keyboard in the world.

This is not the correct answer!!!
Leaving it here in the spirit of assisting further research or get someone out of a jam.
For US Querty only keyboards, this works for now, until someone replies with a proper answer :
public static char ToChar (this Event e)
{
var key = e.keyCode;
char c = '\0';
if ((key >= KeyCode.A) && (key <= KeyCode.Z))
return (char)((int)((e.shift || e.capsLock) ? 'A' : 'a') + (int)(key - KeyCode.A));
if ((key >= KeyCode.Alpha0) && (key <= KeyCode.Alpha9)) {
if (e.shift) {
var specials = new char[] {'!', '#', '#', '$', '%', '^', '&', '*', '(', ')'};
return specials [(int)(key - KeyCode.Alpha0)];
}
return (char)((int)'0' + (int)(key - KeyCode.Alpha0));
}
if (key == KeyCode.Space)
return ' ';
if (key == KeyCode.Quote)
return (e.shift) ? '"' : '\'';
if (key == KeyCode.LeftBracket)
return (e.shift) ? '{' : '[';
if (key == KeyCode.RightBracket)
return (e.shift) ? '}' : ']';
if (key == KeyCode.BackQuote)
return (e.shift) ? '~' : '`';
if (key == KeyCode.Backslash)
return (e.shift) ? '|' : '\\';
if (key == KeyCode.Equals)
return (e.shift) ? '+' : '=';
if (key == KeyCode.Minus)
return (e.shift) ? '_' : '-';
if (key == KeyCode.Semicolon)
return (e.shift) ? ':' : ';';
if (key == KeyCode.Comma)
return (e.shift) ? '<' : ',';
if (key == KeyCode.Period)
return (e.shift) ? '>' : '.';
if (key == KeyCode.Slash)
return (e.shift) ? '?' : '/';
// implement more
return c;
}

Related

While loop multliple conditions out of range in kotlin

I wanna create a function that replaces a specific part of a string sentence "2.x^56+5.x-2.x^3+x^4+2" and I had a problem with the while loop, I used multiple condition in the while loop but it still working even when the condition is match,
fun replacePow() {
while (newString.contains('^')) {
for (i in 0 until newString.lastIndex) {
if (newString[i] == '^') {
m = i
while (newString[m] != '.' || newString[m] != '+' || newString[m]!= '-' ) {
m -= 1
}
if (newString[m] == '.' || newString[m] != '+' || newString[m]!= '-') {
previousLim = m
}
o = i
while (newString[o] != '+' || newString[o]!= '-') {
o += 1
}
if (newString[o] == '+'|| newString[o]== '-') {
nextLim = o
}
newString = newString.replaceRange(m + 1, o, powred[r].toString())
r += 1
}
if (i == newString.lastIndex){
break
}
}
}
println(newString)
}
if there is a better way to write this code just write it in the comment, And thank you anyway

Calculate the numbers without separating them

I can't solve this problem, how to separate the numbers and do the required arithmetic
https://codeforces.com/group/MWSDmqGsZm/contest/219158/problem/O
This is my code , I don't know how true it is
fun main(args:Array<String>) {
val scanner = Scanner(System.`in`)
var s: String = read()!!
var c: Char? = null
var a: String = ""
var b: String = ""
var t: Boolean = true
for (i in 1..s.length) {
if (s[i] == '+' || s[i] == '-' || s[i] == '*' || s[i] == '/') {
c = s[i]
t = false
}else{
if(t){
a+=s[i]
}else{
b+=s[i]
}
}
}
if(c=='+'){
println(a+b)
}else if (c=='-'){
println( "$a - $b" )
}else if (c=='*'){
println("$a * $b")
}else{
println("$a / $b")
}
Try this code:
fun main() {
val str = readLine()!!
val (index, operator) = str.findAnyOf(listOf("+", "-", "*", "/"))!!
val a = str.substring(0, index).toInt()
val b = str.substring(index + 1).toInt()
val result = when (operator) {
"+" -> a + b
"-" -> a - b
"*" -> a * b
"/" -> a / b
else -> -1
}
println(result)
}
findAnyOf documentation:
Finds the first occurrence of any of the specified [strings] in this char sequence.
Returns A pair of an index of the first occurrence of matched string from [strings] and the string matched or null if none of [strings] are found.
Rest of the logic is quite straightforward.

Grammar to parse sql statements delimited by semicolon in antlr4

I'm looking to separate sql statements that could have comments /* */ or strings 'test' or line comments --line comment (sql style) separated by semicolons. An example would be:
Blah blah 'string ; ' ;
More text /* semicolon(;) inside comment */
Some more text
in multiple lines
the text above should retrieve only two statements since the semicolon inside the string ' ' and the comment /* */ should not count as a delimiter.
The current grammar I have is :
grammar SqlStatements;
sql_stmts:
text (';' text)* EOF
;
text:
SINGLE_LINE_COMMENT*
| STRING*
| TEXT*
;
TEXT:
~['--';\''/*']*
;
STRING
:
'\'' ('\'\''|~'\'')* '\''
;
SINGLE_LINE_COMMENT
: '--' ~[\r\n]*
;
MULTILINE_COMMENT
: '/*' .*? ( '*/' | EOF )
;
The code above crashes when typing *.
The common approach for parsing SQL is to first split the individual statements. That might involve handling of delimiter switches, which is needed e.g. when you have a stored procedure in the dump which must be handled as a whole but needs the semicolon as internal statement delimiter.
This can be done very quickly with an optimized loop that jumps over comments and strings. Here's code how this is handled in MySQL Workbench:
/**
* A statement splitter to take a list of sql statements and split them into individual statements,
* return their position and length in the original string (instead the copied strings).
*/
size_t MySQLParserServicesImpl::determineStatementRanges(const char *sql, size_t length,
const std::string &initial_delimiter,
std::vector<std::pair<size_t, size_t> > &ranges,
const std::string &line_break)
{
_stop = false;
std::string delimiter = initial_delimiter.empty() ? ";" : initial_delimiter;
const unsigned char *delimiter_head = (unsigned char*)delimiter.c_str();
const unsigned char keyword[] = "delimiter";
const unsigned char *head = (unsigned char *)sql;
const unsigned char *tail = head;
const unsigned char *end = head + length;
const unsigned char *new_line = (unsigned char*)line_break.c_str();
bool have_content = false; // Set when anything else but comments were found for the current statement.
while (!_stop && tail < end)
{
switch (*tail)
{
case '/': // Possible multi line comment or hidden (conditional) command.
if (*(tail + 1) == '*')
{
tail += 2;
bool is_hidden_command = (*tail == '!');
while (true)
{
while (tail < end && *tail != '*')
tail++;
if (tail == end) // Unfinished comment.
break;
else
{
if (*++tail == '/')
{
tail++; // Skip the slash too.
break;
}
}
}
if (!is_hidden_command && !have_content)
head = tail; // Skip over the comment.
}
else
tail++;
break;
case '-': // Possible single line comment.
{
const unsigned char *end_char = tail + 2;
if (*(tail + 1) == '-' && (*end_char == ' ' || *end_char == '\t' || is_line_break(end_char, new_line)))
{
// Skip everything until the end of the line.
tail += 2;
while (tail < end && !is_line_break(tail, new_line))
tail++;
if (!have_content)
head = tail;
}
else
tail++;
break;
}
case '#': // MySQL single line comment.
while (tail < end && !is_line_break(tail, new_line))
tail++;
if (!have_content)
head = tail;
break;
case '"':
case '\'':
case '`': // Quoted string/id. Skip this in a local loop.
{
have_content = true;
char quote = *tail++;
while (tail < end && *tail != quote)
{
// Skip any escaped character too.
if (*tail == '\\')
tail++;
tail++;
}
if (*tail == quote)
tail++; // Skip trailing quote char to if one was there.
break;
}
case 'd':
case 'D':
{
have_content = true;
// Possible start of the keyword DELIMITER. Must be at the start of the text or a character,
// which is not part of a regular MySQL identifier (0-9, A-Z, a-z, _, $, \u0080-\uffff).
unsigned char previous = tail > (unsigned char *)sql ? *(tail - 1) : 0;
bool is_identifier_char = previous >= 0x80
|| (previous >= '0' && previous <= '9')
|| ((previous | 0x20) >= 'a' && (previous | 0x20) <= 'z')
|| previous == '$'
|| previous == '_';
if (tail == (unsigned char *)sql || !is_identifier_char)
{
const unsigned char *run = tail + 1;
const unsigned char *kw = keyword + 1;
int count = 9;
while (count-- > 1 && (*run++ | 0x20) == *kw++)
;
if (count == 0 && *run == ' ')
{
// Delimiter keyword found. Get the new delimiter (everything until the end of the line).
tail = run++;
while (run < end && !is_line_break(run, new_line))
run++;
delimiter = base::trim(std::string((char *)tail, run - tail));
delimiter_head = (unsigned char*)delimiter.c_str();
// Skip over the delimiter statement and any following line breaks.
while (is_line_break(run, new_line))
run++;
tail = run;
head = tail;
}
else
tail++;
}
else
tail++;
break;
}
default:
if (*tail > ' ')
have_content = true;
tail++;
break;
}
if (*tail == *delimiter_head)
{
// Found possible start of the delimiter. Check if it really is.
size_t count = delimiter.size();
if (count == 1)
{
// Most common case. Trim the statement and check if it is not empty before adding the range.
head = skip_leading_whitespace(head, tail);
if (head < tail)
ranges.push_back(std::make_pair<size_t, size_t>(head - (unsigned char *)sql, tail - head));
head = ++tail;
have_content = false;
}
else
{
const unsigned char *run = tail + 1;
const unsigned char *del = delimiter_head + 1;
while (count-- > 1 && (*run++ == *del++))
;
if (count == 0)
{
// Multi char delimiter is complete. Tail still points to the start of the delimiter.
// Run points to the first character after the delimiter.
head = skip_leading_whitespace(head, tail);
if (head < tail)
ranges.push_back(std::make_pair<size_t, size_t>(head - (unsigned char *)sql, tail - head));
tail = run;
head = run;
have_content = false;
}
}
}
}
// Add remaining text to the range list.
head = skip_leading_whitespace(head, tail);
if (head < tail)
ranges.push_back(std::make_pair<size_t, size_t>(head - (unsigned char *)sql, tail - head));
return 0;
}
This works well also for large sql scripts and can split a dump containing 1 million lines in about 1 second (depends of course on the box you run this on). The var _stop is a flag used to allow breaking the split process. The code is handling MySQL code, so it properly handles hidden commands (version comments).
With the start and length info per query you can now go to your parser.
Even when Mike's answer was fine, I needed to create the grammar in antlr. The following grammar worked for me:
sql_stmts:
sql_stmt (';'+ sql_stmt)*
;
sql_stmt:
TEXT*
;
TEXT:
~[']
| STRING
;
BLOCK_COMMENT
: '/*' .*? ( '*/' | EOF ) -> channel(HIDDEN)
;
LINE_COMMENT
: '--' ~[\r\n]* -> channel(HIDDEN)
;
SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;
STRING
:
'\'' ('\'\''|~'\'')* '\''
;
First, don't ignore the warning and error messages generated when compiling the grammar.
Second, the TEXT rule does not do what you think it does -- quotes don't work there. See the doc.
Third, your first line of input is actually TEXT STRING TEXT SEMI. That second TEXT is the space before your SEMI rule, yet your rule only allows for a single non-consecutive occurrence of TEXT before the SEMI.

Validate user input - Objective-C

I am trying to validate user input. Here's the code:
do{
NSLog(#"Please select from the following options: D/ W/ T/ Q");
res = scanf("%c", &s1);
if(res ==0) {
NSLog(#"Invalid entry.");
}
}while (res ==0);
I want to improve the above code such that it will not allow the user to input anything (such as a number, a string, or any negative number) but only one single character (to be specific, only one of the option given in the prompt).
The current code doesn't do that.
boolean bValid = true;
do {
NSLog(#"Please select from the following options: D/ W/ T/ Q");
res = scanf("%c", &s1);
if(res == 'D' || res == 'W' || res == 'T' || res == 'Q'){
bValid = false;
}
else{
//Error message
}
} while (bValid == true);
You can use this code.
Just check it out.
Well one option is first to read the keyboard as a string
char buffer[128];
fgets( buffer, sizeof(buffer), stdin );
once you have the line, then check whether it is one of the options, seems only the first letter is significant in your case:
switch( toupper( buffer[0] ) )
{
case 'D': {...} ; // do whatever u need to do
case 'W': {...} ;
case 'T': {...} ;
case 'Q': {...} ;
default: {...} ;
}

Different lexer rules in different state

I've been working on a parser for some template language embeded in HTML (FreeMarker), piece of example here:
${abc}
<html>
<head>
<title>Welcome!</title>
</head>
<body>
<h1>
Welcome ${user}<#if user == "Big Joe">, our beloved
leader</#if>!
</h1>
<p>Our latest product:
${latestProduct}!
</body>
</html>
The template language is between some specific tags, e.g. '${' '}', '<#' '>'. Other raw texts in between can be treated like as the same tokens (RAW).
The key point here is that the same text, e.g. an integer, will mean differently thing for the parser depends on whether it's between those tags or not, and thus needs to be treated as different tokens.
I've tried with the following ugly implementation, with a self-defined state to indicate whether it's in those tags. As you see, I have to check the state almost in every rule, which drives me crazy...
I also thinked about the following two solutions:
Use multiple lexers. I can switch between two lexers when inside/outside those tags. However, the document for this is poor for ANTLR3. I don't know how to let one parser share two different lexers and switch between them.
Move the RAW rule up, after the NUMERICAL_ESCAPE rule. Check the state there, if it's in the tag, put back the token and continue trying the left rules. This would save lots of state checking. However, I don't find any 'put back' function and ANTLR complains about some rules can never be matched...
Is there an elegant solution for this?
grammar freemarker_simple;
#lexer::members {
int freemarker_type = 0;
}
expression
: primary_expression ;
primary_expression
: number_literal | identifier | parenthesis | builtin_variable
;
parenthesis
: OPEN_PAREN expression CLOSE_PAREN ;
number_literal
: INTEGER | DECIMAL
;
identifier
: ID
;
builtin_variable
: DOT ID
;
string_output
: OUTPUT_ESCAPE expression CLOSE_BRACE
;
numerical_output
: NUMERICAL_ESCAPE expression CLOSE_BRACE
;
if_expression
: START_TAG IF expression DIRECTIVE_END optional_block
( START_TAG ELSE_IF expression loose_directive_end optional_block )*
( END_TAG ELSE optional_block )?
END_TAG END_IF
;
list : START_TAG LIST expression AS ID DIRECTIVE_END optional_block END_TAG END_LIST ;
for_each
: START_TAG FOREACH ID IN expression DIRECTIVE_END optional_block END_TAG END_FOREACH ;
loose_directive_end
: ( DIRECTIVE_END | EMPTY_DIRECTIVE_END ) ;
freemarker_directive
: ( if_expression | list | for_each ) ;
content : ( RAW | string_output | numerical_output | freemarker_directive ) + ;
optional_block
: ( content )? ;
root : optional_block EOF ;
START_TAG
: '<#'
{ freemarker_type = 1; }
;
END_TAG : '</#'
{ freemarker_type = 1; }
;
DIRECTIVE_END
: '>'
{
if(freemarker_type == 0) $type=RAW;
freemarker_type = 0;
}
;
EMPTY_DIRECTIVE_END
: '/>'
{
if(freemarker_type == 0) $type=RAW;
freemarker_type = 0;
}
;
OUTPUT_ESCAPE
: '${'
{ if(freemarker_type == 0) freemarker_type = 2; }
;
NUMERICAL_ESCAPE
: '#{'
{ if(freemarker_type == 0) freemarker_type = 2; }
;
IF : 'if'
{ if(freemarker_type == 0) $type=RAW; }
;
ELSE : 'else' DIRECTIVE_END
{ if(freemarker_type == 0) $type=RAW; }
;
ELSE_IF : 'elseif'
{ if(freemarker_type == 0) $type=RAW; }
;
LIST : 'list'
{ if(freemarker_type == 0) $type=RAW; }
;
FOREACH : 'foreach'
{ if(freemarker_type == 0) $type=RAW; }
;
END_IF : 'if' DIRECTIVE_END
{ if(freemarker_type == 0) $type=RAW; }
;
END_LIST
: 'list' DIRECTIVE_END
{ if(freemarker_type == 0) $type=RAW; }
;
END_FOREACH
: 'foreach' DIRECTIVE_END
{ if(freemarker_type == 0) $type=RAW; }
;
FALSE: 'false' { if(freemarker_type == 0) $type=RAW; };
TRUE: 'true' { if(freemarker_type == 0) $type=RAW; };
INTEGER: ('0'..'9')+ { if(freemarker_type == 0) $type=RAW; };
DECIMAL: INTEGER '.' INTEGER { if(freemarker_type == 0) $type=RAW; };
DOT: '.' { if(freemarker_type == 0) $type=RAW; };
DOT_DOT: '..' { if(freemarker_type == 0) $type=RAW; };
PLUS: '+' { if(freemarker_type == 0) $type=RAW; };
MINUS: '-' { if(freemarker_type == 0) $type=RAW; };
TIMES: '*' { if(freemarker_type == 0) $type=RAW; };
DIVIDE: '/' { if(freemarker_type == 0) $type=RAW; };
PERCENT: '%' { if(freemarker_type == 0) $type=RAW; };
AND: '&' | '&&' { if(freemarker_type == 0) $type=RAW; };
OR: '|' | '||' { if(freemarker_type == 0) $type=RAW; };
EXCLAM: '!' { if(freemarker_type == 0) $type=RAW; };
OPEN_PAREN: '(' { if(freemarker_type == 0) $type=RAW; };
CLOSE_PAREN: ')' { if(freemarker_type == 0) $type=RAW; };
OPEN_BRACE
: '{'
{ if(freemarker_type == 0) $type=RAW; }
;
CLOSE_BRACE
: '}'
{
if(freemarker_type == 0) $type=RAW;
if(freemarker_type == 2) freemarker_type = 0;
}
;
IN: 'in' { if(freemarker_type == 0) $type=RAW; };
AS: 'as' { if(freemarker_type == 0) $type=RAW; };
ID : ('A'..'Z'|'a'..'z')+
//{ if(freemarker_type == 0) $type=RAW; }
;
BLANK : ( '\r' | ' ' | '\n' | '\t' )+
{
if(freemarker_type == 0) $type=RAW;
else $channel = HIDDEN;
}
;
RAW
: .
;
EDIT
I found the problem similar to How do I lex this input? , where a "start condition" is needed. But unfortunately, the answer uses a lot of predicates as well, just like my states.
Now, I tried to move the RAW higher with a predicate. Hoping to eliminate all the state checks after RAW rule. However, my example input failed, the first line end is recogonized as BLANK instead of RAW it should be.
I guess something wrong is about the rule priority:
After CLOSE_BRACE is matched, the next token is matched from rules after the CLOSE_BRACE rule, rather than start from the begenning again.
Any way to resolve this?
New grammar below with some debug outputs:
grammar freemarker_simple;
#lexer::members {
int freemarker_type = 0;
}
expression
: primary_expression ;
primary_expression
: number_literal | identifier | parenthesis | builtin_variable
;
parenthesis
: OPEN_PAREN expression CLOSE_PAREN ;
number_literal
: INTEGER | DECIMAL
;
identifier
: ID
;
builtin_variable
: DOT ID
;
string_output
: OUTPUT_ESCAPE expression CLOSE_BRACE
;
numerical_output
: NUMERICAL_ESCAPE expression CLOSE_BRACE
;
if_expression
: START_TAG IF expression DIRECTIVE_END optional_block
( START_TAG ELSE_IF expression loose_directive_end optional_block )*
( END_TAG ELSE optional_block )?
END_TAG END_IF
;
list : START_TAG LIST expression AS ID DIRECTIVE_END optional_block END_TAG END_LIST ;
for_each
: START_TAG FOREACH ID IN expression DIRECTIVE_END optional_block END_TAG END_FOREACH ;
loose_directive_end
: ( DIRECTIVE_END | EMPTY_DIRECTIVE_END ) ;
freemarker_directive
: ( if_expression | list | for_each ) ;
content : ( RAW | string_output | numerical_output | freemarker_directive ) + ;
optional_block
: ( content )? ;
root : optional_block EOF ;
START_TAG
: '<#'
{ freemarker_type = 1; }
;
END_TAG : '</#'
{ freemarker_type = 1; }
;
OUTPUT_ESCAPE
: '${'
{ if(freemarker_type == 0) freemarker_type = 2; }
;
NUMERICAL_ESCAPE
: '#{'
{ if(freemarker_type == 0) freemarker_type = 2; }
;
RAW
:
{ freemarker_type == 0 }?=> .
{System.out.printf("RAW \%s \%d\n",getText(),freemarker_type);}
;
DIRECTIVE_END
: '>'
{ if(freemarker_type == 1) freemarker_type = 0; }
;
EMPTY_DIRECTIVE_END
: '/>'
{ if(freemarker_type == 1) freemarker_type = 0; }
;
IF : 'if'
;
ELSE : 'else' DIRECTIVE_END
;
ELSE_IF : 'elseif'
;
LIST : 'list'
;
FOREACH : 'foreach'
;
END_IF : 'if' DIRECTIVE_END
;
END_LIST
: 'list' DIRECTIVE_END
;
END_FOREACH
: 'foreach' DIRECTIVE_END
;
FALSE: 'false' ;
TRUE: 'true' ;
INTEGER: ('0'..'9')+ ;
DECIMAL: INTEGER '.' INTEGER ;
DOT: '.' ;
DOT_DOT: '..' ;
PLUS: '+' ;
MINUS: '-' ;
TIMES: '*' ;
DIVIDE: '/' ;
PERCENT: '%' ;
AND: '&' | '&&' ;
OR: '|' | '||' ;
EXCLAM: '!' ;
OPEN_PAREN: '(' ;
CLOSE_PAREN: ')' ;
OPEN_BRACE
: '{'
;
CLOSE_BRACE
: '}'
{ if(freemarker_type == 2) {freemarker_type = 0;} }
;
IN: 'in' ;
AS: 'as' ;
ID : ('A'..'Z'|'a'..'z')+
{ System.out.printf("ID \%s \%d\n",getText(),freemarker_type);}
;
BLANK : ( '\r' | ' ' | '\n' | '\t' )+
{
System.out.printf("BLANK \%d\n",freemarker_type);
$channel = HIDDEN;
}
;
My input results with the output:
ID abc 2
BLANK 0 <<< incorrect, should be RAW when state==0
RAW < 0 <<< correct
ID html 0 <<< incorrect, should be RAW RAW RAW RAW
RAW > 0
EDIT2
Also tried the 2nd approach with Bart's grammar, still didn't work the 'html' is recognized as an ID, which should be 4 RAWs. When mmode=false, shouldn't RAW get matched first? Or the lexer still chooses the longest match here?
grammar freemarker_bart;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
FILE;
OUTPUT;
RAW_BLOCK;
}
#parser::members {
// merge a given list of tokens into a single AST
private CommonTree merge(List tokenList) {
StringBuilder b = new StringBuilder();
for(int i = 0; i < tokenList.size(); i++) {
Token token = (Token)tokenList.get(i);
b.append(token.getText());
}
return new CommonTree(new CommonToken(RAW, b.toString()));
}
}
#lexer::members {
private boolean mmode = false;
}
parse
: content* EOF -> ^(FILE content*)
;
content
: (options {greedy=true;}: t+=RAW)+ -> ^(RAW_BLOCK {merge($t)})
| if_stat
| output
;
if_stat
: TAG_START IF expression TAG_END raw_block TAG_END_START IF TAG_END -> ^(IF expression raw_block)
;
output
: OUTPUT_START expression OUTPUT_END -> ^(OUTPUT expression)
;
raw_block
: (t+=RAW)* -> ^(RAW_BLOCK {merge($t)})
;
expression
: eq_expression
;
eq_expression
: atom (EQUALS^ atom)*
;
atom
: STRING
| ID
;
// these tokens denote the start of markup code (sets mmode to true)
OUTPUT_START : '${' {mmode=true;};
TAG_START : '<#' {mmode=true;};
TAG_END_START : '</' ('#' {mmode=true;} | ~'#' {$type=RAW;});
RAW : {!mmode}?=> . ;
// these tokens denote the end of markup code (sets mmode to false)
OUTPUT_END : '}' {mmode=false;};
TAG_END : '>' {mmode=false;};
// valid tokens only when in "markup mode"
EQUALS : '==';
IF : 'if';
STRING : '"' ~'"'* '"';
ID : ('a'..'z' | 'A'..'Z')+;
SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};
You could let lexer rules match using gated semantic predicates where you test for a certain boolean expression.
A little demo:
freemarker_simple.g
grammar freemarker_simple;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
FILE;
OUTPUT;
RAW_BLOCK;
}
#parser::members {
// merge a given list of tokens into a single AST
private CommonTree merge(List tokenList) {
StringBuilder b = new StringBuilder();
for(int i = 0; i < tokenList.size(); i++) {
Token token = (Token)tokenList.get(i);
b.append(token.getText());
}
return new CommonTree(new CommonToken(RAW, b.toString()));
}
}
#lexer::members {
private boolean mmode = false;
}
parse
: content* EOF -> ^(FILE content*)
;
content
: (options {greedy=true;}: t+=RAW)+ -> ^(RAW_BLOCK {merge($t)})
| if_stat
| output
;
if_stat
: TAG_START IF expression TAG_END raw_block TAG_END_START IF TAG_END -> ^(IF expression raw_block)
;
output
: OUTPUT_START expression OUTPUT_END -> ^(OUTPUT expression)
;
raw_block
: (t+=RAW)* -> ^(RAW_BLOCK {merge($t)})
;
expression
: eq_expression
;
eq_expression
: atom (EQUALS^ atom)*
;
atom
: STRING
| ID
;
// these tokens denote the start of markup code (sets mmode to true)
OUTPUT_START : '${' {mmode=true;};
TAG_START : '<#' {mmode=true;};
TAG_END_START : '</' ('#' {mmode=true;} | ~'#' {$type=RAW;});
// these tokens denote the end of markup code (sets mmode to false)
OUTPUT_END : {mmode}?=> '}' {mmode=false;};
TAG_END : {mmode}?=> '>' {mmode=false;};
// valid tokens only when in "markup mode"
EQUALS : {mmode}?=> '==';
IF : {mmode}?=> 'if';
STRING : {mmode}?=> '"' ~'"'* '"';
ID : {mmode}?=> ('a'..'z' | 'A'..'Z')+;
SPACE : {mmode}?=> (' ' | '\t' | '\r' | '\n')+ {skip();};
RAW : . ;
which parses your input:
test.html
${abc}
<html>
<head>
<title>Welcome!</title>
</head>
<body>
<h1>
Welcome ${user}<#if user == "Big Joe">, our beloved leader</#if>!
</h1>
<p>Our latest product: ${latestProduct}!</p>
</body>
</html>
into the following AST:
as you can test yourself with the class:
Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
freemarker_simpleLexer lexer = new freemarker_simpleLexer(new ANTLRFileStream("test.html"));
freemarker_simpleParser parser = new freemarker_simpleParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
EDIT 1
When I run your example input with a parser generated from the second grammar you posted, the following are wthe first 5 lines being printed to the console (not counting the many warnings that are generated):
ID abc 2
RAW
0
RAW < 0
ID html 0
...
EDIT 2
Bood wrote:
Also tried the 2nd approach with Bart's grammar, still didn't work the 'html' is recognized as an ID, which should be 4 RAWs. When mmode=false, shouldn't RAW get matched first? Or the lexer still chooses the longest match here?
Yes, that is correct: ANTLR chooses the longer match in that case.
But now that I (finally :)) see what you're trying to do, here's a last proposal: you could let the RAW rule match characters as long as the rule can't see one of the following character sequences ahead: "<#", "</#" or "${". Note that the rule must still stay at the end in the grammar. This check is performed inside the lexer. Also, in that case you don't need the merge(...) method in the parser:
grammar freemarker_simple;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
FILE;
OUTPUT;
RAW_BLOCK;
}
#lexer::members {
private boolean mmode = false;
private boolean rawAhead() {
if(mmode) return false;
int ch1 = input.LA(1), ch2 = input.LA(2), ch3 = input.LA(3);
return !(
(ch1 == '<' && ch2 == '#') ||
(ch1 == '<' && ch2 == '/' && ch3 == '#') ||
(ch1 == '$' && ch2 == '{')
);
}
}
parse
: content* EOF -> ^(FILE content*)
;
content
: RAW
| if_stat
| output
;
if_stat
: TAG_START IF expression TAG_END RAW TAG_END_START IF TAG_END -> ^(IF expression RAW)
;
output
: OUTPUT_START expression OUTPUT_END -> ^(OUTPUT expression)
;
expression
: eq_expression
;
eq_expression
: atom (EQUALS^ atom)*
;
atom
: STRING
| ID
;
OUTPUT_START : '${' {mmode=true;};
TAG_START : '<#' {mmode=true;};
TAG_END_START : '</' ('#' {mmode=true;} | ~'#' {$type=RAW;});
OUTPUT_END : '}' {mmode=false;};
TAG_END : '>' {mmode=false;};
EQUALS : '==';
IF : 'if';
STRING : '"' ~'"'* '"';
ID : ('a'..'z' | 'A'..'Z')+;
SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};
RAW : ({rawAhead()}?=> . )+;
The grammar above will produce the following AST from the input posted at the start of this answer: