Extracting tokens out of ES6 template literals with Ragel

Extracting tokens out of ES6 template literals with Ragel - ragel

JavaScript contains the following syntax:
`hello ${name}`
I'm wondering how a Ragel machine would split the syntax above. The way I see it, the type of the closing curly brace depends on the parsing state. For example, in the code below the curly brace is instead part of the string token, since the ${ token isn't there:
`hello name}`
Finally, it becomes more tricky when you consider that the right curly can also be found within the variable expression itself, ie:
`hello ${() => { return name }()}`
How would a similar context-dependent grammar be implemented with Ragel?

The syntax inside of `` is not normally something you would handle with your lexical analyzer. Better to send it to your parser as a sequence of literal text and/or tokens. So you'd send "`" as opening, "hello " as some literal text, then the tokens "(", ")" etc. To know when to stop and go back to literal text you either need some feedback from your parser to to your scanner, or inside the scanner you need to balance the parens.
Note I've never actually made a parser for javascript, just going on what you provided above.

Related

Bison: Syntax Error processing, unexpected and undefined<token>

I want to process undefined and unexpected token error in yyerror func (or maybe by another func if it's possible)
for example, i get a error message from Bison
...
LAC: checking lookahead EXECSQL: S4
Error: popping nterm component_list ()
Stack now 0
Cleanup: discarding lookahead token $undefined ()
Stack now 0
ERRSTAT = "%X0000002C"
But I want to print which token hasn't been founded and the line number. Is it possible to implement it in Bison and how?

The special token $undefined is reported when yylex returns a token number which doesn't appear in any parser rule. Most of the time, that's the result of the lexer fallback rule:
. { return yytext[0]; }
But it can also happen if you declare a token in your parser file, and the lexer returns that token, but the token is never actually used in any rule.
Unused tokens don't have names, in the sense that the array of names which Bison includes in your parser doesn't include unused tokens, and so there's no way to look up what the token name originally was. You can, however, often get the token number from the variable yychar. If that number is greater than 0 and less than 256, then the token is probably a single-character token, and you could use that to print an additional error message. However, there's no simple way to modify the error message generated by Bison's verbose error messages; if you're using that feature, you'll still see the invalid token message.
In order to print line numbers, you only need to enable line number counting in the lexical scanner, using
%option yylineno
in your Flex (.l) file. Then you can print the value of yylineno in yyerror. (If you're using a "pure" (reentrant) scanner, then yylineno will be in the scanner_t object. In the normal use case where that object is an extra parser argument, it will also be available inside yyerror.)
I know that the above is a bit confusing because there are a lot of different code-generation options with slightly different behaviours. You didn't specify the particular options you're using, so the answer is a bit generic.

Can't get key of object that is numeric

I'm working with an API that returns an array of objects. I can get all the keys, but two of those have numbers as keys, and I cannot get it. Give me an error.
I really dont know why I can not get it those keys.
Is there something different due to are numbers?
BTW Im using axios.

If you're using dot notation, you should change to bracket notation to access properties start by a number.
The code below uses dot notation, it throws an error
const test = {"1h" : "test value"};
console.log(test.1h); // error
Why :
In the object.property syntax, the property must be a valid JavaScript
identifier.
An identifier is a sequence of characters in the code that identifies a variable, function, or property.
In JavaScript, identifiers are case-sensitive and can contain Unicode letters, $, _, and digits (0-9), but may not start with a digit.
The code below uses bracket notation, works fine
const test = {"1h" : "test value"};
console.log(test["1h"]); // works
Why :
In the object[property_name] syntax, the property_name is just a
string or Symbol. So, it can be any string, including '1foo', '!bar!',
or even ' ' (a space).
Check out the document here

Velocity with double curly braces

Why velocity gives the following output for the string
VelocityContext vc = new VelocityContext();
vc.put("foo", "bar");
String inString = "THis is ${{foo}} and this is ${foo}.Hello and ${foo}-Hello";
StringWriter sw = new StringWriter();
ve.evaluate(vc, sw, "Tag", inString);
Output:
THis is ${{} and this is bar.Hello and bar-Hello
I was expecting it would either print ${{foo}} or {bar}, why ${{}? Would double curly act as escape character?
I'm using this under strict reference mode set as true. And I neither see an exception nor I see it print it as is and that's what is confusing me.

Well, you made me look into the code and I'm not sure if I understood it correctly. The problem seems to be that in ${...}, the xxx is treated as an ASTReference, which then gets tokenized differently than a standalone string "{bar}". Specifically, it get tokenized into 3 tokens {, bar and }. Then the engine tries to find the so-called root of the reference (in ${x}, the root is x), does not recognize the pattern and goes into a fallback reference type RUNT, which says that the first token, i.e. "{" matters. This way "{bar}" becomes "{".
In other words, the expression ${{bar}} does not make sense and Velocity fails to throw an error here. In other nonsensical combinations like ${[bar]} it actually throws an error.

Velocity Variables or VTL Identifier
Must start with an alphabetic character (a .. z or A .. Z). The rest of the characters are limited to the following types of characters:
alphabetic (a .. z, A .. Z)
numeric (0 .. 9)
hyphen ("-")
underscore ("_")
You are using Formal Reference Notation as ${varName}
${{foo}} - so velocity try to get variable {foo} which is invalid VTL Identifier so it doesn't try to load the variable.
It probably then try to reference it as a JSON map {"a":"b"} and failed again, probably only { is accepted, so you remain with:
${{}
I tested your template in new velocity 2.0 and this issue isn't reproduce (in strict or non strict mode)
Output:
THis is ${{foo}} and this is bar.Hello and bar-Hello
So you have now a reason to upgrade to velocity 2.0.

Issues of Error handling with ANTLR3

I tried error reporting in following manner.
#members{
public String getErrorMessage(RecognitionException e,String[] tokenNames)
{
List stack=getRuleInvocationStack(e,this.getClass().getName());
String msg=null;
if(e instanceof NoViableAltException){
<some code>
}
else{
msg=super.getErrorMessage(e,tokenNames);
}
String[] inputLines = e.input.toString().split("\r\n");
String line = "";
if(e.token.getCharPositionInLine()==0)
line = "at \"" + inputLines[e.token.getLine() - 2];
else if(e.token.getCharPositionInLine()>0)
line = "at \"" + inputLines[e.token.getLine() - 1];
return ": " + msg.split("at")[0] + line + "\" => [" + stack.get(stack.size() - 1) + "]";
}
public String getTokenErrorDisplay(Token t){
return t.toString();
}
}
And now errors are displayed as follows.
line 6:7 : missing CLOSSB at "int a[6;" => [var_declaration]
line 8:0 : missing SEMICOL at "int p" => [var_declaration]
line 8:5 : missing CLOSB at "get(2;" => [call]
I have 2 questions.
1) Is there a proper way to do the same thing I have done?
2) I want to replace CLOSSB, SEMICOL, CLOSB etc. with their real symbols. How can I do that using the map in .g file?
Thank you.

1) Is there a proper way to do the same thing I have done?
I don't know if there is a defined proper way of showing errors. My take on showing errors is a litmis test. If the user can figure out how to fix the error based on what you have given them then it is good. If the user is confued by the error message then the message needs more work. Based on the examples given in the question, symbols were only char constants.
My favorite way of seeing errors is with the line with an arrow pointing at the location.
i.e.
Expected closing brace on line 6.
int a[6;
^
2) I want to replace CLOSSB, SEMICOL, CLOSB etc. with their real symbols. How can I do that using the map in .g file?
You will have to read the separately generated token file and then make a map, i.e. a dictionary data structure, to translate the token name into the token character(s).
EDIT
First we have to clarify what is meant by symbol. If you limit the definition of symbol to only tokens that are defined in the tokens file with a char or string then this can be done, i.e. '!'=13, or 'public'=92, if however you chose to use the definition of symbol to be any text associated with a token, then that is something other than what I was or plan to address.
When ANTLR generates its token map it uses three different sources:
The char or string constants in the lexer
The char or string constants in the parser.
Internal tokens such as Invalid, Down, Up
Since the tokens in the lexer are not the complete set, one should use the tokens file as a starting point. If you look at the tokens file you will note that the lowest value is 4. If you look at the TokenTypes file (This is the C# version name) you will find the remaining defined tokens.
If you find names like T__ in the tokens file, those are the names ANTLR generated for the char/string literals in the parser.
If you are using string and/or char literals in parser rules, then ANTLR must create a new set of lexer rules that include all of the string and/or char literals in the parser rules. Remember that the parser can only see tokens and not raw text. So string and/or char literals cannot be passed to the parser.
To see the new set of lexer rules, use org.antlr.Tool –Xsavelexer, and then open the created grammar file. The name may be like.g . If you have string and/or char literals in your parser rules you will see lexer rules with name starting with T .
Now that you know all of the tokens and their values you can create a mapping table from the info given in the error to the string you want to output instead for the symbol.
The code at http://markmail.org/message/2vtaukxw5kbdnhdv#query:+page:1+mid:2vtaukxw5kbdnhdv+state:results
is an example.
However the mapping of the tokens can change for such things as changing rules in the lexer or changing char/string literals in the parser. So if the message all of a sudden output the wrong string for a symbol you will have to update the mapping table by hand.
While this is not a perfect solution, it is a possible solution depending on how you define symbol.
Note: Last time I looked ANTLR 4.x creates the table automatically for access within the parser because it was such a problem for so many with ANTLR 3.x.

Bhathiya wrote:
*1) Is there a proper way to do the same thing I have done?
There is no single way to do this. Note that proper error-handling and reporting is tricky. Terence Parr spends a whole chapter on this in The Definitive ANTLR Reference (chapter 10). I recommend you get hold of a copy and read it.
Bhathiya wrote:
2) I want to replace CLOSSB, SEMICOL, CLOSB etc. with their real symbols. How can I do that using the map in .g file?
You can't. For SEMICOL this may seem easy to do, but how would you get this information for a token like FOO:
FOO : (X | Y)+;
fragment X : '4'..'6';
fragment Y : 'a' | 'bc' | . ;

How to output ${expression} in Freemarker without it being interpreted?

I'm trying to use Freemarker in conjunction with jQuery Templates.
Both frameworks use dollar sign/curly brackets to identify expressions for substitution (or as they're called in freemarker, "interpolations") , e.g. ${person.name} .
So when I define a jQuery Template with expressions in that syntax, Freemarker tries to interpret them (and fails).
I've tried various combinations of escaping the ${ sequence to pass it through Freemarker to no avail - \${, \$\{, $\{, etc.
Inserting a freemarker comment in between the dollar and the curly (e.g. $<#-- -->{expression}) DOES work - but I'm looking for a more concise and elegant solution.
Is there a simpler way to get a Freemarker template to output the character sequence ${?

This should print ${person.name}:
${r"${person.name}"}
From the freemarker docs
A special kind of string literals is the raw string literals. In raw string literals, backslash and ${ have no special meaning, they are considered as plain characters. To indicate that a string literal is a raw string literal, you have to put an r directly before the opening quotation mark or apostrophe-quote

For longer sections without FreeMarker markup, use <#noparse>...</#noparse>.
Starting with FreeMarker 2.3.28, configure FreeMarker to use square bracket syntax ([=exp]) instead of brace syntax (${exp}) by setting the interpolation_syntax configuration option to square_bracket.
Note that unlike the tag syntax, the interpolation syntax cannot be specified inside the template. Changing the interpolation syntax requires calling the Java API:
Configuration cfg;
// ...
cfg.setInterpolationSyntax(SQUARE_BRACKET_INTERPOLATION_SYNTAX);
Then FreeMarker will consider ${exp} to be static text.
Do not confuse interpolation syntax with tag syntax, which also can have square_bracket value, but is independent of the interpolation syntax.
When using FreeMarker-based file PreProcessor (FMPP), either configure the setting via config.fmpp or on the command-line, such as:
fmpp --verbose --interpolation-syntax squareBracket ...
This will call the appropriate Java API prior to processing the file.
See also:
https://freemarker.apache.org/docs/dgui_misc_alternativesyntax.html
http://fmpp.sourceforge.net/settings.html#templateSyntax

Another option is to use #include with parse=false option. That is, put your jQuery Templates into the separate include page and use parse=false so that freemarker doesn't try and parse it.
This would be a good option when the templates are larger and contain double quotes.

I had to spent some time to figure out the following scenarios to escape ${expression} -
In Freemarker assignment:
<#assign var = r"${expression}">
In html attribute:
Some link
In Freemarker concatenation:
<#assign x = "something&"+r"${expression}"/>

If ${ is your only problem, then you could use the alternate syntax in the jQuery Templates plugin like this: {{= person.name}}
Maybe a little cleaner than escaping it.

Did you try $$?
I found from the Freemarker manual that ${r"${person.name}"} will print out ${person.name} without attempting to render it.
Perhaps you should also take a look at Freemarker escaping freemarker

I can confirm that the
${r"${item.id}"}
is the correct way as an example.
So I kinda full example will look like
<span> Remove </span>
and the output will be :
<span> Remove </span>

In the case when you want to use non-raw strings so that you can escape double quotes, apostrophes, etc, you can do the following:
Imagine that you want to use the string ${Hello}-"My friend's friend" inside of a string. You cannot do that with raw strings. What I have used that works is:
${"\x0024{Hello}-\"My friend's friend\""}
I have not escaped the apostrophe since I used double quotes.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas