Lex Yacc, should i tokenize character literals?

Lex Yacc, should i tokenize character literals? - yacc

I know, poorly worded question not sure how else to ask though.
I always seem to end up in the error branch regardless of what i'm entering and can't figure out where i'm screwing this up. i'm using a particular flavor of Lex/YACC called GPPG which just sets this all up for use with C#
Here is my Y
method : L_METHOD L_VALUE ')' { System.Diagnostics.Debug.WriteLine("Found a method: Name:" + $1.Data ); }
| error { System.Diagnostics.Debug.WriteLine("Not valid in this statement context ");/*Throw new exception*/ }
;
here's my Lex
\'[^']*\' {this.yylval.Data = yytext.Replace("'",""); return (int)Tokens.L_VALUE;}
[a-zA-Z0-9]+\( {this.yylval.Data = yytext; return (int)Tokens.L_METHOD;}
The idea is that i should be able to pass
Method('value') to it and have it properly recognize that this is correct syntax
ultimately the plan is to execute the Method passing the various parameters as values
i've also tried several derivations. for example:
method : L_METHOD '(' L_VALUE ')' { System.Diagnostics.Debug.WriteLine("Found a method: Name:" + $1.Data ); }
| error { System.Diagnostics.Debug.WriteLine("Not valid in this statement context: ");/*Throw new exception*/ }
;
\'[^']*\' {this.yylval.Data = yytext.Replace("'",""); return (int)Tokens.L_VALUE;}
[a-zA-Z0-9]+ {this.yylval.Data = yytext; return (int)Tokens.L_METHOD;}

You need a lex rule to return the punctuation tokens 'as-is' so that the yacc grammar can recognize them. Something like:
[()] { return *yytext; }
added to your second example should do the trick.

Related

Do strings need to be escaped inside parametrized queries?

I'm discovering Express by creating a simple CRUD without ORM.
Issue is, I'm not able to find any record through the Model.findBy() function
model User {
static async findBy(payload) {
try {
let attr = Object.keys(payload)[0]
let value = Object.values(payload)[0]
let user = await pool.query(
`SELECT * from users WHERE $1::text = $2::text LIMIT 1;`,
[attr, value]
);
return user.rows; // empty :-(
} catch (err) {
throw err
}
}
}
User.findBy({ email: 'foo#bar.baz' }).then(console.log);
User.findBy({ name: 'Foo' }).then(console.log);
I've no issue using psql if I surround $2::text by single quote ' like:
SELECT * FROM users WHERE email = 'foo#bar.baz' LIMIT 1;
Though that's not possible inside parametrized queries. I've tried stuff like '($2::text)' (and escaped variations), but that looks far from what the documentation recommends.
I must be missing something. Is the emptiness of user.rows related to the way I fetch attr & value ? Or maybe, is some kind of escape required when passing string parameters ?
"Answer":
As stated in the comment section, issue isn't related to string escape, but to dynamic column names.
Column names are not identifiers, and therefore cannot be dynamically set using a query parameter.
See: https://stackoverflow.com/a/50813577/11509906

Is there a way to get back source code from antlr4ts parse tree after modifications ctx.removeLastChild/ctx.addChild? [duplicate]

I want to keep white space when I call text attribute of token, is there any way to do it?
Here is the situation:
We have the following code
IF L > 40 THEN;
ELSE
IF A = 20 THEN
PUT "HELLO";
In this case, I want to transform it into:
if (!(L>40){
if (A=20)
put "hello";
}
The rule in Antlr is that:
stmt_if_block: IF expr
THEN x=stmt
(ELSE y=stmt)?
{
if ($x.text.equalsIgnoreCase(";"))
{
WriteLn("if(!(" + $expr.text +")){");
WriteLn($stmt.text);
Writeln("}");
}
}
But the result looks like:
if(!(L>40))
{
ifA=20put"hello";
}
The reason is that the white space in $stmt was removed. I was wondering if there is anyway to keep these white space
Thank you so much
Update: If I add
SPACE: [ ] -> channel(HIDDEN);
The space will be preserved, and the result would look like below, many spaces between tokens:
IF SUBSTR(WNAME3,M-1,1) = ')' THEN M = L; ELSE M = L - 1;

This is the C# extension method I use for exactly this purpose:
public static string GetFullText(this ParserRuleContext context)
{
if (context.Start == null || context.Stop == null || context.Start.StartIndex < 0 || context.Stop.StopIndex < 0)
return context.GetText(); // Fallback
return context.Start.InputStream.GetText(Interval.Of(context.Start.StartIndex, context.Stop.StopIndex));
}
Since you're using java, you'll have to translate it, but it should be straightforward - the API is the same.
Explanation: Get the first token, get the last token, and get the text from the input stream between the first char of the first token and the last char of the last token.

#Lucas solution, but in java in case you have troubles in translating:
private String getFullText(ParserRuleContext context) {
if (context.start == null || context.stop == null || context.start.getStartIndex() < 0 || context.stop.getStopIndex() < 0)
return context.getText();
return context.start.getInputStream().getText(Interval.of(context.start.getStartIndex(), context.stop.getStopIndex()));
}

Looks like InputStream is not always updated after removeLastChild/addChild operations. This solution helped me for one grammar, but it doesn't work for another.
Works for this grammar.
Doesn't work for modern groovy grammar (for some reason inputStream.getText contains old text).
I am trying to implement function name replacement like this:
enterPostfixExpression(ctx: PostfixExpressionContext) {
// Get identifierContext from ctx
...
const token = CommonTokenFactory.DEFAULT.createSimple(GroovyParser.Identifier, 'someNewFnName');
const node = new TerminalNode(token);
identifierContext.removeLastChild();
identifierContext.addChild(node);
UPD: I used visitor pattern for the first implementation

How to get the full user-written statements (including the spaces) in ANTLR

I have a "statement" definition from the Java language definition as follows.
statement
: block
| ASSERT expression (':' expression)? ';'
| 'if' parExpression statement ('else' statement)?
| 'for' '(' forControl ')' statement
| 'while' parExpression statement
| 'do' statement 'while' parExpression ';'
| 'try' block
( catches 'finally' block
| catches
| 'finally' block
)
| 'switch' parExpression switchBlock
| 'synchronized' parExpression block
| 'return' expression? ';'
| 'throw' expression ';'
| 'break' Identifier? ';'
| 'continue' Identifier? ';'
| ';'
| statementExpression ';'
| Identifier ':' statement
;
When doing the parser, i want to print the full user-written statements also (inculding the spaces in the statements), such as:
Object o = Ma.addToObj(r1);
if(h.isFull() && !h.contains(true)) h.update(o);
But when i use the function "getText()" in "exitStatement", i can only get the statements with all the spaces been deleted, such as:
Objecto=Ma.addToObj(r1);
if(h.isFull()&&!h.contains(true))h.update(o);
How can i get the full user-written statements (inculding the spaces in the statements) in a easy way? Thanks a lot!
The full codes as follows:
public class PrintStatements {
public static class GetStatements extends sdlParserBaseListener {
StringBuilder statements = new StringBuilder();
public void exitStatement(sdlParserParser.StatementContext ctx){
statements.append(ctx.getText());
statements.append("\n");
}
}
public static void main(String[] args) throws Exception{
String inputFile = null;
if ( args.length>0 ) inputFile = args[0];
InputStream is = System.in;
if ( inputFile!=null ) {
is = new FileInputStream(inputFile);
}
ANTLRInputStream input = new ANTLRInputStream(is);
sdlParserLexer lexer = new sdlParserLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
sdlParserParser parser = new sdlParserParser(tokens);
ParseTree tree = parser.s();
// create a standard ANTLR parse tree walker
ParseTreeWalker walker = new ParseTreeWalker();
// create listener then feed to walker
GetStatements loader = new GetStatements();
walker.walk(loader, tree); // walk parse tree
System.out.println(loader.statements.toString());
}
}

I've solved this problem by using tokens.getText() in the upper level of the statement, like this:
public void exitE(sdlParserParser.EContext ctx) {
TokenStream tokens = parser.getTokenStream();
String Stmt = null;
Stmt = tokens.getText(ctx.statement());
...
}

I'm pretty new with ANTLR, so maybe i'm wrong with something...
I don't know easy way to do this but you can try something like this.
In your grammar file you probably have something like this:
WS : (' '|'\r'|'\t'|'\u000C'|'\n')
{
if (!preserveWhitespacesAndComments) {
skip();
} else {
$channel = HIDDEN;
}
}
This lexer rule tells parser to ignore whitespaces. More exactly this tokens are sent on HIDDEN channel (parser don't see them). If you comment this lines of code
WS : (' '|'\r'|'\t'|'\u000C'|'\n')
{
if (!preserveWhitespacesAndComments) {
// skip();
} else {
// $channel = HIDDEN;
}
}
all whitespaces will be sent to parser but then you need to rewrite parser rules so he can expect whitespaces at any place.
Object(EXPECT WHITESPACE)o(EXPECT WHITESPACE)=(EXPECT WHITESPACE)Ma.addToObj(r1);
Otherwise parser will report errors.

You need one of two things:
The ability to take file position data for the first and last tokens accepted by a statement parse (either the lexemes or the tree nodes should do), and go to the source file, and extract the text. That will get you the original whitespace.
A prettyprinter, which will regenerate text from the AST, inserting appropriate whitespacing. See my SO answer on how to build a prettyprinter here.

in terms of Antlr4 and Python3, the code looks as follows:
def exitSomeDecl(self, ctx: yourParser.SomeDeclContext):
start_index = ctx.start.tokenIndex
stop_index = ctx.stop.tokenIndex
user_text = self.token_stream.getText(interval=(start_index, stop_index))
here, the self.token_stream: CommonTokenStream is assigned during init:
input_stream = FileStream(file_name)
lexer = sdplLexer(input_stream)
token_stream = CommonTokenStream(lexer)

bindValue and bindParam in mysqli and PDO ignore variable type

I'm having problems in understanding a part of the meaning of binding certain variable types in PDO and mysqli if the type given, in my case, seems to be meaningless. In the following code, the type bound (like i or s) gets ignored. The table row "wert_sortierung" in the database is INT(11). Regardingless if $val_int is really integer or not and if I bind it via i,s / PDO::PARAM_INT or _STR, the query always works, no break, no error or warning, that the types in the binding and database or variable itself don't fit.
<?
class PDOTest {
protected $pdo;
function __construct(){
$usr="usr";
$pwd="pwd";
$host="localhost";
$db="db";
$val_int="I'm a string";
$val_str="OP";
$querystring="SELECT wert_langtext FROM TB_wert WHERE wert_sortierung = ? AND wert_CD = ?";
try {
$db_info = "mysql:host=$host;dbname=$db"; // usually provided via require_once and during construction
$this->pdo = new PDO($db_info, $usr, $pwd);
$this->pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$this->pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$stmt = $this->pdo->prepare($querystring);
$stmt->bindValue(1,$val_int,PDO::PARAM_INT);
$stmt->bindValue(2,$val_str,PDO::PARAM_STR);
$stmt->execute();
$row_return = $stmt->fetchAll(PDO::FETCH_ASSOC);
$this->varprint($row_return);
$this->pdo = NULL;
}
catch (PDOException $ex) {
printf ('Es spricht:');
$this->printerror("Fehla! (" . $ex->getMessage() . ")");
$this->pdo = NULL;
exit();
}
printf("<br />-------<br />");
//Added for comparison
$mysqli = new mysqli($host, $usr, $pwd, $db);
$m_stmt = $mysqli->prepare($querystring);
$m_stmt->bind_param('is',$val_int, $val_str);
$m_stmt->execute();
$m_stmt->bind_result($row_return);
$m_stmt->fetch();
$this->varprint($row_return);
$m_stmt->close();
$mysqli->close();
}
private function printerror($txt) {
printf("<p><font color=\"#ff0000\">%s</font></p>\n",
htmlentities($txt));
}
private function varprint($var) {
echo "<br />";
echo "<pre>";
echo var_dump($var);
echo "</pre>";
}
}
new PDOTest();
?>
Please can anyone point out my error in reasoning.

It is actually Mysql's loose-typing that that deceived you.
As a matter of fact, regular Mysql queries can accept strings for the numberic values all right:
SELECT wert_langtext FROM TB_wert WHERE wert_sortierung = '1' AND wert_CD = '1';
while prepared statement just following this behavior.
However, a contrary situation is not that harmless. Addressing a string value with a number
SELECT wert_langtext FROM TB_wert WHERE wert_sortierung = 1;
will cause infinite number of warnings in case of wert_sortierung being of string type and some unexpected behavior, like matching for ALL the rows.
So, as a general advise I would suggest to always use 's' by default. The only drawback is PDO's emulated prepares and it can be easily worked around.
So, to answer your question explicitly - prepared statements just allow the same behavior as regular queries, adding nothing to it. Everything that possible with a regular query, is possible with prepared statement as well. And no, binding do not validate your data (however it should. Imn my class I test integer placeholders and throw an exception if no numeric value given)

getting user defined error messages using antlr3

numberrange returns [String value]
: numberrangesub
{
String numberRange = ($numberrangesub.text);
String [] v = numberRange.split(",");
if ( Integer.parseInt(v[0].trim()) < Integer.parseInt(v[1].trim())) $value =numberRange;
else throw new RecognitionException();
}
;
Please observe the above ANTLR code. In this I want to throw a user friendly error message like "from value should be less than to value in BETWEEN clause".
I am expecting like this RecognitionException("from value should be less than to value in BETWEEN clause"); But antlr did not accept like as above.
In java class where I am calling the generated java class by Antlr. I am handling like as follows.
try
{
parser.numberRangeCheck();
}
catch (RecognitionException e)
{
throw createException("Invalid Business logic syntax at " + parser.getErrorHeader(e) + ", " + parser.getErrorMessage(e, null), Level.INFO, logger);
}
Any help will be appriciated.

Why not simply throw a RuntimeException with your custom error message?
// ...
else throw new RuntimeException("from value should be less than to value in BETWEEN clause");
// ...

As Terrance wrote in "The Deﬁnitive ANTLR Reference" error chapter excerpt:
To avoid forcing English-only error messages and to generally make
things as ﬂexible as possible, the recognizer does not create exception
objects with string messages. Instead, it tracks the information necessary to generate an error.
So there is no error message supplied to RecognitionError's constructor. But you can define additional field of your recognizer to hold user-friendly error message shown on RecognitionError handling:
numberrange returns [String value]
: numberrangesub
{
String numberRange = ($numberrangesub.text);
String [] v = numberRange.split(",");
if ( Integer.parseInt(v[0].trim()) < Integer.parseInt(v[1].trim()))
$value = numberRange;
else {
this.errorMessage = "from value should be less than to value in BETWEEN clause";
throw new RecognitionException(this.input);
}
}
;
And then override the getErrorMessage method:
public String getErrorMessage(RecognitionException e, String[] tokenNames) {
String msg = this.errorMessage;
// ...
}
This works similar to paraphrase mechanism explained in the same excerpt.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lex Yacc, should i tokenize character literals? - yacc

You need a lex rule to return the punctuation tokens 'as-is' so that the yacc grammar can recognize them. Something like: [()] { return *yytext; } added to your second example should do the trick.

Related

Do strings need to be escaped inside parametrized queries?

Is there a way to get back source code from antlr4ts parse tree after modifications ctx.removeLastChild/ctx.addChild? [duplicate]

How to get the full user-written statements (including the spaces) in ANTLR

bindValue and bindParam in mysqli and PDO ignore variable type

getting user defined error messages using antlr3

Categories

Resources