multiple return values in ANTLR - arraylist

I use ANTLR4 with Java and I would like to store the values that a rule returns while parses the input. I use a grammar like this:
db : 'DB' '(' 'ID' '=' ID ',' query* ')'
{
System.out.println("creating db");
System.out.println("Number of queries -> "+$query.qs.size());
}
;
query returns [ArrayList<Query> qs]
#init
{
$qs = new ArrayList<Query>();
}
: 'QUERY' '(' 'ID' '=' ID ',' smth ')'
{
System.out.println("creating query with id "+$ID.text);
Query query = new Query();
query.setId($ID.text);
$qs.add(query);
}
;
but what happens is that the Number of queries printed ($query.qs size) is always one. This happens because each time a QUERY element is recognized at input it is added to the $qs ArrayList, but for each other QUERY a new ArrayList is instantiated and this query is added to this new ArrayList. When all the queries are recognized then the action for the db : rule is invoked, but the $query.qs ArrayList has only the last query. I solved this problem by maintaining global ArrayLists that store the queries. But, is there another way to do it with ANTLR while the rules are returning, and not having my own global ArrayLists?
Many thanks in advance,
Dimos.

Well, the problem is resolved. I just added the ArrayList to the db rule like this:
db [ArrayList queries] : 'DB' ....
and then at query rule:
$db::queries.add(query)
So, everything is fine!
Thanks for looking, anyway!

Related

tdbc::tokenize documentation and use

I'm using tdbc::odbc to connect to a Pervasive (btrieve type) database, and am unable to pass variables to the driver. A short test snippet:
set customer "100000"
set st [pvdb prepare {
INSERT INTO CUSTOMER_TEMP_EMPTY
SELECT * FROM CUSTOMER_MASTER
WHERE CUSTOMER = :customer
}]
$st execute
This returns:
[Pervasive][ODBC Client Interface]Parameter number out of range.
(binding the 'customer' parameter)
Works fine if I replace :customer with "100000", and I have tried using a variable with $, #, wrapping in apostrophes, quotes, braces. I believe that tdbc::tokenize is the answer I'm looking for, but the man page gives no useful information on its use. I've experimented with tokenize with no progress at all. Can anyone comment on this?
The tdbc::tokenize command is a helper for writing TDBC drivers. It's used for working out what bound variables are inside an SQL string so the binding map can be supplied to the low level driver or, in the case of particularly stupid drivers, string substitutions performed (I hope there's no drivers that need to do this; it'd be annoyingly difficult to get right). The parser knows enough to handle weird cases like things that look like bound variables in strings and comments (those aren't bound variables).
If we feed it (it's calling syntax is trivial) the example SQL you've got, we get this result:
{
INSERT INTO CUSTOMER_TEMP_EMPTY
SELECT * FROM CUSTOMER_MASTER
WHERE CUSTOMER = } :customer {
}
That's a list of three items (the last element just has a newline in it) that simplifies processing a lot; each item is either trivially a bound variable or trivially not.
Other examples (bear in mind in the second case that bound variables may also start with $ or #):
% tdbc::tokenize {':abc' = :abc = ":abc" -- :abc}
{':abc' = } :abc { = ":abc" -- :abc}
% tdbc::tokenize {foo + $bar - #grill}
{foo + } {$bar} { - } #grill
% tdbc::tokenize {foo + :bar + [:grill]}
{foo + } :bar { + [:grill]}
Note that the tokenizer does not fully understand SQL! It makes no attempt to parse the other bits; it's just looking for what is a bound variable.
I've no idea what use the tokenizer could be to you if you're not writing a DB driver.
Still could not get the driver to accept the variable, but looking at your first example of the tokenized return, I came up with:
set customer "100000"
set v [tdbc::tokenize "$customer"]
set query "INSERT INTO CUSTOMER_TEMP_EMPTY SELECT * FROM CUSTOMER_MASTER WHERE CUSTOMER = $v"
set st [pvdb prepare $query]
$st execute
as a test command, and it did indeed successfully pass the command through the driver

Select $id from ids field containg $id1,$id2,$id3

My model in Laravel has a linked_ids string field like this:
echo $model->linked_ids
1,2,3,4,5
I want to make a query that gets me all records with a given id in linked_ids.
Currently I have:
Model::where('linked_ids', 'LIKE', '%' . $model->id . '%');
but this selects me more than I want to (if ex: $model->id is 3 => selects: 1,32,67)\
How can I avoid this since I don't know what position the id will be nor will the ids be ordered? I would like to do this in eloquent but can also use something like DB::raw() to run sql queries.
Bad way to keep your ids but if you really can't change it, you could take advantage of LazyCollections and filter with php.
I'm sure there's a way to do it directly in MySQL (or whatever dbms you're using) but this is what I have.
$id = 3;
Model::cursor()
->filter(function ($model) use ($id) {
return in_array($id, explode(',', $model->linked_ids));
})
// then chain one of these methods
->first(); // returns the first match or null
->collect(); // returns an Illuminate\Support\Collection of the results after the filtering
->all(); // returns an array of Models after the filtering
->toArray(); // returns an array and transforms the models to arrays as well.
->toJson(); // returns a json string
Take notice that this will still do a SELECT * FROM table without any filtering (unless you chain some where methods before cursor() but it won't load any model into memory (which is usually the bottleneck for big queries in Laravel)

ANTLR4 Correctly continuing to parse sections after error

I'm trying to write some tooling (validation/possibly autocomplete) for a SQL-esk query language. However, parser is tokenizing invalid/incomplete inputs in a way that is making it more difficult to work with.
I've reduce my scenario to its simplest reproducible form. Here is my minimized grammar:
grammar SOQL;
WHITE_SPACE : ( ' '|'\r'|'\t'|'\n' ) -> channel(HIDDEN) ;
FROM : 'FROM' ;
SELECT : 'SELECT' ;
/********** SYMBOLS **********/
COMMA : ',' ;
ID: ( 'A'..'Z' | 'a'..'z' | '_' | '$') ( 'A'..'Z' | 'a'..'z' | '_' | '$' | '0'..'9' )* ;
soql_query: select_clause from_clause;
select_clause: SELECT field ( COMMA field )*;
from_clause: FROM table;
field : ID;
table : ID;
When I run the following code (using antlr4ts, but it should be similar to any other port):
const input = 'SELECT ID, Name, Website, Contact, FROM Account'; //invalid trailing ,
let inputStream = new ANTLRInputStream(input);
let lexer = new SOQLLexer(inputStream);
let tokenStream = new CommonTokenStream(lexer);
let parser = new SOQLParser(tokenStream);
let qry = parser.soql_query();
let select = qry.select_clause();
console.log('FIELDS: ', select.field().map(field => field.text));
console.log('FROM: ', qry.from_clause().text);
Console Log
line 1:35 extraneous input 'FROM' expecting ID
line 1:47 mismatched input '<EOF>' expecting 'FROM'
FIELDS: Array(5) ["ID", "Name", "Website", "Contact", "FROMAccount"]
FROM:
I get errors (which is expected), but I was hoping it would still be able to correctly pick out the FROM clause.
It was my understanding since FROM is a identifier, it's not a valid field in the select_clause (maybe I'm just misunderstanding)?
Is there some way to setup the grammar or parser so that it will continue on to properly identify the FROM clause in this scenario (and other common WIP query states).
It was my understanding since FROM is a identifier, it's not a valid
field in the select_clause (maybe I'm just misunderstanding)?
All the parser sees is a discrete stream of typed tokens coming from the lexer. The parser has no intrinsic way to tell if a token is intended to be an identifier, or for that matter, have any particular semantic nature.
In designing a fault-tolerant grammar, plan the parser to be fairly permissive to syntax errors and expect to use several tree-walkers to progressively identify and where possible resolve the syntax and semantic ambiguities.
Two ANTLR features particularly useful to this end include:
1) implement a lexer TokenFactory and custom token, typically extending CommonToken. The custom token provides a convenient space for flags and logic for identifying the correct syntactic/semantic use and expected context for a particular token instance.
2) implement a parser error strategy, extending or expanding on the DefaultErrorStrategy. The error strategy will allow modest modifications to the parser operation on the token stream when an attempted match results in a recognition error. If the error cannot be fully resolved and appropriately fixed upon examining the surrounding (custom) tokens, at least those same custom tokens can be suitably annotated to ease problem resolution during the subsequent tree-walks.

ANTLR ambigous reference - how to get output?

So I have a rule for statement which can lead to more statements:
statement returns[String txt]
: '{'{
$txt="{";
}
(statement{
$txt+=$statement.txt;
})*
'}'{
$txt+="}";
}
| ... //more rules // ...
;
I am getting
reference $statement is ambiguous; rule statement is enclosing rule and referenced in the production (assuming enclosing rule)
but don't know how to resolve it. Somehow I would need to tell ANTLR that I need the return txt of statement inside parent statement. Please help me out :)
If you use $statement, ANTLR doesn't know if you mean the rule itself, or the statement inside ( ... )*.
Try something like this:
statement returns[String txt]
: '{'{
$txt="{";
}
(s=statement{
$txt+=$s.txt;
})*
'}'{
$txt+="}";
}
| ...
;

SQL Injection: is this secure?

I have this site with the following parameters:
http://www.example.com.com/pagination.php?page=4&order=comment_time&sc=desc
I use the values of each of the parameters as a value in a SQL query.
I am trying to test my application and ultimately hack my own application for learning purposes.
I'm trying to inject this statement:
http://www.example.com.com/pagination.php?page=4&order=comment_time&sc=desc' or 1=1 --
But It fails, and MySQL says this:
Warning: mysql_fetch_assoc() expects parameter 1 to be resource,
boolean given in /home/dir/public_html/pagination.php on line 132
Is my application completely free from SQL injection, or is it still possible?
EDIT: Is it possible for me to find a valid sql injection statement to input into one of the parameters of the URL?
The application secured from sql injection never produces invalid queries.
So obviously you still have some issues.
Well-written application for any input produces valid and expected output.
That's completely vulnerable, and the fact that you can cause a syntax error proves it.
There is no function to escape column names or order by directions. Those functions do not exist because it is bad style to expose the DB logic directly in the URL, because it makes the URLs dependent on changes to your database logic.
I'd suggest something like an array mapping the "order" parameter values to column names:
$order_cols = array(
'time' => 'comment_time',
'popular' => 'comment_score',
... and so on ...
);
if (!isset($order_cols[$_GET['order'])) {
$_GET['order'] = 'time';
}
$order = $order_cols[$_GET['order']];
Restrict "sc" manually:
if ($_GET['sc'] == 'asc' || $_GET['sc'] == 'desc') {
$order .= ' ' . $_GET['sc'];
} else {
$order .= ' desc';
}
Then you're guaranteed safe to append that to the query, and the URL is not tied to the DB implementation.
I'm not 100% certain, but I'd say it still seems vulnerable to me -- the fact that it's accepting the single-quote (') as a delimiter and then generating an error off the subsequent injected code says to me that it's passing things it shouldn't on to MySQL.
Any data that could possibly be taken from somewhere other than your application itself should go through mysql_real_escape_string() first. This way the whole ' or 1=1 part gets passed as a value to MySQL... unless you're passing "sc" straight through for the sort order, such as
$sql = "SELECT * FROM foo WHERE page='{$_REQUEST['page']}' ORDER BY data {$_REQUEST['sc']}";
... which you also shouldn't be doing. Try something along these lines:
$page = mysql_real_escape_string($_REQUEST['page']);
if ($_REQUEST['sc'] == "desc")
$sortorder = "DESC";
else
$sortorder = "ASC";
$sql = "SELECT * FROM foo WHERE page='{$page}' ORDER BY data {$sortorder}";
I still couldn't say it's TOTALLY injection-proof, but it's definitely more robust.
I am assuming that your generated query does something like
select <some number of fields>
from <some table>
where sc=desc
order by comment_time
Now, if I were to attack the order by statement instead of the WHERE, I might be able to get some results... Imagine I added the following
comment_time; select top 5 * from sysobjects
the query being returned to your front end would be the top 5 rows from sysobjects, rather than the query you try to generated (depending a lot on the front end)...
It really depends on how PHP validates those arguments. If MySQL is giving you a warning, it means that a hacker already passes through your first line of defence, which is your PHP script.
Use if(!preg_match('/^regex_pattern$/', $your_input)) to filter all your inputs before passing them to MySQL.