Preg_replace solution for prepared statements - sql

I have a command class that abstracts almost all specific database functions (We have the exactly same application running on Mssql 2005 (using ODBC and the native mssql library), MySQL and Oracle. But sometimes we had some problems with our prepare method that, when executed, replaces all placeholders with their respective values. But the problem is that I am using the following:
if(is_array($Parameter['Value']))
{
$Statement = str_ireplace(':'.$Name, implode(', ', $this->Adapter->QuoteValue($Parameter['Value'])), $Statement);
}
else
{
$Statement = str_ireplace(':'.$Name, $this->Adapter->QuoteValue($Parameter['Value']), $Statement);
}
The problem arises when we have two or mer similar parameters names, for example, session_browser and session_browse_version... The first one will partially replace the last one.
Course we learned to go around specifying the parameters within a specific order, but now that I have some "free" time I want to make it better, so I am thinking on switching to preg_replace... and I am not good in regular expression, can anyone give any help with a regex to replace a string like ':parameter_name`?
Best Regards,
Bruno B B Magalhaes

You should use the \b metacharacter to match the word boundary, so you don't accidentally match a short parameter name within a longer parameter name.
Also, you don't need to special-case arrays if you coerce a scalar Value to an array of one entry:
preg_replace("/:$Name\b/",
implode(",", $this->Adapter->QuoteValue( (array) $Parameter['Value'] )),
$Statement);
Note, however, that this can make false positive matches when an identifier or a string literal contains a pattern that looks like a parameter placeholder:
SELECT * FROM ":Name";
SELECT * FROM Table WHERE column = ':Name';
This gets even more complicated when quoted identifiers and string literals can contain escaped quotes.
SELECT * FROM Table WHERE column = 'word\':Name';
You might want to reconsider interpolating variables into SQL strings during prepare, because you're defeating any benefits of prepared statements with respect to security or performance.
I understand why you're doing what you're doing, because not all RDBMS back-end supports named parameters, and also SQL parameters can't be used for lists of values in an IN() predicate. But you're creating an awfully leaky abstraction.

Related

Escaping single quotes in the PLACEHOLDER clause of a HANA SQL statement

I noticed an inconsistency in how "HANA SQL" escapes single quotes in the context of the PLACEHOLDER clause. For example, consider the following PLACEHOLDER clause snippet:
('PLACEHOLDER' = ('$$CC_PARAM$$','''foo'',''an escaped single quote \'' '''))
The PLACEHOLDER clause above contains multiple values assigned to the CC_PARAM. parameter. We can see that inside of the second argument we have a single quote that's escaped with a backslash. However, we escape the single quotes outside each argument with another single quote (i.e. we do '' instead of \''. It's possible to use the \'' format for the first case, but it's not possible to use the '' format in the second case.
Why is there this discrepancy? It makes escaping quotes in multi-input input parameters tricky. I'm looking to programmatically craft SQL queries for HANA. Am I missing something here? Is it safe to use \'' over '' in all cases? Or do I need logic that can tell where a single quote occurs and escape as appropriate?
The implicit rule here - given by how the software is implemented - is that for parameter values of calculation views, the backslash \ is used to escape the single quotation mark.
For all standard SQL string occurrences, using the single-quotation mark twice '' is the correct way to differentiate between syntax element and string literal.
As for the why:
the PLACEHOLDER syntax is not SQL, but a HANA-specific command extension. So, there is no general standard that the current implementation violates.
that given, this command extension is embedded into, respectively clamped onto the standard SQL syntax and has to be handled by the same parser.
But the parameters are not only parsed once, by the SQL parser but again by the component that instantiates the calculation scenario based on the calculation view. With a bit of squinting it's not hard to see that the parameters interface is a general key-value interface that allows for all sorts of information to be handed over to the calc. engine.
One might argue that the whole approach of providing parameters via key-value pairs is not consistent with the general SQL syntax approach and be correct. On the flip side, this approach allows for general flexibility for adding new command elements to the HANA-specific parts, without structurally changing the syntax (and with it the parser).
The clear downside of this is that both the key names, as well as the values, are string-typed. To avoid losing the required escaping for the "inner string" an escape string different from the main SQL escape string needs to be used.
And here we are with two different ways of handing over a string value to be used as a filter condition.
Funny enough, both approaches may still lead to the same query execution plan.
As a matter of fact, in many scenarios with input parameters, the string value will be internally converted into a SQL conforming form. This is the case when the input parameter is used for filtering or in expressions in the calc. view that can be converted into SQL expressions.
For example
SELECT
"AAA"
FROM "_SYS_BIC"."sp/ESC"
('PLACEHOLDER' = ('$$IP_TEST$$', 'this is a test\''s test'));
shows the following execution plan on my system
OPERATOR_NAME OPERATOR_DETAILS
PROJECT TEST.AAA
COLUMN TABLE FILTER CONDITION: TEST.AAA = 'this is a test's test'
(DETAIL: ([SCAN] TEST.AAA = 'this is a test's test'))
Note how the escape-\' has been removed.
All in all: when using PLACEHOLDER values, the \' escaping needs to be used and in all other cases, the '' escaping.
That should not be terribly difficult to implement for a query builder as you can consider this when dealing with the PLACEHOLDER syntax.

SQL injection if brackets and semicolons are filtered

I have a statement like this:
SELECT * FROM TABLE WHERE COLUMN = 123456
123456 is provided by the user so it is vulnerable to SQLi but if I strip all semicolons and brackets, is it possible for the hacker to run any other statements (like DROP,UPDATE,INSERT etc) except SELECT?
I am already using prepared statements but I am curious that if the input is stripped of the line-terminator and brackets, can the hacker modify the DB in any way?
Use sql parameters. Attempting to "sanitize" input is an extremely bad idea. Try googling some complex sql injection snippets, you won't believe how creative black hat hackers are.
In general it's very difficult to be 100% certain that you are safe from this type of attack by trying to strip out specific characters - there are just too many ways to get around your code (by using character encodings etc.)
A better option is to pass parameters to a stored procedure, like this:
CREATE PROCEDURE usp_MyStoredProcedure
#MyParam int
AS
BEGIN
SELECT * FROM TABLE WHERE COLUMN = #MyParam
END
GO
That way SQL will treat the value passed in as a parameter, and nothing else, no matter what it contains. And in this case it would only accept a value of type int anyway.
If you don't want, or can't, use a stored procedure, then I'd suggest changing your code so that the input parameter can only contain a pre-defined list of characters - in this case numeric characters. That way you can be certain that the value is safe to use.

Lowercase or uppercase for SQL Statements

Is there a fundamental difference between these statements
"select * from users where id = 1"
OR
"SELECT* FROM users WHERE id = 1"
OR
'select * from users where id = 1'
OR
'SELECT * FROM users WHERE id = 1'
I was just wondering... I always used the 2nd method, but some collegues are bound to use other methods. What is the most common convention?
You have two separate issues:
Whether you use uppercase or lowercase for SQL keywords. This is a matter of taste. Documentation usually uses uppercase, but I suggest you just agree on either in the particular project.
How to quote SQL in your host language. Since strings are quoted with single quotes (') in SQL, I suggest you use double quotes (") to quote SQL statements from your host language (or tripple quotes (""") if your host language is python). SQL does use double quotes, but only for names of tables, columns and functions that contain special characters, which you can usually avoid needing.
In C/C++ there is a third option for quoting. If your compiler supports variadic macros (introduced in C99), you can define:
#define SQL(...) #__VA_ARGS__
and use SQL like SQL(SELECT * FROM users WHERE id = 1). The advantage is that this way the string can span multiple lines, while quoted strings must have each line quoted separately. The tripple quotes in python serve the same purpose.
There is no difference in terms of how it will work, but usually I put SQL keywords in uppercase so it would be:
SELECT * FROM users WHERE id = 1
No.
There is no difference. It's all about your habits.
It's like to ask is there any differences between these code blocks in C++:
int main() {
}
OR
int
main()
{
}
OR
int main()
{
}
Anyways the most common way to write SQL statements is to write keywords in UPPERCASE.
There is no fundamental difference between either of the above mentioned. However, the official SQL standards up until and including the latest, SQL:2008, have always used upper case to distinguish fields from operators.

Odd formatting in SQL Injection exploits?

I have been trying to learn more about SQL Injection attacks. I understand the principle, but when I actually look at some of these attacks I don't know how they are doing what they are doing.
There often seem to be uneven quotes, and a lot of obfuscation via HEX characters.
I don't get the HEX characters..., surely they are translated back to ASCII by the browser, so what is the point?
However, I am mainly confused by the odd quoting. I am having trouble finding an example right now, however it usually seems that the quoting will end at some point before the end of the statement, when I would have thought it would be at the end?
Perhaps an example is the common use of '1 or 1=1
What is such a statement doing?
I don't get the HEX characters...,
surely they are translated back to
ASCII by the browser
Nope.
However, I am mainly confused by the
odd quoting. I am having trouble
finding an example right now, however
it usually seems that the quoting will
end at some point before the end of
the statement, when I would have
thought it would be at the end?
Imagine you're building inline SQL instead of using parameter substitution as you should. We'll use a make-believe language that looks a lot like PHP for no particular reason.
$sql = "delete from foo where bar = '" + $param + "'";
So, now, imagine that $param gets set by the browser as such...
$param = "' or 1=1 --"
(We're pretending like -- is the SQL comment sequence here. If it's not there's ways around that too)
So, now, what is your SQL after the string substitution is done?
delete from foo where bar = '' or 1=1 --'
Which will delete every record in foo.
This has been purposefully simple, but it should give you a good idea of what the uneven quotes are for.
Let's say that we have a form where we submit a form with a name field. name was used in a variable, $Name. You then run this query:
INSERT INTO Students VALUES ( '$Name' )
It will be translated into:
INSERT INTO Students VALUES ( 'Robert' ); DROP TABLE STUDENTS; --')
The -- is a comment delimiter. After that everything will be ignored. The ' is used to delimit string literals.
To use hex chars in an attack has some reasons. One of the is the obfuscation, other one to bypass some naive security measures.
There are cases where quote marks are prohibited with sql injection. In this case an attacker has to use an encoding method, such as hex encoding for his strings. For instance '/etc/passwd' can be written as0x2f6574632f706173737764 which doesn't require quote marks. Here is an example of a vulnerable query where quote marks are prohibited. :
mysql_query("select name from users where id=".addslashes($_GET[id]));
If you want to use a mysql function like load_file(), then you have to use hex encoding.
PoC:/vuln.php?id=1 union select load_file(0x2f6574632f706173737764)
In this case /etc/passwd is being read and will be the 2nd row.
Here is a variation on the hex encode function that I use in my MySQL SQL Injection exploits:
function charEncode($string){
$char="char(";
$size=strlen($string);
for($x=0;$x<$size;$x++){
$char.=ord($string[$x]).",";
}
$char[strlen($char)-1]=")%00";
return $char;
}
I use this exact method for exploiting HLStats 1.35. I have also used this function in my php nuke exploit to bypass xss filters for writing <?php?> to the disk using into outfile. Its important to note that into outfile is a query operator that does not accept the output to a function or a hex encoded string, it will only accept a quoted string as a path, thus in the vulnerable query above into outfile cannot be used by an attacker. Where as load_file() is a function call and a hex encoding can be used.
As for the uneven quoting; sql injection occurs where the coder didn't sanitize user input for dangerous characters such as single quotes - consider following statement
SELECT * FROM admin WHERE username='$_GET["user"]' and password='$_GET["pass"]'
if I know that valid user is 'admin' and insert the 'or 1=1 I will get following
SELECT * FROM admin WHERE username='admin' and password='something' or 1=1
This will result in returning the query alway true, because the left side of the expression will always be true, regardless of the value of the password.
This is the most simple example of sql injection, and ofter you will find that the attacker won't need to use the quote at all, or maybe comment the rest of the query out with comment delimiter such as -- or /*, if there are more parameters passed after the injection point.
As for the HEX encoding, there may be several reasons, avoiding filtering, it is simply easier to use hex encoded values, because you don't need to worry about quoting all your values in a query.
This is for instance useful if you want to use concat to co-notate two fields together like so:
inject.php?id=1 and 1=0 union select 1,2,concat(username,0x3a3a,password) from admin
Which would providing 3rd row is visible one, return for isntance admin::admin. If I didn't use hex encoding, I would have to do this:
inject.php?id=1 and 1=0 union select 1,2,concat(username,'::',password) from admin
This could be problem with aforementioned addslashes function, but also with poorly written regex sanitization functions or if you have very complicated query.
Sql injection is very wide topic, and what I've covered is hardly even an introduction through.

Regex for parsing SQL parameters

If I have a query such as SELECT * from authors where name = #name_param, is there a regex to parse out the parameter names (specifically the "name_param")?
Thanks
This is tricky because params can also occur inside quoted strings.
SELECT * FROM authors WHERE name = #name_param
AND string = 'don\'t use #name_param';
How would the regular expression know to use the first #name_param but not the second?
It's a problem that can be solved, but it's not practical to do it in a single regular expression. I had to handle this in Zend_Db, and what I did was first strip out all quoted strings and delimited identifiers, and then you can use regular expressions on the remainder.
You can see the code, because it's open-source.
See functions _stripQuoted() and _parseParameters().
https://github.com/zendframework/zf1/blob/136735e776f520b081cd374012852cb88cef9a88/library/Zend/Db/Statement.php#L200
https://github.com/zendframework/zf1/blob/136735e776f520b081cd374012852cb88cef9a88/library/Zend/Db/Statement.php#L140
Given you have no quoted strings or comments with parameters in them, the required regex would be quite trivial:
#([_a-zA-Z]+) /* match group 1 contains the name only */
I go with Bill Karwin's recommendation to be cautious, knowing that the naïve approach has its pitfalls. But if you kow the data you deal with, this regex would be all you need.