How can you query a SQL database for malicious or suspicious data? - sql

Lately I have been doing a security pass on a PHP application and I've already found and fixed one XSS vulnerability (both in validating input and encoding the output).
How can I query the database to make sure there isn't any malicious data still residing in it? The fields in question should be text with allowable symbols (-, #, spaces) but shouldn't have any special html characters (<, ", ', >, etc).
I assume I should use regular expressions in the query; does anyone have prebuilt regexes especially for this purpose?

If you only care about non-alphanumerics and it's SQL Server you can use:
SELECT *
FROM MyTable
WHERE MyField LIKE '%[^a-z0-9]%'
This will show you any row where MyField has anything except a-z and 0-9.
EDIT:
Updated pattern would be: LIKE '%[^a-z0-9!-# ]%' ESCAPE '!'
I had to add the ESCAPE char since you want to allow dashes -.

For the same reason that you shouldn't be validating input against a black-list (i.e. list of illegal characters), I'd try to avoid doing the same in your search. I'm commenting without knowing the intent of the fields holding the data (i.e. name, address, "about me", etc.), but my suggestion would be to construct your query to identify what you do want in your database then identify the exceptions.
Reason being there are just simply so many different character patterns used in XSS. Take a look at the XSS Cheat Sheet and you'll start to get an idea. Particularly when you get into character encoding, just looking for things like angle brackets and quotes is not going to get you too far.

Related

No space before ORDER BY - Why does this work?

I came across some SQL in an application which had no space before the "ORDER BY" clause. I was surprised that this even works.
Given a table of numbers, called [counter] where there is simply one column, counter_id that is an incrementing list of integers this SQL works fine in Microsoft SQL Server 2012
select
*
FROM [counter] c
where c.counter_id = 1000ORDER by counter_id
This also works with strings, e.g.:
WHERE some_string = 'test'ORDER BY something
My question is, are there any potential pitfalls or dangers with this query? And conversely, are there any benefits? Other than saving, what, 8 bits of network traffic for that whitespace (whcih may well be a consideration in some applications)
Let me explain the reason why this works with numbers and strings.
The reason is because numbers cannot start identifiers, unless the name is escaped. Basically, the first things that happens to a SQL query is tokenization. That is, the components of the query are broken into identifiers and keywords, which are then analyzed.
In SQL Server, keywords and identifiers and function names (and so on) cannot start with a digit (unless the name is escaped, of course). So, when the tokenizer encounters a digit, it knows that it has a number. The number ends when a non-digit character is encountered. So, a sequence of characters such as 1000ORDER BY is easily turned into three tokens, 1000, ORDER, and BY.
Similarly, the first time that a single quote is encountered, it always represents a string literal. The string literal ends when the final single quote is encountered. The next set of characters represents another token.
Let me add that there is exactly zero reason to ever use these nuances. First, these rules are properties of SQL Server's tokenization and do not necessarily apply to other databases. Second, the purpose of SQL is for humans to be able to express queries. It is way, way more important that we read them.
As jarlh mentioned there might be difference during scanning and parsing the tokens but it creates correctly during execution plan, hence it might not be huge difference in advantages or disadvantages
When parser examines characters ,it checks for keywords,identifiers,string constants and match overall semantic and syntactic structure of the language. Since 'Order by' is a keyword and sql parser knows its possible syntactic location in a query, it will interpret it accordingly without throwing any error. This is the reason why your order by will not throw any error.
Parsing sql query
Parsing SQL

What are pros and cons of using special characters in SQL identifiers?

Should I avoid special characters like "é á ç" in SQL table names and column names?
What are the pros and cons of using special characters?
As you can guess, there are pros and cons. This is more or less a subjective question.
SQL (unlike most programming languages) allows you to use special characters, whitespace, punctuation, or reserved words in your table or column identifiers.
It's pretty nice that people have the choice to use appropriate characters for their native language.
Especially in cases where a word changes its meaning significantly when spelled with the closest ASCII characters: e.g. año vs. ano.
But the downside is that if you do this, you have to use "delimited identifiers" every time you reference the table with special characters. In standard SQL, delimited identifiers use double-quotes.
SELECT * FROM "SELECT"
This is actually okay! If you want to use an SQL reserved word as a table name, you can do it. But it might cause some confusion for some readers of the code.
Likewise if you use special non-ASCII characters, it might make it hard for English-speaking programmers to maintain the code, because they are not familiar with the key sequence to type those special characters. Or they might forget that they have to delimit the table names.
SELECT * FROM "año"
Then there's non-standard delimited identifiers. Microsoft uses square-brackets by default:
SELECT * FROM [año]
And MySQL uses back-ticks by default:
SELECT * FROM `año`
Though both can use the standard double-quotes as identifier delimiters if you enable certain options, you can't always rely on that, and if the option gets disabled, your code will stop working. So users of Microsoft and MySQL are kind of stuck using the non-standard delimiters, unfortunately.
Maintaining the code is simpler in some ways if you can stick with ASCII characters. But there are legitimate reasons to want to use special characters too.

sql injection when single quote in input is always replaced with double quote

Suppose sql query invoked by the login form is:
SELECT * FROM Users WHERE Name ='user input data' AND Pass ='user input data'
Now all single quotes in user input data are replaced with double single quotes (in other words, ' is replaced with ''.).
I can think of this possible sql injection: set user as the intended user and set the password as '\' or 1=1
but I can't think of how I am going to avoid the last ' from disrupting my sql injection.
Avoiding the last ' from disrupting your SQL injection is as simple as using a comment, for example (-- works great for single-line SQL). Alternatively, you can always add something like this:
someHack and '' = '
Now, is it possible to get around your single-quote to double-quote replacement? That most likely depends heavily on the actual SQL variant and the SQL engine you're using. For example, some SQL engines might treat endlines in input in a way that would give you trouble. Unicode is very likely to give you trouble as well, with its very complicated rules. In particular, if parts of your code are unicode aware and others aren't, it's quite easy to pass a harmless unicode character to someone who expects ASCII, and actually use your replacement to add the (unescaped) quote. In case you think this is a weird edge case, it's actually a very real and used SQL injection vector for PHP programmers who thought replace (or addslashes) is good enough to prevent SQL injection :D
If possible, try to use parameters instead of slapping text together. It will likely help your performance as well :) The simplest way to avoid parser bugs is to avoid using the parser.

"¿" (inverted question mark) character in oracle

I found "¿" character (inverted question mark) in the database tables in place of single quote (') character.
Can any one let me know how i can avoid this character from the table.
There are many rows which contains the text with this character but not all single quotes are turning to this ¿ symbol.
I am not even able to filter the rows to update this character (¿) with single quote again.
When i user Like "%¿%" it filters me the text containing ordinary question mark (?)
In general there are two possibilities:
Your database tables really have ¿ characters caused by wrong NLS_LANG settings when data was inserted (or the database character set does not support the special character). In such case the LIKE '%¿%' condition should work. However, this also means you have corrupt data in your database and it is almost impossible to correct them because¿ stands for any wrong character.
Your client (e.g. SQL*Plus) is not able to display the special character caused by wrong NLS_LANG settings or the font does not support the special character.
Which client do you use (SQL*Plus, TOAD, SQL Developer, etc.)?
What is your NLS_LANG Environment variable, resp. your Registry key HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG or HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG?
What do you get when you select DUMP(... , 1016) from your table?
Was it really a simple quote (ASCII 39)? If they actually were some sort of "smart quotes", those do have mappings in windows-1252 but there is no ISO-8859 mapping for them, so if your database charset is ISO-8859-1 and you try to insert some windows-1252 text, Oracle tries to translete from windows-1252 to ISO-8859-1 and uses chr(191) to signal an unmappable character. chr(191) happens to be the opening question mark.
You can check that by executing this (copy the select to preserve the smart quotes):
select dump('‘’“”') from dual
This behaviour is basically "correct", as what you are asking Oracle to do cannot be done, though it is not very intuitive.
See windows-1252, ISO8859-1 and ISO-8859-15 charsets for comparison. Note that windows uses the range 127-159 that is not used in the ISO charsets

Odd formatting in SQL Injection exploits?

I have been trying to learn more about SQL Injection attacks. I understand the principle, but when I actually look at some of these attacks I don't know how they are doing what they are doing.
There often seem to be uneven quotes, and a lot of obfuscation via HEX characters.
I don't get the HEX characters..., surely they are translated back to ASCII by the browser, so what is the point?
However, I am mainly confused by the odd quoting. I am having trouble finding an example right now, however it usually seems that the quoting will end at some point before the end of the statement, when I would have thought it would be at the end?
Perhaps an example is the common use of '1 or 1=1
What is such a statement doing?
I don't get the HEX characters...,
surely they are translated back to
ASCII by the browser
Nope.
However, I am mainly confused by the
odd quoting. I am having trouble
finding an example right now, however
it usually seems that the quoting will
end at some point before the end of
the statement, when I would have
thought it would be at the end?
Imagine you're building inline SQL instead of using parameter substitution as you should. We'll use a make-believe language that looks a lot like PHP for no particular reason.
$sql = "delete from foo where bar = '" + $param + "'";
So, now, imagine that $param gets set by the browser as such...
$param = "' or 1=1 --"
(We're pretending like -- is the SQL comment sequence here. If it's not there's ways around that too)
So, now, what is your SQL after the string substitution is done?
delete from foo where bar = '' or 1=1 --'
Which will delete every record in foo.
This has been purposefully simple, but it should give you a good idea of what the uneven quotes are for.
Let's say that we have a form where we submit a form with a name field. name was used in a variable, $Name. You then run this query:
INSERT INTO Students VALUES ( '$Name' )
It will be translated into:
INSERT INTO Students VALUES ( 'Robert' ); DROP TABLE STUDENTS; --')
The -- is a comment delimiter. After that everything will be ignored. The ' is used to delimit string literals.
To use hex chars in an attack has some reasons. One of the is the obfuscation, other one to bypass some naive security measures.
There are cases where quote marks are prohibited with sql injection. In this case an attacker has to use an encoding method, such as hex encoding for his strings. For instance '/etc/passwd' can be written as0x2f6574632f706173737764 which doesn't require quote marks. Here is an example of a vulnerable query where quote marks are prohibited. :
mysql_query("select name from users where id=".addslashes($_GET[id]));
If you want to use a mysql function like load_file(), then you have to use hex encoding.
PoC:/vuln.php?id=1 union select load_file(0x2f6574632f706173737764)
In this case /etc/passwd is being read and will be the 2nd row.
Here is a variation on the hex encode function that I use in my MySQL SQL Injection exploits:
function charEncode($string){
$char="char(";
$size=strlen($string);
for($x=0;$x<$size;$x++){
$char.=ord($string[$x]).",";
}
$char[strlen($char)-1]=")%00";
return $char;
}
I use this exact method for exploiting HLStats 1.35. I have also used this function in my php nuke exploit to bypass xss filters for writing <?php?> to the disk using into outfile. Its important to note that into outfile is a query operator that does not accept the output to a function or a hex encoded string, it will only accept a quoted string as a path, thus in the vulnerable query above into outfile cannot be used by an attacker. Where as load_file() is a function call and a hex encoding can be used.
As for the uneven quoting; sql injection occurs where the coder didn't sanitize user input for dangerous characters such as single quotes - consider following statement
SELECT * FROM admin WHERE username='$_GET["user"]' and password='$_GET["pass"]'
if I know that valid user is 'admin' and insert the 'or 1=1 I will get following
SELECT * FROM admin WHERE username='admin' and password='something' or 1=1
This will result in returning the query alway true, because the left side of the expression will always be true, regardless of the value of the password.
This is the most simple example of sql injection, and ofter you will find that the attacker won't need to use the quote at all, or maybe comment the rest of the query out with comment delimiter such as -- or /*, if there are more parameters passed after the injection point.
As for the HEX encoding, there may be several reasons, avoiding filtering, it is simply easier to use hex encoded values, because you don't need to worry about quoting all your values in a query.
This is for instance useful if you want to use concat to co-notate two fields together like so:
inject.php?id=1 and 1=0 union select 1,2,concat(username,0x3a3a,password) from admin
Which would providing 3rd row is visible one, return for isntance admin::admin. If I didn't use hex encoding, I would have to do this:
inject.php?id=1 and 1=0 union select 1,2,concat(username,'::',password) from admin
This could be problem with aforementioned addslashes function, but also with poorly written regex sanitization functions or if you have very complicated query.
Sql injection is very wide topic, and what I've covered is hardly even an introduction through.