Does quoting of identifiers improve security? - sql

I know that in MySQL we can quote identifiers with back tick symbol, whereas in Oracle (and other RDBMS that follow standard) we can use double quotes around table names and field names. I wonder whether it somehow improves security and should we use this technique in real world applications in parallel with traditional methods like preparation of sql statements?

I wonder whether it somehow improves security
If you're accepting user input for column or table names and putting them into a query then yes, you will need to correctly encode them to avoid SQL injection. Specifically, wrap them in double-quotes, and replace any double-quote character in the name with a doubled-double-quote. (In MySQL, backtick and doubled-backtick, or set the ANSI_QUOTES config to make it use the standard double quote.)
It's really unusual to be accepting arbitrary schema names from user input though, and rarely a good idea. Typically where you allow particular columns to be identified (eg for a sort=something parameter) it's better to permit only a whitelist of known-good columns.
When you are only writing fixed queries there is no particular security need to quote schema names, although it's probably a good idea to ensure that your queries still work reliably in the face of different DBMSs with different reserved words.

Related

Should you use quotes on system identifiers in your query?

It may depend on the database type, but is there a preference (by the database and not the coder) or is it better to use quotes? Is it:
Faster?
Less error prone?
Helps prevent injection (if using a PDO or not)?
Assume there is nothing requiring the use (spaces, reserved words, etc.).
MySQL:
SELECT `id` FROM `table` WHERE `name` = '$name';
ANSI:
SELECT "id" FROM "table" WHERE "name" = '$name';
vs:
SELECT id FROM table WHERE name = $name;
This answer talks about the requirement to use quotes in MySQL, but I'm interested in when it's not required by the db, but it might be preferred/better for the aspects (and perhaps more) that I listed above.
Quotes around identifiers are used only during the query parsing stage - a stage that goes through the SQL statement, and figures out its syntactic elements. Compared to other stages (query optimization, query execution, and passing the results back to the caller) the parsing stage is relatively short. Therefore, you should not expect any measurable speedup or slowdown from using quotes around your identifiers, regardless of your particular RDBMS.
As far as being more or less error prone goes, missing quotes around multipart identifiers become apparent very quickly during the development stage, so the practice of placing quotes everywhere it is not worth the trouble, because the readability to humans suffers significantly.
Finally, adding quotes around identifiers would not help you prevent injection attacks; same goes for not placing quotes around all identifiers. Many SQL script generators take this route to avoid if statements all over the script testing if the identifier is multipart or not.
The only situation where quoting all identifiers is a good idea is when you generate SQL programmatically, and the results are not intended for human readers.
I am not a fan of using quoted identifiers. They take too long to type and they clutter the code, making it harder for humans to read.
Also, I prefer to discourage the use of reserved words as identifiers in SQL. To me, this is just good practice. I mean, who wants to read something like:
select `select`, `from` as `as`
from <some really messed up table>
I use the the bottome version because I find it easier to read but it depends on the way you echoed your query. There is no difference in the performance as it reads past the quotes unless it is need so you really don't know it unless writing a WHERE clause in your query that contains more than one variable.
I'd say it's faster NOT to use quotes for numeric identifiers; why add the potential overhead of a data type conversion to a query you want (presumably) to be as performant as possible?

Should underscores be used in column names?

Technically, the underscore character (_) can be used in column names. But is it good practice to use underscores in column names ? It seems to make the name more readable but I'm concerned about technical issues which may arise from using them. Column names will not be prefixed with an underscore.
There are no direct technical issue with using an underscore in the name. In fact, I do it quite often and find it helpful. Ruby even auto generate underscores in column names and SQL Servers own system objects use underscores too.
In general, it is a good idea to have some naming convention that you stick to in the database, and if that includes underscores, no big deal.
Any character can be used in the name, if you put square brackets or quotes around the name when referring to it. I try to avoid spaces though, since it makes things harder to read.
There are a few things you want to avoid when coming up with a naming convention for SQL Server. They are:
Don't prefix stored procedures with sp_ unless you are planning to make them system wide.
Don't prefix columns with their data type (since you may want to change it).
Avoid putting stuff in the sys schema (you can with hacking, but you shouldn't).
Pretend your code is case sensitive, even when it isn't. You never know when you end up on a server that has tempdb set up to be case sensitive.
When creating temp table, always specify collation for string types.
There is no problem with this, as long as it makes the column name clearer.
If you check PostgreSQL documentation you may find that almost all the objects are named with Snake Case.
Moreover, a lot of system objects in MySQL, MS SQL Server, Oracle DB, and aforementioned PostgreSQL use Snake Case.
From my personal experience it is not a big deal to use underscores for objects naming.
But there is a caveat.
Underscore symbol is a placeholder for any symbol in SQL LIKE operator:
SELECT * FROM FileList WHERE Extention LIKE 'ex_'
It is a potential issue when there is a lot of dynamic SQL code, especially if we are talking about autogenerated object names. And such bugs are quite hard to find.
Personally I would rather avoid underscores in naming. But at the same time there is no need to rewrite all the existing code if this type of naming has already being used.
Forewarned is forearmed.

Are input sanitization and parameterized queries mutually exclusive?

I'm working updating some legacy code that does not properly handle user input. The code does do a minimal amount of sanitization, but does not cover all known threats.
Our newer code uses parameterized queries. As I understand it, the queries are precompiled, and the input is treated simply as data which cannot be executed. In that case, sanitization is not necessary. Is that right?
To put it another way, if I parameterize the queries in this legacy code, is it OK to eliminate the sanitization that it currently does? Or am I missing some additional benefit of sanitization on top of parameterization?
It's true that SQL query parameters are a good defense against SQL injection. Embedded quotes or other special characters can't make mischief.
But some components of SQL queries can't be parameterized. E.g. table names, column names, SQL keywords.
$sql = "SELECT * FROM MyTable ORDER BY {$columnname} {$ASC_or_DESC}";
So there are some examples of dynamic content you may need to validate before interpolating into an SQL query. Whitelisting values is also a good technique.
Also you could have values that are permitted by the data type of a column but would be nonsensical. For these cases, it's often easier to use application code to validate than to try to validate in SQL constraints.
Suppose you store a credit card number. There are valid patterns for credit card numbers, and libraries to recognize a valid one from an invalid one.
Or how about when a user defines her password? You may want to ensure sufficient password strength, or validate that the user entered the same string in two password-entry fields.
Or if they order a quantity of merchandise, you may need to store the quantity as an integer but you'd want to make sure it's greater than zero and perhaps if it's greater than 1000 you'd want to double-check with the user that they entered it correctly.
Parameterized queries will help prevent SQL injection, but they won't do diddly against cross-site scripting. You need other measures, like HTML encoding or HTML detection/validation, to prevent that. If all you care about is SQL injection, parameterized queries is probably sufficient.
There are many different reasons to sanitize and validate, including preventing cross-site scripting, and simply wanting the correct content for a field (no names in phone numbers). Parameterized queries eliminate the need to manually sanitize or escape against SQL injection.
See one of my previous answers on this.
You are correct, SQL parameters are not executable code so you don't need to worry about that.
However, you should still do a bit of validation. For example, if you expect a varchar(10) and the user inputs something longer than that, you will end up with an exception.
In short no. Input sanitization and the use of parameterized queries are not mutually exclusive, they are independent: you can use neither, either one alone, or both. They prevent different types of attacks. Using both is the best course.
It is important to note, as a minor point, that sometimes it is useful to write stored procedures which contain dynamic SQL. In this case, the fact that the inputs are parameterized is no automatic defense against SQL injection. This may seem a fairly obvious point, but I often run into people who think that because their inputs are parameterized they can just stop worrying about SQL Injection.

Using backquote/backticks for mysql queries

I have building MYSQL queries with backticks. For example,
SELECT `title` FROM `table` WHERE (`id` = 3)
as opposed to:
SELECT title FROM table WHERE (id = 3)
I think I got this practice from the Phpmyadmin exports, and from what I understood, even Rails generates its queries like this.
But nowadays I see less and less queries built like this, and also, the code looks messier and more complicated with backticks in queries. Even with SQL helper functions, things would be simpler without them. Hence, I'm considering to leave them behind.
I wanted to find out if there is other implication in this practice such as SQL (MySQL in my case) interpretation speed, etc. What do you think?
Backticks also allow spaces and other special characters (except for backticks, obviously) in table/column names. They're not strictly necessary but a good idea for safety.
If you follow sensible rules for naming tables and columns backticks should be unnecessary.
Every time I see this discussed, I try to lobby for their inclusion, because, well, the answer is hidden in here already, although wryly winked away without further thought. When we mistakenly use a keyword as a field or table name, we can escape confusion by various methods, but only the keenly aware back-tick ` allows an even greater benefit!!!
Every word in a sql statement is run through the entire keyword hash table to see if conflicts, therefore, you've done you query a great favor by telling the compiler that, hey, I know what I'm doing, you don't need to check these words because they represent table and field names. Speed and elegance.
Cheers,
Brad
backticks are used to escape reserved keywords in your mysql query, e.g. you want to have a count column—not that uncommon.
you can use other special characters or spaces in your column/table/db names
they do not keep you safe from injection attacks (if you allow users to enter column names in some way—bad practice anyway)
they are not standardized sql and will only work in mysql; other dbms will use " instead
Well, if you ensure that you never accidentally use a keyword as an identifier, you don't need the backticks. :-)
You read the documentation on identifiers at http://dev.mysql.com/doc/refman/5.6/en/identifiers.html
SQL generators will often include backticks, as it is simpler than including a list of all MySQL reserved words. To use any1 sequence of BMP Unicode characters except U+0000 as an identifier, they can simply
Replace all backticks with double backticks
Surround that with single backticks
When writing handmade queries, I know (most of) MySQL's reserved words, and I prefer to not use backticks where possible as it is shorter and IMO easier to read.
Most of the time, it's just a style preference -- unless of course, you have a field like date or My Field, and then you must use backticks.
1. Though see https://bugs.mysql.com/bug.php?id=68676
My belief was that the backticks were primarily used to prevent erroneous queries which utilized common SQL identifiers, i.e. LIMIT and COUNT.

Why can you have a column named ORDER in DB2?

In DB2, you can name a column ORDER and write SQL like
SELECT ORDER FROM tblWHATEVER ORDER BY ORDER
without even needing to put any special characters around the column name. This is causing me pain that I won't get into, but my question is: why do databases allow the use of SQL keywords for object names? Surely it would make more sense to just not allow this?
I largely agree with the sentiment that keywords shouldn't be allowed as identifiers. Most modern computing languages have 20 or maybe 30 keywords, in which case imposing a moratorium on their use as identifiers is entirely reasonable. Unfortunately, SQL comes from the old COBOL school of languages ("computing languages should be as similar to English as possible"). Hence, SQL (like COBOL) has several hundred keywords.
I don't recall if the SQL standard says anything about whether reserved words must be permitted as identifiers, but given the extensive (excessive!) vocabulary it's unsurprising that several SQL implementations permit it.
Having said that, using keywords as identifiers isn't half as silly as the whole concept of quoted identifiers in SQL (and these aren't DB2 specific). Permitting case sensitive identifiers is one thing, but quoted identifiers permit all sorts of nonsense including spaces, diacriticals and in some implementations (yes, including DB2), control characters! Try the following for example:
CREATE TABLE "My
Tablé" ( A INTEGER NOT NULL );
Yes, that's a line break in the middle of an identifier along with an e-acute at the end... (which leads to interesting speculation on what encoding is used for database meta-data and hence whether a non-Unicode database would permit, say, a table definition containing Japanese column names).
Many SQL parsers (expecially DB2/z, which I use) are smarter than some of the regular parsers which sometimes separate lexical and semantic analysis totally (this separation is mostly a good thing).
The SQL parsers can figure out based on context whether a keyword is valid or should be treated as an identifier.
Hence you can get columns called ORDER or GROUP or DATE (that's a particularly common one).
It does annoy me with some of the syntax coloring editors when they brand an identifier with the keyword color. Their parsers aren't as 'smart' as the ones in DB2.
Because object names are ... names. All database systems let you use quoted names to stop you from running into trouble.
If you are running into issues, the fault lies not with the practice of permitting object names to be names, but with faulty implementations, or with faulty code libraries which don't automatically quote everything or cannot be made to quote names as-needed.
Interestingly you can use keywords as field names in SqlServer as well. The only differenc eis that you would need to use parenthesis with the name of the field
so you can do something like
create table [order](
id int,
[order] varchar(50) )
and then :)
select
[order]
from
[order]
order by [order]
That is of course a bit extreme example but at least with the use of parenthesis you can see that [order] is not a keyword.
The reason I would see people using names already reserved by keywords is when there is a direct mapping between column names, or names of the tables and the data presentation. You can call that being lazy or convenient.