It may depend on the database type, but is there a preference (by the database and not the coder) or is it better to use quotes? Is it:
Faster?
Less error prone?
Helps prevent injection (if using a PDO or not)?
Assume there is nothing requiring the use (spaces, reserved words, etc.).
MySQL:
SELECT `id` FROM `table` WHERE `name` = '$name';
ANSI:
SELECT "id" FROM "table" WHERE "name" = '$name';
vs:
SELECT id FROM table WHERE name = $name;
This answer talks about the requirement to use quotes in MySQL, but I'm interested in when it's not required by the db, but it might be preferred/better for the aspects (and perhaps more) that I listed above.
Quotes around identifiers are used only during the query parsing stage - a stage that goes through the SQL statement, and figures out its syntactic elements. Compared to other stages (query optimization, query execution, and passing the results back to the caller) the parsing stage is relatively short. Therefore, you should not expect any measurable speedup or slowdown from using quotes around your identifiers, regardless of your particular RDBMS.
As far as being more or less error prone goes, missing quotes around multipart identifiers become apparent very quickly during the development stage, so the practice of placing quotes everywhere it is not worth the trouble, because the readability to humans suffers significantly.
Finally, adding quotes around identifiers would not help you prevent injection attacks; same goes for not placing quotes around all identifiers. Many SQL script generators take this route to avoid if statements all over the script testing if the identifier is multipart or not.
The only situation where quoting all identifiers is a good idea is when you generate SQL programmatically, and the results are not intended for human readers.
I am not a fan of using quoted identifiers. They take too long to type and they clutter the code, making it harder for humans to read.
Also, I prefer to discourage the use of reserved words as identifiers in SQL. To me, this is just good practice. I mean, who wants to read something like:
select `select`, `from` as `as`
from <some really messed up table>
I use the the bottome version because I find it easier to read but it depends on the way you echoed your query. There is no difference in the performance as it reads past the quotes unless it is need so you really don't know it unless writing a WHERE clause in your query that contains more than one variable.
I'd say it's faster NOT to use quotes for numeric identifiers; why add the potential overhead of a data type conversion to a query you want (presumably) to be as performant as possible?
Related
I know that in MySQL we can quote identifiers with back tick symbol, whereas in Oracle (and other RDBMS that follow standard) we can use double quotes around table names and field names. I wonder whether it somehow improves security and should we use this technique in real world applications in parallel with traditional methods like preparation of sql statements?
I wonder whether it somehow improves security
If you're accepting user input for column or table names and putting them into a query then yes, you will need to correctly encode them to avoid SQL injection. Specifically, wrap them in double-quotes, and replace any double-quote character in the name with a doubled-double-quote. (In MySQL, backtick and doubled-backtick, or set the ANSI_QUOTES config to make it use the standard double quote.)
It's really unusual to be accepting arbitrary schema names from user input though, and rarely a good idea. Typically where you allow particular columns to be identified (eg for a sort=something parameter) it's better to permit only a whitelist of known-good columns.
When you are only writing fixed queries there is no particular security need to quote schema names, although it's probably a good idea to ensure that your queries still work reliably in the face of different DBMSs with different reserved words.
I know this could be a trivial question but I keep hearing one of my teachers voice saying
don't use SELECT * within a stored procedure, that affects performance and it's returning data that could be braking its clients if it's schema changes causing unknown ripple
I can't find any article confirming that concept, and I think that should be noticeable if true.
In most modern professional SQL implementations (Oracle, SQL Server, DB2, etc.), the use of SELECT * has a negative impact only in a top-level SELECT. In all other cases the SQL compiler should perform column-optimization anyway, eliminating any columns that are not used.
And the negative effect of * in a top-level SELECT is almost entirely related to returning all of the columns when you probably do not need all of them.
IMHO, in all other cases(**), including most ad-hoc cases, the use of * is perfectly fine and has no determimental effects (and obvious beneficial conveniences). The widespread universal pronouncements agaist using * are largely an archiac holdover from the time (10-15 years ago) when most SQL compilers did not have effective column-elimination optimization techniques.
(** - one exception is in VIEW definitions in SQL Server, because it doesn't automatically notice if the bound column list changes.)
The other reason that you sometimes see for not using SELECT * is not because of any performance issue, but just as a matter of coding practices. That is, that it's generally better to write your SQL code to be explicit about what columns you (or your client code) expects and thus are dependent on. If you use * then it's implicit and someone reading your SQL code cannot easily tell if your application is truly dependent on a certain column or not. (And IMHO, this is the more valid reason.)
I found this quote in a paper when we use SELECT * instruction:
“[…] real harm occurs when a sort is required. Every SELECTed column, with the sorting columns repeated, makes up the width of the sort work file wider. The wider and longer the file, the slower the sort is.” In http://www.quest.com/whitepapers/10_SQL_Tips.pdf
This paper is form DB2 engine but likely this is applied for other engines too.
Technically, the underscore character (_) can be used in column names. But is it good practice to use underscores in column names ? It seems to make the name more readable but I'm concerned about technical issues which may arise from using them. Column names will not be prefixed with an underscore.
There are no direct technical issue with using an underscore in the name. In fact, I do it quite often and find it helpful. Ruby even auto generate underscores in column names and SQL Servers own system objects use underscores too.
In general, it is a good idea to have some naming convention that you stick to in the database, and if that includes underscores, no big deal.
Any character can be used in the name, if you put square brackets or quotes around the name when referring to it. I try to avoid spaces though, since it makes things harder to read.
There are a few things you want to avoid when coming up with a naming convention for SQL Server. They are:
Don't prefix stored procedures with sp_ unless you are planning to make them system wide.
Don't prefix columns with their data type (since you may want to change it).
Avoid putting stuff in the sys schema (you can with hacking, but you shouldn't).
Pretend your code is case sensitive, even when it isn't. You never know when you end up on a server that has tempdb set up to be case sensitive.
When creating temp table, always specify collation for string types.
There is no problem with this, as long as it makes the column name clearer.
If you check PostgreSQL documentation you may find that almost all the objects are named with Snake Case.
Moreover, a lot of system objects in MySQL, MS SQL Server, Oracle DB, and aforementioned PostgreSQL use Snake Case.
From my personal experience it is not a big deal to use underscores for objects naming.
But there is a caveat.
Underscore symbol is a placeholder for any symbol in SQL LIKE operator:
SELECT * FROM FileList WHERE Extention LIKE 'ex_'
It is a potential issue when there is a lot of dynamic SQL code, especially if we are talking about autogenerated object names. And such bugs are quite hard to find.
Personally I would rather avoid underscores in naming. But at the same time there is no need to rewrite all the existing code if this type of naming has already being used.
Forewarned is forearmed.
I'm working updating some legacy code that does not properly handle user input. The code does do a minimal amount of sanitization, but does not cover all known threats.
Our newer code uses parameterized queries. As I understand it, the queries are precompiled, and the input is treated simply as data which cannot be executed. In that case, sanitization is not necessary. Is that right?
To put it another way, if I parameterize the queries in this legacy code, is it OK to eliminate the sanitization that it currently does? Or am I missing some additional benefit of sanitization on top of parameterization?
It's true that SQL query parameters are a good defense against SQL injection. Embedded quotes or other special characters can't make mischief.
But some components of SQL queries can't be parameterized. E.g. table names, column names, SQL keywords.
$sql = "SELECT * FROM MyTable ORDER BY {$columnname} {$ASC_or_DESC}";
So there are some examples of dynamic content you may need to validate before interpolating into an SQL query. Whitelisting values is also a good technique.
Also you could have values that are permitted by the data type of a column but would be nonsensical. For these cases, it's often easier to use application code to validate than to try to validate in SQL constraints.
Suppose you store a credit card number. There are valid patterns for credit card numbers, and libraries to recognize a valid one from an invalid one.
Or how about when a user defines her password? You may want to ensure sufficient password strength, or validate that the user entered the same string in two password-entry fields.
Or if they order a quantity of merchandise, you may need to store the quantity as an integer but you'd want to make sure it's greater than zero and perhaps if it's greater than 1000 you'd want to double-check with the user that they entered it correctly.
Parameterized queries will help prevent SQL injection, but they won't do diddly against cross-site scripting. You need other measures, like HTML encoding or HTML detection/validation, to prevent that. If all you care about is SQL injection, parameterized queries is probably sufficient.
There are many different reasons to sanitize and validate, including preventing cross-site scripting, and simply wanting the correct content for a field (no names in phone numbers). Parameterized queries eliminate the need to manually sanitize or escape against SQL injection.
See one of my previous answers on this.
You are correct, SQL parameters are not executable code so you don't need to worry about that.
However, you should still do a bit of validation. For example, if you expect a varchar(10) and the user inputs something longer than that, you will end up with an exception.
In short no. Input sanitization and the use of parameterized queries are not mutually exclusive, they are independent: you can use neither, either one alone, or both. They prevent different types of attacks. Using both is the best course.
It is important to note, as a minor point, that sometimes it is useful to write stored procedures which contain dynamic SQL. In this case, the fact that the inputs are parameterized is no automatic defense against SQL injection. This may seem a fairly obvious point, but I often run into people who think that because their inputs are parameterized they can just stop worrying about SQL Injection.
I have building MYSQL queries with backticks. For example,
SELECT `title` FROM `table` WHERE (`id` = 3)
as opposed to:
SELECT title FROM table WHERE (id = 3)
I think I got this practice from the Phpmyadmin exports, and from what I understood, even Rails generates its queries like this.
But nowadays I see less and less queries built like this, and also, the code looks messier and more complicated with backticks in queries. Even with SQL helper functions, things would be simpler without them. Hence, I'm considering to leave them behind.
I wanted to find out if there is other implication in this practice such as SQL (MySQL in my case) interpretation speed, etc. What do you think?
Backticks also allow spaces and other special characters (except for backticks, obviously) in table/column names. They're not strictly necessary but a good idea for safety.
If you follow sensible rules for naming tables and columns backticks should be unnecessary.
Every time I see this discussed, I try to lobby for their inclusion, because, well, the answer is hidden in here already, although wryly winked away without further thought. When we mistakenly use a keyword as a field or table name, we can escape confusion by various methods, but only the keenly aware back-tick ` allows an even greater benefit!!!
Every word in a sql statement is run through the entire keyword hash table to see if conflicts, therefore, you've done you query a great favor by telling the compiler that, hey, I know what I'm doing, you don't need to check these words because they represent table and field names. Speed and elegance.
Cheers,
Brad
backticks are used to escape reserved keywords in your mysql query, e.g. you want to have a count column—not that uncommon.
you can use other special characters or spaces in your column/table/db names
they do not keep you safe from injection attacks (if you allow users to enter column names in some way—bad practice anyway)
they are not standardized sql and will only work in mysql; other dbms will use " instead
Well, if you ensure that you never accidentally use a keyword as an identifier, you don't need the backticks. :-)
You read the documentation on identifiers at http://dev.mysql.com/doc/refman/5.6/en/identifiers.html
SQL generators will often include backticks, as it is simpler than including a list of all MySQL reserved words. To use any1 sequence of BMP Unicode characters except U+0000 as an identifier, they can simply
Replace all backticks with double backticks
Surround that with single backticks
When writing handmade queries, I know (most of) MySQL's reserved words, and I prefer to not use backticks where possible as it is shorter and IMO easier to read.
Most of the time, it's just a style preference -- unless of course, you have a field like date or My Field, and then you must use backticks.
1. Though see https://bugs.mysql.com/bug.php?id=68676
My belief was that the backticks were primarily used to prevent erroneous queries which utilized common SQL identifiers, i.e. LIMIT and COUNT.