Technically, the underscore character (_) can be used in column names. But is it good practice to use underscores in column names ? It seems to make the name more readable but I'm concerned about technical issues which may arise from using them. Column names will not be prefixed with an underscore.
There are no direct technical issue with using an underscore in the name. In fact, I do it quite often and find it helpful. Ruby even auto generate underscores in column names and SQL Servers own system objects use underscores too.
In general, it is a good idea to have some naming convention that you stick to in the database, and if that includes underscores, no big deal.
Any character can be used in the name, if you put square brackets or quotes around the name when referring to it. I try to avoid spaces though, since it makes things harder to read.
There are a few things you want to avoid when coming up with a naming convention for SQL Server. They are:
Don't prefix stored procedures with sp_ unless you are planning to make them system wide.
Don't prefix columns with their data type (since you may want to change it).
Avoid putting stuff in the sys schema (you can with hacking, but you shouldn't).
Pretend your code is case sensitive, even when it isn't. You never know when you end up on a server that has tempdb set up to be case sensitive.
When creating temp table, always specify collation for string types.
There is no problem with this, as long as it makes the column name clearer.
If you check PostgreSQL documentation you may find that almost all the objects are named with Snake Case.
Moreover, a lot of system objects in MySQL, MS SQL Server, Oracle DB, and aforementioned PostgreSQL use Snake Case.
From my personal experience it is not a big deal to use underscores for objects naming.
But there is a caveat.
Underscore symbol is a placeholder for any symbol in SQL LIKE operator:
SELECT * FROM FileList WHERE Extention LIKE 'ex_'
It is a potential issue when there is a lot of dynamic SQL code, especially if we are talking about autogenerated object names. And such bugs are quite hard to find.
Personally I would rather avoid underscores in naming. But at the same time there is no need to rewrite all the existing code if this type of naming has already being used.
Forewarned is forearmed.
Related
I know that in MySQL we can quote identifiers with back tick symbol, whereas in Oracle (and other RDBMS that follow standard) we can use double quotes around table names and field names. I wonder whether it somehow improves security and should we use this technique in real world applications in parallel with traditional methods like preparation of sql statements?
I wonder whether it somehow improves security
If you're accepting user input for column or table names and putting them into a query then yes, you will need to correctly encode them to avoid SQL injection. Specifically, wrap them in double-quotes, and replace any double-quote character in the name with a doubled-double-quote. (In MySQL, backtick and doubled-backtick, or set the ANSI_QUOTES config to make it use the standard double quote.)
It's really unusual to be accepting arbitrary schema names from user input though, and rarely a good idea. Typically where you allow particular columns to be identified (eg for a sort=something parameter) it's better to permit only a whitelist of known-good columns.
When you are only writing fixed queries there is no particular security need to quote schema names, although it's probably a good idea to ensure that your queries still work reliably in the face of different DBMSs with different reserved words.
It may depend on the database type, but is there a preference (by the database and not the coder) or is it better to use quotes? Is it:
Faster?
Less error prone?
Helps prevent injection (if using a PDO or not)?
Assume there is nothing requiring the use (spaces, reserved words, etc.).
MySQL:
SELECT `id` FROM `table` WHERE `name` = '$name';
ANSI:
SELECT "id" FROM "table" WHERE "name" = '$name';
vs:
SELECT id FROM table WHERE name = $name;
This answer talks about the requirement to use quotes in MySQL, but I'm interested in when it's not required by the db, but it might be preferred/better for the aspects (and perhaps more) that I listed above.
Quotes around identifiers are used only during the query parsing stage - a stage that goes through the SQL statement, and figures out its syntactic elements. Compared to other stages (query optimization, query execution, and passing the results back to the caller) the parsing stage is relatively short. Therefore, you should not expect any measurable speedup or slowdown from using quotes around your identifiers, regardless of your particular RDBMS.
As far as being more or less error prone goes, missing quotes around multipart identifiers become apparent very quickly during the development stage, so the practice of placing quotes everywhere it is not worth the trouble, because the readability to humans suffers significantly.
Finally, adding quotes around identifiers would not help you prevent injection attacks; same goes for not placing quotes around all identifiers. Many SQL script generators take this route to avoid if statements all over the script testing if the identifier is multipart or not.
The only situation where quoting all identifiers is a good idea is when you generate SQL programmatically, and the results are not intended for human readers.
I am not a fan of using quoted identifiers. They take too long to type and they clutter the code, making it harder for humans to read.
Also, I prefer to discourage the use of reserved words as identifiers in SQL. To me, this is just good practice. I mean, who wants to read something like:
select `select`, `from` as `as`
from <some really messed up table>
I use the the bottome version because I find it easier to read but it depends on the way you echoed your query. There is no difference in the performance as it reads past the quotes unless it is need so you really don't know it unless writing a WHERE clause in your query that contains more than one variable.
I'd say it's faster NOT to use quotes for numeric identifiers; why add the potential overhead of a data type conversion to a query you want (presumably) to be as performant as possible?
I have building MYSQL queries with backticks. For example,
SELECT `title` FROM `table` WHERE (`id` = 3)
as opposed to:
SELECT title FROM table WHERE (id = 3)
I think I got this practice from the Phpmyadmin exports, and from what I understood, even Rails generates its queries like this.
But nowadays I see less and less queries built like this, and also, the code looks messier and more complicated with backticks in queries. Even with SQL helper functions, things would be simpler without them. Hence, I'm considering to leave them behind.
I wanted to find out if there is other implication in this practice such as SQL (MySQL in my case) interpretation speed, etc. What do you think?
Backticks also allow spaces and other special characters (except for backticks, obviously) in table/column names. They're not strictly necessary but a good idea for safety.
If you follow sensible rules for naming tables and columns backticks should be unnecessary.
Every time I see this discussed, I try to lobby for their inclusion, because, well, the answer is hidden in here already, although wryly winked away without further thought. When we mistakenly use a keyword as a field or table name, we can escape confusion by various methods, but only the keenly aware back-tick ` allows an even greater benefit!!!
Every word in a sql statement is run through the entire keyword hash table to see if conflicts, therefore, you've done you query a great favor by telling the compiler that, hey, I know what I'm doing, you don't need to check these words because they represent table and field names. Speed and elegance.
Cheers,
Brad
backticks are used to escape reserved keywords in your mysql query, e.g. you want to have a count column—not that uncommon.
you can use other special characters or spaces in your column/table/db names
they do not keep you safe from injection attacks (if you allow users to enter column names in some way—bad practice anyway)
they are not standardized sql and will only work in mysql; other dbms will use " instead
Well, if you ensure that you never accidentally use a keyword as an identifier, you don't need the backticks. :-)
You read the documentation on identifiers at http://dev.mysql.com/doc/refman/5.6/en/identifiers.html
SQL generators will often include backticks, as it is simpler than including a list of all MySQL reserved words. To use any1 sequence of BMP Unicode characters except U+0000 as an identifier, they can simply
Replace all backticks with double backticks
Surround that with single backticks
When writing handmade queries, I know (most of) MySQL's reserved words, and I prefer to not use backticks where possible as it is shorter and IMO easier to read.
Most of the time, it's just a style preference -- unless of course, you have a field like date or My Field, and then you must use backticks.
1. Though see https://bugs.mysql.com/bug.php?id=68676
My belief was that the backticks were primarily used to prevent erroneous queries which utilized common SQL identifiers, i.e. LIMIT and COUNT.
I always see examples this way, but why? Is this a good practice?
So they're distinguishable from the rest of the query (which is typically written in upper case).
As for whether or not it's a best practice...if you're writing queries in all upper case, then yes it definitely makes your queries easier to read and understand.
I use lower case for the names invented by me.
These are table names, column names, my function names, aliases, etc.
The upper case is for the names invented by somebody else
That is reserved words, built-in functions, etc.
dual and dummy in Oracle are notable exception from this rule, but they are table name and column name, so I just use like with like.
Convention is always a good practice, so it is good to follow what your dev team has agreed upon. Many people subscribe to putting keywords in UPPER case, so differentiating aliases from keywords by making them lower is common.
I think just like casing questions with SQL-it's all personal preference. I like all lowercase in my queries so I tend to just keep it that way with aliases as well.
As said before, I think its personal preference.
I mostly use lower case, except aliases which I always capitalize.
I write queries only in stored procedures so I write only the important part of my query (and other TSQL "commands" like BEGIN, END, IF, ELSE, WHILE, etc.) in upper case use.
All aliases are capitalized so I can see at a glance to which table an attribute belongs.
If someone joins my team (project) he has to do the same, as I do when I join someone else's project.
As I try to make it more readable, I think that line breaks and indentations are more important that case (as long as it stays the same through the whole project).
I think there are several reasons to write them in lowercase:
Lowercase looks less like "shouting". Lots of UPPERCASE characters look like shouting (most people on a forum won't like posts in UPPERCASE). Some write keywords like IF in upper case, when also writing your aliases in uppercase you might get confused.
If you start a tablename with a Capital and your aliases with lower case characters you can keep them apart. Otherwise you might get confused when there are also tables with shorter names.
But no matter what standard you use in a team as long as everybody is using the same rules you can read each other codes and it will look less messy. Sometimes code can be so complex you don't want te get distracted by code which violates the "rules".
In DB2, you can name a column ORDER and write SQL like
SELECT ORDER FROM tblWHATEVER ORDER BY ORDER
without even needing to put any special characters around the column name. This is causing me pain that I won't get into, but my question is: why do databases allow the use of SQL keywords for object names? Surely it would make more sense to just not allow this?
I largely agree with the sentiment that keywords shouldn't be allowed as identifiers. Most modern computing languages have 20 or maybe 30 keywords, in which case imposing a moratorium on their use as identifiers is entirely reasonable. Unfortunately, SQL comes from the old COBOL school of languages ("computing languages should be as similar to English as possible"). Hence, SQL (like COBOL) has several hundred keywords.
I don't recall if the SQL standard says anything about whether reserved words must be permitted as identifiers, but given the extensive (excessive!) vocabulary it's unsurprising that several SQL implementations permit it.
Having said that, using keywords as identifiers isn't half as silly as the whole concept of quoted identifiers in SQL (and these aren't DB2 specific). Permitting case sensitive identifiers is one thing, but quoted identifiers permit all sorts of nonsense including spaces, diacriticals and in some implementations (yes, including DB2), control characters! Try the following for example:
CREATE TABLE "My
Tablé" ( A INTEGER NOT NULL );
Yes, that's a line break in the middle of an identifier along with an e-acute at the end... (which leads to interesting speculation on what encoding is used for database meta-data and hence whether a non-Unicode database would permit, say, a table definition containing Japanese column names).
Many SQL parsers (expecially DB2/z, which I use) are smarter than some of the regular parsers which sometimes separate lexical and semantic analysis totally (this separation is mostly a good thing).
The SQL parsers can figure out based on context whether a keyword is valid or should be treated as an identifier.
Hence you can get columns called ORDER or GROUP or DATE (that's a particularly common one).
It does annoy me with some of the syntax coloring editors when they brand an identifier with the keyword color. Their parsers aren't as 'smart' as the ones in DB2.
Because object names are ... names. All database systems let you use quoted names to stop you from running into trouble.
If you are running into issues, the fault lies not with the practice of permitting object names to be names, but with faulty implementations, or with faulty code libraries which don't automatically quote everything or cannot be made to quote names as-needed.
Interestingly you can use keywords as field names in SqlServer as well. The only differenc eis that you would need to use parenthesis with the name of the field
so you can do something like
create table [order](
id int,
[order] varchar(50) )
and then :)
select
[order]
from
[order]
order by [order]
That is of course a bit extreme example but at least with the use of parenthesis you can see that [order] is not a keyword.
The reason I would see people using names already reserved by keywords is when there is a direct mapping between column names, or names of the tables and the data presentation. You can call that being lazy or convenient.