In DB2, you can name a column ORDER and write SQL like
SELECT ORDER FROM tblWHATEVER ORDER BY ORDER
without even needing to put any special characters around the column name. This is causing me pain that I won't get into, but my question is: why do databases allow the use of SQL keywords for object names? Surely it would make more sense to just not allow this?
I largely agree with the sentiment that keywords shouldn't be allowed as identifiers. Most modern computing languages have 20 or maybe 30 keywords, in which case imposing a moratorium on their use as identifiers is entirely reasonable. Unfortunately, SQL comes from the old COBOL school of languages ("computing languages should be as similar to English as possible"). Hence, SQL (like COBOL) has several hundred keywords.
I don't recall if the SQL standard says anything about whether reserved words must be permitted as identifiers, but given the extensive (excessive!) vocabulary it's unsurprising that several SQL implementations permit it.
Having said that, using keywords as identifiers isn't half as silly as the whole concept of quoted identifiers in SQL (and these aren't DB2 specific). Permitting case sensitive identifiers is one thing, but quoted identifiers permit all sorts of nonsense including spaces, diacriticals and in some implementations (yes, including DB2), control characters! Try the following for example:
CREATE TABLE "My
Tablé" ( A INTEGER NOT NULL );
Yes, that's a line break in the middle of an identifier along with an e-acute at the end... (which leads to interesting speculation on what encoding is used for database meta-data and hence whether a non-Unicode database would permit, say, a table definition containing Japanese column names).
Many SQL parsers (expecially DB2/z, which I use) are smarter than some of the regular parsers which sometimes separate lexical and semantic analysis totally (this separation is mostly a good thing).
The SQL parsers can figure out based on context whether a keyword is valid or should be treated as an identifier.
Hence you can get columns called ORDER or GROUP or DATE (that's a particularly common one).
It does annoy me with some of the syntax coloring editors when they brand an identifier with the keyword color. Their parsers aren't as 'smart' as the ones in DB2.
Because object names are ... names. All database systems let you use quoted names to stop you from running into trouble.
If you are running into issues, the fault lies not with the practice of permitting object names to be names, but with faulty implementations, or with faulty code libraries which don't automatically quote everything or cannot be made to quote names as-needed.
Interestingly you can use keywords as field names in SqlServer as well. The only differenc eis that you would need to use parenthesis with the name of the field
so you can do something like
create table [order](
id int,
[order] varchar(50) )
and then :)
select
[order]
from
[order]
order by [order]
That is of course a bit extreme example but at least with the use of parenthesis you can see that [order] is not a keyword.
The reason I would see people using names already reserved by keywords is when there is a direct mapping between column names, or names of the tables and the data presentation. You can call that being lazy or convenient.
Related
I've been looking at a SQL query & output and am trying to figure out the impact the "$" would have in the query if any.
I cannot find anything about the use of the "$" symbol and want to make sure. I've searched the w3 schools, here, and Oracle documentation (as I'm using an Oracle database)
select * from v$example_users
The above is the code that I'm looking at. Will the "$" symbol in the middle of the table name? I.e. is the table called "v$example_users" or does the "$" somehow affect the table?
There's no special functionality to the $ character.
The v$ views are public synonyms of Oracle's dynamic performance views. They are given these "unconventional" names to make them easy to recognize.
Most Oracle identifiers (table names, column names, etc.) you see in the wild usually include only alphanumeric characters. This is largely because it's easy to type and it reduces confusion with math operators, parenthesis, and other delimiters. This is also true for all databases I know.
However, you can actually use other symbols for table names (or other "objects") if you want to. If you do, you will probably need to quote the name -- every time. That is, when you create the table, when you insert a row, when you delete and update it, and on every single select query where you use it. It's perfectly legal, so if you need it, use it.
And, there's no performance penalty to pay. For Oracle there's no difference.
For example:
create table "Car+Price" (
id number(6),
"list&price" number(6),
"%discount" number(3),
"used?" number(1)
);
But then a query would be:
select id, "list&price", "%discount"
from "Car+Price"
where "used?" = 1;
See? Works! Does it look nice? er... not really. Besides, it looks quite error prone to me.
In my opinion, I would strongly recommend against using it (if possible) since it makes code harder to read, it's more difficult to debug, some ORMs don't really deal well with it, and a myriad of other reasons.
Just stay away from it, as long as you can.
Quest Software\Knowledge Xpert states:
If two identical SQL statements vary because an identical table has two different aliases, then the SQL is different and will not be shared.
What sense does this make?
I understand that if I have table A and table B and I fail to alias an ambiguous column what I'm trying to do is mathematically ambiguous, but the names of the aliases themselves shouldn't matter should they? Why would SQL/Oracle care that table A's alias is FOO in one statement and BAR in another when determining for caching purposes if they are identical?
On a similar line why should whitespace or word case matter at all?
"SQL cannot be shared within the SGA unless it is absolutely identical. Statement components that must be the same include:
Word case (uppercase and lowercase characters)
Whitespace
Underlying schema objects"
Underlying schema objects makes sense, because after all mathematically that's something different. Is the idea I might be an idiot and have columns named "Foo" "FOO" and "foo" and we don't want to accidentally cache?
I think it's to avoid the extra overhead of "normalizing" each SQL statement before creating a SQL_ID.
The SQL_ID is a hash of the SQL statement. In order to do what you are asking, it would require the SQL parser to do extra work (for limited benefit) in order to make a uniform SQL statement that would compare exactly with another statement that was equivalent, but had mixed case, extra spaces, etc.
I think this restrictions are due to SQL processing mechanism Oracle uses. It calculates hash value of query text and if this hash matches with one stored in SGA it helps to avoid hard parsing steps. More details are here.
Technically, the underscore character (_) can be used in column names. But is it good practice to use underscores in column names ? It seems to make the name more readable but I'm concerned about technical issues which may arise from using them. Column names will not be prefixed with an underscore.
There are no direct technical issue with using an underscore in the name. In fact, I do it quite often and find it helpful. Ruby even auto generate underscores in column names and SQL Servers own system objects use underscores too.
In general, it is a good idea to have some naming convention that you stick to in the database, and if that includes underscores, no big deal.
Any character can be used in the name, if you put square brackets or quotes around the name when referring to it. I try to avoid spaces though, since it makes things harder to read.
There are a few things you want to avoid when coming up with a naming convention for SQL Server. They are:
Don't prefix stored procedures with sp_ unless you are planning to make them system wide.
Don't prefix columns with their data type (since you may want to change it).
Avoid putting stuff in the sys schema (you can with hacking, but you shouldn't).
Pretend your code is case sensitive, even when it isn't. You never know when you end up on a server that has tempdb set up to be case sensitive.
When creating temp table, always specify collation for string types.
There is no problem with this, as long as it makes the column name clearer.
If you check PostgreSQL documentation you may find that almost all the objects are named with Snake Case.
Moreover, a lot of system objects in MySQL, MS SQL Server, Oracle DB, and aforementioned PostgreSQL use Snake Case.
From my personal experience it is not a big deal to use underscores for objects naming.
But there is a caveat.
Underscore symbol is a placeholder for any symbol in SQL LIKE operator:
SELECT * FROM FileList WHERE Extention LIKE 'ex_'
It is a potential issue when there is a lot of dynamic SQL code, especially if we are talking about autogenerated object names. And such bugs are quite hard to find.
Personally I would rather avoid underscores in naming. But at the same time there is no need to rewrite all the existing code if this type of naming has already being used.
Forewarned is forearmed.
I have building MYSQL queries with backticks. For example,
SELECT `title` FROM `table` WHERE (`id` = 3)
as opposed to:
SELECT title FROM table WHERE (id = 3)
I think I got this practice from the Phpmyadmin exports, and from what I understood, even Rails generates its queries like this.
But nowadays I see less and less queries built like this, and also, the code looks messier and more complicated with backticks in queries. Even with SQL helper functions, things would be simpler without them. Hence, I'm considering to leave them behind.
I wanted to find out if there is other implication in this practice such as SQL (MySQL in my case) interpretation speed, etc. What do you think?
Backticks also allow spaces and other special characters (except for backticks, obviously) in table/column names. They're not strictly necessary but a good idea for safety.
If you follow sensible rules for naming tables and columns backticks should be unnecessary.
Every time I see this discussed, I try to lobby for their inclusion, because, well, the answer is hidden in here already, although wryly winked away without further thought. When we mistakenly use a keyword as a field or table name, we can escape confusion by various methods, but only the keenly aware back-tick ` allows an even greater benefit!!!
Every word in a sql statement is run through the entire keyword hash table to see if conflicts, therefore, you've done you query a great favor by telling the compiler that, hey, I know what I'm doing, you don't need to check these words because they represent table and field names. Speed and elegance.
Cheers,
Brad
backticks are used to escape reserved keywords in your mysql query, e.g. you want to have a count column—not that uncommon.
you can use other special characters or spaces in your column/table/db names
they do not keep you safe from injection attacks (if you allow users to enter column names in some way—bad practice anyway)
they are not standardized sql and will only work in mysql; other dbms will use " instead
Well, if you ensure that you never accidentally use a keyword as an identifier, you don't need the backticks. :-)
You read the documentation on identifiers at http://dev.mysql.com/doc/refman/5.6/en/identifiers.html
SQL generators will often include backticks, as it is simpler than including a list of all MySQL reserved words. To use any1 sequence of BMP Unicode characters except U+0000 as an identifier, they can simply
Replace all backticks with double backticks
Surround that with single backticks
When writing handmade queries, I know (most of) MySQL's reserved words, and I prefer to not use backticks where possible as it is shorter and IMO easier to read.
Most of the time, it's just a style preference -- unless of course, you have a field like date or My Field, and then you must use backticks.
1. Though see https://bugs.mysql.com/bug.php?id=68676
My belief was that the backticks were primarily used to prevent erroneous queries which utilized common SQL identifiers, i.e. LIMIT and COUNT.
Is SQL case sensitive? I've used MySQL and SQL Server which both seem to be case insensitive. Is this always the case? Does the standard define case-sensitivity?
The SQL keywords are case insensitive (SELECT, FROM, WHERE, etc), but they are often written in all caps. However, in some setups, table and column names are case sensitive.
MySQL has a configuration option to enable/disable it. Usually case sensitive table and column names are the default on Linux MySQL and case insensitive used to be the default on Windows, but now the installer asked about this during setup. For SQL Server it is a function of the database's collation setting.
Here is the MySQL page about name case-sensitivity
Here is the article in MSDN about collations for SQL Server
This isn't strictly SQL language, but in SQL Server if your database collation is case-sensitive, then all table names are case-sensitive.
The SQL-92 specification states that identifiers might be quoted, or unquoted. If both sides are unquoted then they are always case insensitive, e.g., table_name == TAble_nAmE.
However, quoted identifiers are case sensitive, e.g., "table_name" != "TAble_naME". Also based on the specification if you wish to compare unquoted identifiers with quoted ones, then unquoted and quoted identifiers can be considered the same, if the unquoted characters are uppercased, e.g. TABLE_NAME == "TABLE_NAME", but TABLE_NAME != "table_name" or TABLE_NAME != "TAble_NaMe".
Here is the relevant part of the specification (section 5.2.13):
A <regular identifier> and a <delimited identifier> are equivalent if the <identifier body> of the <regular identifier> (with
every letter that is a lower-case letter replaced by the equivalent upper-case letter or letters) and the <delimited identifier
body> of the <delimited identifier> (with all occurrences of
<quote> replaced by <quote symbol> and all occurrences of <doublequote symbol> replaced by <double quote>), considered as
the repetition of a <character string literal> that specifies a
<character set specification> of SQL_TEXT and an implementation-
defined collation that is sensitive to case, compare equally
according to the comparison rules in Subclause 8.2, "<comparison
predicate>".
Note, that just like with other parts of the SQL standard, not all databases follow this section fully. PostgreSQL for example stores all unquoted identifiers lowercased instead of uppercased, so table_name == "table_name" (which is exactly the opposite of the standard). Also some databases are case insensitive all the time, or case-sensitiveness depend on some setting in the DB or are dependent on some of the properties of the system, usually whether the file system is case sensitive or not.
Note that some database tools might send identifiers quoted all the time, so in instances where you mix queries generated by some tool (like a CREATE TABLE query generated by Liquibase or other DB migration tool), with hand made queries (like a simple JDBC select in your application) you have to make sure that the cases are consistent, especially on databases where quoted and unquoted identifiers are different (DB2, PostgreSQL, etc.)
In SQL Server it is an option. Turning it on sucks.
I'm not sure about MySQL.
Identifiers and reserved words should not be case sensitive, although many follow a convention to use capitals for reserved words and upper camel case for identifiers.
See SQL-92 Sec. 5.2
My understanding is that the SQL standard calls for case-insensitivity. I don't believe any databases follow the standard completely, though.
MySQL has a configuration setting as part of its "strict mode" (a grab bag of several settings that make MySQL more standards-compliant) for case sensitive or insensitive table names. Regardless of this setting, column names are still case-insensitive, although I think it affects how the column-names are displayed. I believe this setting is instance-wide, across all databases within the RDBMS instance, although I'm researching today to confirm this (and hoping the answer is no).
I like how Oracle handles this far better. In straight SQL, identifiers like table and column names are case insensitive. However, if for some reason you really desire to get explicit casing, you can enclose the identifier in double-quotes (which are quite different in Oracle SQL from the single-quotes used to enclose string data). So:
SELECT fieldName
FROM tableName;
will query fieldname from tablename, but
SELECT "fieldName"
FROM "tableName";
will query fieldName from tableName.
I'm pretty sure you could even use this mechanism to insert spaces or other non-standard characters into an identifier.
In this situation if for some reason you found explicitly-cased table and column names desirable it was available to you, but it was still something I would highly caution against.
My convention when I used Oracle on a daily basis was that in code I would put all Oracle SQL keywords in uppercase and all identifiers in lowercase. In documentation I would put all table and column names in uppercase. It was very convenient and readable to be able to do this (although sometimes a pain to type so many capitals in code -- I'm sure I could've found an editor feature to help, here).
In my opinion MySQL is particularly bad for differing about this on different platforms. We need to be able to dump databases on Windows and load them into Unix, and doing so is a disaster if the installer on Windows forgot to put the RDBMS into case-sensitive mode. (To be fair, part of the reason this is a disaster is our coders made the bad decision, long ago, to rely on the case-sensitivity of MySQL on UNIX.) The people who wrote the Windows MySQL installer made it really convenient and Windows-like, and it was great to move toward giving people a checkbox to say "Would you like to turn on strict mode and make MySQL more standards-compliant?" But it is very convenient for MySQL to differ so significantly from the standard, and then make matters worse by turning around and differing from its own de facto standard on different platforms. I'm sure that on differing Linux distributions this may be further compounded, as packagers for different distros probably have at times incorporated their own preferred MySQL configuration settings.
Here's another Stack Overflow question that gets into discussing if case-sensitivity is desirable in an RDBMS.
No. MySQL is not case sensitive, and neither is the SQL standard. It's just common practice to write the commands upper-case.
Now, if you are talking about table/column names, then yes they are, but not the commands themselves.
So
SELECT * FROM foo;
is the same as
select * from foo;
but not the same as
select * from FOO;
I found this blog post to be very helpful (I am not the author). Summarizing (please read, though):
...delimited identifiers are case sensitive ("table_name" != "Table_Name"), while non quoted identifiers are not, and are transformed to upper case (table_name => TABLE_NAME).
He found DB2, Oracle and Interbase/Firebird are 100% compliant:
PostgreSQL ... lowercases every unquoted identifier, instead of uppercasing it. MySQL ... file system dependent. SQLite and SQL Server ... case of the table and field names are preserved on creation, but they are completely ignored afterwards.
I don't think SQL Server is case sensitive, at least not by default.
When I'm querying manually via SQL Server Management Studio, I mess up case all the time and it cheerfully accepts it:
select cOL1, col2 FrOM taBLeName WheRE ...
SQL keywords are case insensitive themselves.
Names of tables, columns, etc., have a case sensitivity which is database dependent - you should probably assume that they are case sensitive unless you know otherwise (in many databases they aren't though; in MySQL table names are sometimes case sensitive, but most other names are not).
Comparing data using =, >, <, etc., has a case awareness which is dependent on the collation settings which are in use on the individual database, table or even column in question. It's normal however, to keep collation fairly consistent within a database. We have a few columns which need to store case sensitive values; they have a collation specifically set.
Have the best of both worlds
These days you can just write all your SQL statements in lowercase and if you ever need to have it formatted then just install a plugin that will do it for you. This is only applicable if your code editor has those plug-ins available. Visual Studio Code has many extensions that can do this.
Here's a couple you can use: vscode-sql-formatter and SqlFormatter-VSCode