Is SQL syntax case sensitive? - sql

Is SQL case sensitive? I've used MySQL and SQL Server which both seem to be case insensitive. Is this always the case? Does the standard define case-sensitivity?

The SQL keywords are case insensitive (SELECT, FROM, WHERE, etc), but they are often written in all caps. However, in some setups, table and column names are case sensitive.
MySQL has a configuration option to enable/disable it. Usually case sensitive table and column names are the default on Linux MySQL and case insensitive used to be the default on Windows, but now the installer asked about this during setup. For SQL Server it is a function of the database's collation setting.
Here is the MySQL page about name case-sensitivity
Here is the article in MSDN about collations for SQL Server

This isn't strictly SQL language, but in SQL Server if your database collation is case-sensitive, then all table names are case-sensitive.

The SQL-92 specification states that identifiers might be quoted, or unquoted. If both sides are unquoted then they are always case insensitive, e.g., table_name == TAble_nAmE.
However, quoted identifiers are case sensitive, e.g., "table_name" != "TAble_naME". Also based on the specification if you wish to compare unquoted identifiers with quoted ones, then unquoted and quoted identifiers can be considered the same, if the unquoted characters are uppercased, e.g. TABLE_NAME == "TABLE_NAME", but TABLE_NAME != "table_name" or TABLE_NAME != "TAble_NaMe".
Here is the relevant part of the specification (section 5.2.13):
A <regular identifier> and a <delimited identifier> are equivalent if the <identifier body> of the <regular identifier> (with
every letter that is a lower-case letter replaced by the equivalent upper-case letter or letters) and the <delimited identifier
body> of the <delimited identifier> (with all occurrences of
<quote> replaced by <quote symbol> and all occurrences of <doublequote symbol> replaced by <double quote>), considered as
the repetition of a <character string literal> that specifies a
<character set specification> of SQL_TEXT and an implementation-
defined collation that is sensitive to case, compare equally
according to the comparison rules in Subclause 8.2, "<comparison
predicate>".
Note, that just like with other parts of the SQL standard, not all databases follow this section fully. PostgreSQL for example stores all unquoted identifiers lowercased instead of uppercased, so table_name == "table_name" (which is exactly the opposite of the standard). Also some databases are case insensitive all the time, or case-sensitiveness depend on some setting in the DB or are dependent on some of the properties of the system, usually whether the file system is case sensitive or not.
Note that some database tools might send identifiers quoted all the time, so in instances where you mix queries generated by some tool (like a CREATE TABLE query generated by Liquibase or other DB migration tool), with hand made queries (like a simple JDBC select in your application) you have to make sure that the cases are consistent, especially on databases where quoted and unquoted identifiers are different (DB2, PostgreSQL, etc.)

In SQL Server it is an option. Turning it on sucks.
I'm not sure about MySQL.

Identifiers and reserved words should not be case sensitive, although many follow a convention to use capitals for reserved words and upper camel case for identifiers.
See SQL-92 Sec. 5.2

My understanding is that the SQL standard calls for case-insensitivity. I don't believe any databases follow the standard completely, though.
MySQL has a configuration setting as part of its "strict mode" (a grab bag of several settings that make MySQL more standards-compliant) for case sensitive or insensitive table names. Regardless of this setting, column names are still case-insensitive, although I think it affects how the column-names are displayed. I believe this setting is instance-wide, across all databases within the RDBMS instance, although I'm researching today to confirm this (and hoping the answer is no).
I like how Oracle handles this far better. In straight SQL, identifiers like table and column names are case insensitive. However, if for some reason you really desire to get explicit casing, you can enclose the identifier in double-quotes (which are quite different in Oracle SQL from the single-quotes used to enclose string data). So:
SELECT fieldName
FROM tableName;
will query fieldname from tablename, but
SELECT "fieldName"
FROM "tableName";
will query fieldName from tableName.
I'm pretty sure you could even use this mechanism to insert spaces or other non-standard characters into an identifier.
In this situation if for some reason you found explicitly-cased table and column names desirable it was available to you, but it was still something I would highly caution against.
My convention when I used Oracle on a daily basis was that in code I would put all Oracle SQL keywords in uppercase and all identifiers in lowercase. In documentation I would put all table and column names in uppercase. It was very convenient and readable to be able to do this (although sometimes a pain to type so many capitals in code -- I'm sure I could've found an editor feature to help, here).
In my opinion MySQL is particularly bad for differing about this on different platforms. We need to be able to dump databases on Windows and load them into Unix, and doing so is a disaster if the installer on Windows forgot to put the RDBMS into case-sensitive mode. (To be fair, part of the reason this is a disaster is our coders made the bad decision, long ago, to rely on the case-sensitivity of MySQL on UNIX.) The people who wrote the Windows MySQL installer made it really convenient and Windows-like, and it was great to move toward giving people a checkbox to say "Would you like to turn on strict mode and make MySQL more standards-compliant?" But it is very convenient for MySQL to differ so significantly from the standard, and then make matters worse by turning around and differing from its own de facto standard on different platforms. I'm sure that on differing Linux distributions this may be further compounded, as packagers for different distros probably have at times incorporated their own preferred MySQL configuration settings.
Here's another Stack Overflow question that gets into discussing if case-sensitivity is desirable in an RDBMS.

No. MySQL is not case sensitive, and neither is the SQL standard. It's just common practice to write the commands upper-case.
Now, if you are talking about table/column names, then yes they are, but not the commands themselves.
So
SELECT * FROM foo;
is the same as
select * from foo;
but not the same as
select * from FOO;

I found this blog post to be very helpful (I am not the author). Summarizing (please read, though):
...delimited identifiers are case sensitive ("table_name" != "Table_Name"), while non quoted identifiers are not, and are transformed to upper case (table_name => TABLE_NAME).
He found DB2, Oracle and Interbase/Firebird are 100% compliant:
PostgreSQL ... lowercases every unquoted identifier, instead of uppercasing it. MySQL ... file system dependent. SQLite and SQL Server ... case of the table and field names are preserved on creation, but they are completely ignored afterwards.

I don't think SQL Server is case sensitive, at least not by default.
When I'm querying manually via SQL Server Management Studio, I mess up case all the time and it cheerfully accepts it:
select cOL1, col2 FrOM taBLeName WheRE ...

SQL keywords are case insensitive themselves.
Names of tables, columns, etc., have a case sensitivity which is database dependent - you should probably assume that they are case sensitive unless you know otherwise (in many databases they aren't though; in MySQL table names are sometimes case sensitive, but most other names are not).
Comparing data using =, >, <, etc., has a case awareness which is dependent on the collation settings which are in use on the individual database, table or even column in question. It's normal however, to keep collation fairly consistent within a database. We have a few columns which need to store case sensitive values; they have a collation specifically set.

Have the best of both worlds
These days you can just write all your SQL statements in lowercase and if you ever need to have it formatted then just install a plugin that will do it for you. This is only applicable if your code editor has those plug-ins available. Visual Studio Code has many extensions that can do this.
Here's a couple you can use: vscode-sql-formatter and SqlFormatter-VSCode

Related

Collation issue in SQL query

I have been reading about collation in SQL, but I am still confused. Why is it that this code works fine:
...case WHEN _AccountID not in ('00000000P','0000000P9','899') THEN 'blah'
but the following does not work and produces an error message
"Cannot resolve the collation conflict between
"SQL_Latin1_General_CP1_CS_AS" and "SQL_Latin1_General_CP1_CI_AS" in
the equal to operation"
...case WHEN _AccountID not in (select _AccountID from tMyTable) THEN 'blah'
especially when the rest of the query is exactly the same!
Actually, I can write other queries where even the latter syntax works fine (so I wouldn't think it's because of actual column values, right?), but my above examples are both from the otherwise exact same query. I can't understand what to look for enough in my data to differentiate the queries in which it works from the queries where it doesn't work.
Collations are used to determine things like sort order and handling for case sensitivity. Collations can be set at the server, database, table and column level. So two columns in the one table could potentially have different collations. In your error message, one collation is case insensitive (CI) and one is case sensitive (CS). What we don't know yet from the information you've posted, is the server/database/tables the two columns called _AccountID are stored. Nevertheless they have different collations. CI and CS are addressed in BOL thusly:
Distinguishes between uppercase and lowercase letters. If selected,
lowercase letters sort ahead of their uppercase versions. If this
option is not selected, the collation is case-insensitive. That is,
SQL Server considers the uppercase and lowercase versions of letters
to be identical for sorting purposes. You can explicitly select case
insensitivity by specifying _CI.
One workaround assuming the first _AccountID has a different collation to the database's default collation (and the second one uses the database default), might be:
...case WHEN _AccountID collate database_default not in (select _AccountID from tMyTable) THEN 'blah'
As an aside, assuming you're using SQL Server, you might want to consider using
WHERE NOT EXISTS (SELECT * FROM from tMyTable tbl WHERE tbl._AccountID = <the_other>._AccountID)
...which will perform better than WHERE NOT IN (SELECT...)

Is it defined anywhere (in the SQL spec, or ODBC spec, or something similar) what collation is used to resolve identifiers?

Not sure if this question is a good fit for stackoverflow as it's not specific to any particular database or api, but I can't really google an answer, or find one in, for example, the SQL-92 spec (it talks about collation, but seemingly only for data, not identifiers).
My question is, does SQL/ODBC take into account the possibility of different collations being used by different databases to resolve identifiers(i.e. columns, tables, scalar function names, etc)? I've seen mentions of doing case-insensitive matching (for example for ODBC catalog function arguments), but that's only a special-case for collations (along with the implicit-opposite of the binary collation), or is some 'default' collation assumed? It feels like an oversight...
The answer is likely to be implementation-dependent, so there's probably no dependable consistent behavior.
I can answer for MySQL.
Objects that map to filesystem entries (like database/schema names and table names and view names and partition names), are case-sensitive, unless you're on an operating system where the filesystem is case-insensitive.
Other objects like column names, triggers, stored procedures, and so on are case-insensitive. The INFORMATION_SCHEMA uses a collation of utf_general_ci (case-insensitive).
For more details, read http://dev.mysql.com/doc/refman/5.7/en/charset-collation-information-schema.html

Why are table aliases not compiled out of existence when sharing SQL statements (on Oracle DBMS)

Quest Software\Knowledge Xpert states:
If two identical SQL statements vary because an identical table has two different aliases, then the SQL is different and will not be shared.
What sense does this make?
I understand that if I have table A and table B and I fail to alias an ambiguous column what I'm trying to do is mathematically ambiguous, but the names of the aliases themselves shouldn't matter should they? Why would SQL/Oracle care that table A's alias is FOO in one statement and BAR in another when determining for caching purposes if they are identical?
On a similar line why should whitespace or word case matter at all?
"SQL cannot be shared within the SGA unless it is absolutely identical. Statement components that must be the same include:
Word case (uppercase and lowercase characters)
 
Whitespace
 
Underlying schema objects"
Underlying schema objects makes sense, because after all mathematically that's something different. Is the idea I might be an idiot and have columns named "Foo" "FOO" and "foo" and we don't want to accidentally cache?
I think it's to avoid the extra overhead of "normalizing" each SQL statement before creating a SQL_ID.
The SQL_ID is a hash of the SQL statement. In order to do what you are asking, it would require the SQL parser to do extra work (for limited benefit) in order to make a uniform SQL statement that would compare exactly with another statement that was equivalent, but had mixed case, extra spaces, etc.
I think this restrictions are due to SQL processing mechanism Oracle uses. It calculates hash value of query text and if this hash matches with one stored in SGA it helps to avoid hard parsing steps. More details are here.

Using backquote/backticks for mysql queries

I have building MYSQL queries with backticks. For example,
SELECT `title` FROM `table` WHERE (`id` = 3)
as opposed to:
SELECT title FROM table WHERE (id = 3)
I think I got this practice from the Phpmyadmin exports, and from what I understood, even Rails generates its queries like this.
But nowadays I see less and less queries built like this, and also, the code looks messier and more complicated with backticks in queries. Even with SQL helper functions, things would be simpler without them. Hence, I'm considering to leave them behind.
I wanted to find out if there is other implication in this practice such as SQL (MySQL in my case) interpretation speed, etc. What do you think?
Backticks also allow spaces and other special characters (except for backticks, obviously) in table/column names. They're not strictly necessary but a good idea for safety.
If you follow sensible rules for naming tables and columns backticks should be unnecessary.
Every time I see this discussed, I try to lobby for their inclusion, because, well, the answer is hidden in here already, although wryly winked away without further thought. When we mistakenly use a keyword as a field or table name, we can escape confusion by various methods, but only the keenly aware back-tick ` allows an even greater benefit!!!
Every word in a sql statement is run through the entire keyword hash table to see if conflicts, therefore, you've done you query a great favor by telling the compiler that, hey, I know what I'm doing, you don't need to check these words because they represent table and field names. Speed and elegance.
Cheers,
Brad
backticks are used to escape reserved keywords in your mysql query, e.g. you want to have a count column—not that uncommon.
you can use other special characters or spaces in your column/table/db names
they do not keep you safe from injection attacks (if you allow users to enter column names in some way—bad practice anyway)
they are not standardized sql and will only work in mysql; other dbms will use " instead
Well, if you ensure that you never accidentally use a keyword as an identifier, you don't need the backticks. :-)
You read the documentation on identifiers at http://dev.mysql.com/doc/refman/5.6/en/identifiers.html
SQL generators will often include backticks, as it is simpler than including a list of all MySQL reserved words. To use any1 sequence of BMP Unicode characters except U+0000 as an identifier, they can simply
Replace all backticks with double backticks
Surround that with single backticks
When writing handmade queries, I know (most of) MySQL's reserved words, and I prefer to not use backticks where possible as it is shorter and IMO easier to read.
Most of the time, it's just a style preference -- unless of course, you have a field like date or My Field, and then you must use backticks.
1. Though see https://bugs.mysql.com/bug.php?id=68676
My belief was that the backticks were primarily used to prevent erroneous queries which utilized common SQL identifiers, i.e. LIMIT and COUNT.

Why can you have a column named ORDER in DB2?

In DB2, you can name a column ORDER and write SQL like
SELECT ORDER FROM tblWHATEVER ORDER BY ORDER
without even needing to put any special characters around the column name. This is causing me pain that I won't get into, but my question is: why do databases allow the use of SQL keywords for object names? Surely it would make more sense to just not allow this?
I largely agree with the sentiment that keywords shouldn't be allowed as identifiers. Most modern computing languages have 20 or maybe 30 keywords, in which case imposing a moratorium on their use as identifiers is entirely reasonable. Unfortunately, SQL comes from the old COBOL school of languages ("computing languages should be as similar to English as possible"). Hence, SQL (like COBOL) has several hundred keywords.
I don't recall if the SQL standard says anything about whether reserved words must be permitted as identifiers, but given the extensive (excessive!) vocabulary it's unsurprising that several SQL implementations permit it.
Having said that, using keywords as identifiers isn't half as silly as the whole concept of quoted identifiers in SQL (and these aren't DB2 specific). Permitting case sensitive identifiers is one thing, but quoted identifiers permit all sorts of nonsense including spaces, diacriticals and in some implementations (yes, including DB2), control characters! Try the following for example:
CREATE TABLE "My
Tablé" ( A INTEGER NOT NULL );
Yes, that's a line break in the middle of an identifier along with an e-acute at the end... (which leads to interesting speculation on what encoding is used for database meta-data and hence whether a non-Unicode database would permit, say, a table definition containing Japanese column names).
Many SQL parsers (expecially DB2/z, which I use) are smarter than some of the regular parsers which sometimes separate lexical and semantic analysis totally (this separation is mostly a good thing).
The SQL parsers can figure out based on context whether a keyword is valid or should be treated as an identifier.
Hence you can get columns called ORDER or GROUP or DATE (that's a particularly common one).
It does annoy me with some of the syntax coloring editors when they brand an identifier with the keyword color. Their parsers aren't as 'smart' as the ones in DB2.
Because object names are ... names. All database systems let you use quoted names to stop you from running into trouble.
If you are running into issues, the fault lies not with the practice of permitting object names to be names, but with faulty implementations, or with faulty code libraries which don't automatically quote everything or cannot be made to quote names as-needed.
Interestingly you can use keywords as field names in SqlServer as well. The only differenc eis that you would need to use parenthesis with the name of the field
so you can do something like
create table [order](
id int,
[order] varchar(50) )
and then :)
select
[order]
from
[order]
order by [order]
That is of course a bit extreme example but at least with the use of parenthesis you can see that [order] is not a keyword.
The reason I would see people using names already reserved by keywords is when there is a direct mapping between column names, or names of the tables and the data presentation. You can call that being lazy or convenient.