Once I had spent hours in debugging a simple SQL query using mysql_query() in PHP/MySQL only to realise that I had missed bactick around the table name. From then I had been always using it around table names.
But when I used the same in SQLite/C++, the symbol is not even recognized. It's confusing, whether to use this or not? What does standard say about usage of it?
Also, it would be helpful if anyone could tell me when to use quotes and when not. I mean around values and field names.
The SQL standard (current version is ISO/IEC 9075:2011, in multiple parts) says nothing about the 'back-tick' or 'back-quote' symbol (Unicode U+0060 or GRAVE ACCENT); it doesn't recognize it as a character with special meaning that can appear in SQL.
The Standard SQL mechanism for quoting identifiers is with delimited identifiers enclosed in double quotes:
SELECT "select" FROM "from" WHERE "where" = "group by";
In MySQL, that might be written:
SELECT `select` FROM `from` WHERE `where` = `group by`;
In MS SQL Server, that might be written:
SELECT [select] FROM [from] WHERE [where] = [group by];
The trouble with the SQL Standard notation is that C programmers are used to enclosing strings in double quotes, so most DBMS use double quotes as an alternative to the single quotes recognized by the standard. But that then leaves you with a problem when you want to enclose identifiers.
Microsoft took one approach; MySQL took another; Informix allows interchangeable use of single and double quotes, but if you want delimited identifiers, you set an environment variable and then you have to follow the standard (single quotes for strings, double quotes for identifiers); DB2 only follows the standard, AFAIK; SQLite appears to follow the standard; Oracle also appears to follow the standard; Sybase appears to allow either double quotes (standard) or square brackets (as with MS SQL Server — which means SQL Server might allow double quotes too). This page (link AWOL since 2013 — now available in The Wayback Machine) documents documented all these servers (and was helpful filling out the gaps in my knowledge) and notes whether the strings inside delimited identifiers are case-sensitive or not.
As to when to use a quoting mechanism around identifiers, my attitude is 'never'. Well, not quite never, but only when absolutely forced into doing so.
Note that delimited identifiers are case-sensitive; that is, "from" and "FROM" refer to different columns (in most DBMS — see URL above). Most of SQL is not case-sensitive; it is a nuisance to know which case to use. (The SQL Standard has a mainframe orientation — it expects names to be converted to upper-case; most DBMS convert names to lower-case, though.)
In general, you must delimit identifiers which are keywords to the version of SQL you are using. That means most of the keywords in Standard SQL, plus any extras that are part of the particular implementation(s) that you are using.
One continuing source of trouble is when you upgrade the server, where a column name that was not a keyword in release N becomes a keyword in release N+1. Existing SQL that worked before the upgrade stops working afterwards. Then, at least as a short-term measure, you may be forced into quoting the name. But in the ordinary course of events, you should aim to avoid needing to quote identifiers.
Of course, my attitude is coloured by the fact that Informix (which is what I work with mostly) accepts this SQL verbatim, whereas most DBMS would choke on it:
CREATE TABLE TABLE
(
DATE INTEGER NOT NULL,
NULL FLOAT NOT NULL,
FLOAT INTEGER NOT NULL,
NOT DATE NOT NULL,
INTEGER FLOAT NOT NULL
);
Of course, the person who produces such a ridiculous table for anything other than demonstration purposes should be hung, drawn, quartered and then the residue should be made to fix the mess they've created. But, within some limits which customers routinely manage to hit, keywords can be used as identifiers in many contexts. That is, of itself, a useful form of future-proofing. If a word becomes a keyword, there's a moderate chance that the existing code will continue to work unaffected by the change. However, the mechanism is not perfect; you can't create a table with a column called PRIMARY, but you can alter a table to add such a column. There is a reason for the idiosyncrasy, but it is hard to explain.
Trailing underscore
You said:
it would be helpful if anyone could tell me when to use quotes and when not
Years ago I surveyed several relational database products looking for commands, keywords, and reserved words. Shockingly, I found over a thousand distinct words.
Many of them were surprisingly counter-intuitive as a "database word". So I feared there was no simple way to avoid unintentional collisions with reserved words while naming my tables, columns, and such.
Then I found this tip some where on the internets:
Use a trailing underscore in all your SQL naming.
Turns out the SQL specification makes an explicit promise to never use a trailing underscore in any SQL-related names.
Being copyright-protected, I cannot quote the SQL spec directly. But section 5.2.11 <token> and <separator> from a supposed-draft of ISO/IEC 9075:1992, Database Language SQL (SQL-92) says (in my own re-wording):
In the current and future versions of the SQL spec, no keyword will end with an underscore
➥ Though oddly dropped into the SQL spec without discussion, that simple statement to me screams out “Name your stuff with a trailing underscore to avoid all naming collisions”.
Instead of:
person
name
address
…use:
person_
name_
address_
Since adopting this practice, I have found a nice side-effect. In our apps we generally have classes and variables with the same names as the database objects (tables, columns, etc.). So an inherent ambiguity arises as to when referring to the database object versus when referring to the app state (classes, vars). Now the context is clear: When seeing a trailing underscore on a name, the database is specifically indicated. No underscore means the app programming (Java, etc.).
Further tip on SQL naming: For maximum portability, use all-lowercase with underscore between words, as well as the trailing underscore. While the SQL spec requires (not suggests) an implementation to store identifiers in all uppercase while accepting other casing, most/all products ignore this requirement. So after much reading and experimenting, I learned the all-lowercase with underscores will be most portable.
If using all-lowercase, underscores between words, plus a trailing underscore, you may never need to care about enquoting with single-quotes, double-quotes, back-ticks, or brackets.
Related
When using the Query tool in pgAdmin4 for Postgres, I have to use double quotes "" if I want to reference columns in a query.
Can this be altered so that double quotes are not needed? I have my database setup in Manjaro yet I have the same setup on another system in Ubuntu and I am 99% sure that on that install, I do not need to use double quotes in the query tool.
Does anyone know if this is a setting that could be amended as it is really annoying having to put all column references into double quotes all the time
This simple select query fails:
SELECT saleDate,qty,saleAmount FROM sales
and I get the following error:
HINT: Perhaps you meant to reference the column "sales.saleDate". SQL state: 42703 Character: 8
Yet this works fine:
SELECT "saleDate", "qty", "saleAmount" FROM sales
Would just be nice not to have to reference every single column with ""'s
You only have to use double quotes for case sensitive identifiers or identifiers including special characters or that are reserved words.
Simply avoid using such identifiers when creating objects, then you don't need to double quote them later on.
The identifiers in the database don't need to be "pretty" after all. The presentation layer should handle that.
4.1.1. Identifiers and Key Words:
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. For example, the identifiers FOO, foo, and "foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different from these three and each other. (The folding of unquoted names to lower case in PostgreSQL is incompatible with the SQL standard, which says that unquoted names should be folded to upper case. Thus, foo should be equivalent to "FOO" not "foo" according to the standard. If you want to write portable applications you are advised to always quote a particular name or never quote it.)
I came across some SQL in an application which had no space before the "ORDER BY" clause. I was surprised that this even works.
Given a table of numbers, called [counter] where there is simply one column, counter_id that is an incrementing list of integers this SQL works fine in Microsoft SQL Server 2012
select
*
FROM [counter] c
where c.counter_id = 1000ORDER by counter_id
This also works with strings, e.g.:
WHERE some_string = 'test'ORDER BY something
My question is, are there any potential pitfalls or dangers with this query? And conversely, are there any benefits? Other than saving, what, 8 bits of network traffic for that whitespace (whcih may well be a consideration in some applications)
Let me explain the reason why this works with numbers and strings.
The reason is because numbers cannot start identifiers, unless the name is escaped. Basically, the first things that happens to a SQL query is tokenization. That is, the components of the query are broken into identifiers and keywords, which are then analyzed.
In SQL Server, keywords and identifiers and function names (and so on) cannot start with a digit (unless the name is escaped, of course). So, when the tokenizer encounters a digit, it knows that it has a number. The number ends when a non-digit character is encountered. So, a sequence of characters such as 1000ORDER BY is easily turned into three tokens, 1000, ORDER, and BY.
Similarly, the first time that a single quote is encountered, it always represents a string literal. The string literal ends when the final single quote is encountered. The next set of characters represents another token.
Let me add that there is exactly zero reason to ever use these nuances. First, these rules are properties of SQL Server's tokenization and do not necessarily apply to other databases. Second, the purpose of SQL is for humans to be able to express queries. It is way, way more important that we read them.
As jarlh mentioned there might be difference during scanning and parsing the tokens but it creates correctly during execution plan, hence it might not be huge difference in advantages or disadvantages
When parser examines characters ,it checks for keywords,identifiers,string constants and match overall semantic and syntactic structure of the language. Since 'Order by' is a keyword and sql parser knows its possible syntactic location in a query, it will interpret it accordingly without throwing any error. This is the reason why your order by will not throw any error.
Parsing sql query
Parsing SQL
Should I avoid special characters like "é á ç" in SQL table names and column names?
What are the pros and cons of using special characters?
As you can guess, there are pros and cons. This is more or less a subjective question.
SQL (unlike most programming languages) allows you to use special characters, whitespace, punctuation, or reserved words in your table or column identifiers.
It's pretty nice that people have the choice to use appropriate characters for their native language.
Especially in cases where a word changes its meaning significantly when spelled with the closest ASCII characters: e.g. año vs. ano.
But the downside is that if you do this, you have to use "delimited identifiers" every time you reference the table with special characters. In standard SQL, delimited identifiers use double-quotes.
SELECT * FROM "SELECT"
This is actually okay! If you want to use an SQL reserved word as a table name, you can do it. But it might cause some confusion for some readers of the code.
Likewise if you use special non-ASCII characters, it might make it hard for English-speaking programmers to maintain the code, because they are not familiar with the key sequence to type those special characters. Or they might forget that they have to delimit the table names.
SELECT * FROM "año"
Then there's non-standard delimited identifiers. Microsoft uses square-brackets by default:
SELECT * FROM [año]
And MySQL uses back-ticks by default:
SELECT * FROM `año`
Though both can use the standard double-quotes as identifier delimiters if you enable certain options, you can't always rely on that, and if the option gets disabled, your code will stop working. So users of Microsoft and MySQL are kind of stuck using the non-standard delimiters, unfortunately.
Maintaining the code is simpler in some ways if you can stick with ASCII characters. But there are legitimate reasons to want to use special characters too.
How do I create a table in H2 with a column named GROUP? I saw an example that used something like [*] a while ago, but I can't seem to find it.
Trailing Underscore
Add a trailing underscore: GROUP_
The SQL spec explicitly promises† that no keyword will ever have a trailing underscore. So you are guaranteed that any naming you create with a trailing underscore will never collide with a keyword or reserved word.
I name all my columns, constraints, etc. in the database with a trailing underscore. Seems a bit weird at first, but you get used to seeing it. Turns out to have a nice side-effect: In all the programming as well as notes and emails, when I see the trailing underscore I know the context is the database as opposed to a programming variable or a business term.
Another benefit is peace-of-mind. Such a relief to eliminate an entire class of possible bugs and weird problems due to keyword collision. If you are thinking, "No big deal - what's a few SQL keywords to memorize and avoid", think again. There are a zillion keywords and reserved words, a zillion being over a thousand.
The answer by Shiva is correct as well: Adding quotes around the name, "GROUP", does solve the problem. The downside is that remembering to add those quotes will be tiresome and troublesome.
Further tip: For maximum compatibility across various SQL databases, do your naming in all lowercase. The SQL spec says that all names should be stored in uppercase while tolerating lowercase. But unfortunately some (most?) databases fail to follow the spec in that regard. After hours of study of various databases, I concluded that all-lowercase gives you maximum portability.
So I actually suggest you name your column: group_
Multiple word names look like this: given_name_ and date_of_first_contact_
† I cannot quote the SQL spec because it is copyright protected, unfortunately. In the SQL:2011 spec, read section 5.4 Names and identifiers under the heading Syntax Rules item 3, NOTE 111. In SQL-92 see section 5.2, item 11. Just searching for the word underscore will work.
I've been facing the same problem recently, my table has columns "key" and "level", both of which are keywords. So instead of renaming the actual tables or bastardising the DB/configuration in any other way, just for the coughing test, the fix was to put the following in the driver configuration in application.properties:
jdbc.url=jdbc:h2:mem:db;NON_KEYWORDS=KEY,LEVEL
And beyond that, I did not have to change a thing in Hibernate/entity settings and JPA was happy and never complained again.
see details here:
https://www.h2database.com/html/commands.html#set_non_keywords
You have to surround the reserved word column name in quotes, like so
"GROUP"
Source (direct link): h2database.com
Keywords / Reserved Words
There is a list of keywords that can't be used as identifiers (table
names, column names and so on), unless they are quoted (surrounded
with double quotes). The list is currently:
CROSS, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, DISTINCT,
EXCEPT, EXISTS, FALSE, FOR, FROM, FULL, GROUP, HAVING, INNER,
INTERSECT, IS, JOIN, LIKE, LIMIT, MINUS, NATURAL, NOT, NULL, ON,
ORDER, PRIMARY, ROWNUM, SELECT, SYSDATE, SYSTIME, SYSTIMESTAMP, TODAY,
TRUE, UNION, UNIQUE, WHERE
Certain words of this list are keywords because they are functions
that can be used without '()' for compatibility, for example
CURRENT_TIMESTAMP.
I've been having this problem with SQL generated by JPA... Turned out I was using a variable name called limit.
Caused by: org.h2.jdbc.JdbcSQLSyntaxErrorException: Syntax error in SQL statement "CREATE TABLE EXPENSE_LIMIT (ID BIGINT NOT NULL, LIMIT[*] DECIMAL(19,2), ACCOUNT_ID BIGINT, EXPENSE_CATEGORY_ID BIGINT, PERIOD_ID BIGINT, PRIMARY KEY (ID)) "; expected "identifier"; SQL statement:
Where my model class had a field called limit.
The fix is to specify column name as
#Column(name = "`limit`")
In one of my project, it was required to have a table with space in between. Some suggest me not to include spaces because it is not a good technique.
we can still implement it using single-double quotes for table name in queries. But i need a solid backing for not opting spaces. Please help.
It makes it harder to read, creates complexity if you ever want to do dynamic SQL. Spaces in the tables names on the other hand add no value whatsoever.
Mr. Anderson points out that its tedious. This is true enough, but more importantly it adds unnecessary tediousness.
I would never use spaces (nor other special characters) in table or column names.
Out of lazyness is one point (so typing SQL queries is a lot easier because you don't need those dreaded quotes)
Secondly a lot of tools out there might still have problems with non-standard table names.
Btw: the quote character for non-standard object names is a double quote (")
If you really go down that road, I would highly recommend to put MySQL into "ANSI Mode" in order to be compatible with the rest of the (DBMS) world.
(Single quotes are for character literals, double quotes for "escaping" non-standard names)