In one of my project, it was required to have a table with space in between. Some suggest me not to include spaces because it is not a good technique.
we can still implement it using single-double quotes for table name in queries. But i need a solid backing for not opting spaces. Please help.
It makes it harder to read, creates complexity if you ever want to do dynamic SQL. Spaces in the tables names on the other hand add no value whatsoever.
Mr. Anderson points out that its tedious. This is true enough, but more importantly it adds unnecessary tediousness.
I would never use spaces (nor other special characters) in table or column names.
Out of lazyness is one point (so typing SQL queries is a lot easier because you don't need those dreaded quotes)
Secondly a lot of tools out there might still have problems with non-standard table names.
Btw: the quote character for non-standard object names is a double quote (")
If you really go down that road, I would highly recommend to put MySQL into "ANSI Mode" in order to be compatible with the rest of the (DBMS) world.
(Single quotes are for character literals, double quotes for "escaping" non-standard names)
Related
I am participating in an online program to become a data analyst. I am using SQL to construct queries along with the instruction videos. I feel there is no consistency using SQL regarding the use of backticks and apostrophes. When should I use backticks while constructing SQL queries, and when should I use apostrophes? Any help you can provide me is greatly appreciated. Thank you very much.
Greg
Backticks are typically used by MySql as delimiters for using reserved words or names containing special characters such as a space. Most other platforms support "double quotes" as identifiers or also have their own proprietary syntax.
' single quotes are used to define 'string literal' values in almost all database platforms.
Should I avoid special characters like "é á ç" in SQL table names and column names?
What are the pros and cons of using special characters?
As you can guess, there are pros and cons. This is more or less a subjective question.
SQL (unlike most programming languages) allows you to use special characters, whitespace, punctuation, or reserved words in your table or column identifiers.
It's pretty nice that people have the choice to use appropriate characters for their native language.
Especially in cases where a word changes its meaning significantly when spelled with the closest ASCII characters: e.g. año vs. ano.
But the downside is that if you do this, you have to use "delimited identifiers" every time you reference the table with special characters. In standard SQL, delimited identifiers use double-quotes.
SELECT * FROM "SELECT"
This is actually okay! If you want to use an SQL reserved word as a table name, you can do it. But it might cause some confusion for some readers of the code.
Likewise if you use special non-ASCII characters, it might make it hard for English-speaking programmers to maintain the code, because they are not familiar with the key sequence to type those special characters. Or they might forget that they have to delimit the table names.
SELECT * FROM "año"
Then there's non-standard delimited identifiers. Microsoft uses square-brackets by default:
SELECT * FROM [año]
And MySQL uses back-ticks by default:
SELECT * FROM `año`
Though both can use the standard double-quotes as identifier delimiters if you enable certain options, you can't always rely on that, and if the option gets disabled, your code will stop working. So users of Microsoft and MySQL are kind of stuck using the non-standard delimiters, unfortunately.
Maintaining the code is simpler in some ways if you can stick with ASCII characters. But there are legitimate reasons to want to use special characters too.
I know that I should use it when I deal with data of TEXT type (and I guess the ones that fall back to TEXT), but is it the only case?
Example:
UPDATE names SET name='Mike' WHERE id=3
I'm writing an SQL query auto generation in C++, so I want to make sure I don't miss cases, when I have to add quotes.
Single quotes (') denote textual data, as you noted (e.g., 'Mike' in your example). Numeric data (e.g., 3 in your example), object (table, column, etc) names and syntactic elements (e.g., update, set, where) should not be wrapped in quotes.
The single quote is the delimiter for the string. It lets the parser know where the string starts and where it ends as well as that is is a string. You will find that sometimes you get away with a double quote too.
The only way to be certain you don't miss any cases would be to escape the input, otherwise this will be vulnerable to abuse when somehow a single quote ends up in in the text.
Once I had spent hours in debugging a simple SQL query using mysql_query() in PHP/MySQL only to realise that I had missed bactick around the table name. From then I had been always using it around table names.
But when I used the same in SQLite/C++, the symbol is not even recognized. It's confusing, whether to use this or not? What does standard say about usage of it?
Also, it would be helpful if anyone could tell me when to use quotes and when not. I mean around values and field names.
The SQL standard (current version is ISO/IEC 9075:2011, in multiple parts) says nothing about the 'back-tick' or 'back-quote' symbol (Unicode U+0060 or GRAVE ACCENT); it doesn't recognize it as a character with special meaning that can appear in SQL.
The Standard SQL mechanism for quoting identifiers is with delimited identifiers enclosed in double quotes:
SELECT "select" FROM "from" WHERE "where" = "group by";
In MySQL, that might be written:
SELECT `select` FROM `from` WHERE `where` = `group by`;
In MS SQL Server, that might be written:
SELECT [select] FROM [from] WHERE [where] = [group by];
The trouble with the SQL Standard notation is that C programmers are used to enclosing strings in double quotes, so most DBMS use double quotes as an alternative to the single quotes recognized by the standard. But that then leaves you with a problem when you want to enclose identifiers.
Microsoft took one approach; MySQL took another; Informix allows interchangeable use of single and double quotes, but if you want delimited identifiers, you set an environment variable and then you have to follow the standard (single quotes for strings, double quotes for identifiers); DB2 only follows the standard, AFAIK; SQLite appears to follow the standard; Oracle also appears to follow the standard; Sybase appears to allow either double quotes (standard) or square brackets (as with MS SQL Server — which means SQL Server might allow double quotes too). This page (link AWOL since 2013 — now available in The Wayback Machine) documents documented all these servers (and was helpful filling out the gaps in my knowledge) and notes whether the strings inside delimited identifiers are case-sensitive or not.
As to when to use a quoting mechanism around identifiers, my attitude is 'never'. Well, not quite never, but only when absolutely forced into doing so.
Note that delimited identifiers are case-sensitive; that is, "from" and "FROM" refer to different columns (in most DBMS — see URL above). Most of SQL is not case-sensitive; it is a nuisance to know which case to use. (The SQL Standard has a mainframe orientation — it expects names to be converted to upper-case; most DBMS convert names to lower-case, though.)
In general, you must delimit identifiers which are keywords to the version of SQL you are using. That means most of the keywords in Standard SQL, plus any extras that are part of the particular implementation(s) that you are using.
One continuing source of trouble is when you upgrade the server, where a column name that was not a keyword in release N becomes a keyword in release N+1. Existing SQL that worked before the upgrade stops working afterwards. Then, at least as a short-term measure, you may be forced into quoting the name. But in the ordinary course of events, you should aim to avoid needing to quote identifiers.
Of course, my attitude is coloured by the fact that Informix (which is what I work with mostly) accepts this SQL verbatim, whereas most DBMS would choke on it:
CREATE TABLE TABLE
(
DATE INTEGER NOT NULL,
NULL FLOAT NOT NULL,
FLOAT INTEGER NOT NULL,
NOT DATE NOT NULL,
INTEGER FLOAT NOT NULL
);
Of course, the person who produces such a ridiculous table for anything other than demonstration purposes should be hung, drawn, quartered and then the residue should be made to fix the mess they've created. But, within some limits which customers routinely manage to hit, keywords can be used as identifiers in many contexts. That is, of itself, a useful form of future-proofing. If a word becomes a keyword, there's a moderate chance that the existing code will continue to work unaffected by the change. However, the mechanism is not perfect; you can't create a table with a column called PRIMARY, but you can alter a table to add such a column. There is a reason for the idiosyncrasy, but it is hard to explain.
Trailing underscore
You said:
it would be helpful if anyone could tell me when to use quotes and when not
Years ago I surveyed several relational database products looking for commands, keywords, and reserved words. Shockingly, I found over a thousand distinct words.
Many of them were surprisingly counter-intuitive as a "database word". So I feared there was no simple way to avoid unintentional collisions with reserved words while naming my tables, columns, and such.
Then I found this tip some where on the internets:
Use a trailing underscore in all your SQL naming.
Turns out the SQL specification makes an explicit promise to never use a trailing underscore in any SQL-related names.
Being copyright-protected, I cannot quote the SQL spec directly. But section 5.2.11 <token> and <separator> from a supposed-draft of ISO/IEC 9075:1992, Database Language SQL (SQL-92) says (in my own re-wording):
In the current and future versions of the SQL spec, no keyword will end with an underscore
➥ Though oddly dropped into the SQL spec without discussion, that simple statement to me screams out “Name your stuff with a trailing underscore to avoid all naming collisions”.
Instead of:
person
name
address
…use:
person_
name_
address_
Since adopting this practice, I have found a nice side-effect. In our apps we generally have classes and variables with the same names as the database objects (tables, columns, etc.). So an inherent ambiguity arises as to when referring to the database object versus when referring to the app state (classes, vars). Now the context is clear: When seeing a trailing underscore on a name, the database is specifically indicated. No underscore means the app programming (Java, etc.).
Further tip on SQL naming: For maximum portability, use all-lowercase with underscore between words, as well as the trailing underscore. While the SQL spec requires (not suggests) an implementation to store identifiers in all uppercase while accepting other casing, most/all products ignore this requirement. So after much reading and experimenting, I learned the all-lowercase with underscores will be most portable.
If using all-lowercase, underscores between words, plus a trailing underscore, you may never need to care about enquoting with single-quotes, double-quotes, back-ticks, or brackets.
I have noticed that using either Oracle or SQLite, queries like this are perfectly valid
SELECT*FROM(SELECT a,MAX(b)i FROM c GROUP BY a)WHERE(a=1)OR(i=2);
Is that a “feature” of SQL that keywords or words of a query need not be surrounded with whitespace? If so, why was it designed this way? SQL has been designed to be readable, this seems to be a form of obfuscation (particularly the MAX(b)i thing where i is a token which serves as an alias).
SQL-92 BNF Grammar here explicitly states that delimiters (bracket, whitespace, * etc) are valid to break up the tokens, which makes the white space optional in various cases where other delimiters already break up the tokens.
This is true not only for SQLite and Oracle, but MySQL and SQL Server at least (that I work with and have tested), since it is specified in the language definition.
Whitespace is optional in pretty much any language where it is not absolutely necessary to preserve boundaries between keywords and/or identifiers. You could write code in C# that looked similar to your SQL, and as long as the compiler can still parse the identifiers and keywords, it doesn't care.
Case in point: The subquery of your statement is the only place where whitespace is needed to separate keywords from other alpha characters. Everywhere else, some non-alphanumeric character (which aren't part of any keyword in SQL) separates keywords, so the SQL parser can still digest this statement. As long as that is true, whitespace is purely for human readability.
Most of this is valid simply because you've enclosed key sections in parentheses where white space would ordinarily be required.
I think this is a side effect of the parser.
Usually the compilers will ignore white spaces via blocks SKIP, which are tokens ignored by the compiler but that cause errors if in the middle of a reserved word. For example in C: 'while' is valid, 'whi le' is not although the whitespace is a SKIP token.
The reason is that it simplifies the parser, if not they would have to manage all the white space and that can be quite complex unless you set strict rules like Python does, but that would be both hard to impose to vendors like Oracle and would make SQL more complex than it should.
And that simplification has the (unintended?) side effect of being able to remove MOST (not all) white spaces. Be aware in some cases the removal of white spaces may cause compilation errors (can't remove the space in GROUP BY as that's part of the token).