Does SQL standard allows whitespace between function names and parenthesis - sql

Checking few RDBMS I find that things like
SELECT COUNT (a), SUM (b)
FROM TABLE
are allowed (notice space between aggregate functions and parenthesis).
Could anyone provide a pointer to SQL standard itself where this is defined (any version will do)?
EDIT:
The above works in postgres, mysql needs set sql_mode = "IGNORE_SPACE"; as defined here (for full list of functions that are influenced with this server mode see in this ref).
MS SQL is reported to accept the above.
Also, it seems that the answer is most likely in the standard. I can follow the BNF regarding the regular symbols and terms, but I get lost when it comes to the definition of whitespace and separators in that part of the select.

Yes; the white space between tokens is substantially ignored. The only exception is, officially, with adjacent string literal concatenation - but the standard is weirder than any implementation would be.
See: http://savage.net.au/SQL/

This works in SQL Server 2005:
SELECT COUNT (*)
FROM TABLE
...while one space between COUNT and (*) on MySQL causes a MySQL 1064 error (syntax error). I don't have Oracle or Postgres handy to test.
Whatever the standard may be, it's dependent on implementation in the vendor and version you are using.

I can't provide a pointer, but I believe that white space like that is ignored.
I know that it is in T-SQL, and about 80% certain about MySQL's implementation.

Related

Why is the use of 'from' behind the table name allowed? What does it do?

SQL for DB2 is pretty strict, that's why I was surprised this query succeeded:
select 1 from sysibm.sysdummy1 from
Is it exactly the same as?
select 1 from sysibm.sysdummy1
If the double from is allowed, why isn't a double where/select/order by/having allowed? Is there any difference in the output when running this query on a 'real' table?
Db2 (for Linux, Unix, Windows) provides a list of reserved schemas and words. As stated in the docs, the list is not enforced by Db2, but the recommendation is to not use them for portability reasons.
A from succeeds but not a where because an optional WHERE clause follows in the place you tried to use the reserved words. In that case you have an incomplete WHERE clause and it violates grammar rules. Thus, the recommendation is to respect the list of reserved words and not use them. You may (freedom of expression... ;-) ), but you should be considerate...

No space before ORDER BY - Why does this work?

I came across some SQL in an application which had no space before the "ORDER BY" clause. I was surprised that this even works.
Given a table of numbers, called [counter] where there is simply one column, counter_id that is an incrementing list of integers this SQL works fine in Microsoft SQL Server 2012
select
*
FROM [counter] c
where c.counter_id = 1000ORDER by counter_id
This also works with strings, e.g.:
WHERE some_string = 'test'ORDER BY something
My question is, are there any potential pitfalls or dangers with this query? And conversely, are there any benefits? Other than saving, what, 8 bits of network traffic for that whitespace (whcih may well be a consideration in some applications)
Let me explain the reason why this works with numbers and strings.
The reason is because numbers cannot start identifiers, unless the name is escaped. Basically, the first things that happens to a SQL query is tokenization. That is, the components of the query are broken into identifiers and keywords, which are then analyzed.
In SQL Server, keywords and identifiers and function names (and so on) cannot start with a digit (unless the name is escaped, of course). So, when the tokenizer encounters a digit, it knows that it has a number. The number ends when a non-digit character is encountered. So, a sequence of characters such as 1000ORDER BY is easily turned into three tokens, 1000, ORDER, and BY.
Similarly, the first time that a single quote is encountered, it always represents a string literal. The string literal ends when the final single quote is encountered. The next set of characters represents another token.
Let me add that there is exactly zero reason to ever use these nuances. First, these rules are properties of SQL Server's tokenization and do not necessarily apply to other databases. Second, the purpose of SQL is for humans to be able to express queries. It is way, way more important that we read them.
As jarlh mentioned there might be difference during scanning and parsing the tokens but it creates correctly during execution plan, hence it might not be huge difference in advantages or disadvantages
When parser examines characters ,it checks for keywords,identifiers,string constants and match overall semantic and syntactic structure of the language. Since 'Order by' is a keyword and sql parser knows its possible syntactic location in a query, it will interpret it accordingly without throwing any error. This is the reason why your order by will not throw any error.
Parsing sql query
Parsing SQL

Creating a table in NexusDB with german umlauts?

I'm trying to import a CREATE TABLE statement in NexusDB.
The table name contains some german umlauts and so do some field names but I receive an error that there were some invalid characters in my statement (obviously the umlauts...).
My question is now: can somebody give a solution or any ideas to solve my problem?
It's not so easy to just change the umlauts into equivalent terms like ä -> ae or ö -> oe since our application has fixed table names every customer uses currently.
It is not a good idea to use characters outside what is normally permitted in the SQL standard. This will bite you not only in NexusDB, but in many other databases as well. Take special note that there is a good chance you will also run into problems when you want to access data via ODBC etc, as other environments may also have similar standard restrictions. My strong recommendation would be to avoid use of characters outside the SQL naming standard for tables, no matter which database is used.
However... having said all that, given that NexusDB is one of the most flexible database systems for the programmer (it comes with full source), there is already a solution. If you add an "extendedliterals" define to your database server project, then a larger array of characters are considered valid. For the exact change this enables, see the nxcValidIdentChars constant in the nxllConst.pas unit. The constant may also be changed if required.

What does the SQL Standard say about usage of backtick(`)?

Once I had spent hours in debugging a simple SQL query using mysql_query() in PHP/MySQL only to realise that I had missed bactick around the table name. From then I had been always using it around table names.
But when I used the same in SQLite/C++, the symbol is not even recognized. It's confusing, whether to use this or not? What does standard say about usage of it?
Also, it would be helpful if anyone could tell me when to use quotes and when not. I mean around values and field names.
The SQL standard (current version is ISO/IEC 9075:2011, in multiple parts) says nothing about the 'back-tick' or 'back-quote' symbol (Unicode U+0060 or GRAVE ACCENT); it doesn't recognize it as a character with special meaning that can appear in SQL.
The Standard SQL mechanism for quoting identifiers is with delimited identifiers enclosed in double quotes:
SELECT "select" FROM "from" WHERE "where" = "group by";
In MySQL, that might be written:
SELECT `select` FROM `from` WHERE `where` = `group by`;
In MS SQL Server, that might be written:
SELECT [select] FROM [from] WHERE [where] = [group by];
The trouble with the SQL Standard notation is that C programmers are used to enclosing strings in double quotes, so most DBMS use double quotes as an alternative to the single quotes recognized by the standard. But that then leaves you with a problem when you want to enclose identifiers.
Microsoft took one approach; MySQL took another; Informix allows interchangeable use of single and double quotes, but if you want delimited identifiers, you set an environment variable and then you have to follow the standard (single quotes for strings, double quotes for identifiers); DB2 only follows the standard, AFAIK; SQLite appears to follow the standard; Oracle also appears to follow the standard; Sybase appears to allow either double quotes (standard) or square brackets (as with MS SQL Server — which means SQL Server might allow double quotes too). This page (link AWOL since 2013 — now available in The Wayback Machine) documents documented all these servers (and was helpful filling out the gaps in my knowledge) and notes whether the strings inside delimited identifiers are case-sensitive or not.
As to when to use a quoting mechanism around identifiers, my attitude is 'never'. Well, not quite never, but only when absolutely forced into doing so.
Note that delimited identifiers are case-sensitive; that is, "from" and "FROM" refer to different columns (in most DBMS — see URL above). Most of SQL is not case-sensitive; it is a nuisance to know which case to use. (The SQL Standard has a mainframe orientation — it expects names to be converted to upper-case; most DBMS convert names to lower-case, though.)
In general, you must delimit identifiers which are keywords to the version of SQL you are using. That means most of the keywords in Standard SQL, plus any extras that are part of the particular implementation(s) that you are using.
One continuing source of trouble is when you upgrade the server, where a column name that was not a keyword in release N becomes a keyword in release N+1. Existing SQL that worked before the upgrade stops working afterwards. Then, at least as a short-term measure, you may be forced into quoting the name. But in the ordinary course of events, you should aim to avoid needing to quote identifiers.
Of course, my attitude is coloured by the fact that Informix (which is what I work with mostly) accepts this SQL verbatim, whereas most DBMS would choke on it:
CREATE TABLE TABLE
(
DATE INTEGER NOT NULL,
NULL FLOAT NOT NULL,
FLOAT INTEGER NOT NULL,
NOT DATE NOT NULL,
INTEGER FLOAT NOT NULL
);
Of course, the person who produces such a ridiculous table for anything other than demonstration purposes should be hung, drawn, quartered and then the residue should be made to fix the mess they've created. But, within some limits which customers routinely manage to hit, keywords can be used as identifiers in many contexts. That is, of itself, a useful form of future-proofing. If a word becomes a keyword, there's a moderate chance that the existing code will continue to work unaffected by the change. However, the mechanism is not perfect; you can't create a table with a column called PRIMARY, but you can alter a table to add such a column. There is a reason for the idiosyncrasy, but it is hard to explain.
Trailing underscore
You said:
it would be helpful if anyone could tell me when to use quotes and when not
Years ago I surveyed several relational database products looking for commands, keywords, and reserved words. Shockingly, I found over a thousand distinct words.
Many of them were surprisingly counter-intuitive as a "database word". So I feared there was no simple way to avoid unintentional collisions with reserved words while naming my tables, columns, and such.
Then I found this tip some where on the internets:
Use a trailing underscore in all your SQL naming.
Turns out the SQL specification makes an explicit promise to never use a trailing underscore in any SQL-related names.
Being copyright-protected, I cannot quote the SQL spec directly. But section 5.2.11 <token> and <separator> from a supposed-draft of ISO/IEC 9075:1992, Database Language SQL (SQL-92) says (in my own re-wording):
In the current and future versions of the SQL spec, no keyword will end with an underscore
➥ Though oddly dropped into the SQL spec without discussion, that simple statement to me screams out “Name your stuff with a trailing underscore to avoid all naming collisions”.
Instead of:
person
name
address
…use:
person_
name_
address_
Since adopting this practice, I have found a nice side-effect. In our apps we generally have classes and variables with the same names as the database objects (tables, columns, etc.). So an inherent ambiguity arises as to when referring to the database object versus when referring to the app state (classes, vars). Now the context is clear: When seeing a trailing underscore on a name, the database is specifically indicated. No underscore means the app programming (Java, etc.).
Further tip on SQL naming: For maximum portability, use all-lowercase with underscore between words, as well as the trailing underscore. While the SQL spec requires (not suggests) an implementation to store identifiers in all uppercase while accepting other casing, most/all products ignore this requirement. So after much reading and experimenting, I learned the all-lowercase with underscores will be most portable.
If using all-lowercase, underscores between words, plus a trailing underscore, you may never need to care about enquoting with single-quotes, double-quotes, back-ticks, or brackets.

How to create portable inserts from SQL Server?

Now it generates inserts like
INSERT [Bla] ([id], [description], [name], [version])
VALUES (CAST(1 AS Numeric(19, 0)), convert(t...
It's very SQL Server specific. I would like to create a script that everybody can use, database agnostic. I have very simple data types - varchars, numbers, dates, bits(boolean).
I think
insert into bla values (1, 'die', '2001-01-01 11:11:11')
should work in all DBMSs, right?
Some basic rules:
Get rid of the square brackets. In your case they are not needed - not even in SQL Server. (At the same time make sure you never use reserved words or special characters in column or table names).
If you do need to use special characters or reserved words (which is not something I would recommend), then use the standard double quotes (e.g. "GROUP").
But remember that names are case sensitive then: my_table is the same as MY_TABLE but "my_table" is different to "MY_TABLE" according to the standard. Again this might vary between DBMS and their configuration.
The CAST operator is standard and works on most DBMS (although not all support casting in all possible combinations).
convert() is SQL Server specific and should be replaced with an approriate CAST expression.
Try to specify values in the correct data type, never rely on implicit data conversion (so do not use '1' for a number). Although I don't think casting a 1 to a numeric() should be needed.
Usually I also recommend to use ANSI literals (e.g. DATE '2011-03-14') for DATE/TIMESTAMP literals, but SQL Server does not support that. So it won't help you very much.
A quick glance at the Wikipedia article on SQL, will tell you a bit about standardisation of SQL across different implementations, such as MS SQL, PostgreSQL, Oracle etc.
In short, there is a number of ANSI standards but there is varying support for it throught each product.
The general way to support multiple database servers from your software product is to accept there are differences, code for them at the database level, and make your application able to call the same database access code irrespective of database server.
There are a number of problems with number formats which will not port between dbmses however this pales when you look at the problems with dates and date formats. For instance the default DATE format used in an ORACLE DB depends on the whims of whoever installed the software, you can use date conversion functions to get ORACLE to accept the common date formats - but these functions are ORACLE specific.
Besides how do you know the table and column names will be the same on the target DB?
If you are serious about this, really need to port data between hydrogenous DBMSes, and know a bit of perl thn try using SqlFairy which is available from CPAN. The sheer size of this download should be enough to convince you how complex this problem can be.