Are nested aggregate functions allowed/not allowed by SQL standard? - sql

My question is strictly related to the ISO SQL standard, and not some of its implementations like Oracle, MySQL, PostgreSQL etc.
The question is:
Are nested aggregate functions allowed/forbidden by the SQL standard? For example, the following query:
select MAX(COUNT(*))
from a_table
group by a_column
In particular, I'm reading the documentation available at this link, but I can't find any reference about this
As far as I know, Oracle allows this, but MySQL and PostgreSQL not, so I was wondering if somebody that knows the standard better than me, can tell if it's a grey area where different implementations can do what they want, or if there are some constraints on the standard

Related

How different are SQL dialects for basic queries?

New to SQL, so please excuse imprecision in the question.
For "normal" queries, is SQL syntax mutually intelligible between dialects? To take a concrete example, would SELECT * FROM [Pending Scans] be valid in all common dialects?
Not looking for an exhaustive list!
No. This would be:
select *
from pending_scans;
Square braces are non-standard for escaping identifiers. The standard escape character for identifiers is the double quote, and most databases now support that.
I should note that for all but the simplest queries -- such as the one you have written -- slight differences between dialects make it a fool's errand to try to write complete portable code. For instance, function names are different, such as len() versus length(), and date/time operations are quite bespoke.
If you are writing an application that needs to support multiple different database types, a typical method would be to define the API as a set of views and use the views with simple SELECT queries (as in your example).

What is the MAP keyword in SQL?

I have been trying to improve my SQL chops a little bit by working through SQL Koans, since I enjoy the learn-by-doing-and-meditating approach, and my SQL knowledge is lacking. In one set of koans is the following:
-- Meditate on MANY-TO-MANY relationships
select a.first_name, a.last_name, b.title
from book b
join book_to_author_map map on _____.id = _____.book_id
join author a on _____.author_id = _____.id
where author_id in (1, 5, 6)
Having no previous experience with aliasing, and little experience with joins, I was stuck on this problem for a little while. I was stuck longer than necessary though, because in Emacs sql-mode the word map (which I understand now to be an alias for the table book_to_author_map) is highlighted as an SQL keyword. I spent a lot of time looking for documentation on this keyword, and found nothing (aside from lots of information on sqlmap...).
Peeking at the source code for the Emacs sql-mode I found that map is designated as a keyword as part of sql-mode-postgres-font-lock-keywords, so I started to search for map in relation to PostgreSQL, and found it in a list of SQL keywords in the PostgreSQL documentation. The keyword MAP is designated as a "non-reserved" keyword for SQL:2003 and a "reserved" keyword for SQL:1999. However, I have been unable so far to find any documentation for this keyword in association with SQL.
My question, more out of curiosity than anything else, is as in the title: what is the MAP keyword in SQL?
I don't have a copy of the standard, but judging by the grammar, MAP WITH <function> is a clause for the CREATE ORDERING statement.
CREATE ORDERING is used to specify a sort order for a user-defined type, though as far as I can tell, the only vendor to have implemented the MAP WITH clause is Teradata. It looks like this clause lets you define a sort order for a custom type by providing a function which maps it to an existing type with a known ordering.
There is no such statement in Postgres, which defines sort ordering via operator classes and collations.

PostgreSQL force standard SQL syntax

Is it possible to have Postgres reject queries which use its proprietary extensions to the SQL language?
e.g. select a::int from b; should throw an error, forcing the use of proper casts as in select cast(a as int) from b;
Perhaps more to the point is the question of whether it is possible to write SQL that is supported by all RDBMS with the same resulting behaviour?
PostgreSQL has no such feature. Even if it did, it wouldn't help you tons because interpretations of the SQL standard vary, support for standard syntax and features vary, and some DBs are relaxed about restrictions that others enforce or have limitations others don't. Syntax is the least of your problems.
The only reliable way to write cross-DB portable SQL is to test that SQL on every target database as part of an automated test suite. And to swear a lot.
In many places the query parser/rewriter transforms the standard "spelling" of a query into the PostgreSQL internal form, which will be emitted on dump/reload. In particular, PostgreSQL doesn't store the raw source code for things like views, check constraint expressions, index expressions, etc. It stores the internal parse tree, and reconstructs the source from that when it's asked to dump or display the object.
For example:
regress=> CREATE TABLE sometable ( x varchar(100) );
CREATE TABLE
regress=> CREATE VIEW someview AS SELECT CAST (x AS integer) FROM sometable;
CREATE VIEW
regress=> SELECT pg_get_viewdef('someview');
pg_get_viewdef
-------------------------------------
SELECT (sometable.x)::integer AS x
FROM sometable;
(1 row)
It'd be pretty useless anyway, since the standard fails to specify some pretty common and important pieces of functionality and often has rather ambiguous specifications of things it does define. Until recently it didn't define a way to limit the number of rows returned by a query, for example, so every database had its own different syntax (TOP, LIMIT / OFFSET, etc).
Other things the standard specifies are not implemented by most vendors, so using them is pretty pointless. Good luck using the SQL-standard generated and identity columns across all DB vendors.
It'd be quite nice to have a "prefer standard spelling" dump mode, that used CAST instead of ::, etc, but it's really not simple to do because some transformations aren't 1:1 reversible, e.g.:
regress=> CREATE VIEW v AS SELECT '1234' SIMILAR TO '%23%';
CREATE VIEW
regress=> SELECT pg_get_viewdef('v');
SELECT ('1234'::text ~ similar_escape('%23%'::text, NULL::text));
or:
regress=> CREATE VIEW v2 AS SELECT extract(dow FROM current_date);
CREATE VIEW
regress=> SELECT pg_get_viewdef('v2');
SELECT date_part('dow'::text, ('now'::text)::date) AS date_part;
so you see that significant changes would need to be made to how PostgreSQL internally represents and works with functions and expressions before what you want would be possible.
Lots of the SQL standard stuff uses funky one-off syntax that PostgreSQL converts into function calls and casts during parsing, so it doesn't have to add special case features every time the SQL committe have another brain-fart and pull some new creative bit of syntax out of ... somewhere. Changing that would require adding tons of new expression node types and general mess, all for no real gain.
Perhaps more to the point is the question of whether it is possible to
write SQL that is supported by all RDBMS with the same resulting
behaviour?
No, not even for many simple statments..
select top 10 ... -- tsql
select ... limit 10 -- everyone else
many more examples exist. Use an orm or something similar if you want to insulate yourself from database choice.
If you do write sql by hand, then trying to follow the SQL standard is always a good choice :-)
You could use a tool like Mimer's SQL Validator to validate that queries follow the SQL spec before running them:
http://developer.mimer.com/validator/parser92/index.tml
You could force users to write queries in HQL or JPQL, which would then get translated in to the correct SQL dialect for your database.

Are sub-SELECT's in SELECT and FROM clauses standard-compliant?

The title pretty much says it all. Under "standard-compliant" SQL I mean SQL constructs allowed in any of SQL standards.
I looked through the "Understanding SQL" book, but it mentions subqueries only inside WHERE, GROUP BY, HAVING etc. clauses, not SELECT and FROM (or maybe I'm missing something).
I know MS SQL allows sub-SELECT's in SELECT and FROM. I would like to know if it is a standard behavior. Or maybe it isn't standard, but is now implemented in major SQL databases (I have very little experience with DB's other than MS SQL)?
Yes. You can use a subquery as a derived table wherever you can use a table in a select statement.
SQL ANSI 92

Plain SQL vs Dialects

DBMS Vendors use SQL dialect features to differentiate their product, at the same time claiming to support SQL standards. 'Nuff said on this.
Is there any example of SQL you have coded that can't be translated to SQL:2008 standard SQL ?
To be specific, I'm talking about DML (a query statement), NOT DDL, stored procedure syntax or anything that is not a pure SQL statement.
I'm also talking about queries you would use in Production, not for ad-hoc stuff.
Edit Jan 13
Thanks for all of your answers : they have conveyed to me an impression that a lot of the DBMS-specific SQL is created to allow work-arounds for poor relational design. Which leads me to the conclusion you probably wouldn't want to port most existing applications.
Typical differences include subtly differnt semantics (for example Oracle handles NULLs differently from other SQL dialects in some cases), different exception handling mechanisms, different types and proprietary methods for doing things like string operations, date operations or hierarchical queries. Query hints also tend to have syntax that varies across platforms, and different optimisers may get confused on different types of constructs.
One can use ANSI SQL for the most part across database systems and expect to get reasonable results on a database with no significant tuning issues like missing indexes. However, on any non-trivial application there is likely to be some requirement for code that cannot easily be done portably.
Typically, this requirement will be fairly localised within an application code base - a handful of queries where this causes an issue. Reporting is much more likely to throw up this type of issue and doing generic reporting queries that will work across database managers is very unlikely to work well. Some applications are more likely to cause grief than others.
Therefore, it is unlikely that relying on 'portable' SQL constructs for an application will work in the general case. A better strategy is to use generic statements where they will work and break out to a database specific layer where this does not work.
A generic query mechanism could be to use ANSI SQL where possible; another possible approach would be to use an O/R mapper, which can take drivers for various database platforms. This type of mechanism should suffice for the majority of database operations but will require you to do some platform-specifc work where it runs out of steam.
You may be able to use stored procedures as an abstraction layer for more complex operations and code a set of platform specific sprocs for each target platform. The sprocs could be accessed through something like ADO.net.
In practice, subtle differences in paramter passing and exception handling may cause problems with this approach. A better approach is to produce a module that wraps the
platform-specific database operations with a common interface. Different 'driver' modules can be swapped in and out depending on what DBMS platform you are using.
Oracle has some additions, such as model or hierarchical queries that are very difficult, if not impossible, to translate into pure SQL
Even when SQL:2008 can do something sometimes the syntax is not the same. Take the REGEXP matching syntax for example, SQL:2008 uses LIKE_REGEX vs MySQL's REGEXP.
And yes, I agree, it's very annoying.
Part of the problem with Oracle is that it's still based on the SQL 1992 ANSI standard. SQL Server is on SQL 1999 standard, so some of the things that look like "extensions" are in fact newer standards. (I believe that the "OVER" clause is one of these.)
Oracle is also far more restrictive about placing subqueries in SQL. SQL Server is far more flexible and permissive about allowing subqueries almost anywhere.
SQL Server has a rational way to select the "top" row of a result: "SELECT TOP 1 FROM CUSTOMERS ORDER BY SALES_TOTAL". In Oracle, this becomes "SELECT * FROM (SELECT CUSTOMERS ORDER BY SALES_TOTAL) WHERE ROW_NUMBER <= 1".
And of course there's always Oracle's infamous SELECT (expression) FROM DUAL.
Edit to add:
Now that I'm at work and can access some of my examples, here's a good one. This is generated by LINQ-to-SQL, but it's a clean query to select rows 41 through 50 from a table, after sorting. It uses the "OVER" clause:
SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ContactName]) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN 40 + 1 AND 40 + 10
ORDER BY [t1].[ROW_NUMBER]
Common here on SO
ISNULL (SQL Server)
NVL ('Orable)
IFNULL (MySQL, DB2?)
COALESCE (ANSI)
To answer exactly:
ISNULL can easily give different results as COALESCE on SQL Server because of data type precedence, as per my answer/comments here