The title pretty much says it all. Under "standard-compliant" SQL I mean SQL constructs allowed in any of SQL standards.
I looked through the "Understanding SQL" book, but it mentions subqueries only inside WHERE, GROUP BY, HAVING etc. clauses, not SELECT and FROM (or maybe I'm missing something).
I know MS SQL allows sub-SELECT's in SELECT and FROM. I would like to know if it is a standard behavior. Or maybe it isn't standard, but is now implemented in major SQL databases (I have very little experience with DB's other than MS SQL)?
Yes. You can use a subquery as a derived table wherever you can use a table in a select statement.
SQL ANSI 92
Related
My question is strictly related to the ISO SQL standard, and not some of its implementations like Oracle, MySQL, PostgreSQL etc.
The question is:
Are nested aggregate functions allowed/forbidden by the SQL standard? For example, the following query:
select MAX(COUNT(*))
from a_table
group by a_column
In particular, I'm reading the documentation available at this link, but I can't find any reference about this
As far as I know, Oracle allows this, but MySQL and PostgreSQL not, so I was wondering if somebody that knows the standard better than me, can tell if it's a grey area where different implementations can do what they want, or if there are some constraints on the standard
I'm looking at a SQL tutorial and the command that the tutorial gave gives me an error on my SQL Server Management Studio of "Ambiguous column name". Is this error only applicable when using SQL server?
No, not at all. In reality, a column reference in SQL should always be qualified -- meaning that it should say what table it comes from. You can think of the unqualified names as a short-hand. The SQL engine does the favor of figuring out the table when it can. And when a column is in multiple tables, it cannot figure it out.
Your queries should be readable and unambiguous. In your case, your code should look something like this:
select c.cname
from college c join
apply a
on c.cname = a.cname
where c.enrollment > 2000 and a.major = 'CS';
Note: This is guessing where enrollment and major are coming from, because there is not enough information in your query to figure this out.
Also, this uses proper, explicit, standard, readable JOIN syntax. Never use commas in the FROM clause, even if your course/tutorial materials do so. In fact, that alone suggests that they are way out-of-date (decades old).
Also, use table aliases (the abbreviations) so queries are simpler to write and to read.
I have a table "Price" that should be filtered according to a value stored in an another table "CutPrice" (meaning only prices inferior to the parameter stored should be displayed).
I experimented various way, and for a laugh I did the following:
SELECT Location, Price, CutoffPrice
FROM LocPrice INNER JOIN
CutPrice ON CutPrice.CutOffPrice < LocPrice.Price
Using the inferior than sign works perfectly, it's even faster than the Case statement I used in another version of the query.
I try googling for it, to see if it's standard, recommended, not recommended, even a bug , perhaps ?
I could not find anything. So I know, the question might be a bit broad for the site, but is this a standard use of JOINS or is there anything to be careful about when using this ? Particularly about T-SQL on SQL server 2005
Non-equi joins are a pretty standard part of SQL. See http://blog.mclaughlinsoftware.com/oracle-sql-programming/basic-sql-join-semantics/ for example. Most of the major databases support non-equi joins, usually with both SQL 92 JOIN ON syntax and pre-SQL 92 WHERE syntax. However, if you have a particular database in mind searching for 'non-equi join <db name>' will usually find mention of it doesn't support them, e.g. Hive: work around for non equi left join
DBMS Vendors use SQL dialect features to differentiate their product, at the same time claiming to support SQL standards. 'Nuff said on this.
Is there any example of SQL you have coded that can't be translated to SQL:2008 standard SQL ?
To be specific, I'm talking about DML (a query statement), NOT DDL, stored procedure syntax or anything that is not a pure SQL statement.
I'm also talking about queries you would use in Production, not for ad-hoc stuff.
Edit Jan 13
Thanks for all of your answers : they have conveyed to me an impression that a lot of the DBMS-specific SQL is created to allow work-arounds for poor relational design. Which leads me to the conclusion you probably wouldn't want to port most existing applications.
Typical differences include subtly differnt semantics (for example Oracle handles NULLs differently from other SQL dialects in some cases), different exception handling mechanisms, different types and proprietary methods for doing things like string operations, date operations or hierarchical queries. Query hints also tend to have syntax that varies across platforms, and different optimisers may get confused on different types of constructs.
One can use ANSI SQL for the most part across database systems and expect to get reasonable results on a database with no significant tuning issues like missing indexes. However, on any non-trivial application there is likely to be some requirement for code that cannot easily be done portably.
Typically, this requirement will be fairly localised within an application code base - a handful of queries where this causes an issue. Reporting is much more likely to throw up this type of issue and doing generic reporting queries that will work across database managers is very unlikely to work well. Some applications are more likely to cause grief than others.
Therefore, it is unlikely that relying on 'portable' SQL constructs for an application will work in the general case. A better strategy is to use generic statements where they will work and break out to a database specific layer where this does not work.
A generic query mechanism could be to use ANSI SQL where possible; another possible approach would be to use an O/R mapper, which can take drivers for various database platforms. This type of mechanism should suffice for the majority of database operations but will require you to do some platform-specifc work where it runs out of steam.
You may be able to use stored procedures as an abstraction layer for more complex operations and code a set of platform specific sprocs for each target platform. The sprocs could be accessed through something like ADO.net.
In practice, subtle differences in paramter passing and exception handling may cause problems with this approach. A better approach is to produce a module that wraps the
platform-specific database operations with a common interface. Different 'driver' modules can be swapped in and out depending on what DBMS platform you are using.
Oracle has some additions, such as model or hierarchical queries that are very difficult, if not impossible, to translate into pure SQL
Even when SQL:2008 can do something sometimes the syntax is not the same. Take the REGEXP matching syntax for example, SQL:2008 uses LIKE_REGEX vs MySQL's REGEXP.
And yes, I agree, it's very annoying.
Part of the problem with Oracle is that it's still based on the SQL 1992 ANSI standard. SQL Server is on SQL 1999 standard, so some of the things that look like "extensions" are in fact newer standards. (I believe that the "OVER" clause is one of these.)
Oracle is also far more restrictive about placing subqueries in SQL. SQL Server is far more flexible and permissive about allowing subqueries almost anywhere.
SQL Server has a rational way to select the "top" row of a result: "SELECT TOP 1 FROM CUSTOMERS ORDER BY SALES_TOTAL". In Oracle, this becomes "SELECT * FROM (SELECT CUSTOMERS ORDER BY SALES_TOTAL) WHERE ROW_NUMBER <= 1".
And of course there's always Oracle's infamous SELECT (expression) FROM DUAL.
Edit to add:
Now that I'm at work and can access some of my examples, here's a good one. This is generated by LINQ-to-SQL, but it's a clean query to select rows 41 through 50 from a table, after sorting. It uses the "OVER" clause:
SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ContactName]) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN 40 + 1 AND 40 + 10
ORDER BY [t1].[ROW_NUMBER]
Common here on SO
ISNULL (SQL Server)
NVL ('Orable)
IFNULL (MySQL, DB2?)
COALESCE (ANSI)
To answer exactly:
ISNULL can easily give different results as COALESCE on SQL Server because of data type precedence, as per my answer/comments here
say i have a table
Id int
Region int
Name nvarchar
select * from table1 where region = 1 and name = 'test'
select * from table1 where name = 'test' and region = 1
will there be a difference in performance?
assume no indexes
is it the same with LINQ?
Because your qualifiers are, in essence, actually the same (it doesn't matter what order the where clauses are put in), then no, there's no difference between those.
As for LINQ, you will need to know what query LINQ to SQL actually emits (you can use a SQL Profiler to find out). Sometimes the query will be the simplest query you can think of, sometimes it will be a convoluted variety of such without you realizing it, because of things like dependencies on FKs or other such constraints. LINQ also wouldn't use an * for select.
The only real way to know is to find out the SQL Server Query Execution plan of both queries. To read more on the topic, go here:
SQL Server Query Execution Plan Analysis
Should it? No. SQL is a relational algebra and the DBMS should optimize irrespective of order within the statement.
Does it? Possibly. Some DBMS' may store data in a certain order (e.g., maintain a key of some sort) despite what they've been told. But, and here's the crux: you cannot rely on it.
You may need to switch DBMS' at some point in the future. Even a later version of the same DBMS may change its behavior. The only thing you should be relying on is what's in the SQL standard.
Regarding the query given: with no indexes or primary key on the two fields in question, you should assume that you'll need a full table scan for both cases. Hence they should run at the same speed.
I don't recommend the *, because the engine should look for the table scheme before executing the query. Instead use the table fields you want to avoid unnecessary overhead.
And yes, the engine optimizes your queries, but help him :) with that.
Best Regards!
For simple queries, likely there is little or no difference, but yes indeed the way you write a query can have a huge impact on performance.
In SQL Server (performance issues are very database specific), a correlated subquery will usually have poor performance compared to doing the same thing in a join to a derived table.
Other things in a query that can affect performance include using SARGable1 where clauses instead of non-SARGable ones, selecting only the fields you need and never using select * (especially not when doing a join as at least one field is repeated), using a set-bases query instead of a cursor, avoiding using a wildcard as the first character in a a like clause and on and on. There are very large books that devote chapters to more efficient ways to write queries.
1 "SARGable", for those that don't know, are stage 1 predicates in DB2 parlance (and possibly other DBMS'). Stage 1 predicates are more efficient since they're parts of indexes and DB2 uses those first.