How different are SQL dialects for basic queries? - sql

New to SQL, so please excuse imprecision in the question.
For "normal" queries, is SQL syntax mutually intelligible between dialects? To take a concrete example, would SELECT * FROM [Pending Scans] be valid in all common dialects?
Not looking for an exhaustive list!

No. This would be:
select *
from pending_scans;
Square braces are non-standard for escaping identifiers. The standard escape character for identifiers is the double quote, and most databases now support that.
I should note that for all but the simplest queries -- such as the one you have written -- slight differences between dialects make it a fool's errand to try to write complete portable code. For instance, function names are different, such as len() versus length(), and date/time operations are quite bespoke.
If you are writing an application that needs to support multiple different database types, a typical method would be to define the API as a set of views and use the views with simple SELECT queries (as in your example).

Related

PostgreSQL force standard SQL syntax

Is it possible to have Postgres reject queries which use its proprietary extensions to the SQL language?
e.g. select a::int from b; should throw an error, forcing the use of proper casts as in select cast(a as int) from b;
Perhaps more to the point is the question of whether it is possible to write SQL that is supported by all RDBMS with the same resulting behaviour?
PostgreSQL has no such feature. Even if it did, it wouldn't help you tons because interpretations of the SQL standard vary, support for standard syntax and features vary, and some DBs are relaxed about restrictions that others enforce or have limitations others don't. Syntax is the least of your problems.
The only reliable way to write cross-DB portable SQL is to test that SQL on every target database as part of an automated test suite. And to swear a lot.
In many places the query parser/rewriter transforms the standard "spelling" of a query into the PostgreSQL internal form, which will be emitted on dump/reload. In particular, PostgreSQL doesn't store the raw source code for things like views, check constraint expressions, index expressions, etc. It stores the internal parse tree, and reconstructs the source from that when it's asked to dump or display the object.
For example:
regress=> CREATE TABLE sometable ( x varchar(100) );
CREATE TABLE
regress=> CREATE VIEW someview AS SELECT CAST (x AS integer) FROM sometable;
CREATE VIEW
regress=> SELECT pg_get_viewdef('someview');
pg_get_viewdef
-------------------------------------
SELECT (sometable.x)::integer AS x
FROM sometable;
(1 row)
It'd be pretty useless anyway, since the standard fails to specify some pretty common and important pieces of functionality and often has rather ambiguous specifications of things it does define. Until recently it didn't define a way to limit the number of rows returned by a query, for example, so every database had its own different syntax (TOP, LIMIT / OFFSET, etc).
Other things the standard specifies are not implemented by most vendors, so using them is pretty pointless. Good luck using the SQL-standard generated and identity columns across all DB vendors.
It'd be quite nice to have a "prefer standard spelling" dump mode, that used CAST instead of ::, etc, but it's really not simple to do because some transformations aren't 1:1 reversible, e.g.:
regress=> CREATE VIEW v AS SELECT '1234' SIMILAR TO '%23%';
CREATE VIEW
regress=> SELECT pg_get_viewdef('v');
SELECT ('1234'::text ~ similar_escape('%23%'::text, NULL::text));
or:
regress=> CREATE VIEW v2 AS SELECT extract(dow FROM current_date);
CREATE VIEW
regress=> SELECT pg_get_viewdef('v2');
SELECT date_part('dow'::text, ('now'::text)::date) AS date_part;
so you see that significant changes would need to be made to how PostgreSQL internally represents and works with functions and expressions before what you want would be possible.
Lots of the SQL standard stuff uses funky one-off syntax that PostgreSQL converts into function calls and casts during parsing, so it doesn't have to add special case features every time the SQL committe have another brain-fart and pull some new creative bit of syntax out of ... somewhere. Changing that would require adding tons of new expression node types and general mess, all for no real gain.
Perhaps more to the point is the question of whether it is possible to
write SQL that is supported by all RDBMS with the same resulting
behaviour?
No, not even for many simple statments..
select top 10 ... -- tsql
select ... limit 10 -- everyone else
many more examples exist. Use an orm or something similar if you want to insulate yourself from database choice.
If you do write sql by hand, then trying to follow the SQL standard is always a good choice :-)
You could use a tool like Mimer's SQL Validator to validate that queries follow the SQL spec before running them:
http://developer.mimer.com/validator/parser92/index.tml
You could force users to write queries in HQL or JPQL, which would then get translated in to the correct SQL dialect for your database.

Should you use quotes on system identifiers in your query?

It may depend on the database type, but is there a preference (by the database and not the coder) or is it better to use quotes? Is it:
Faster?
Less error prone?
Helps prevent injection (if using a PDO or not)?
Assume there is nothing requiring the use (spaces, reserved words, etc.).
MySQL:
SELECT `id` FROM `table` WHERE `name` = '$name';
ANSI:
SELECT "id" FROM "table" WHERE "name" = '$name';
vs:
SELECT id FROM table WHERE name = $name;
This answer talks about the requirement to use quotes in MySQL, but I'm interested in when it's not required by the db, but it might be preferred/better for the aspects (and perhaps more) that I listed above.
Quotes around identifiers are used only during the query parsing stage - a stage that goes through the SQL statement, and figures out its syntactic elements. Compared to other stages (query optimization, query execution, and passing the results back to the caller) the parsing stage is relatively short. Therefore, you should not expect any measurable speedup or slowdown from using quotes around your identifiers, regardless of your particular RDBMS.
As far as being more or less error prone goes, missing quotes around multipart identifiers become apparent very quickly during the development stage, so the practice of placing quotes everywhere it is not worth the trouble, because the readability to humans suffers significantly.
Finally, adding quotes around identifiers would not help you prevent injection attacks; same goes for not placing quotes around all identifiers. Many SQL script generators take this route to avoid if statements all over the script testing if the identifier is multipart or not.
The only situation where quoting all identifiers is a good idea is when you generate SQL programmatically, and the results are not intended for human readers.
I am not a fan of using quoted identifiers. They take too long to type and they clutter the code, making it harder for humans to read.
Also, I prefer to discourage the use of reserved words as identifiers in SQL. To me, this is just good practice. I mean, who wants to read something like:
select `select`, `from` as `as`
from <some really messed up table>
I use the the bottome version because I find it easier to read but it depends on the way you echoed your query. There is no difference in the performance as it reads past the quotes unless it is need so you really don't know it unless writing a WHERE clause in your query that contains more than one variable.
I'd say it's faster NOT to use quotes for numeric identifiers; why add the potential overhead of a data type conversion to a query you want (presumably) to be as performant as possible?

Using backquote/backticks for mysql queries

I have building MYSQL queries with backticks. For example,
SELECT `title` FROM `table` WHERE (`id` = 3)
as opposed to:
SELECT title FROM table WHERE (id = 3)
I think I got this practice from the Phpmyadmin exports, and from what I understood, even Rails generates its queries like this.
But nowadays I see less and less queries built like this, and also, the code looks messier and more complicated with backticks in queries. Even with SQL helper functions, things would be simpler without them. Hence, I'm considering to leave them behind.
I wanted to find out if there is other implication in this practice such as SQL (MySQL in my case) interpretation speed, etc. What do you think?
Backticks also allow spaces and other special characters (except for backticks, obviously) in table/column names. They're not strictly necessary but a good idea for safety.
If you follow sensible rules for naming tables and columns backticks should be unnecessary.
Every time I see this discussed, I try to lobby for their inclusion, because, well, the answer is hidden in here already, although wryly winked away without further thought. When we mistakenly use a keyword as a field or table name, we can escape confusion by various methods, but only the keenly aware back-tick ` allows an even greater benefit!!!
Every word in a sql statement is run through the entire keyword hash table to see if conflicts, therefore, you've done you query a great favor by telling the compiler that, hey, I know what I'm doing, you don't need to check these words because they represent table and field names. Speed and elegance.
Cheers,
Brad
backticks are used to escape reserved keywords in your mysql query, e.g. you want to have a count column—not that uncommon.
you can use other special characters or spaces in your column/table/db names
they do not keep you safe from injection attacks (if you allow users to enter column names in some way—bad practice anyway)
they are not standardized sql and will only work in mysql; other dbms will use " instead
Well, if you ensure that you never accidentally use a keyword as an identifier, you don't need the backticks. :-)
You read the documentation on identifiers at http://dev.mysql.com/doc/refman/5.6/en/identifiers.html
SQL generators will often include backticks, as it is simpler than including a list of all MySQL reserved words. To use any1 sequence of BMP Unicode characters except U+0000 as an identifier, they can simply
Replace all backticks with double backticks
Surround that with single backticks
When writing handmade queries, I know (most of) MySQL's reserved words, and I prefer to not use backticks where possible as it is shorter and IMO easier to read.
Most of the time, it's just a style preference -- unless of course, you have a field like date or My Field, and then you must use backticks.
1. Though see https://bugs.mysql.com/bug.php?id=68676
My belief was that the backticks were primarily used to prevent erroneous queries which utilized common SQL identifiers, i.e. LIMIT and COUNT.

Plain SQL vs Dialects

DBMS Vendors use SQL dialect features to differentiate their product, at the same time claiming to support SQL standards. 'Nuff said on this.
Is there any example of SQL you have coded that can't be translated to SQL:2008 standard SQL ?
To be specific, I'm talking about DML (a query statement), NOT DDL, stored procedure syntax or anything that is not a pure SQL statement.
I'm also talking about queries you would use in Production, not for ad-hoc stuff.
Edit Jan 13
Thanks for all of your answers : they have conveyed to me an impression that a lot of the DBMS-specific SQL is created to allow work-arounds for poor relational design. Which leads me to the conclusion you probably wouldn't want to port most existing applications.
Typical differences include subtly differnt semantics (for example Oracle handles NULLs differently from other SQL dialects in some cases), different exception handling mechanisms, different types and proprietary methods for doing things like string operations, date operations or hierarchical queries. Query hints also tend to have syntax that varies across platforms, and different optimisers may get confused on different types of constructs.
One can use ANSI SQL for the most part across database systems and expect to get reasonable results on a database with no significant tuning issues like missing indexes. However, on any non-trivial application there is likely to be some requirement for code that cannot easily be done portably.
Typically, this requirement will be fairly localised within an application code base - a handful of queries where this causes an issue. Reporting is much more likely to throw up this type of issue and doing generic reporting queries that will work across database managers is very unlikely to work well. Some applications are more likely to cause grief than others.
Therefore, it is unlikely that relying on 'portable' SQL constructs for an application will work in the general case. A better strategy is to use generic statements where they will work and break out to a database specific layer where this does not work.
A generic query mechanism could be to use ANSI SQL where possible; another possible approach would be to use an O/R mapper, which can take drivers for various database platforms. This type of mechanism should suffice for the majority of database operations but will require you to do some platform-specifc work where it runs out of steam.
You may be able to use stored procedures as an abstraction layer for more complex operations and code a set of platform specific sprocs for each target platform. The sprocs could be accessed through something like ADO.net.
In practice, subtle differences in paramter passing and exception handling may cause problems with this approach. A better approach is to produce a module that wraps the
platform-specific database operations with a common interface. Different 'driver' modules can be swapped in and out depending on what DBMS platform you are using.
Oracle has some additions, such as model or hierarchical queries that are very difficult, if not impossible, to translate into pure SQL
Even when SQL:2008 can do something sometimes the syntax is not the same. Take the REGEXP matching syntax for example, SQL:2008 uses LIKE_REGEX vs MySQL's REGEXP.
And yes, I agree, it's very annoying.
Part of the problem with Oracle is that it's still based on the SQL 1992 ANSI standard. SQL Server is on SQL 1999 standard, so some of the things that look like "extensions" are in fact newer standards. (I believe that the "OVER" clause is one of these.)
Oracle is also far more restrictive about placing subqueries in SQL. SQL Server is far more flexible and permissive about allowing subqueries almost anywhere.
SQL Server has a rational way to select the "top" row of a result: "SELECT TOP 1 FROM CUSTOMERS ORDER BY SALES_TOTAL". In Oracle, this becomes "SELECT * FROM (SELECT CUSTOMERS ORDER BY SALES_TOTAL) WHERE ROW_NUMBER <= 1".
And of course there's always Oracle's infamous SELECT (expression) FROM DUAL.
Edit to add:
Now that I'm at work and can access some of my examples, here's a good one. This is generated by LINQ-to-SQL, but it's a clean query to select rows 41 through 50 from a table, after sorting. It uses the "OVER" clause:
SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ContactName]) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN 40 + 1 AND 40 + 10
ORDER BY [t1].[ROW_NUMBER]
Common here on SO
ISNULL (SQL Server)
NVL ('Orable)
IFNULL (MySQL, DB2?)
COALESCE (ANSI)
To answer exactly:
ISNULL can easily give different results as COALESCE on SQL Server because of data type precedence, as per my answer/comments here

Why can you have a column named ORDER in DB2?

In DB2, you can name a column ORDER and write SQL like
SELECT ORDER FROM tblWHATEVER ORDER BY ORDER
without even needing to put any special characters around the column name. This is causing me pain that I won't get into, but my question is: why do databases allow the use of SQL keywords for object names? Surely it would make more sense to just not allow this?
I largely agree with the sentiment that keywords shouldn't be allowed as identifiers. Most modern computing languages have 20 or maybe 30 keywords, in which case imposing a moratorium on their use as identifiers is entirely reasonable. Unfortunately, SQL comes from the old COBOL school of languages ("computing languages should be as similar to English as possible"). Hence, SQL (like COBOL) has several hundred keywords.
I don't recall if the SQL standard says anything about whether reserved words must be permitted as identifiers, but given the extensive (excessive!) vocabulary it's unsurprising that several SQL implementations permit it.
Having said that, using keywords as identifiers isn't half as silly as the whole concept of quoted identifiers in SQL (and these aren't DB2 specific). Permitting case sensitive identifiers is one thing, but quoted identifiers permit all sorts of nonsense including spaces, diacriticals and in some implementations (yes, including DB2), control characters! Try the following for example:
CREATE TABLE "My
Tablé" ( A INTEGER NOT NULL );
Yes, that's a line break in the middle of an identifier along with an e-acute at the end... (which leads to interesting speculation on what encoding is used for database meta-data and hence whether a non-Unicode database would permit, say, a table definition containing Japanese column names).
Many SQL parsers (expecially DB2/z, which I use) are smarter than some of the regular parsers which sometimes separate lexical and semantic analysis totally (this separation is mostly a good thing).
The SQL parsers can figure out based on context whether a keyword is valid or should be treated as an identifier.
Hence you can get columns called ORDER or GROUP or DATE (that's a particularly common one).
It does annoy me with some of the syntax coloring editors when they brand an identifier with the keyword color. Their parsers aren't as 'smart' as the ones in DB2.
Because object names are ... names. All database systems let you use quoted names to stop you from running into trouble.
If you are running into issues, the fault lies not with the practice of permitting object names to be names, but with faulty implementations, or with faulty code libraries which don't automatically quote everything or cannot be made to quote names as-needed.
Interestingly you can use keywords as field names in SqlServer as well. The only differenc eis that you would need to use parenthesis with the name of the field
so you can do something like
create table [order](
id int,
[order] varchar(50) )
and then :)
select
[order]
from
[order]
order by [order]
That is of course a bit extreme example but at least with the use of parenthesis you can see that [order] is not a keyword.
The reason I would see people using names already reserved by keywords is when there is a direct mapping between column names, or names of the tables and the data presentation. You can call that being lazy or convenient.