How to create portable inserts from SQL Server? - sql

Now it generates inserts like
INSERT [Bla] ([id], [description], [name], [version])
VALUES (CAST(1 AS Numeric(19, 0)), convert(t...
It's very SQL Server specific. I would like to create a script that everybody can use, database agnostic. I have very simple data types - varchars, numbers, dates, bits(boolean).
I think
insert into bla values (1, 'die', '2001-01-01 11:11:11')
should work in all DBMSs, right?

Some basic rules:
Get rid of the square brackets. In your case they are not needed - not even in SQL Server. (At the same time make sure you never use reserved words or special characters in column or table names).
If you do need to use special characters or reserved words (which is not something I would recommend), then use the standard double quotes (e.g. "GROUP").
But remember that names are case sensitive then: my_table is the same as MY_TABLE but "my_table" is different to "MY_TABLE" according to the standard. Again this might vary between DBMS and their configuration.
The CAST operator is standard and works on most DBMS (although not all support casting in all possible combinations).
convert() is SQL Server specific and should be replaced with an approriate CAST expression.
Try to specify values in the correct data type, never rely on implicit data conversion (so do not use '1' for a number). Although I don't think casting a 1 to a numeric() should be needed.
Usually I also recommend to use ANSI literals (e.g. DATE '2011-03-14') for DATE/TIMESTAMP literals, but SQL Server does not support that. So it won't help you very much.

A quick glance at the Wikipedia article on SQL, will tell you a bit about standardisation of SQL across different implementations, such as MS SQL, PostgreSQL, Oracle etc.
In short, there is a number of ANSI standards but there is varying support for it throught each product.
The general way to support multiple database servers from your software product is to accept there are differences, code for them at the database level, and make your application able to call the same database access code irrespective of database server.

There are a number of problems with number formats which will not port between dbmses however this pales when you look at the problems with dates and date formats. For instance the default DATE format used in an ORACLE DB depends on the whims of whoever installed the software, you can use date conversion functions to get ORACLE to accept the common date formats - but these functions are ORACLE specific.
Besides how do you know the table and column names will be the same on the target DB?
If you are serious about this, really need to port data between hydrogenous DBMSes, and know a bit of perl thn try using SqlFairy which is available from CPAN. The sheer size of this download should be enough to convince you how complex this problem can be.

Related

Problems with BETWEEN dates operator

I am practising and experimenting with different syntax of SQL BETWEEN operator in regards to dates from the "https://www.w3schools.com/sql/sql_between.asp"
This is the Order table in my database:
LINK: https://www.w3schools.com/sql/sql_between.asp
The query is fetching the orderdates between a given condition of 2 dates.
These are the two main syntax versions (according to w3schools):
SELECT *
FROM Orders
WHERE OrderDate BETWEEN #01/07/1996# AND #31/07/1996#;
and:
SELECT *
FROM Orders
WHERE OrderDate BETWEEN '1996-07-01' AND '1996-07-31';
The output that we get on typing the above two queries from the Orders table
Number of Records: 22 (out of 196 records). Yes this is correct.
Now I am experimenting with this syntax versions.
CASE #1:
SELECT *
FROM Orders
WHERE OrderDate BETWEEN #1996/07/01# AND #1996/07/31#;
Result of case #1: 22 (same as the above syntax)
In the SQL try it out editor(https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_between_date&ss=-1) they are stating that this SQL statement is not supported in the WebSQL database.The example still works, because it uses a modified version of SQL.
WHY SO?
If you're using the W3Schools Tryit editor in Chrome, you're using WebSQL, which is basically SQLite.
SQLite doesn't have a date/time format, so is probably storing the date values as strings formatted in the ISO-8601 format (see this answer for more information).
Other database systems (e.g. Oracle, Microsoft SQL Server, Postgres, MySQL) have built-in date formats, and you generally represent them as strings (enclosed in single quotes). For example: '1997-07-01' (depending on the specific RDBMS, there might be more specific considerations).
The format that uses pound signs (e.g. #7/1/1997#) is unique to Microsoft Access (see this answer for more information).
Bottom line: Dates are generally enclosed in single quotes. You're best off sticking to the ISO-8601 standard (e.g. 1997-07-01).
If you're learning SQL, there are other resources out there besides W3Schools. I would recommend downloading an open-source RDBMS like Postgres or MySQL, setting up a sample database, and working on some queries. Challenge sites like codewars might also be helpful
One more thing: Don't use BETWEEN for dates. Use >= and <, to make sure you're not excluding dates with a time portion. For more information, read this blog.

Are there any performance downsides to using ODBC date string-literals in SQL Server? Is it better than a regular string date literal?

I have a TSQL view that processes multiple gigabytes of data in a SQL Server 2016 environment. In this view, there are multiple times where I am comparing if a DateTime value is before/after a static date, traditionally represented as a string literal like '2018-07-11'.
An example comparison would be:
SELECT MyId, MyValue FROM MyTable WHERE MyDate = '2018-07-11'
While looking for a way to use a DateTime literal instead of a string, I came across examples using ODBC DateTime strings like so:
SELECT MyId, MyValue FROM MyTable WHERE MyDate = {d '2018-07-11'}
When I compare the query plan I get the same result, even when I make up more advanced queries.
I started using this format in an attempt to prevent the auto-conversion of string to DateTime in queries, but I haven't been able to find any good documentation explaining any side effects of using ODBC functions. I'm not sure if this acts the same way as a string literal or if it is interpreted as a date.
If this was a UDF or Stored Procedure, I'd have the ability to declare a DateTime variable for use in the query, but in a VIEW this is not possible, nor would it be feasible because there are a lot of DateTime literals in the actual version of the query.
So in conclusion, does someone have any concrete reasons for or against using this {d '2018-07-11'} format (besides it potentially not being valid in a non SQL Server environment)?
I want to ensure that I'm not shooting myself in the foot here on a code review.
PS: I apologize for the vague examples and semi-open-ended question, I am not allowed to disclose any actual source code.
Thanks!
EDIT: I forgot to mention that I could also use DATEFROMPARTS(2018, 07, 11), but I wasn't sure if this would be looked at weirdly by the query optimizer.
The ODBC literal has the slight advantage that it can never be interpreted as YYYY-DD-MM, which is possible with one internationalization setting.
You can avoid ambiguity by using 'YYYYMMDD' format. This format is not affected by settings.
I prefer not using the ODBC, just because it seems to involve more clutter in the query. I admit to also preferring the hyphenated form (consistent with the ISO standard and other databases). But you have three alternatives. Possibly the safest for general purpose, SQL-Server-only code is the unhyphenated form.
A literal is a literal. It is transformed into a value during parsing. The value is used later.
Here is the list of DateTime literals that SQL Server supports. ODBC is a supported format.
So, if only using SQL Server then there is no difference. Different SQL flavors may reject the ODBC syntax. I do not believe it is ANSI SQL, so "less standard"?

Why do SQLiteStudio (and others) not display a datetime in human-readable format by default?

Today I had to use a SQLite database for the first time and I really wondered about the display of a DATETIME column like 1411111200. Of course, internally it has to be stored as some integer value to be able to do math with it. But who wants to see that in a grid output, which is clearly for human eyes?
I even tried two programs, SQLiteStudio and SQLite Manager, and both don't even have an option to change this (at least I couldn't find it).
Of course with my knowledge about SQL it didn't take long to find out what the values mean - this query displays it like I expected:
select datetime(timestamp, 'unixepoch', 'localtime'), * from MyTable
But that's very uncomfortable when working with a GUI Tool. So why? Just because? Unix nerds? Or did I just get a wrong impression because I accidentally tried the only 2 Tools which are bad?
(I also appreciate comments on which tools to use or where I can find the hidden settings.)
Probably because sqlite doesn't have a first-class date type — how would a GUI tool know which columns are supposed to contain dates?
The question implies that a column of datatype DATETIME can only hold valid datetimes. But that's not true in SQLite: you can put any number or string value and it will be stored and displayed like it is.
To find out what the most "natural" way for a timestamp in SQLite would be, I created a table like this:
CREATE TABLE test ( timestamp DATETIME DEFAULT ( CURRENT_TIMESTAMP ) );
The result is a display in human readable format (2014-09-22 10:56:07)! But in fact it is saved as string, and I cannot imagine any serious software developer who would like that. Any comments?
That original database from the question, having datetimes as unixepoch, is not because of its table definition, but because the inserted data was like that. And that was probably the best possible option how to do it.
So, the answer is, those tools cannot display the datetime in human readable format, because they cannot know how it was encoded. It can be the number of seconds since 1970 or anything else, and it could even be different from row to row. What a mess.
From Wikipedia:
A common criticism is that SQLite's type system lacks the data
integrity mechanism provided by statically typed columns in other
products. [...] However, it can be implemented with constraints
like CHECK(typeof(x)='integer').
From the authors:
[...] most other SQL database engines are statically typed and so some
people feel that the use of manifest typing is a bug in SQLite. But
the authors of SQLite feel very strongly that this is a feature. The
use of manifest typing in SQLite is a deliberate design decision which
has proven in practice to make SQLite more reliable and easier to use,
especially when used in combination with dynamically typed programming
languages such as Tcl and Python.

What does the SQL Standard say about usage of backtick(`)?

Once I had spent hours in debugging a simple SQL query using mysql_query() in PHP/MySQL only to realise that I had missed bactick around the table name. From then I had been always using it around table names.
But when I used the same in SQLite/C++, the symbol is not even recognized. It's confusing, whether to use this or not? What does standard say about usage of it?
Also, it would be helpful if anyone could tell me when to use quotes and when not. I mean around values and field names.
The SQL standard (current version is ISO/IEC 9075:2011, in multiple parts) says nothing about the 'back-tick' or 'back-quote' symbol (Unicode U+0060 or GRAVE ACCENT); it doesn't recognize it as a character with special meaning that can appear in SQL.
The Standard SQL mechanism for quoting identifiers is with delimited identifiers enclosed in double quotes:
SELECT "select" FROM "from" WHERE "where" = "group by";
In MySQL, that might be written:
SELECT `select` FROM `from` WHERE `where` = `group by`;
In MS SQL Server, that might be written:
SELECT [select] FROM [from] WHERE [where] = [group by];
The trouble with the SQL Standard notation is that C programmers are used to enclosing strings in double quotes, so most DBMS use double quotes as an alternative to the single quotes recognized by the standard. But that then leaves you with a problem when you want to enclose identifiers.
Microsoft took one approach; MySQL took another; Informix allows interchangeable use of single and double quotes, but if you want delimited identifiers, you set an environment variable and then you have to follow the standard (single quotes for strings, double quotes for identifiers); DB2 only follows the standard, AFAIK; SQLite appears to follow the standard; Oracle also appears to follow the standard; Sybase appears to allow either double quotes (standard) or square brackets (as with MS SQL Server — which means SQL Server might allow double quotes too). This page (link AWOL since 2013 — now available in The Wayback Machine) documents documented all these servers (and was helpful filling out the gaps in my knowledge) and notes whether the strings inside delimited identifiers are case-sensitive or not.
As to when to use a quoting mechanism around identifiers, my attitude is 'never'. Well, not quite never, but only when absolutely forced into doing so.
Note that delimited identifiers are case-sensitive; that is, "from" and "FROM" refer to different columns (in most DBMS — see URL above). Most of SQL is not case-sensitive; it is a nuisance to know which case to use. (The SQL Standard has a mainframe orientation — it expects names to be converted to upper-case; most DBMS convert names to lower-case, though.)
In general, you must delimit identifiers which are keywords to the version of SQL you are using. That means most of the keywords in Standard SQL, plus any extras that are part of the particular implementation(s) that you are using.
One continuing source of trouble is when you upgrade the server, where a column name that was not a keyword in release N becomes a keyword in release N+1. Existing SQL that worked before the upgrade stops working afterwards. Then, at least as a short-term measure, you may be forced into quoting the name. But in the ordinary course of events, you should aim to avoid needing to quote identifiers.
Of course, my attitude is coloured by the fact that Informix (which is what I work with mostly) accepts this SQL verbatim, whereas most DBMS would choke on it:
CREATE TABLE TABLE
(
DATE INTEGER NOT NULL,
NULL FLOAT NOT NULL,
FLOAT INTEGER NOT NULL,
NOT DATE NOT NULL,
INTEGER FLOAT NOT NULL
);
Of course, the person who produces such a ridiculous table for anything other than demonstration purposes should be hung, drawn, quartered and then the residue should be made to fix the mess they've created. But, within some limits which customers routinely manage to hit, keywords can be used as identifiers in many contexts. That is, of itself, a useful form of future-proofing. If a word becomes a keyword, there's a moderate chance that the existing code will continue to work unaffected by the change. However, the mechanism is not perfect; you can't create a table with a column called PRIMARY, but you can alter a table to add such a column. There is a reason for the idiosyncrasy, but it is hard to explain.
Trailing underscore
You said:
it would be helpful if anyone could tell me when to use quotes and when not
Years ago I surveyed several relational database products looking for commands, keywords, and reserved words. Shockingly, I found over a thousand distinct words.
Many of them were surprisingly counter-intuitive as a "database word". So I feared there was no simple way to avoid unintentional collisions with reserved words while naming my tables, columns, and such.
Then I found this tip some where on the internets:
Use a trailing underscore in all your SQL naming.
Turns out the SQL specification makes an explicit promise to never use a trailing underscore in any SQL-related names.
Being copyright-protected, I cannot quote the SQL spec directly. But section 5.2.11 <token> and <separator> from a supposed-draft of ISO/IEC 9075:1992, Database Language SQL (SQL-92) says (in my own re-wording):
In the current and future versions of the SQL spec, no keyword will end with an underscore
➥ Though oddly dropped into the SQL spec without discussion, that simple statement to me screams out “Name your stuff with a trailing underscore to avoid all naming collisions”.
Instead of:
person
name
address
…use:
person_
name_
address_
Since adopting this practice, I have found a nice side-effect. In our apps we generally have classes and variables with the same names as the database objects (tables, columns, etc.). So an inherent ambiguity arises as to when referring to the database object versus when referring to the app state (classes, vars). Now the context is clear: When seeing a trailing underscore on a name, the database is specifically indicated. No underscore means the app programming (Java, etc.).
Further tip on SQL naming: For maximum portability, use all-lowercase with underscore between words, as well as the trailing underscore. While the SQL spec requires (not suggests) an implementation to store identifiers in all uppercase while accepting other casing, most/all products ignore this requirement. So after much reading and experimenting, I learned the all-lowercase with underscores will be most portable.
If using all-lowercase, underscores between words, plus a trailing underscore, you may never need to care about enquoting with single-quotes, double-quotes, back-ticks, or brackets.

Does SQL standard allows whitespace between function names and parenthesis

Checking few RDBMS I find that things like
SELECT COUNT (a), SUM (b)
FROM TABLE
are allowed (notice space between aggregate functions and parenthesis).
Could anyone provide a pointer to SQL standard itself where this is defined (any version will do)?
EDIT:
The above works in postgres, mysql needs set sql_mode = "IGNORE_SPACE"; as defined here (for full list of functions that are influenced with this server mode see in this ref).
MS SQL is reported to accept the above.
Also, it seems that the answer is most likely in the standard. I can follow the BNF regarding the regular symbols and terms, but I get lost when it comes to the definition of whitespace and separators in that part of the select.
Yes; the white space between tokens is substantially ignored. The only exception is, officially, with adjacent string literal concatenation - but the standard is weirder than any implementation would be.
See: http://savage.net.au/SQL/
This works in SQL Server 2005:
SELECT COUNT (*)
FROM TABLE
...while one space between COUNT and (*) on MySQL causes a MySQL 1064 error (syntax error). I don't have Oracle or Postgres handy to test.
Whatever the standard may be, it's dependent on implementation in the vendor and version you are using.
I can't provide a pointer, but I believe that white space like that is ignored.
I know that it is in T-SQL, and about 80% certain about MySQL's implementation.