Creating a table in NexusDB with german umlauts? - sql

I'm trying to import a CREATE TABLE statement in NexusDB.
The table name contains some german umlauts and so do some field names but I receive an error that there were some invalid characters in my statement (obviously the umlauts...).
My question is now: can somebody give a solution or any ideas to solve my problem?
It's not so easy to just change the umlauts into equivalent terms like ä -> ae or ö -> oe since our application has fixed table names every customer uses currently.

It is not a good idea to use characters outside what is normally permitted in the SQL standard. This will bite you not only in NexusDB, but in many other databases as well. Take special note that there is a good chance you will also run into problems when you want to access data via ODBC etc, as other environments may also have similar standard restrictions. My strong recommendation would be to avoid use of characters outside the SQL naming standard for tables, no matter which database is used.
However... having said all that, given that NexusDB is one of the most flexible database systems for the programmer (it comes with full source), there is already a solution. If you add an "extendedliterals" define to your database server project, then a larger array of characters are considered valid. For the exact change this enables, see the nxcValidIdentChars constant in the nxllConst.pas unit. The constant may also be changed if required.

Related

Why do SQLiteStudio (and others) not display a datetime in human-readable format by default?

Today I had to use a SQLite database for the first time and I really wondered about the display of a DATETIME column like 1411111200. Of course, internally it has to be stored as some integer value to be able to do math with it. But who wants to see that in a grid output, which is clearly for human eyes?
I even tried two programs, SQLiteStudio and SQLite Manager, and both don't even have an option to change this (at least I couldn't find it).
Of course with my knowledge about SQL it didn't take long to find out what the values mean - this query displays it like I expected:
select datetime(timestamp, 'unixepoch', 'localtime'), * from MyTable
But that's very uncomfortable when working with a GUI Tool. So why? Just because? Unix nerds? Or did I just get a wrong impression because I accidentally tried the only 2 Tools which are bad?
(I also appreciate comments on which tools to use or where I can find the hidden settings.)
Probably because sqlite doesn't have a first-class date type — how would a GUI tool know which columns are supposed to contain dates?
The question implies that a column of datatype DATETIME can only hold valid datetimes. But that's not true in SQLite: you can put any number or string value and it will be stored and displayed like it is.
To find out what the most "natural" way for a timestamp in SQLite would be, I created a table like this:
CREATE TABLE test ( timestamp DATETIME DEFAULT ( CURRENT_TIMESTAMP ) );
The result is a display in human readable format (2014-09-22 10:56:07)! But in fact it is saved as string, and I cannot imagine any serious software developer who would like that. Any comments?
That original database from the question, having datetimes as unixepoch, is not because of its table definition, but because the inserted data was like that. And that was probably the best possible option how to do it.
So, the answer is, those tools cannot display the datetime in human readable format, because they cannot know how it was encoded. It can be the number of seconds since 1970 or anything else, and it could even be different from row to row. What a mess.
From Wikipedia:
A common criticism is that SQLite's type system lacks the data
integrity mechanism provided by statically typed columns in other
products. [...] However, it can be implemented with constraints
like CHECK(typeof(x)='integer').
From the authors:
[...] most other SQL database engines are statically typed and so some
people feel that the use of manifest typing is a bug in SQLite. But
the authors of SQLite feel very strongly that this is a feature. The
use of manifest typing in SQLite is a deliberate design decision which
has proven in practice to make SQLite more reliable and easier to use,
especially when used in combination with dynamically typed programming
languages such as Tcl and Python.

Is there a database that accepts special characters by default (without converting them)?

I am currently starting from scratch choosing a database to store data collected from a suite of web forms. Humans will be filling out these forms, and as they're susceptible to using international characters, especially those humans named José and François and أسامة and 布鲁斯, I wanted to start with a modern database platform that accepts all types (so to speak), without conversion.
Q: Does a databases exist, from the start, that accepts a wide diversity of the characters found in modern typefaces? If so, what are the drawbacks to a database that doesn't need to convert as much data in order to store that data?
// Anticipating two answers that I'm not looking for:
I found many answers to how someone could CONVERT (or encode) a special character, like é or a copyright symbol © into database-legal character set like © (for ©) so that a database can then accept it. This requires a conversion/translation layer to shuttle data into and out of the database. I know that has to happen on a level like the letter z is reducible to 1's and 0's, but I'm really talking about finding a human-readable database, one that doesn't need to translate.
I also see suggestions that people change the character encoding of their current database to one that accepts a wider range of characters. This is a good solution for someone who is carrying over a legacy system and wants to make it relevant to the wider range of characters that early computers, and the early web, didn't anticipate. I'm not starting with a legacy system. I'm looking for some modern database options.
Yes, there are databases that support large character sets. How to accomplish this is different from one database to another. For example:
In MS SQL Server you can use the nchar, nvarchar and ntext data types to store Unicode (UCS-2) text.
In MySQL you can choose UTF-8 as encoding for a table, so that it will be able to store Unicode text.
For any database that you consider using, you should look for Unicode support to see if can handle large character sets.

What does the SQL Standard say about usage of backtick(`)?

Once I had spent hours in debugging a simple SQL query using mysql_query() in PHP/MySQL only to realise that I had missed bactick around the table name. From then I had been always using it around table names.
But when I used the same in SQLite/C++, the symbol is not even recognized. It's confusing, whether to use this or not? What does standard say about usage of it?
Also, it would be helpful if anyone could tell me when to use quotes and when not. I mean around values and field names.
The SQL standard (current version is ISO/IEC 9075:2011, in multiple parts) says nothing about the 'back-tick' or 'back-quote' symbol (Unicode U+0060 or GRAVE ACCENT); it doesn't recognize it as a character with special meaning that can appear in SQL.
The Standard SQL mechanism for quoting identifiers is with delimited identifiers enclosed in double quotes:
SELECT "select" FROM "from" WHERE "where" = "group by";
In MySQL, that might be written:
SELECT `select` FROM `from` WHERE `where` = `group by`;
In MS SQL Server, that might be written:
SELECT [select] FROM [from] WHERE [where] = [group by];
The trouble with the SQL Standard notation is that C programmers are used to enclosing strings in double quotes, so most DBMS use double quotes as an alternative to the single quotes recognized by the standard. But that then leaves you with a problem when you want to enclose identifiers.
Microsoft took one approach; MySQL took another; Informix allows interchangeable use of single and double quotes, but if you want delimited identifiers, you set an environment variable and then you have to follow the standard (single quotes for strings, double quotes for identifiers); DB2 only follows the standard, AFAIK; SQLite appears to follow the standard; Oracle also appears to follow the standard; Sybase appears to allow either double quotes (standard) or square brackets (as with MS SQL Server — which means SQL Server might allow double quotes too). This page (link AWOL since 2013 — now available in The Wayback Machine) documents documented all these servers (and was helpful filling out the gaps in my knowledge) and notes whether the strings inside delimited identifiers are case-sensitive or not.
As to when to use a quoting mechanism around identifiers, my attitude is 'never'. Well, not quite never, but only when absolutely forced into doing so.
Note that delimited identifiers are case-sensitive; that is, "from" and "FROM" refer to different columns (in most DBMS — see URL above). Most of SQL is not case-sensitive; it is a nuisance to know which case to use. (The SQL Standard has a mainframe orientation — it expects names to be converted to upper-case; most DBMS convert names to lower-case, though.)
In general, you must delimit identifiers which are keywords to the version of SQL you are using. That means most of the keywords in Standard SQL, plus any extras that are part of the particular implementation(s) that you are using.
One continuing source of trouble is when you upgrade the server, where a column name that was not a keyword in release N becomes a keyword in release N+1. Existing SQL that worked before the upgrade stops working afterwards. Then, at least as a short-term measure, you may be forced into quoting the name. But in the ordinary course of events, you should aim to avoid needing to quote identifiers.
Of course, my attitude is coloured by the fact that Informix (which is what I work with mostly) accepts this SQL verbatim, whereas most DBMS would choke on it:
CREATE TABLE TABLE
(
DATE INTEGER NOT NULL,
NULL FLOAT NOT NULL,
FLOAT INTEGER NOT NULL,
NOT DATE NOT NULL,
INTEGER FLOAT NOT NULL
);
Of course, the person who produces such a ridiculous table for anything other than demonstration purposes should be hung, drawn, quartered and then the residue should be made to fix the mess they've created. But, within some limits which customers routinely manage to hit, keywords can be used as identifiers in many contexts. That is, of itself, a useful form of future-proofing. If a word becomes a keyword, there's a moderate chance that the existing code will continue to work unaffected by the change. However, the mechanism is not perfect; you can't create a table with a column called PRIMARY, but you can alter a table to add such a column. There is a reason for the idiosyncrasy, but it is hard to explain.
Trailing underscore
You said:
it would be helpful if anyone could tell me when to use quotes and when not
Years ago I surveyed several relational database products looking for commands, keywords, and reserved words. Shockingly, I found over a thousand distinct words.
Many of them were surprisingly counter-intuitive as a "database word". So I feared there was no simple way to avoid unintentional collisions with reserved words while naming my tables, columns, and such.
Then I found this tip some where on the internets:
Use a trailing underscore in all your SQL naming.
Turns out the SQL specification makes an explicit promise to never use a trailing underscore in any SQL-related names.
Being copyright-protected, I cannot quote the SQL spec directly. But section 5.2.11 <token> and <separator> from a supposed-draft of ISO/IEC 9075:1992, Database Language SQL (SQL-92) says (in my own re-wording):
In the current and future versions of the SQL spec, no keyword will end with an underscore
➥ Though oddly dropped into the SQL spec without discussion, that simple statement to me screams out “Name your stuff with a trailing underscore to avoid all naming collisions”.
Instead of:
person
name
address
…use:
person_
name_
address_
Since adopting this practice, I have found a nice side-effect. In our apps we generally have classes and variables with the same names as the database objects (tables, columns, etc.). So an inherent ambiguity arises as to when referring to the database object versus when referring to the app state (classes, vars). Now the context is clear: When seeing a trailing underscore on a name, the database is specifically indicated. No underscore means the app programming (Java, etc.).
Further tip on SQL naming: For maximum portability, use all-lowercase with underscore between words, as well as the trailing underscore. While the SQL spec requires (not suggests) an implementation to store identifiers in all uppercase while accepting other casing, most/all products ignore this requirement. So after much reading and experimenting, I learned the all-lowercase with underscores will be most portable.
If using all-lowercase, underscores between words, plus a trailing underscore, you may never need to care about enquoting with single-quotes, double-quotes, back-ticks, or brackets.

What are the characters that count as the same character under collation of UTF8 Unicode? And what VB.net function can be used to merge them?

Also what's the vb.net function that will map all those different characters into their most standard form.
For example, tolower would map A and a to the same character right?
I need the same function for these characters
german
ß === s
Ü === u
Χιοσ == Χίος
Otherwise, sometimes I insert Χιοσ and latter when I insert Χίος mysql complaints that the ID already exist.
So I want to create a unique ID that maps all those strange characters into a more stable one.
For the encoding aspect of the thing, look at String.Normalize. Notice also its overload that specifies a particular normal form to which you want to convert the string, but the default normal form (C) will work just fine for nearly everyone who wants to "map all those different characters into their most standard form".
However, things get more complicated once you move into the database and deal with collations.
Unicode normalization does not ever change the character case. It covers only cases where the characters are basically equivalent - look the same1, mean the same thing. For example,
Χιοσ != Χίος,
The two sigma characters are considered non-equivalent, and the accented iota (\u1F30) is equivalent to a sequence of two characters, the plain iota (\u03B9) and the accent (\u0313).
Your real problem seems to be that you are using Unicode strings as primary keys, which is not the most popular database design practice. Such primary keys take up more space than needed and are bound to change over time (even if the initial version of the application does not plan to support that). Oh, and I forgot their sensitivity to collations. Instead of identifying records by Unicode strings, have the database schema generate meaningless sequential integers for you as you insert the records, and demote the Unicode strings to mere attributes of the records. This way they can be the same or different as you please.
It may still be useful to normalize them before storing for the purpose of searching and safer subsequent processing; but the particular case insensitive collation that you use will no longer restrict you in any way.
1Almost the same in case of compatibility normalization as opposed to canonical normalization.

Unicode question under iOS

I have a SQLite database with a word list. In a table there is a word list that includes the word "você". This word has this representation in unicode "voc\U00ea".
I've found out that the same word can have the following representation with the same visual output:
"voc\U00ea",
"voce\U0302"
When I query my db using the second representation it returns blank. Does anyone know a way for the query work using both representations without duplicating the records in the table?
Thanks,
Miguel
These two forms are known as nfc(normal form composed) and nfd("normal form decomposed"). The letter \U0302 is known as a "combining circumflex", which modifies a preceding letter.
To cope with this situation, do the following:
Pick a normalization. Usually choosing nfc is a good idea. (Although iOS/OS X file system uses nfd.)
Before putting a string into the database, always normalize. In iOS, you can use precomposedStringWithCanonicalMapping or precomosedStringWithCompatibilityMapping. To understand the difference between canonical and compatibility mappings, see this description.
Before performing a query, always normalize the query to the same normal form.