Which Collation to use for Russian, German and Arabic - sql

Which Collation shall I use to save Arabic, Russian, English and German Characters to the Database?
My column setting is nvarchar(100)
I have set it currently to:
SQL_Latin1_General_Cp1256_CI_AS
It is saving Arabic, German and English but I need to save Russian too.

I guess you have problems inserting the values.
You need to prepend N before the start of the string, otherwise it doesn't work.
You're doing:
Insert 'bla' into your_table
instead of
Insert N'bla' into your_table
SQL server does not have a unicode collation.
However, there is a binary collation "SQL_Latin1_General_1251_BIN".
It stores the code points in numerical order, which can be pretty arbitrary.
It's not culture-specific though (despite the name).

Related

SQL NVARCHAR(MAX) returning ASCII and Weird Characters instead of Text

I have an SQL Table and I'm trying to return the values as a string.
The values should be city names like Sydney, Melbourne, Port Maquarie etc.
But When I run a select I either get black results or as detailed in the first picture some strange backwards L character. The column is an NVARCHAR(MAX)
SELECT ctGlobalName FROM Crm.Cities
Then I tried using MSSQL's Edit top 200 rows feature and I could see the names of the cities, but also all these weird ascii characters.
Now I didn't create the database, I'm just running queries on it. Some things I've read have suggested it is a problem with the Collation. But the table is SQL_Latin1_General_CP1_CI_AS which matches the server collation.
I'm sure there must be something I can add to my select query to return the values as an ordinary string. Is there something I can do to my select query to return the expected format without the weird characters?
An NVARCHAR datatype can store Unicode characters, which are used for languages that are not supported by the ASCII character set i.e. non-English (or related) languages such as Chinese or Indonesian. If your SQL Server or Windows doesn't have that language installed then you might see strange-looking representations of the data.
On the other hand, it could also be that the application that updates this table has just stored bad data in that column.
Either way you might need to do some string manipulation to strip out the characters you don't want.

What does collation mean?

What does collation mean in SQL, and what does it do?
Collation can be simply thought of as sort order.
In English (and it's strange cousin, American), collation may be a pretty simple matter consisting of ordering by the ASCII code.
Once you get into those strange European languages with all their accents and other features, collation changes. For example, though the different accented forms of a may exist at disparate code points, they may all need to be sorted as if they were the same letter.
Besides the "accented letters are sorted differently than unaccented ones" in some Western European languages, you must take into account the groups of letters, which sometimes are sorted differently, also.
Traditionally, in Spanish, "ch" was considered a letter in its own right, same with "ll" (both of which represent a single phoneme), so a list would get sorted like this:
caballo
cinco
coche
charco
chocolate
chueco
dado
(...)
lámpara
luego
llanta
lluvia
madera
Notice all the words starting with single c go together, except words starting with ch which go after them, same with ll-starting words which go after all the words starting with a single l. This is the ordering you'll see in old dictionaries and encyclopedias, sometimes even today by very conservative organizations.
The Royal Academy of the Language changed this to make it easier for Spanish to be accomodated in the computing world. Nevertheless, ñ is still considered a different letter than n and goes after it, and before o. So this is a correctly ordered list:
Namibia
número
ñandú
ñú
obra
ojo
By selecting the correct collation, you get all this done for you, automatically :-)
Rules that tell how to compare and sort strings: letters order; whether case matters, whether diacritics matter etc.
For instance, if you want all letters to be different (say, if you store filenames in UNIX), you use UTF8_BIN collation:
SELECT 'A' COLLATE UTF8_BIN = 'a' COLLATE UTF8_BIN
---
0
If you want to ignore case and diacritics differences (say, for a search engine), you use UTF8_GENERAL_CI collation:
SELECT 'A' COLLATE UTF8_GENERAL_CI = 'ä' COLLATE UTF8_GENERAL_CI
---
1
As you can see, this collation (comparison rule) considers capital A and lowecase ä the same letter, ignoring case and diacritic differences.
Collation defines how you sort and compare string values
For example, it defines how to deal with
accents (äàa etc)
case (Aa)
the language context:
In a French collation, cote < côte < coté < côté.
In the SQL Server Latin1 default , cote < coté < côte < côté
ASCII sorts (a binary collation)
Collation means assigning some order to the characters in an Alphabet, say, ASCII or Unicode etc.
Suppose you have 3 characters in your alphabet - {A,B,C}. You can define some example collations for it by assigning integral values to the characters
Example 1 = {A=1,B=2,C=3}
Example 2 = {C=1,B=2,A=3}
Example 3 = {B=1,C=2,A=3}
As a matter of fact, you can define n! collations on an Alphabet of size n. Given such an order, different sorting routines likes LSD/MSD string sorts make use of it for sorting strings.
Collation determines how your data is sorted and compared. It's very often important with regards to internazionalization, e.g. how do you sort japanese kanji?
If you google collation and sql server you'll find plenty of articles discussing it!
Reference is taken from this Article:
A collation is a set of rules for comparing characters in a character set. It has also ruled for sorting of characters and proper order of two characters varies from language to language.
A Collation compared two strings like, if a word is greater than another one, and sort accordingly.
If you are using “latin1” Character set, you can use “latin1_swedish_ci” Collation.
You have to choose right collation because wrong collation may affect your database performance.
http://en.wikipedia.org/wiki/Collation
Collation is the assembly of written information into a standard order. (...) A collation algorithm such as the Unicode collation algorithm defines an order through the process of comparing two given character strings and deciding which should come before the other.
The collation is how SQL server decides on how to sort and compare text.
See MSDN.

How to choose collation of SQL Server database

What if I want to use database to store different groups of special characters, how do I choose which collation to use? For example, if I set collation to Croatian and want to use Russian cyrillic, japanese characters except croatian special characters - which collation should I use?
Thanks,
Ilija
You'd use nvarchar to store the data
COLLATION defines sorting and comparing
That means you can store Croatian, Russian and Japanese in the same column.
But when you want to compare (WHERE MyColumn = #foo) or sort (ORDER BY MyColumn) you'll not get what you expect because of the collation.
However, you can use the COLLATE clause to change it if needed.
eg ORDER BY MyColumn COLLATE Japanese_something
I'd go for your most common option that covers most of your data. MSDN has this maybe useful article

SQL Collation & Datatype: Support Both Western and Arabic data in a field

I have a Delphi + SQL Server (2k or 2005 supported) app that is used by both western and Arabic users. For some fields (i.e. name) my app needs to be able to support both Arabic language and western language characters.
Is it possible to set a single collation & datatype for a field to handle either English or Arabic data? NB: I do not want 2 separate DB's, I want one DB that supports both languages.
ISO 8859-6 (or its Windows codepage lookalike cp1256) gives you Arabic and Western characters in as much as the lower 128 characters are the same as ASCII. You are out of luck if you want non-ASCII ‘Western’ characters like the accented letters.
Better, though, would be to support all of Unicode. I don't know about Delphi, but in SQL Server you get NVARCHAR, which is your datatype of choice for native Unicode strings. (Stored as UTF-16LE internally.)
change type of columns from varchar() to nvarchar() in sqlserver
i try it and runin

When should I use the SQL Server Unicode 'N' Constant?

I've been looking into the use of the Unicode 'N' constant within my code, for example:
select object_id(N'VW_TABLE_UPDATE_DATA', N'V');
insert into SOME_TABLE (Field1, Field2) values (N'A', N'B');
After doing some reading around when to use it, and I'm still not entirely clear as to the circumstances under which it should and should not be used.
Is it as simple as using it when data types or parameters expect a unicode data type (as per the above examples), or is it more sophiticated than that?
The following Microsoft site gives an explanation, but I'm also a little unclear as to some of the terms it is using
http://msdn.microsoft.com/en-us/library/ms179899.aspx
Or to precis:
Unicode constants are interpreted as
Unicode data, and are not evaluated by
using a code page. Unicode constants
do have a collation. This collation
primarily controls comparisons and
case sensitivity. Unicode constants
are assigned the default collation of
the current database, unless the
COLLATE clause is used to specify a
collation.
What does it mean by:
'evaluated by using a code page'?
Collation?
I realise this is quite a broad question, but any links or help would be appreciated.
Thanks
Is it as simple as using it when data types or parameters expect a unicode data type?
Pretty much.
To answer your other points:
A code page is another name for encoding of a character set. For example, windows code page 1255 encodes Hebrew. This is normally used for 8bit encodings for characters. In terms of your question, strings may be evaluated using different code pages (so the same bit pattern may be interpreted as a Japanese character or an Arabic one, depending on what code page was used to evaluate it).
Collation is about how SQL Server is to order strings - this depends on code page, as you would order strings in different languages differently. See this article for an example.
National character nchar() and nvarchar() use two bytes per character and support international character set -- think internet.
The N prefix converts a string constant to two bytes per character. So if you have people from different countries and would like their names properly stored -- something like:
CREATE TABLE SomeTable (
id int
,FirstName nvarchar(50)
);
Then use:
INSERT INTO SomeTable
( Id, FirstName )
VALUES ( 1, N'Guðjón' );
and
SELECT *
FROM SomeTable
WHERE FirstName = N'Guðjón';