SQL column collation change - sql

I would like to change a column collation to some Polish collation and be able to view Polish characters properly. All three, original column, original table and original database, use SQL_Scandinavian_CP850_CS_AS.
For column collation change I tried:
SELECT CAST([ColumnName] AS nvarchar(50)) COLLATE Polish_CI_AS FROM t1
These 3 example letters appear in Scandinavian table:
SELECT 'ØùÒ' COLLATE Polish_CI_AS
Should return in results łŚń. Instead it shows 'OuO'.

Unfortunately SQL Server does not support OEM code page 852 which is what you need to convert code page 850 data into if you want to convert 'ØùÒ' to 'łŚń'. You can change the collation of data without SQL Server doing character mapping by CASTing through varbinary, but this only works with supported collations.
An alternative approach might be to create a user-defined function that takes a string and maps characters one-at-a-time, so Ø maps to ł etc. Fiddly to do, there are (up to) 127 characters to map, but not difficult.

Related

SQL NVARCHAR(MAX) returning ASCII and Weird Characters instead of Text

I have an SQL Table and I'm trying to return the values as a string.
The values should be city names like Sydney, Melbourne, Port Maquarie etc.
But When I run a select I either get black results or as detailed in the first picture some strange backwards L character. The column is an NVARCHAR(MAX)
SELECT ctGlobalName FROM Crm.Cities
Then I tried using MSSQL's Edit top 200 rows feature and I could see the names of the cities, but also all these weird ascii characters.
Now I didn't create the database, I'm just running queries on it. Some things I've read have suggested it is a problem with the Collation. But the table is SQL_Latin1_General_CP1_CI_AS which matches the server collation.
I'm sure there must be something I can add to my select query to return the values as an ordinary string. Is there something I can do to my select query to return the expected format without the weird characters?
An NVARCHAR datatype can store Unicode characters, which are used for languages that are not supported by the ASCII character set i.e. non-English (or related) languages such as Chinese or Indonesian. If your SQL Server or Windows doesn't have that language installed then you might see strange-looking representations of the data.
On the other hand, it could also be that the application that updates this table has just stored bad data in that column.
Either way you might need to do some string manipulation to strip out the characters you don't want.

Using unicode in "like" operator

In Sybase ASE 15, how can I use unicode to search for records containing a certain character using the like operator? For example, U+001E
pseudo-code:
select * from table where field like "%{U+001E}%"
Make sure the collation on your table supports unicode.
Use Nvarchar/Nchar. There used to be an Ntext datatype as well,
but it's depreciated now in favour of Nvarchar.
The columns take up twice as much space over the non-unicode counterparts
(char and varchar).
Then when manually inserting into them, use N to indicate it's unicode text:
INSERT INTO MyTable(SomeNvarcharColumn) VALUES (N'français')

MS SQL Server 2008 Encoding

I have table in database with Lithuanian_100_CI_AS collation. Some rows has text fields with text, which contains random symbols instead of Lithuanian ones. Is it possible to change the encoding, that i would see the letters i need? Changing collation does totally nothing.
If you have got the data like this (manipulated) then you can not realy save it by changing the collation, but if you set the right collation this could help you to get the data written in a right way to your database (more relevant for the future)
No, the data is random.
You need to
use nvarchar to store this data correctly
ensure the client is using nvarchar for parameters
ensure all string constants have N in front (example: N'foobar')
The collation is not encoding: it only determins how strings and compared/sported, but determines the code page for non-unicode (unicode = nvarchar) columns
Note, the data types "text" and "ntext" are deprecated in SQL Server. Use the max types

Problem with SQL Collation

I'm making an Arabic website , and after I create the database and start writing Arabic text inside it , it just show ???? , so I change the collation of my Database from SQL_Latien to Arabic_CI_AI
but I'm still getting the ???? inside my fields and when I check the properties of the field I found it SQL_Latien and it doesn't change
so what should I do to fix this problem without repeating building the database
please reply as soon as you can
Thanks in Advance
Database collation is just the default setting for new columns.
To change the collation of an existing column, you'd have to alter table. For example:
alter table YourTable alter column col1 varchar(10) collate Arabic_CI_AI
The collation sequence is the order in which characters appear when you sort (ie. use the 'ORDER BY' clause). Different collations will result in different sort orders.
This is obviously NOT what you are looking for. You problem is storing and retrieving UNICODE characters outside the ASCII range (ie. Arabic characters). To do that, the data types storing this data must support UNICODE, instead of ASCII. Simply, when defining a column, use the data types nchar, nvarchar, and ntext, instead of char, varchar and text.

When should I use the SQL Server Unicode 'N' Constant?

I've been looking into the use of the Unicode 'N' constant within my code, for example:
select object_id(N'VW_TABLE_UPDATE_DATA', N'V');
insert into SOME_TABLE (Field1, Field2) values (N'A', N'B');
After doing some reading around when to use it, and I'm still not entirely clear as to the circumstances under which it should and should not be used.
Is it as simple as using it when data types or parameters expect a unicode data type (as per the above examples), or is it more sophiticated than that?
The following Microsoft site gives an explanation, but I'm also a little unclear as to some of the terms it is using
http://msdn.microsoft.com/en-us/library/ms179899.aspx
Or to precis:
Unicode constants are interpreted as
Unicode data, and are not evaluated by
using a code page. Unicode constants
do have a collation. This collation
primarily controls comparisons and
case sensitivity. Unicode constants
are assigned the default collation of
the current database, unless the
COLLATE clause is used to specify a
collation.
What does it mean by:
'evaluated by using a code page'?
Collation?
I realise this is quite a broad question, but any links or help would be appreciated.
Thanks
Is it as simple as using it when data types or parameters expect a unicode data type?
Pretty much.
To answer your other points:
A code page is another name for encoding of a character set. For example, windows code page 1255 encodes Hebrew. This is normally used for 8bit encodings for characters. In terms of your question, strings may be evaluated using different code pages (so the same bit pattern may be interpreted as a Japanese character or an Arabic one, depending on what code page was used to evaluate it).
Collation is about how SQL Server is to order strings - this depends on code page, as you would order strings in different languages differently. See this article for an example.
National character nchar() and nvarchar() use two bytes per character and support international character set -- think internet.
The N prefix converts a string constant to two bytes per character. So if you have people from different countries and would like their names properly stored -- something like:
CREATE TABLE SomeTable (
id int
,FirstName nvarchar(50)
);
Then use:
INSERT INTO SomeTable
( Id, FirstName )
VALUES ( 1, N'Guðjón' );
and
SELECT *
FROM SomeTable
WHERE FirstName = N'Guðjón';