I created a database in SQL Server with COLLATION KR949_BIN2, which means that the codepage of this database is 949.
Is it possible to get the encoded value of a character based on the codepage in this database?
For example, the encoded value of character '좩' in codepage 949 is 0xA144, is there a SQL statement that I can get 0xA144 from char '좩' in this database?
Also, is there a way to insert '좩' into a column by its encoded value 0xA144?
Based on Character data is represented incorrectly when the code page of the client computer differs from the code page of the database in SQL Server 2005 I suspect that you're actually using the Korean_Wansung_CI_AS collation or something similar:
Method 2: Use an appropriate collation for the database
If you must use a non-Unicode data type, always make sure that the code page of the database and the code page of any non-Unicode columns can store the non-Unicode data correctly. For example, if you want to store code page 949 (Korean) character data, use a Korean collation for the database. For example, use the Korean_Wansung_CI_AS collation for the database.
That being the case, yes, you can see and insert 0xA144 as per the following example:
create table #Wansung (
[Description] varchar(50),
Codepoint varchar(50) collate Korean_Wansung_CI_AS
);
insert #Wansung ([Description], Codepoint)
select 'U+C8A9 Hangul Syllable Jwaeg', N'좩';
insert #Wansung ([Description], Codepoint)
select 'From Windows-949 encoding', 0xA144;
select [Description], Codepoint, cast(Codepoint as varbinary(max)) as Bytes, cast(unicode(Codepoint) as varbinary(max)) as UTF32
from #Wansung;
Which returns the results:
Description
Codepoint
Bytes
UTF32
U+C8A9 Hangul Syllable Jwaeg
좩
0xA144
0x0000C8A9
From Windows-949 encoding
좩
0xA144
0x0000C8A9
Related
I have seen prefix N in some insert T-SQL queries. Many people have used N before inserting the value in a table.
I searched, but I was not able to understand what is the purpose of including the N before inserting any strings into the table.
INSERT INTO Personnel.Employees
VALUES(N'29730', N'Philippe', N'Horsford', 20.05, 1),
What purpose does this 'N' prefix serve, and when should it be used?
It's declaring the string as nvarchar data type, rather than varchar
You may have seen Transact-SQL code that passes strings around using
an N prefix. This denotes that the subsequent string is in Unicode
(the N actually stands for National language character set). Which
means that you are passing an NCHAR, NVARCHAR or NTEXT value, as
opposed to CHAR, VARCHAR or TEXT.
To quote from Microsoft:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
If you want to know the difference between these two data types, see this SO post:
What is the difference between varchar and nvarchar?
Let me tell you an annoying thing that happened with the N' prefix - I wasn't able to fix it for two days.
My database collation is SQL_Latin1_General_CP1_CI_AS.
It has a table with a column called MyCol1. It is an Nvarchar
This query fails to match Exact Value That Exists.
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = 'ESKİ'
// 0 result
using prefix N'' fixes it
SELECT TOP 1 * FROM myTable1 WHERE MyCol1 = N'ESKİ'
// 1 result - found!!!!
Why? Because latin1_general doesn't have big dotted İ that's why it fails I suppose.
1. Performance:
Assume your where clause is like this:
WHERE NAME='JON'
If the NAME column is of any type other than nvarchar or nchar, then you should not specify the N prefix. However, if the NAME column is of type nvarchar or nchar, then if you do not specify the N prefix, then 'JON' is treated as non-unicode. This means the data type of NAME column and string 'JON' are different and so SQL Server implicitly converts one operand’s type to the other. If the SQL Server converts the literal’s type
to the column’s type then there is no issue, but if it does the other way then performance will get hurt because the column's index (if available) wont be used.
2. Character set:
If the column is of type nvarchar or nchar, then always use the prefix N while specifying the character string in the WHERE criteria/UPDATE/INSERT clause. If you do not do this and one of the characters in your string is unicode (like international characters - example - ā) then it will fail or suffer data corruption.
Assuming the value is nvarchar type for that only we are using N''
In sql Server I join two query using UNION and my "Address" column has nText datatype so it's have an issue in Distinct. So i have to convert Address column nText to varchar but in output i got symbolic data in Adress. Actual data is in our local 'Gujarati' Language.
varchar: Variable-length, non-Unicode character data. The database collation determines which code page the data is stored using.
nvarchar: Variable-length Unicode character data. Dependent on the database collation for comparisons.
so change your type varchar to nvarchar it will sort your issue..
Same issue face by me during storing arabic charachters
Please use "nvarchar" instead of "varchar"
The n in ntext basically means "Unicode." In order to maintain those characters, you need to cast to another Unicode type.
The Unicode equivalent to varchar is nvarchar, so your query might end up looking like:
SELECT DISTINCT CONVERT(nvarchar(max), [Address])
FROM YourTable
Given SQL Server 2012, how can I insert control characters (the ones coded under ASCII 32, like TAB, CR, LF) in a nvarchar or varchar column from a SQL script?
Unless I miss something in the question you can do this using TSQL CHAR() function:
INSERT INTO MyTable(ColName) VALUES(CHAR(13) + CHAR(10))
Will insert CR/LF. Same for other codes.
Edit there is TSQL NCHAR() as well for Unicode characters.
Please note that the function may vary depending on the type of your column, using the wrong function can result in wrong encoding.
nchar/nvarchar
http://technet.microsoft.com/en-us/library/ms186939.aspx
char/varchar
http://technet.microsoft.com/en-us/library/ms176089.aspx
Consider the following script - the second INSERT statement throws a primary key violation.
BEGIN TRAN
CREATE TABLE UnicodeQuestion
(
UnicodeCol NVARCHAR(100)
COLLATE Latin1_General_CI_AI
)
CREATE UNIQUE INDEX UX_UnicodeCol
ON UnicodeQuestion ( UnicodeCol )
INSERT INTO UnicodeQuestion (UnicodeCol) VALUES (N'ae')
INSERT INTO UnicodeQuestion (UnicodeCol) VALUES (N'æ')
ROLLBACK
As I understand it, if I want to have my index treat these values separately, I need to use a binary collation. But there are many binary collations, and they have individual cultures in their names! I don't want culture-sensitive treatment...
Which collation should I use when storing arbitrary Unicode data in nvarchar columns?
For Unicode data it is irrelevant what binary collation you choose.
For Unicode data types, data comparisons are based on the Unicode
code points. For binary collations on Unicode data types, the locale
is not considered in data sorts. For example, Latin_1_General_BIN and
Japanese_BIN yield identical sorting results when used on Unicode
data.
The reason for having locale specific BIN collations is that this determines the code page used when dealing with non Unicode data.
I am using Sql Server 2008 R2 Enterprise. I am coding an application capable of inserting, updating, deleting and selecting records from a Sql tables. The application is making errors when it comes to the records that contain special characters such as ć, č š, đ and ž.
Here's what happens:
The command:
INSERT INTO Account (Name, Person)
VALUES ('Boris Borenović', 'True')
WHERE Id = '1'
inserts a new record but the Name field is Boris Borenovic, so character ć is changed to c.
The command:
SELECT * FROM Account
WHERE Name = 'Boris Borenović'
returns the correct record, so again the character ć is replaced by c and the record is returned.
Questions:
Is it possible to make Sql Server save the ć and other special characters mentioned earlier?
Is it still possible, if the previous question is resolved, to make Sql be able to return the Boris Borenović record even if the query asks for Boris Borenovic?
So, when saving records I want Sql to save exactly what is given, but when retrieving the records, I want it to be able to ingnore the special characters. Thanks for all the help.
1) Make sure the column is of type nvarchar rather than varchar (or nchar for char)
2) Use N' at the start of string literals containing such strings, e.g. N'Boris Borenović'
3) If you're using a client library (e.g. ADO.Net), it should handle Unicode text, so long as, again, the parameters are marked as being nvarchar/nchar instead of varchar/char
4) If you want to query and ignore accents, then you can add a COLLATE clause to your select. E.g.:
SELECT * FROM Account
WHERE Name = 'Boris Borenovic' COLLATE Latin1_General_CI_AI
Where _CI_AI means Case Insensitive, Accent Insensitive, should return all rows with all variants of the "c" at the end.
5) If the column in the table is part of a UNIQUE/PK constraint, and you need it to contain both "Boris Borenović" and "Boris Borenovic", then add a COLLATE clause to the column definition, but this time use a collation with "_AS" at the end, which says that it's accent sensitive.
To allow SQL Server to store special characters, use nvarchar instead of varchar for the column type.
When retrieving, you can force a accent-insensitve collation so that it ignores the different C's:
WHERE Name = 'Boris Borenović' COLLATE Cyrillic_General_CI_AI
Here, CI stands for Case Insensitive, and AS for Accent Insensitive.
I've faced with the same problem and after some researching:
https://dba.stackexchange.com/questions/139551/how-do-i-set-a-sql-server-unicode-nvarchar-string-to-an-emoji-or-supplementary
What is the difference between varchar and nvarchar?
I altered type of needed fields:
ALTER TABLE [table_name] ALTER COLUMN column_name [nvarchar]
GO
And it works!