Handling chinese characters in SQL Server 2016 - sql

Our ETL team is sending us some data with chinese description. When we are loading that data in our SQL Server database, those descriptions are coming up as blank.
We tried changing the column format to nvarchar, but that doesnt help.
Can you please help.
Thanks

You must use the N prefix when dealing with NVARCHAR.
INSERT INTO table (column) VALUES (N'chinese characters')
Prefix a Unicode character string constants with the letter N to
signal UCS-2 or UTF-16 input, depending on whether an SC collation is
used or not. Without the N prefix, the string is converted to the
default code page of the database that may not recognize certain
characters. Starting with SQL Server 2019 preview, when a UTF-8
enabled collation is used, the default code page is capable of storing
UNICODE UTF-8 character set.
Source: https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017

Related

How to fix character encoding in sql query

I have a db2 database where I store names containing special characters. When I try to retrieve them with an internal software, I get proper results. However when I tried to do the same with queries or look into the db, the characters are stored strangely.
The documentation says that the encoding is utf-8 latin1.
My query looks something like this:
SELECT firstn, lastn
FROM unams
WHERE unamid = 12345
The user with the given ID has some special characters in his/her name: é and ó, but the query returns it as Ă© and Ăł.
Is there a way to convert the characters back to their original form with using some simple SQL function? I am new to databases and encoding, trying to understand the latter by reading this but I'm quite lost.
EDIT: Currently sending queries via SPSS Modeler with a proper ODBC driver, the database lies on a Windows Server 2016
Per the comments, the solution was to create a Windows environment variable DB2CODEPAGE=1208 , then restart, then drop and re-populate the tables.
If the applications runs locally on the Db2-server (i.e. only one hostname is involved) then the same variable can be set. This will impact all local applications that use the UTF-8 encoded database.
If the application runs remotely from the Db2-server (i.e. two hostnames are involved) then set the variable on the workstation and on the Windows Db2-server.
Current versions of IBM supplied Db2-clients on Windows will derive their codepage from the regional settings which might not always render Unicode characters correctly, so using the DB2CODEPAGE=1208 forces the Db2-client CLI drivers to use a Unicode application code page to override this.
with t (firstn) as (
values ('éó')
--SELECT firstn
--FROM unams
--WHERE unamid = 12345
)
select x.c, hex(x.c) c_hes
from
t
, xmltable('for $id in (1 to string-length($s)) return <i>{substring($s, $id, 1)}</i>'
passing t.firstn as "s" columns tok varchar(6) path '.') x(c);
C C_HEX
- -----
é C3A9
ó C3B3
The query above converts the string of characters to a table with each character (C) and its hex representation (C_HEX) in each row.
You can run it as is to check if you get the same output. It must be as described for a UTF-8 database.
Now try to comment out the line with values ('éó') and uncomment the select statement returning some row with these special characters.
If you see the same hex representation of these characters stored in the firstn column, then this means, that the string is stored appropriately, but your client tool (SPSS Modeller) can't show these characters correctly due to some reason (wrong font, for example).

DB2 to COBOL String Losing Line Feed and Carriage Returns

i'm trying to grab some data out of a table. The Variable is VARCHAR 30000 and when I use COBOL EXEC SQL to retrieve it, the string is returned but without the necessary (hex) Line Feeds and Carriage Returns. I inspect the string one character at a time looking for the Hex Values of 0A or 0D but they never come up.
The LF and CR seem to be lost as soon as fill the string into my cobol variable.
Ideas?
If the data is stored / converted to ebcdic when retrieved on the mainframe, you should get the EBCDIC New-Line characters x'15' decimal=21 rather than 0A or 0D.
It is only if you are retrieving the data in ASCII / UTF-8 that you would get 0A or 0D.
Most java editors can edit EBCDIC (with the EBCDIC New-Line Character x'15') just as easily as ASCII (with \n), not sure about Eclipse though.
I have seen situations where CR and LF were in data in the database. These are valid characters so it's possible for them to be stored there.
Have you tried to confirm that there really are CR and LF characters in the database using some other tool or method? My Z-Series experience is quite limited, so I'm unable to suggest options. However, there must be some equivalent of SSMS and SQL Server on the Z-Series to query the DB2 database.
Check out this SO link on querying DB2 and cleaning up CR and LF characters.
DB2/iSeries SQL clean up CR/LF, tabs etc
Well, I believe this could be dialect dependent (both COBOL and DB2) but if it were me I would be using FOR BIT DATA on the VARCHAR in the table definition. Your issue could also relate to the code page defined for the database in which the table resides.
I routinely store all kinds of binary, EBCDIC and Unicode data mixed within the same VARCHAR FOR BIT DATA column with no problems, and all you are trying to do is include CR & LF. My approach works in both DB2 z/OS and DB2 LUW.
I hope this helps.

Oracle SQL loader: Convert foreign chars to English

I'm using SQL loader to load a data file into an Oracle table. The database has a character set of US7ASCII. Some of the records in the data file contain European special characters such as é or ü or î. When they are loaded into the table, the results are weird with 2 strange icons (like black diamond with an arrow inside).
Is there anyway to get SQL loader to load the nearest English equivalent instead? So the é would be loaded as e, the ü would be loaded as u and the î would be loaded as i?
First establish what is actually being stored. Use the Oracle SQL DUMP function, e.g. SELECT dump ( colname, 16 ) from table.
This will return the HEX code for the data in colname.
If the hex code matches the hex code (od -h) of the input then the data is stored correctly, and the issue you have is translation on display. See my answer below for how to deal with this:
when insert persian character in oracle db i see the question mark
The more likely issue you have is your database character US7ASCII cannot cope with extended character sets.

SQL Server database with Latin1 codepage shows Japanese Chars as "?"

Three questions with the following scenario:
SQL Server 2005 production db with a Latin1 codepage and showing "?" for invalid chars in Management Studio.
SomeCompanyApp client as a service that populates the data from servers and workstations.
SomeCompanyApp management console that shows "?" for Asian characters.
Since this is a prod db I will not write to it.
I don't know if the client app that is storing the data in the database is actually storing it correctly as Unicode and it simply doesn't show because they are using Latin1 for the console.
Q1: As I understand it, SQL Server stores nvarchar text as Unicode regardless of the codepage or am I completely wrong and if the codepage is Latin1 then everything that is not in that codepage gets converted to "?".
Q2: Is it the same with a text column?
Q3: Is there a way using SQL Server Management Studio or Visual Studio and some code (don't care which language :)) to query the db and show me if the chars really do show up as Japanese, Chinese, Korean, etc.?
My final goal is to extract data from the db and store it in another db using UTF-8 to show Japanese and other Asian chars as what they are in my own client webapp. I will settle for an answer to Q3. I can code in several languages and at the very least understand some others but I'm just not knowledgeable enough about Unicode. In case you want to know my webapp will be using pyodbc and cassandra but for these questions that doesn't matter.
When inserting into an NVARCHAR column in SSMS, you need to make absolutely sure you're prefixing your string with a N:
This will NOT work:
INSERT INTO dbo.MyTable(NVarcharColumn) VALUES('Some Text with Special Char')
SQL Server will interpret your string in the VALUES(..) as VARCHAR and thus strip off any special characters.
You need this:
INSERT INTO dbo.MyTable(NVarcharColumn) VALUES(N'Some Text with Special Char')
Prefixing your text literal with an N'..' tells SQL Server to treat this as NVARCHAR all the way.
Does this help you solve your Q3 ??

Why are my accented characters breaking in SQL Server 2005?

When I update my database with this command:
UPDATE myTable SET Name = 'Hermann Dönnhoff' WHERE ID = 123;
SQL Server actually puts 'Hermann Do¨nnhoff' in the field instead. Instead of faithfully inserting the o-umlaut (char(246)), I'm getting two characters ( char(111) + char (168) ).
This happens for all characters that have accent marks, not just umlauts.
Has anybody seen this?
Thank you.
You need to use the nchar, nvarchar, or ntext datatypes for Unicode data.
The issue is that your code page does not directly support those characters.
Read up on collations for more information:
http://msdn.microsoft.com/en-us/library/aa214408%28SQL.80%29.aspx
http://msdn.microsoft.com/en-us/library/aa174903%28SQL.80%29.aspx