PL/SQL Chinese garbled in oracle - sql

My oracle version is 11g,installing on linux. Client is xp.
Now,By PL/SQL to query data and the chinese grabled; like this(Field Name):
In pl/sql execute command:" select userenv('language') from dual;" and show
SIMPLIFIED CHINESE_CHINA.AL32UTF8(I think it is Server-side character set)
So I look at the Windows xp registry: HKEY_LOCAL_MACHINE->SOFTWARE->Oracle->NLS_LANG.
It show:
SIMPLIFIED CHINESE_CHINA.ZHS16GBK (I think it is Client-side character set)
I changed it to
SIMPLIFIED CHINESE_CHINA.AL32UTF8
But the Chinese are still garbled.
And,This "NAME" field should actually show : "北京市".
I execute command:
select dump(name,1016) from MN_C11_SM_S31 where objectid=1;
and show:
Does that mean that the data itself is stored is incorrect?
How should I do?
Supplementary:Just,I used C# code to parse this string by UTF-8:"e58c97e4baace5b882".
and it show: "北京市".I think this proves the data itself is not wrong.

You need to be careful.
SQLplus and OracleSqlDeveloper might actually display Chinese characters incorrectly.
And you need to take an nvarchar (utf8) field for a chinese string.
Try using a C# Windows application to input, insert and display data, it uses unicode internally, so you can at least rule out that bug source. Don't use a console application for display, console applications can't display Unicode characters correctly.
Then, you need to store your string in an nvarchar field (not varchar), and use a parameter of type nvarchar, when you insert your string.
INSERT INTO YOUR_TABLE (newname)
VALUES (:UnicodeString)
command.Parameters.Add (":UnicodeString", OracleType.NVarChar).Value = stringToSave;

Related

How to not lose unicode characters in Stored Procedure parameter

Problem Statement:
We are having a legacy application with backend as SQL Server. Till now, we did not face any issues in passing non-unicode values. Now, we are getting unicode characters from user interface.
The unicode character is getting passed as given below, in UI. These data are being inserted into table.
Currently, we pass unicode characters like below and we are losing non-english characters.
EXEC dbo.ProcedureName #IN_DELIM_VALS = '삼성~AX~Aland Islands~ALLTest1~Aland Islands~~~~'
What we tried:
If we pass unicode with N prefix, the non-english characters are being inserted into table properly.
EXEC dbo.ProcedureName #IN_DELIM_VALS = N'삼성~AX~Aland Islands~ALLTest1~Aland Islands~~~~'
But, adding N prefix, requires UI code change. As it is legacy application, we want to avoid UI change. We want to handle in the sql server side.
when I read about passing parameter without N prefix, the data is implicitly converted to default code page and korean characters are getting lost. Reference
Prefix a Unicode character string constants with the letter N to
signal UCS-2 or UTF-16 input, depending on whether an SC collation is
used or not. Without the N prefix, the string is converted to the
default code page of the database that may not recognize certain
characters. Starting with SQL Server 2019 (15.x), when a UTF-8 enabled
collation is used, the default code page is capable of storing UNICODE
UTF-8 character set.
Our Ask:
Is there a way to add N prefix to the stored procedure parameter, before being assigned to stored procedure parameter and so, we are not losing unicode characters.
As it is not possible to add N prefix, after parameter is being passed to SQL Server, we are going with below application code change. Ideally, the application should pass the parameter with right nvarchar datatype, so that it is having N prefix.
SqlCommand cmd = new SqlCommand("EXEC dbo.ProcedureName #IN_DELIM_VALS", myConnection);
cmd.Parameters.Add(new SqlParameter("#IN_DELIM_VALS", SqlDbType.NVarChar,400)).Value
= "삼성~AX~Aland Islands~ALLTest1~Aland Islands~~~~";
cmd.ExecuteNonQuery();

Getting Unescaped JSON from SQL

I've created a stored procedure to pull data as a JSON object from my SQL Server database. All my data is relational and I'm trying to get it out as a JSON string.
Currently, I am able to get out a JSON string from SQL Server just fine, however this object ALWAYS includes escape characters (e.g. "{\"field\":\"value\"}). I'd like to pull the same JSON but without escaped characters. To test this I'm using some simple queries and getting them into .NET with a SqlDataAdapter using my stored procedure.
The thing that puzzles me is that when I run my query within SSMS, I never see any escape characters, but as soon as it's pulled a .NET application, the escape characters appear. I'd like to prevent this from happening and have my applications get only the unescaped JSON string.
I've tried several suggestions I've found during my research but nothing has produced my desired results. The changes I've seen (documented in MSDN and in other SO posts) have dealt with getting unescaped results, but only within SSMS and not within other applications.
What I've tried:
Simple Json query set to param and then using JSON_QUERY to select the param:
DECLARE #JSON varchar(max)
SET #JSON = (SELECT '{"Field":"Value"}' AS myJson FOR JSON PATH)
SELECT JSON_QUERY(#JSON) AS 'JsonResponse' FOR JSON PATH
This produces the following in a .NET application:
"[{\"JsonResponse\":{\"Field\":\"Value\"}}]"
This produces the following in SSMS:
[{"JsonResponse":[{"myJson":"{\"Field\":\"Value\"}"}]}]
Simple Json query without param using JSON_QUERY:
SELECT JSON_QUERY('{"Field":"Value"}') AS 'JsonResponse' FOR JSON PATH
This produces the following in a .NET application
"[{\"JsonResponse\":{\"Field\":\"Value\"}}]"
This produces the following in SSMS
[{"JsonResponse":{"Field":"Value"}}]
Simple Json query with temp tables using JSON_QUERY:
CREATE TABLE #temp(
jsoncol varchar(255)
)
INSERT INTO #temp VALUES ('{"Field":"Value"}')
SELECT JSON_QUERY(jsoncol) AS 'JsonResponse' FROM #temp FOR JSON PATH
DROP TABLE #temp
This produces the following in a .NET application:
"[{\"JsonResponse\":{\"Field\":\"Value\"}}]"
This produces the following in SSMS:
[{"JsonResponse":{"Field":"Value"}}]
I'm lead to believe that there is no way to get out a JSON string from SQL Server without having the escaped characters. In case the examples above weren't enough, I've included my stored procedure here. Hopefully someone can point me in the right direction.
This depends where you look at the string...
In SSMS a string is marked with single quotes. The double quote can exist within a string without problems:
DECLARE #SomeString = 'This can include "double quotes" but you have to double ''single quote''';
In a C# application the double quote is the string marker. So the above example would look like this:
string SomeString = "This must escape \"double quotes\" but you can use 'single quote' without problems";
Within your IDE (is it VS?) you can look at the string as is or as you'd need to be used in code. Your example shows " at the beginning and at the end of your string. That is a clear hint, that this is the option as in code. You could use this string and place it into your code. The real string, which is used and processed will not contain escape characters.
Hint: Escape characters are only needed in human-readable formats, where there are characters with special meaning (a ; in a CSV, a < in HTML and so on)...
UPDATE Some more explanation
Escape characters are needed to place a string within a string. Somehow you have to mark the beginning and the end of the string, but there is nothing else you can use then some magic characters.
In order to use these characters within the embedded string you have to go one the following ways:
escaping (e.g. XML will replace & with & and JSON will replace a " with \" as JSON uses the " to mark its labels) or
Magic borders (e.g. a CDATA-section in XML, which allows to place unescaped characters as is: <![CDATA[forbidden characters &<> allowed here]]>)
Whatever you do, you must distinguish between the visible string in an editor or in a text-based container like XML or JSON and the value the application will pick out of this.
An example:
<root><a>this & that</a></root>
visible string: "this & that"
real value: "this & that"

Update/insert/retrieve accented character in DB?

I am using oracle 12G
when i run #F:\update.sql from sql plus it displays accented character é as junk character when I retrieve from either sqlplus or sql developer
By when run the individual statement from sql plus. Now if I retrieve it from sqlplus, it displays the correct character, but when I retrieve it from sqldeveloper, it again displays the junk character.
update.sql content is this
update employee set name ='é' where id= 1;
What i want is when i run #F:\update.sql , it should insert/update/retrieve it in correct format whether it is from sqlplus or any other tool ?
For information :- when i run
SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER LIKE '%CHARACTERSET%'
i get below information
PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_CHARACTERSET WE8MSWIN1252
NLS_NCHAR_CHARACTERSET AL16UTF16
when i run #.[%NLS_LANG%] from command prompt i see
SP2-0310: unable to open file ".[AMERICAN_AMERICA.WE8MSWIN1252]"
I am not familiar with SQL Developer but I can give solution for SQL*Plus.
Presume you like to work in Windows CP1252
First of all ensure that the file F:\update.sql is saved in CP1252 encoding. Many editors call this encoding ANSI which is the same (let's skip the details about difference between term ANSI and Windows-1252)
Then before you run the script enter
chcp 1252
in order to switch encoding of your cmd.exe to CP1252. By default encoding of cmd.exe is most likely CP850 or CP437 which are different.
Then set NLS_LANG environment variable to character set WE8MSWIN1252, e.g.
set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
After that your script should work fine with SQL*Plus. SQL*Plus inherits the encoding (or "character set", if you prefer this term) from parent cmd.exe. NLS_LANG tells the Oracle driver which character set you are using.
Example Summary:
chcp 1252
set NLS_LANG=.WE8MSWIN1252
sqlplus username/password#db #F:\update.sql
Some notes: In order to set encoding of cmd.exe permanently, see this answer: Unicode characters in Windows command line - how?
NLS_LANG can be set either as Environment Variable or in your Registry at HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 32-bit Oracle Client), resp. HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 64-bit Oracle Client).
For SQL Developer check you options, somewhere it should be possible to define encoding of SQL files.
You are not forced to use Windows-1252. The same works also for other encoding, for example WE8ISO8859P1 (i.e. ISO-8859-1, chcp 28591) or UTF-8. However, in case of UTF-8 your SQL-script may contain characters which are not supported by database character set WE8MSWIN1252. Such characters would be replaced by placeholder (e.g. ¿).

Garbage data while inserting data with special characters in SQL Server 2012 using Perl

I have an XML file with data in multiple languages (eg. - Russian, Japanese, Chinese, English). This XML is created on Linux platform and it has passed xmllint checks.
Now, I am reading this data from XML file and inserting into SQL Server 2012 on Windows 7 platform (XML also present on Windows). But I am getting ???? as a value in fields. This is happening for some of the cases like all the sentence in other language.
But, if any sentence having some special characters it's working fine.
I am using function
$row_value = decode("utf-8",$row_value);
use Encode;
require Encode::Detect;
my $utf8 = decode("Detect", $data);
Try this for decode data...

Replace character in SQL results

This is from a Oracle SQL query. It has these weird skinny rectangle shapes in the database in places where apostrophes should be. (I wish we would could paste screen shots in here)
It looks like this when I copy and paste the results.
spouse�s
is there a way to write a SQL SELECT statement that searches for this character in the field and replaces it with an apostrophe in the results?
Edit: I need to change only the results in a SELECT statement for reporting purposes, I can't change the Database.
I ran this
select dump('�') from dual;
which returned
Typ=96 Len=3: 239,191,189
This seems to work so far
select translate('What is your spouse�s first name?', '�', '''') from dual;
but this doesn't work
select translate(Fieldname, '�', '''') from TableName
Select FN from TN
What is your spouse�s first name?
SELECT DUMP(FN, 1016) from TN
Typ=1 Len=33 CharacterSet=US7ASCII: 57,68,61,74,20,69,73,20,79,6f,75,72,20,73,70,6f,75,73,65,92,73,20,66,69,72,73,74,20,6e,61,6d,65,3f
EDIT:
So I have established that is the backquote character. I can't get the DB updated so I'm trying this code
SELECT REGEX_REPLACE(FN,"\0092","\0027") FROM TN
and I"m getting ORA-00904:"Regex_Replace":invalid identifier
This seems a problem with your charset configuracion. Check your NLS_LANG and others NLS_xxx enviroment/regedit values. You have to check the oracle server, your client and the client of the inserter of that data.
Try to DUMP the value. you can do it with a select as simple as:
SELECT DUMP(the_column)
FROM xxx
WHERE xxx
UPDATE: I think that before try to replace, look for the root of the problem. If this happens because a charset trouble you can get big problems with bad data.
UPDATE 2: Answering the comments. The problem may be is not on the database server side, may be is in the client side. The problem (if this is the problem) can be a translation on server to/from client comunication. It's for a server-client bad configuracion-coordination. For instance if the server has defined UTF8 charset and your client uses US7ASCII, then all acutes will appear as ?.
Another approach can be that if the server has defined UTF8 charset and your client also UTF8 but the application is not able to show UTF8 chars, then the problem is in the application side.
UPDATE 3: On your examples:
select translate('What. It works because the � is exactly the same char: You have pasted on both sides.
select translate(Fieldname. It does not work because the � is not stored on database, it's the char that the client receives may be because some translation occurs from the data table until it's showed to you.
Next step: Look in DUMP syntax and try to extract the codes for the mysterious char (from the table not pasting �!).
I would say there's a good chance the character is a single-tick "smart quote" (I hate the name). The smart quotes are characters 91-94 (using a Windows encoding), or Unicode U+2018, U+2019, U+201C, and U+201D.
I'm going to propose a front-end application-based, client-side approach to the problem:
I suspect that this problem has more to do with a mismatch between the font you are trying to display the word spouse�s with, and the character �. That icon appears when you are trying to display a character in a Unicode font that doesn't have the glyph for the character's code.
The Oracle database will dutifully return whatever characters were INSERTed into its' column. It's more up to you, and your application, to interpret what it will look like given the font you are trying to display your data with in your application, so I suggest investigating as to what this mysterious � character is that is replacing your apostrophes. Start by using FerranB's recommended DUMP().
Try running the following query to get the character code:
SELECT DUMP(<column with weird character>, 1016)
FROM <your table>
WHERE <column with weird character> like '%spouse%';
If that doesn't grab your actual text from the database, you'll need to modify the WHERE clause to actually grab the offending column.
Once you've found the code for the character, you could just replace the character by using the regex_replace() built-in function by determining the raw hex code of the character and then supplying the ASCII / C0 Controls and Basic Latin character 0x0027 ('), using code similar to this:
UPDATE <table>
set <column with offending character>
= REGEX_REPLACE(<column with offending character>,
"<character code of �>",
"'")
WHERE regex_like(<column with offending character>,"<character code of �>");
If you aren't familiar with Unicode and different ways of character encoding, I recommend reading Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I wasn't until I read that article.
EDIT: If your'e seeing 0x92, there's likely a charset mismatch here:
0x92 in CP-1252 (default Windows code page) is a backquote character, which looks kinda like an apostrophe. This code isn't a valid ASCII character, and it isn't valid in IS0-8859-1 either. So probably either the database is in CP-1252 encoding (don't find that likely), or a database connection which spoke CP-1252 inserted it, or somehow the apostrophe got converted to 0x92. The database is returning values that are valid in CP-1252 (or some other charset where 0x92 is valid), but your db client connection isn't expecting CP-1252. Hence, the wierd question mark.
And FerranB is likely right. I would talk with your DBA or some other admin about this to get the issue straightened out. If you can't, I would try either doing the update above (seems like you can't), or doing this:
INSERT (<normal table columns>,...,<column with offending character>) INTO <table>
SELECT <all normal columns>, REGEX_REPLACE(<column with offending character>,
"\0092",
"\0027") -- for ASCII/ISO-8859-1 apostrophe
FROM <table>
WHERE regex_like(<column with offending character>,"\0092");
DELETE FROM <table> WHERE regex_like(<column with offending character>,"\0092");
Before you do this you need to understand what actually happened. It looks to me that someone inserted non-ascii strings in the database. For example Unicode or UTF-8. Before you fix this, be very sure that this is actually a bug. The apostrophe comes in many forms, not just the "'".
TRANSLATE() is a useful function for replacing or eliminating known single character codes.