Difference between N'String' vs U'String' literals in Oracle - sql

What is the meaning and difference between these queries?
SELECT U'String' FROM dual;
and
SELECT N'String' FROM dual;

In this answer i will try to provide informations from official resources
(1) The N'' text Literal
N'' is used to convert a string to NCHAR or NVARCHAR2 datatype
According to this Oracle documentation Oracle - Literals
The syntax of text literals is as follows:
where N or n specifies the literal using the national character set (NCHAR or NVARCHAR2 data).
Also in this second article Oracle - Datatypes
The N'String' is used to convert a string to NCHAR datatype
From the article listed above:
The following example compares the translated_description column of the pm.product_descriptions table with a national character set string:
SELECT translated_description FROM product_descriptions
WHERE translated_name = N'LCD Monitor 11/PM';
(2) The U'' Literal
U'' is used to handle the SQL NCHAR String Literals in Oracle Call Interface (OCI)
Based on this Oracle documentation Programming with Unicode
The Oracle Call Interface (OCI) is the lowest level API that the rest of the client-side database access products use. It provides a flexible way for C/C++ programs to access Unicode data stored in SQL CHAR and NCHAR datatypes. Using OCI, you can programmatically specify the character set (UTF-8, UTF-16, and others) for the data to be inserted or retrieved. It accesses the database through Oracle Net.
OCI is the lowest-level API for accessing a database, so it offers the best possible performance.
Handling SQL NCHAR String Literals in OCI
You can switch it on by setting the environment variable ORA_NCHAR_LITERAL_REPLACE to TRUE. You can also achieve this behavior programmatically by using the OCI_NCHAR_LITERAL_REPLACE_ON and OCI_NCHAR_LITERAL_REPLACE_OFF modes in OCIEnvCreate() and OCIEnvNlsCreate(). So, for example, OCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_ON) turns on NCHAR literal replacement, while OCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_OFF) turns it off.
[...] Note that, when the NCHAR literal replacement is turned on, OCIStmtPrepare and OCIStmtPrepare2 will transform N' literals with U' literals in the SQL text and store the resulting SQL text in the statement handle. Thus, if the application uses OCI_ATTR_STATEMENT to retrieve the SQL text from the OCI statement handle, the SQL text will return U' instead of N' as specified in the original text.
(3) Answer for your question
From datatypes perspective, there is not difference between both queries provided

N'string' just returns the string as NCHAR type.
U'string' returns also NCHAR type, however it does additional processing to the string: it replaces \\ with \ and \xxxx with Unicode code point U+xxxx, where xxxx are 4 hexadecimal digits. This is similar to UNISTR('string'), the difference is that the latter returns NVARCHAR2.
U' literals are useful when you want to have a Unicode string independent from encoding and NLS settings.
Example:
select n'\€', u'\\\20ac', n'\\\20ac' from dual;
N'\€' U'\\\20AC' N'\\\20AC'
----- ---------- ----------
\€ \€ \\\20ac

when using N' we denote that given datatype is NCHAR or NVARCHAR.
U' is used to denote unicode

The documented N'' literals are the same as standard character literals ('') except that their data type is NVARCHAR2 and not VARCHAR2. It is important to note that the characters in these literals, together with the entire SQL statement, are converted from the client character set to the database character set when transmitted to the server. All characters from the literals that are not supported by the database character set are lost.
The data type of the undocumented U'' literals is also NVARCHAR2. The content of a U'' literal is interpreted like the input to the SQL UNISTR function. That is, each character sequence \xxxx, where each x is one hex digit, is interpreted as a UTF-16 code point U+xxxx. I am not sure why the U'' literals are undocumented. I can only guess. They are used internally by the NCHAR literal replacement feature, which, when enable on a client, automatically translates N'' literals to U'' literals. This prevents the mentioned data loss due to character set conversion and enables literal Unicode data to be provided for NVARCHAR2 columns even if the database character set is not Unicode.
The two queries in this thread's question are generally not equivalent because the literal text would be interpreted differently. However, if no backslash is present in the literals, no difference can be observed.

Related

How to change charset of table/column - Oracle

I have the following table:
Can i change charset of column SCHEMA_ERD to UTF-8?
If it isn't possible maybe i can change charset only for that table?
Can someone give me example how to do it?
I want to keep emojis inside that SCHEMA_ERD that is why i need change charset to UTF-8.
As far as I can tell, in Oracle, character set is database's characteristic. It means that you should change character set for the whole database (i.e. you can't do that for only one table (or column, or schema).
However, you could use NVARCHAR2 or NCLOB so I suggest you to try it. Because, changing character set is not that simple task.
Issue this query to see your available character sets in your database
select * from nls_database_parameters
where parameter in ( 'NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
PARAMETER VALUE
------------------------- ----------
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_CHARACTERSET WE8ISO8859P1
The character set in NLS_CHARACTERSET is what is used for normal VARCHAR2.
The NCHAR character set is used for NVARCHAR2 or NCLOB.
You have no other chance wihtout support of your DBA
But note, that the NCHAR character set is typically set to support UTF8 or UTF16 so it should work for you.

Handling chinese characters in SQL Server 2016

Our ETL team is sending us some data with chinese description. When we are loading that data in our SQL Server database, those descriptions are coming up as blank.
We tried changing the column format to nvarchar, but that doesnt help.
Can you please help.
Thanks
You must use the N prefix when dealing with NVARCHAR.
INSERT INTO table (column) VALUES (N'chinese characters')
Prefix a Unicode character string constants with the letter N to
signal UCS-2 or UTF-16 input, depending on whether an SC collation is
used or not. Without the N prefix, the string is converted to the
default code page of the database that may not recognize certain
characters. Starting with SQL Server 2019 preview, when a UTF-8
enabled collation is used, the default code page is capable of storing
UNICODE UTF-8 character set.
Source: https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017

Losing special characters on insert

I am using oracle 11g and trying to insert a string containing special UTF8 characters eg '(ε- c'. The NLS character sets for the databse are...
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_CHARACTERSET WE8ISO8859P1
when I copy and paste the above string into a NVARCHAR field it works fine.
if I execute the below I get an upside down question mark in the field
insert into title_debug values ('(ε- c');
where title debug table consists of a single NVARCHAR2(100) field called title.
I have attempted to assign this string to a NVARCHAR2(100) variable then iserting this. And also attempted all the different CAST / CONVERT ect functions I can find and nothing is working.
Any assistance would be greatly appreciated.
UPDATE
I have executed
select dump(title, 1016), dump(title1, 1016)
into v_title, v_title1
from dual
where title is the string passed in as a varchar and title1 is the string passed in as a NVarchar.
Unsuprisingly the encodings come through as WE8ISO8859P1 and AL16UTF16. but on both the ε comes through as hex 'BF'. This is the upside down Question mark.
My only thought left is to try and pass this through as a raw and then do something with it. However I have not yet been able to figure out how to convert the string into a acceptable format with XQuery (OSB).
Continued thanks for assistance.
Our DBA found the solution to this issue. The answer lay in a setting on the dbc connection on the bus to tell it to convert utf8 to NChar.
On The connection pool page add the following lines to the Properties box.
oracle.jdbc.convertNcharLiterals=true
oracle.jdbc.defaultNchar=true
this will allow you to be able to insert into NVarchar2 fields while maintaining the utf8 characters.
Cheers
I did a test with '(ε- c'. And i have not encountered any problems.
If you can i advice you to change your character set for :
NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
And Oracle recommendation for all new deployment is the Unicode character set AL32UTF8.
Because it's flexible globalization support and is a universal character set
reg,
First verify the data is being stored correctly, then use the correct NLS_LANG settings. See my answer to this question:
when insert persian character in oracle db i see the question mark

which data type can save bangla Language in sql server?

I want to save bangla Language in sql server. Using which data type I can Do it in sql server 2005 or sql server 2008.
I tried varchar and varbinary type but it cannot save bangla Language.
How is it possible?
You're using SQL_Latin1_General_CP1_CI_AS for your collation, which is suited for the Latin character set (ISO-8859-1). To store characters fromother character sets, you can use the NVARCHAR() which can store the full Unicode range, irrespective of collation - this does mean it will need to be treated as NVARCHAR() all the way, as quoted constants (e.g. N'বাংলা Bangla'), as the data types for parameters to stored procedures, etc.

UTF-8 should be using CHAR or NCHAR

If I'm using utf-8 encoding for a character set that would use the character set for non-Cyrillic european languages, can I use varchar/char, or should I use nvarchar/nchar?
is there a huge sql processing time penalty for using nvarchar?
internal data representation for nvarchar is UTF-16. AFAIK you cannot change it, so you better use nvarchar for parameters.
MSSQL doesn't support UTF-8 natively, so use nchar.
You must make sure that your code is consistent about the character sets it inserts as that means the RDBMS won't. MSSQL does support UTF-16, so you might consider using that charset for your application instead.