UTF-8 should be using CHAR or NCHAR - sql

If I'm using utf-8 encoding for a character set that would use the character set for non-Cyrillic european languages, can I use varchar/char, or should I use nvarchar/nchar?
is there a huge sql processing time penalty for using nvarchar?

internal data representation for nvarchar is UTF-16. AFAIK you cannot change it, so you better use nvarchar for parameters.

MSSQL doesn't support UTF-8 natively, so use nchar.
You must make sure that your code is consistent about the character sets it inserts as that means the RDBMS won't. MSSQL does support UTF-16, so you might consider using that charset for your application instead.

Related

How to query BLOB as CLOB in H2 database

To project BLOB into CLOB in Oracle I can do this query:
SELECT ent.ID, to_clob(ent.blob_string) from entity_1 ent;
However, I could not find the to_clob equivalent operation in H2 to see my data in H2-console. How I can do that?
It depends on content of your BLOB. In H2 Console you actually can see BLOB and other binary values as is in hexadecimal representation without any additional functions around them.
You can use CAST(ent.blob_string AS VARCHAR) (or CAST(ent.blob_string AS CLOB)) to convert a binary string to a character string explicitly, but such conversion uses different encodings in different versions of H2. Old versions use hexadecimal representation, new versions use UTF-8. You can use UTF8TOSTRING(ent.blob_string) function for UTF-8 conversion in old and new versions. There is also RAWTOHEX(ent.blob_string) function, but its behavior is also different between different versions of H2 and its compatibility modes.

How to change charset of table/column - Oracle

I have the following table:
Can i change charset of column SCHEMA_ERD to UTF-8?
If it isn't possible maybe i can change charset only for that table?
Can someone give me example how to do it?
I want to keep emojis inside that SCHEMA_ERD that is why i need change charset to UTF-8.
As far as I can tell, in Oracle, character set is database's characteristic. It means that you should change character set for the whole database (i.e. you can't do that for only one table (or column, or schema).
However, you could use NVARCHAR2 or NCLOB so I suggest you to try it. Because, changing character set is not that simple task.
Issue this query to see your available character sets in your database
select * from nls_database_parameters
where parameter in ( 'NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
PARAMETER VALUE
------------------------- ----------
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_CHARACTERSET WE8ISO8859P1
The character set in NLS_CHARACTERSET is what is used for normal VARCHAR2.
The NCHAR character set is used for NVARCHAR2 or NCLOB.
You have no other chance wihtout support of your DBA
But note, that the NCHAR character set is typically set to support UTF8 or UTF16 so it should work for you.

Difference between N'String' vs U'String' literals in Oracle

What is the meaning and difference between these queries?
SELECT U'String' FROM dual;
and
SELECT N'String' FROM dual;
In this answer i will try to provide informations from official resources
(1) The N'' text Literal
N'' is used to convert a string to NCHAR or NVARCHAR2 datatype
According to this Oracle documentation Oracle - Literals
The syntax of text literals is as follows:
where N or n specifies the literal using the national character set (NCHAR or NVARCHAR2 data).
Also in this second article Oracle - Datatypes
The N'String' is used to convert a string to NCHAR datatype
From the article listed above:
The following example compares the translated_description column of the pm.product_descriptions table with a national character set string:
SELECT translated_description FROM product_descriptions
WHERE translated_name = N'LCD Monitor 11/PM';
(2) The U'' Literal
U'' is used to handle the SQL NCHAR String Literals in Oracle Call Interface (OCI)
Based on this Oracle documentation Programming with Unicode
The Oracle Call Interface (OCI) is the lowest level API that the rest of the client-side database access products use. It provides a flexible way for C/C++ programs to access Unicode data stored in SQL CHAR and NCHAR datatypes. Using OCI, you can programmatically specify the character set (UTF-8, UTF-16, and others) for the data to be inserted or retrieved. It accesses the database through Oracle Net.
OCI is the lowest-level API for accessing a database, so it offers the best possible performance.
Handling SQL NCHAR String Literals in OCI
You can switch it on by setting the environment variable ORA_NCHAR_LITERAL_REPLACE to TRUE. You can also achieve this behavior programmatically by using the OCI_NCHAR_LITERAL_REPLACE_ON and OCI_NCHAR_LITERAL_REPLACE_OFF modes in OCIEnvCreate() and OCIEnvNlsCreate(). So, for example, OCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_ON) turns on NCHAR literal replacement, while OCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_OFF) turns it off.
[...] Note that, when the NCHAR literal replacement is turned on, OCIStmtPrepare and OCIStmtPrepare2 will transform N' literals with U' literals in the SQL text and store the resulting SQL text in the statement handle. Thus, if the application uses OCI_ATTR_STATEMENT to retrieve the SQL text from the OCI statement handle, the SQL text will return U' instead of N' as specified in the original text.
(3) Answer for your question
From datatypes perspective, there is not difference between both queries provided
N'string' just returns the string as NCHAR type.
U'string' returns also NCHAR type, however it does additional processing to the string: it replaces \\ with \ and \xxxx with Unicode code point U+xxxx, where xxxx are 4 hexadecimal digits. This is similar to UNISTR('string'), the difference is that the latter returns NVARCHAR2.
U' literals are useful when you want to have a Unicode string independent from encoding and NLS settings.
Example:
select n'\€', u'\\\20ac', n'\\\20ac' from dual;
N'\€' U'\\\20AC' N'\\\20AC'
----- ---------- ----------
\€ \€ \\\20ac
when using N' we denote that given datatype is NCHAR or NVARCHAR.
U' is used to denote unicode
The documented N'' literals are the same as standard character literals ('') except that their data type is NVARCHAR2 and not VARCHAR2. It is important to note that the characters in these literals, together with the entire SQL statement, are converted from the client character set to the database character set when transmitted to the server. All characters from the literals that are not supported by the database character set are lost.
The data type of the undocumented U'' literals is also NVARCHAR2. The content of a U'' literal is interpreted like the input to the SQL UNISTR function. That is, each character sequence \xxxx, where each x is one hex digit, is interpreted as a UTF-16 code point U+xxxx. I am not sure why the U'' literals are undocumented. I can only guess. They are used internally by the NCHAR literal replacement feature, which, when enable on a client, automatically translates N'' literals to U'' literals. This prevents the mentioned data loss due to character set conversion and enables literal Unicode data to be provided for NVARCHAR2 columns even if the database character set is not Unicode.
The two queries in this thread's question are generally not equivalent because the literal text would be interpreted differently. However, if no backslash is present in the literals, no difference can be observed.

which data type can save bangla Language in sql server?

I want to save bangla Language in sql server. Using which data type I can Do it in sql server 2005 or sql server 2008.
I tried varchar and varbinary type but it cannot save bangla Language.
How is it possible?
You're using SQL_Latin1_General_CP1_CI_AS for your collation, which is suited for the Latin character set (ISO-8859-1). To store characters fromother character sets, you can use the NVARCHAR() which can store the full Unicode range, irrespective of collation - this does mean it will need to be treated as NVARCHAR() all the way, as quoted constants (e.g. N'বাংলা Bangla'), as the data types for parameters to stored procedures, etc.

Converting a ansi blob data in utf8 db to utf8 characterset

I have a column in utf8 database which contains ansi data.
This occurs as i am migrating from a non utf8 db to a utf8 db.
The column has blob data. My question is how do i convert this data to utf8 character set from ansi?
Is there any specific query i can run to convert the ansi data to utf8 data?
Or any other means which i can use to do the same??
Can someone help me with this.
Since the data is currently US7ASCII, you shouldn't need to do anything to it to convert it to AL32UTF8. The UTF-8 encoding was designed so that characters in the 7-bit ASCII character set have identical binary representations in UTF-8.