Converting a ansi blob data in utf8 db to utf8 characterset - sql

I have a column in utf8 database which contains ansi data.
This occurs as i am migrating from a non utf8 db to a utf8 db.
The column has blob data. My question is how do i convert this data to utf8 character set from ansi?
Is there any specific query i can run to convert the ansi data to utf8 data?
Or any other means which i can use to do the same??
Can someone help me with this.

Since the data is currently US7ASCII, you shouldn't need to do anything to it to convert it to AL32UTF8. The UTF-8 encoding was designed so that characters in the 7-bit ASCII character set have identical binary representations in UTF-8.

Related

Handling chinese characters in SQL Server 2016

Our ETL team is sending us some data with chinese description. When we are loading that data in our SQL Server database, those descriptions are coming up as blank.
We tried changing the column format to nvarchar, but that doesnt help.
Can you please help.
Thanks
You must use the N prefix when dealing with NVARCHAR.
INSERT INTO table (column) VALUES (N'chinese characters')
Prefix a Unicode character string constants with the letter N to
signal UCS-2 or UTF-16 input, depending on whether an SC collation is
used or not. Without the N prefix, the string is converted to the
default code page of the database that may not recognize certain
characters. Starting with SQL Server 2019 preview, when a UTF-8
enabled collation is used, the default code page is capable of storing
UNICODE UTF-8 character set.
Source: https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017

Difference between N'String' vs U'String' literals in Oracle

What is the meaning and difference between these queries?
SELECT U'String' FROM dual;
and
SELECT N'String' FROM dual;
In this answer i will try to provide informations from official resources
(1) The N'' text Literal
N'' is used to convert a string to NCHAR or NVARCHAR2 datatype
According to this Oracle documentation Oracle - Literals
The syntax of text literals is as follows:
where N or n specifies the literal using the national character set (NCHAR or NVARCHAR2 data).
Also in this second article Oracle - Datatypes
The N'String' is used to convert a string to NCHAR datatype
From the article listed above:
The following example compares the translated_description column of the pm.product_descriptions table with a national character set string:
SELECT translated_description FROM product_descriptions
WHERE translated_name = N'LCD Monitor 11/PM';
(2) The U'' Literal
U'' is used to handle the SQL NCHAR String Literals in Oracle Call Interface (OCI)
Based on this Oracle documentation Programming with Unicode
The Oracle Call Interface (OCI) is the lowest level API that the rest of the client-side database access products use. It provides a flexible way for C/C++ programs to access Unicode data stored in SQL CHAR and NCHAR datatypes. Using OCI, you can programmatically specify the character set (UTF-8, UTF-16, and others) for the data to be inserted or retrieved. It accesses the database through Oracle Net.
OCI is the lowest-level API for accessing a database, so it offers the best possible performance.
Handling SQL NCHAR String Literals in OCI
You can switch it on by setting the environment variable ORA_NCHAR_LITERAL_REPLACE to TRUE. You can also achieve this behavior programmatically by using the OCI_NCHAR_LITERAL_REPLACE_ON and OCI_NCHAR_LITERAL_REPLACE_OFF modes in OCIEnvCreate() and OCIEnvNlsCreate(). So, for example, OCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_ON) turns on NCHAR literal replacement, while OCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_OFF) turns it off.
[...] Note that, when the NCHAR literal replacement is turned on, OCIStmtPrepare and OCIStmtPrepare2 will transform N' literals with U' literals in the SQL text and store the resulting SQL text in the statement handle. Thus, if the application uses OCI_ATTR_STATEMENT to retrieve the SQL text from the OCI statement handle, the SQL text will return U' instead of N' as specified in the original text.
(3) Answer for your question
From datatypes perspective, there is not difference between both queries provided
N'string' just returns the string as NCHAR type.
U'string' returns also NCHAR type, however it does additional processing to the string: it replaces \\ with \ and \xxxx with Unicode code point U+xxxx, where xxxx are 4 hexadecimal digits. This is similar to UNISTR('string'), the difference is that the latter returns NVARCHAR2.
U' literals are useful when you want to have a Unicode string independent from encoding and NLS settings.
Example:
select n'\€', u'\\\20ac', n'\\\20ac' from dual;
N'\€' U'\\\20AC' N'\\\20AC'
----- ---------- ----------
\€ \€ \\\20ac
when using N' we denote that given datatype is NCHAR or NVARCHAR.
U' is used to denote unicode
The documented N'' literals are the same as standard character literals ('') except that their data type is NVARCHAR2 and not VARCHAR2. It is important to note that the characters in these literals, together with the entire SQL statement, are converted from the client character set to the database character set when transmitted to the server. All characters from the literals that are not supported by the database character set are lost.
The data type of the undocumented U'' literals is also NVARCHAR2. The content of a U'' literal is interpreted like the input to the SQL UNISTR function. That is, each character sequence \xxxx, where each x is one hex digit, is interpreted as a UTF-16 code point U+xxxx. I am not sure why the U'' literals are undocumented. I can only guess. They are used internally by the NCHAR literal replacement feature, which, when enable on a client, automatically translates N'' literals to U'' literals. This prevents the mentioned data loss due to character set conversion and enables literal Unicode data to be provided for NVARCHAR2 columns even if the database character set is not Unicode.
The two queries in this thread's question are generally not equivalent because the literal text would be interpreted differently. However, if no backslash is present in the literals, no difference can be observed.

DB2 to COBOL String Losing Line Feed and Carriage Returns

i'm trying to grab some data out of a table. The Variable is VARCHAR 30000 and when I use COBOL EXEC SQL to retrieve it, the string is returned but without the necessary (hex) Line Feeds and Carriage Returns. I inspect the string one character at a time looking for the Hex Values of 0A or 0D but they never come up.
The LF and CR seem to be lost as soon as fill the string into my cobol variable.
Ideas?
If the data is stored / converted to ebcdic when retrieved on the mainframe, you should get the EBCDIC New-Line characters x'15' decimal=21 rather than 0A or 0D.
It is only if you are retrieving the data in ASCII / UTF-8 that you would get 0A or 0D.
Most java editors can edit EBCDIC (with the EBCDIC New-Line Character x'15') just as easily as ASCII (with \n), not sure about Eclipse though.
I have seen situations where CR and LF were in data in the database. These are valid characters so it's possible for them to be stored there.
Have you tried to confirm that there really are CR and LF characters in the database using some other tool or method? My Z-Series experience is quite limited, so I'm unable to suggest options. However, there must be some equivalent of SSMS and SQL Server on the Z-Series to query the DB2 database.
Check out this SO link on querying DB2 and cleaning up CR and LF characters.
DB2/iSeries SQL clean up CR/LF, tabs etc
Well, I believe this could be dialect dependent (both COBOL and DB2) but if it were me I would be using FOR BIT DATA on the VARCHAR in the table definition. Your issue could also relate to the code page defined for the database in which the table resides.
I routinely store all kinds of binary, EBCDIC and Unicode data mixed within the same VARCHAR FOR BIT DATA column with no problems, and all you are trying to do is include CR & LF. My approach works in both DB2 z/OS and DB2 LUW.
I hope this helps.

which data type can save bangla Language in sql server?

I want to save bangla Language in sql server. Using which data type I can Do it in sql server 2005 or sql server 2008.
I tried varchar and varbinary type but it cannot save bangla Language.
How is it possible?
You're using SQL_Latin1_General_CP1_CI_AS for your collation, which is suited for the Latin character set (ISO-8859-1). To store characters fromother character sets, you can use the NVARCHAR() which can store the full Unicode range, irrespective of collation - this does mean it will need to be treated as NVARCHAR() all the way, as quoted constants (e.g. N'বাংলা Bangla'), as the data types for parameters to stored procedures, etc.

UTF-8 should be using CHAR or NCHAR

If I'm using utf-8 encoding for a character set that would use the character set for non-Cyrillic european languages, can I use varchar/char, or should I use nvarchar/nchar?
is there a huge sql processing time penalty for using nvarchar?
internal data representation for nvarchar is UTF-16. AFAIK you cannot change it, so you better use nvarchar for parameters.
MSSQL doesn't support UTF-8 natively, so use nchar.
You must make sure that your code is consistent about the character sets it inserts as that means the RDBMS won't. MSSQL does support UTF-16, so you might consider using that charset for your application instead.