How to store emoji characters in DB2 database? - sql

I have an application that it's storing tweets in a DB2 database, and need to retrieve them in some moments. I'm having troubles showing text string with emojis inside (some emojis loose the format).
I've been reading different answers in internet, but most are for MySQL (switch from utf8 to utf8mb4), but nothing for DB2...
Is there any way to do something like the following in DB2 databases?
https://mathiasbynens.be/notes/mysql-utf8mb4
Thanks too much

You can use Unicode constants like this in a Db2 Unicode database
$ db2 "values U&'\+01F600'"
1
----
😀
1 record(s) selected.
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000731.html
U& followed by a sequence of characters that starts and ends with a string delimiter and that is optionally followed by the UESCAPE clause. This form of a character string constant is also called a Unicode string constant.
A character can be expressed by either its typographical character (glyph) or its Unicode code point. The code point of a Unicode character ranges from X'000000' to X'10FFFF'.
To express a Unicode character through its code point, use the Unicode escape character followed by 4 hexadecimal digits, or the Unicode escape character followed by a plus sign (+) and 6 hexadecimal digits. The default Unicode escape character is the reverse solidus ()
or you can use UTF-8 HEX values if you prefer
db2 "values x'F09F9880'"
1
----
😀
1 record(s) selected.

Could you clarify what is the issue? With UTF-8 database there is no issues with the linked example
$ db2 "create table emoji(string_with_emjoi varchar(32))"
DB20000I The SQL command completed successfully.
$ db2 "insert into emoji values 'foo𝌆bar'"
DB20000I The SQL command completed successfully.
$ db2 "select string_with_emjoi, hex(string_with_emjoi) string_with_emjoi_hex from emoji"
STRING_WITH_EMJOI STRING_WITH_EMJOI_HEX
-------------------------------- ----------------------------------------------------------------
foo𝌆bar 666F6FF09D8C86626172
Code point for emjoi is stored with 4 bytes (0xF09D8C86). If you have an issue displaying the emoji after retrieval you need to dig a bit deeper and see what is the actual value returned by the database - problem very well might be in the application itself.

Related

How to fix character encoding in sql query

I have a db2 database where I store names containing special characters. When I try to retrieve them with an internal software, I get proper results. However when I tried to do the same with queries or look into the db, the characters are stored strangely.
The documentation says that the encoding is utf-8 latin1.
My query looks something like this:
SELECT firstn, lastn
FROM unams
WHERE unamid = 12345
The user with the given ID has some special characters in his/her name: é and ó, but the query returns it as Ă© and Ăł.
Is there a way to convert the characters back to their original form with using some simple SQL function? I am new to databases and encoding, trying to understand the latter by reading this but I'm quite lost.
EDIT: Currently sending queries via SPSS Modeler with a proper ODBC driver, the database lies on a Windows Server 2016
Per the comments, the solution was to create a Windows environment variable DB2CODEPAGE=1208 , then restart, then drop and re-populate the tables.
If the applications runs locally on the Db2-server (i.e. only one hostname is involved) then the same variable can be set. This will impact all local applications that use the UTF-8 encoded database.
If the application runs remotely from the Db2-server (i.e. two hostnames are involved) then set the variable on the workstation and on the Windows Db2-server.
Current versions of IBM supplied Db2-clients on Windows will derive their codepage from the regional settings which might not always render Unicode characters correctly, so using the DB2CODEPAGE=1208 forces the Db2-client CLI drivers to use a Unicode application code page to override this.
with t (firstn) as (
values ('éó')
--SELECT firstn
--FROM unams
--WHERE unamid = 12345
)
select x.c, hex(x.c) c_hes
from
t
, xmltable('for $id in (1 to string-length($s)) return <i>{substring($s, $id, 1)}</i>'
passing t.firstn as "s" columns tok varchar(6) path '.') x(c);
C C_HEX
- -----
é C3A9
ó C3B3
The query above converts the string of characters to a table with each character (C) and its hex representation (C_HEX) in each row.
You can run it as is to check if you get the same output. It must be as described for a UTF-8 database.
Now try to comment out the line with values ('éó') and uncomment the select statement returning some row with these special characters.
If you see the same hex representation of these characters stored in the firstn column, then this means, that the string is stored appropriately, but your client tool (SPSS Modeller) can't show these characters correctly due to some reason (wrong font, for example).

Handling chinese characters in SQL Server 2016

Our ETL team is sending us some data with chinese description. When we are loading that data in our SQL Server database, those descriptions are coming up as blank.
We tried changing the column format to nvarchar, but that doesnt help.
Can you please help.
Thanks
You must use the N prefix when dealing with NVARCHAR.
INSERT INTO table (column) VALUES (N'chinese characters')
Prefix a Unicode character string constants with the letter N to
signal UCS-2 or UTF-16 input, depending on whether an SC collation is
used or not. Without the N prefix, the string is converted to the
default code page of the database that may not recognize certain
characters. Starting with SQL Server 2019 preview, when a UTF-8
enabled collation is used, the default code page is capable of storing
UNICODE UTF-8 character set.
Source: https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017

DB2 to COBOL String Losing Line Feed and Carriage Returns

i'm trying to grab some data out of a table. The Variable is VARCHAR 30000 and when I use COBOL EXEC SQL to retrieve it, the string is returned but without the necessary (hex) Line Feeds and Carriage Returns. I inspect the string one character at a time looking for the Hex Values of 0A or 0D but they never come up.
The LF and CR seem to be lost as soon as fill the string into my cobol variable.
Ideas?
If the data is stored / converted to ebcdic when retrieved on the mainframe, you should get the EBCDIC New-Line characters x'15' decimal=21 rather than 0A or 0D.
It is only if you are retrieving the data in ASCII / UTF-8 that you would get 0A or 0D.
Most java editors can edit EBCDIC (with the EBCDIC New-Line Character x'15') just as easily as ASCII (with \n), not sure about Eclipse though.
I have seen situations where CR and LF were in data in the database. These are valid characters so it's possible for them to be stored there.
Have you tried to confirm that there really are CR and LF characters in the database using some other tool or method? My Z-Series experience is quite limited, so I'm unable to suggest options. However, there must be some equivalent of SSMS and SQL Server on the Z-Series to query the DB2 database.
Check out this SO link on querying DB2 and cleaning up CR and LF characters.
DB2/iSeries SQL clean up CR/LF, tabs etc
Well, I believe this could be dialect dependent (both COBOL and DB2) but if it were me I would be using FOR BIT DATA on the VARCHAR in the table definition. Your issue could also relate to the code page defined for the database in which the table resides.
I routinely store all kinds of binary, EBCDIC and Unicode data mixed within the same VARCHAR FOR BIT DATA column with no problems, and all you are trying to do is include CR & LF. My approach works in both DB2 z/OS and DB2 LUW.
I hope this helps.

Oracle SQL loader: Convert foreign chars to English

I'm using SQL loader to load a data file into an Oracle table. The database has a character set of US7ASCII. Some of the records in the data file contain European special characters such as é or ü or î. When they are loaded into the table, the results are weird with 2 strange icons (like black diamond with an arrow inside).
Is there anyway to get SQL loader to load the nearest English equivalent instead? So the é would be loaded as e, the ü would be loaded as u and the î would be loaded as i?
First establish what is actually being stored. Use the Oracle SQL DUMP function, e.g. SELECT dump ( colname, 16 ) from table.
This will return the HEX code for the data in colname.
If the hex code matches the hex code (od -h) of the input then the data is stored correctly, and the issue you have is translation on display. See my answer below for how to deal with this:
when insert persian character in oracle db i see the question mark
The more likely issue you have is your database character US7ASCII cannot cope with extended character sets.

Losing special characters on insert

I am using oracle 11g and trying to insert a string containing special UTF8 characters eg '(ε- c'. The NLS character sets for the databse are...
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_CHARACTERSET WE8ISO8859P1
when I copy and paste the above string into a NVARCHAR field it works fine.
if I execute the below I get an upside down question mark in the field
insert into title_debug values ('(ε- c');
where title debug table consists of a single NVARCHAR2(100) field called title.
I have attempted to assign this string to a NVARCHAR2(100) variable then iserting this. And also attempted all the different CAST / CONVERT ect functions I can find and nothing is working.
Any assistance would be greatly appreciated.
UPDATE
I have executed
select dump(title, 1016), dump(title1, 1016)
into v_title, v_title1
from dual
where title is the string passed in as a varchar and title1 is the string passed in as a NVarchar.
Unsuprisingly the encodings come through as WE8ISO8859P1 and AL16UTF16. but on both the ε comes through as hex 'BF'. This is the upside down Question mark.
My only thought left is to try and pass this through as a raw and then do something with it. However I have not yet been able to figure out how to convert the string into a acceptable format with XQuery (OSB).
Continued thanks for assistance.
Our DBA found the solution to this issue. The answer lay in a setting on the dbc connection on the bus to tell it to convert utf8 to NChar.
On The connection pool page add the following lines to the Properties box.
oracle.jdbc.convertNcharLiterals=true
oracle.jdbc.defaultNchar=true
this will allow you to be able to insert into NVarchar2 fields while maintaining the utf8 characters.
Cheers
I did a test with '(ε- c'. And i have not encountered any problems.
If you can i advice you to change your character set for :
NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
And Oracle recommendation for all new deployment is the Unicode character set AL32UTF8.
Because it's flexible globalization support and is a universal character set
reg,
First verify the data is being stored correctly, then use the correct NLS_LANG settings. See my answer to this question:
when insert persian character in oracle db i see the question mark