I'm using SQL loader to load a data file into an Oracle table. The database has a character set of US7ASCII. Some of the records in the data file contain European special characters such as é or ü or î. When they are loaded into the table, the results are weird with 2 strange icons (like black diamond with an arrow inside).
Is there anyway to get SQL loader to load the nearest English equivalent instead? So the é would be loaded as e, the ü would be loaded as u and the î would be loaded as i?
First establish what is actually being stored. Use the Oracle SQL DUMP function, e.g. SELECT dump ( colname, 16 ) from table.
This will return the HEX code for the data in colname.
If the hex code matches the hex code (od -h) of the input then the data is stored correctly, and the issue you have is translation on display. See my answer below for how to deal with this:
when insert persian character in oracle db i see the question mark
The more likely issue you have is your database character US7ASCII cannot cope with extended character sets.
Related
I have an application that it's storing tweets in a DB2 database, and need to retrieve them in some moments. I'm having troubles showing text string with emojis inside (some emojis loose the format).
I've been reading different answers in internet, but most are for MySQL (switch from utf8 to utf8mb4), but nothing for DB2...
Is there any way to do something like the following in DB2 databases?
https://mathiasbynens.be/notes/mysql-utf8mb4
Thanks too much
You can use Unicode constants like this in a Db2 Unicode database
$ db2 "values U&'\+01F600'"
1
----
😀
1 record(s) selected.
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000731.html
U& followed by a sequence of characters that starts and ends with a string delimiter and that is optionally followed by the UESCAPE clause. This form of a character string constant is also called a Unicode string constant.
A character can be expressed by either its typographical character (glyph) or its Unicode code point. The code point of a Unicode character ranges from X'000000' to X'10FFFF'.
To express a Unicode character through its code point, use the Unicode escape character followed by 4 hexadecimal digits, or the Unicode escape character followed by a plus sign (+) and 6 hexadecimal digits. The default Unicode escape character is the reverse solidus ()
or you can use UTF-8 HEX values if you prefer
db2 "values x'F09F9880'"
1
----
😀
1 record(s) selected.
Could you clarify what is the issue? With UTF-8 database there is no issues with the linked example
$ db2 "create table emoji(string_with_emjoi varchar(32))"
DB20000I The SQL command completed successfully.
$ db2 "insert into emoji values 'foo𝌆bar'"
DB20000I The SQL command completed successfully.
$ db2 "select string_with_emjoi, hex(string_with_emjoi) string_with_emjoi_hex from emoji"
STRING_WITH_EMJOI STRING_WITH_EMJOI_HEX
-------------------------------- ----------------------------------------------------------------
foo𝌆bar 666F6FF09D8C86626172
Code point for emjoi is stored with 4 bytes (0xF09D8C86). If you have an issue displaying the emoji after retrieval you need to dig a bit deeper and see what is the actual value returned by the database - problem very well might be in the application itself.
I have a db2 database where I store names containing special characters. When I try to retrieve them with an internal software, I get proper results. However when I tried to do the same with queries or look into the db, the characters are stored strangely.
The documentation says that the encoding is utf-8 latin1.
My query looks something like this:
SELECT firstn, lastn
FROM unams
WHERE unamid = 12345
The user with the given ID has some special characters in his/her name: é and ó, but the query returns it as Ă© and Ăł.
Is there a way to convert the characters back to their original form with using some simple SQL function? I am new to databases and encoding, trying to understand the latter by reading this but I'm quite lost.
EDIT: Currently sending queries via SPSS Modeler with a proper ODBC driver, the database lies on a Windows Server 2016
Per the comments, the solution was to create a Windows environment variable DB2CODEPAGE=1208 , then restart, then drop and re-populate the tables.
If the applications runs locally on the Db2-server (i.e. only one hostname is involved) then the same variable can be set. This will impact all local applications that use the UTF-8 encoded database.
If the application runs remotely from the Db2-server (i.e. two hostnames are involved) then set the variable on the workstation and on the Windows Db2-server.
Current versions of IBM supplied Db2-clients on Windows will derive their codepage from the regional settings which might not always render Unicode characters correctly, so using the DB2CODEPAGE=1208 forces the Db2-client CLI drivers to use a Unicode application code page to override this.
with t (firstn) as (
values ('éó')
--SELECT firstn
--FROM unams
--WHERE unamid = 12345
)
select x.c, hex(x.c) c_hes
from
t
, xmltable('for $id in (1 to string-length($s)) return <i>{substring($s, $id, 1)}</i>'
passing t.firstn as "s" columns tok varchar(6) path '.') x(c);
C C_HEX
- -----
é C3A9
ó C3B3
The query above converts the string of characters to a table with each character (C) and its hex representation (C_HEX) in each row.
You can run it as is to check if you get the same output. It must be as described for a UTF-8 database.
Now try to comment out the line with values ('éó') and uncomment the select statement returning some row with these special characters.
If you see the same hex representation of these characters stored in the firstn column, then this means, that the string is stored appropriately, but your client tool (SPSS Modeller) can't show these characters correctly due to some reason (wrong font, for example).
Our ETL team is sending us some data with chinese description. When we are loading that data in our SQL Server database, those descriptions are coming up as blank.
We tried changing the column format to nvarchar, but that doesnt help.
Can you please help.
Thanks
You must use the N prefix when dealing with NVARCHAR.
INSERT INTO table (column) VALUES (N'chinese characters')
Prefix a Unicode character string constants with the letter N to
signal UCS-2 or UTF-16 input, depending on whether an SC collation is
used or not. Without the N prefix, the string is converted to the
default code page of the database that may not recognize certain
characters. Starting with SQL Server 2019 preview, when a UTF-8
enabled collation is used, the default code page is capable of storing
UNICODE UTF-8 character set.
Source: https://learn.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017
i'm trying to grab some data out of a table. The Variable is VARCHAR 30000 and when I use COBOL EXEC SQL to retrieve it, the string is returned but without the necessary (hex) Line Feeds and Carriage Returns. I inspect the string one character at a time looking for the Hex Values of 0A or 0D but they never come up.
The LF and CR seem to be lost as soon as fill the string into my cobol variable.
Ideas?
If the data is stored / converted to ebcdic when retrieved on the mainframe, you should get the EBCDIC New-Line characters x'15' decimal=21 rather than 0A or 0D.
It is only if you are retrieving the data in ASCII / UTF-8 that you would get 0A or 0D.
Most java editors can edit EBCDIC (with the EBCDIC New-Line Character x'15') just as easily as ASCII (with \n), not sure about Eclipse though.
I have seen situations where CR and LF were in data in the database. These are valid characters so it's possible for them to be stored there.
Have you tried to confirm that there really are CR and LF characters in the database using some other tool or method? My Z-Series experience is quite limited, so I'm unable to suggest options. However, there must be some equivalent of SSMS and SQL Server on the Z-Series to query the DB2 database.
Check out this SO link on querying DB2 and cleaning up CR and LF characters.
DB2/iSeries SQL clean up CR/LF, tabs etc
Well, I believe this could be dialect dependent (both COBOL and DB2) but if it were me I would be using FOR BIT DATA on the VARCHAR in the table definition. Your issue could also relate to the code page defined for the database in which the table resides.
I routinely store all kinds of binary, EBCDIC and Unicode data mixed within the same VARCHAR FOR BIT DATA column with no problems, and all you are trying to do is include CR & LF. My approach works in both DB2 z/OS and DB2 LUW.
I hope this helps.
How can I insert Arabic characters into a SQL Server database? I tried to insert Arabic data into a table and the Arabic characters in the insert script were inserted as '??????' in the table.
I tried to directly paste the data into the table through SQL Server Management Studio and the Arabic characters was successfully and accurately inserted.
I looked around for resolutions for this problems and some threads suggested changing the datatype to nvarchar instead of varchar. I tried this as well but without any luck.
How can we insert Arabic characters into SQL Server database?
For the field to be able to store unicode characters, you have to use the type nvarchar (or other similar like ntext, nchar).
To insert the unicode characters in the database you have to send the text as unicode by using a parameter type like nvarchar / SqlDbType.NVarChar.
(For completeness: if you are creating SQL dynamically (against common advice), you put an N before a string literal to make it unicode. For example: insert into table (name) values (N'Pavan').)
Guess the solation is first turn on the field to ntext then write N with the value. For example
insert into eng(Name) values(N'حسن')
If you are trying to load data directly into the database like me, I found a great way to do so by creating a table using Excel and then export as CSV. Then I used the database browser SQLite to import the data correctly into the SQL database. You can then adjust the table properties if needed. Hope this would help.