Update/insert/retrieve accented character in DB? - sql

I am using oracle 12G
when i run #F:\update.sql from sql plus it displays accented character é as junk character when I retrieve from either sqlplus or sql developer
By when run the individual statement from sql plus. Now if I retrieve it from sqlplus, it displays the correct character, but when I retrieve it from sqldeveloper, it again displays the junk character.
update.sql content is this
update employee set name ='é' where id= 1;
What i want is when i run #F:\update.sql , it should insert/update/retrieve it in correct format whether it is from sqlplus or any other tool ?
For information :- when i run
SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER LIKE '%CHARACTERSET%'
i get below information
PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_CHARACTERSET WE8MSWIN1252
NLS_NCHAR_CHARACTERSET AL16UTF16
when i run #.[%NLS_LANG%] from command prompt i see
SP2-0310: unable to open file ".[AMERICAN_AMERICA.WE8MSWIN1252]"

I am not familiar with SQL Developer but I can give solution for SQL*Plus.
Presume you like to work in Windows CP1252
First of all ensure that the file F:\update.sql is saved in CP1252 encoding. Many editors call this encoding ANSI which is the same (let's skip the details about difference between term ANSI and Windows-1252)
Then before you run the script enter
chcp 1252
in order to switch encoding of your cmd.exe to CP1252. By default encoding of cmd.exe is most likely CP850 or CP437 which are different.
Then set NLS_LANG environment variable to character set WE8MSWIN1252, e.g.
set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
After that your script should work fine with SQL*Plus. SQL*Plus inherits the encoding (or "character set", if you prefer this term) from parent cmd.exe. NLS_LANG tells the Oracle driver which character set you are using.
Example Summary:
chcp 1252
set NLS_LANG=.WE8MSWIN1252
sqlplus username/password#db #F:\update.sql
Some notes: In order to set encoding of cmd.exe permanently, see this answer: Unicode characters in Windows command line - how?
NLS_LANG can be set either as Environment Variable or in your Registry at HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 32-bit Oracle Client), resp. HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 64-bit Oracle Client).
For SQL Developer check you options, somewhere it should be possible to define encoding of SQL files.
You are not forced to use Windows-1252. The same works also for other encoding, for example WE8ISO8859P1 (i.e. ISO-8859-1, chcp 28591) or UTF-8. However, in case of UTF-8 your SQL-script may contain characters which are not supported by database character set WE8MSWIN1252. Such characters would be replaced by placeholder (e.g. ¿).

Related

invalid byte sequence for encoding “UTF8”

I am trying to load a 3GB (24 Million rows) csv file to greenplum database using gpload functionality but I keep getting the below error
Error -
invalid byte sequence for encoding "UTF8": 0x8d
I have tried solution provided by Mike but for me, my client_encoding and file encoding are already the same. Both are UNICODE.
Database -
show client_encoding;
"UNICODE"
File -
file my_file_name.csv
my_file_name.csv: UTF-8 Unicode (with BOM) text
I have browsed through Greenplum's documentation as well, which says the encoding of external file and database should match. It is matching in my case yet somehow it is failing.
I have uploaded similar smaller files as well (same UTF-8 Unicode (with BOM) text)
Any help is appreciated !
Posted in another thread - use the iconv command to strip these characters out of your file. Greenplum is instantiated using a character set, UTF-8 by default, and requires that all characters be of the designated character set. You can also choose to log these errors with the LOG ERRORS clause of the EXTERNAL TABLE. This will trap the bad data and allow you to continue up to set LIMIT that you specify during create.
iconv -f utf-8 -t utf-8 -c file.txt
will clean up your UTF-8 file, skipping all the invalid characters.
-f is the source format
-t the target format
-c skips any invalid sequence

SQL Server adding extra special characters in query result

I am trying to extract some records in a file using BCP command in SQL Server. However when the file is generated, there are extract spaces in between the result for each column.
To try I just wrote basic SQL Query as simple as this
select 'ABC', 40, 'TEST','NOTWORKING'
When we copy the output of above query and paste it in Notepad, the output comes as
ABC 40 TEST NOTWORKING
Notice the space between each value? The file that system is generating using BCP command also has same space coming in the output file which is incorrect. What I want to see in the output file is
ABC40TESTNOTWORKING
What must be causing this issue? I am simply amazed to see such weird issue and hoping that it can be fixed by some changes or setting. Please help.
Sample BCP command
EXEC xp_cmdshell 'bcp "select ''ABC'', 40, ''TEST'',''NOTWORKING''" queryout "E:\Testfile.txt" -c -T -S""'
Output in the File - Testfile.txt
ABC 40 TEST NOTWORKING
There are probably tabs between the values. If you want a single value, use concat():
select CONCAT('ABC', 40, 'TEST', 'NOTWORKING')
There's no issue. The command line has no field terminator argument, so the default is used, a tab. That's described in the docs :
-t field_term
Specifies the field terminator. The default is \t (tab character). Use this parameter to override the default field terminator. For more information, see Specify Field and Row Terminators (SQL Server).
If you specify the field terminator in hexadecimal notation in a bcp.exe command, the value will be truncated at 0x00. For example, if you specify 0x410041, 0x41 will be used.
If field_term begins with a hyphen (-) or a forward slash (/), do not include a space between -t and the field_term value.
The link points to an entire article that explains how to use terminators, for each of the bulk operations.
As for the Copy/Paste operation, it has nothing to do with SQL Server. SQL Server has no UI, it's a service. I suspect what was pasted in Notepad was copied from an SSMS grid.
SSMS is a client tool just like any other. When you copy data from it into the clipboard, it decides what to put there and what format to use. That format can be plain text, using spaces and tabs for layout, RTF, HTML etc.
Plain text with tabs as field separators is probably the best choice for any tool, as it preserves the visual layout up to a point and uses only a single character as a separator. A fixed-length layout using spaces could also be used but that would add characters that may well be part of a field.
Encodings and codepages
-c exports the data using the user's default codepage. This means that text stored in varchar fields using a different codepage (collation) may get mangled. Non-visible Unicode characters will also get mangled and appear as something else, or as ?.
-c
Performs the operation using a character data type. This option does not prompt for each field; it uses char as the storage type, without prefixes and with \t (tab character) as the field separator and \r\n (newline character) as the row terminator. -c is not compatible with -w.
It's better to use export the file as UTF16 using -w.
-w
Performs the bulk copy operation using Unicode characters. This option does not prompt for each field; it uses nchar as the storage type, no prefixes, \t (tab character) as the field separator, and \n (newline character) as the row terminator. -w is not compatible with -c.
The codepage can be specified using the -C parameter. -C 1251 for example will export the data using Windows' Latin1 codepage. 1253 will export it using the Greek codepage.
-C { ACP | OEM | RAW | code_page }
Specifies the code page of the data in the data file. code_page is relevant only if the data contains char, varchar, or text columns with character values greater than 127 or less than 32.
SQL Server 2016 and later can also export text as UTF8 with -C 65001. Earlier versions don't support UTF8.
Versions prior to version 13 (SQL Server 2016 (13.x)) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.
All this is described in bcp's online documentation.
This subject is so important for any database that it has an entire section in the docs, that describes data format and considerations, using format files to specify different settings per column, and guidelines to ensure compatibility with other applications

UTL_FILE operations produces special characters in the output file

We've some String values stored in Oracle DB. We're writing these values to a .DAT file . The query snippet in our package looks like below :
Opening file :
l_file := UTL_FILE.fopen(l_dir, l_file_name, 'W');
Writing to file :
UTL_FILE.putf(l_file, ('E' ||';'|| TO_CHAR(l_rownum) ||';'|| TO_CHAR(v_cdr_session_id_seq) ||';' || l_record_data));
String value in the DB : "Owner’s address"
String value in the .DAT file : "Owner’s address"
Question is : How to avoid those special special characters while writing it to an output file?
I assume your database uses character set AL32UTF8 (which is the default nowadays). In such case try this:
l_file := UTL_FILE.FOPEN_NCHAR(l_dir, l_file_name, 'W');
PUT_NCHAR(l_file, ('E;'|| l_rownum ||';'|| v_cdr_session_id_seq||';' ||l_record_data);
Note for function FOPEN_NCHAR: Even though the contents of an NVARCHAR2 buffer may be AL16UTF16 or UTF8 (depending on the national character set of the database), the contents of the file are always read and written in UTF8.
Summarising from comments, your Linux session and Vim configuration are both using UTF-8, but your terminal emulator software is using the Windows-1252 codepage. That renders the 'curly' right-quote mark you have, ’, which is Unicode codepoint 2019, as ’.
You need to change your emulator's configuration from Windows-1252 to UTF-8. Exactly how depends on which emulator you are using. For example, in PuTTY you can change the current session by right-clicking on the window title bar and choosing "Change settings...", then going to the "Window" category and its "Translation" subcategory, and changing the value from the default "Win 252 (Western)" to "UTF-8".
If you have the file open in Vim you can control-L to redraw it. That will only affect the current session though; to make that change permanent you'll need to make the same change to the stored session settings, from the "New session..." dialog - loading your current settings, and remembering to save the changes before actually opening the new session.
Other emulators have similar settings but in different places. For XShell, according to their web site:
You can easily switch to UTF8 encoding by selecting Unicode (UTF8) in the Encoding list on the Standard toolbar.
It looks like you can also set it for a session, or as the default.

Openfire: Offline UTF-8 encoded messages are saved wrong

We use Openfire 3.9.3. Its MySql database uses utf8_persian_ci collation and in openfire.xml we have:
...<defaultProvider>
<driver>com.mysql.jdbc.Driver</driver>
<serverURL>jdbc:mysql://localhost:3306/openfire?useUnicode=true&amp;characterEncoding=UTF-8</serverURL>
<mysql>
<useUnicode>true</useUnicode>
</mysql> ....
The problem is that offline messages which contain Persian characters (UTF-8 encoded) are saved as strings of question marks. For example سلام (means hello in Persian) is stored and showed like ????.
MySQL does not have proper Unicode support, which makes supporting data in non-Western languages difficult. However, the MySQL JDBC driver has a workaround which can be enabled by adding
?useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8
to the URL of the JDBC driver. You can edit the conf/openfire.xml file to add this value.
Note: If the mechanism you use to configure a JDBC URL is XML-based, you will need to use the XML character literal & to separate configuration parameters, as the ampersand is a reserved character for XML.
Also be sure that your DB and tables have utf8 encoding.

PL/SQL Chinese garbled in oracle

My oracle version is 11g,installing on linux. Client is xp.
Now,By PL/SQL to query data and the chinese grabled; like this(Field Name):
In pl/sql execute command:" select userenv('language') from dual;" and show
SIMPLIFIED CHINESE_CHINA.AL32UTF8(I think it is Server-side character set)
So I look at the Windows xp registry: HKEY_LOCAL_MACHINE->SOFTWARE->Oracle->NLS_LANG.
It show:
SIMPLIFIED CHINESE_CHINA.ZHS16GBK (I think it is Client-side character set)
I changed it to
SIMPLIFIED CHINESE_CHINA.AL32UTF8
But the Chinese are still garbled.
And,This "NAME" field should actually show : "北京市".
I execute command:
select dump(name,1016) from MN_C11_SM_S31 where objectid=1;
and show:
Does that mean that the data itself is stored is incorrect?
How should I do?
Supplementary:Just,I used C# code to parse this string by UTF-8:"e58c97e4baace5b882".
and it show: "北京市".I think this proves the data itself is not wrong.
You need to be careful.
SQLplus and OracleSqlDeveloper might actually display Chinese characters incorrectly.
And you need to take an nvarchar (utf8) field for a chinese string.
Try using a C# Windows application to input, insert and display data, it uses unicode internally, so you can at least rule out that bug source. Don't use a console application for display, console applications can't display Unicode characters correctly.
Then, you need to store your string in an nvarchar field (not varchar), and use a parameter of type nvarchar, when you insert your string.
INSERT INTO YOUR_TABLE (newname)
VALUES (:UnicodeString)
command.Parameters.Add (":UnicodeString", OracleType.NVarChar).Value = stringToSave;