Openfire: Offline UTF-8 encoded messages are saved wrong - openfire

We use Openfire 3.9.3. Its MySql database uses utf8_persian_ci collation and in openfire.xml we have:
...<defaultProvider>
<driver>com.mysql.jdbc.Driver</driver>
<serverURL>jdbc:mysql://localhost:3306/openfire?useUnicode=true&amp;characterEncoding=UTF-8</serverURL>
<mysql>
<useUnicode>true</useUnicode>
</mysql> ....
The problem is that offline messages which contain Persian characters (UTF-8 encoded) are saved as strings of question marks. For example سلام (means hello in Persian) is stored and showed like ????.

MySQL does not have proper Unicode support, which makes supporting data in non-Western languages difficult. However, the MySQL JDBC driver has a workaround which can be enabled by adding
?useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8
to the URL of the JDBC driver. You can edit the conf/openfire.xml file to add this value.
Note: If the mechanism you use to configure a JDBC URL is XML-based, you will need to use the XML character literal & to separate configuration parameters, as the ampersand is a reserved character for XML.
Also be sure that your DB and tables have utf8 encoding.

Related

How to import UTF8 table in another encoding(win1251, SQL_ASCII) with COPY()?

Prehistory: Hello, i saw many questions about encoding in postgres, but.
I have UFT8 table, and i'm using COPY function to import that table in CSV, and i need to make COPY with different encodings like WIN1251 and SQL_ASCII.
Problem: When in table i have characters that not supported in WIN1251/SQL_ASCII, i will got classic error
character with byte sequence 0xe7 0xb0 0xab in encoding "UTF8" has no equivalent in encoding "WIN1251"
I tried using "set client_encoding/ convert / convert_to" - no success.
Main question: Is there any way to do this without error using sql?
There is simply no way to convert 簫 into Windows-1252, so you can forget about that.
If you set the client encoding to SQL_ASCII, you will be able to load the data into an SQL_ASCII database, but that is of little use, since the database does not recognize it as a character, but three meaningless bytes above 127.

Encoding issue in Postgres ERROR "UTF8" is it best to set encoding to UTF8 or to make the data WIN1252 compatible?

I created a table importing a CSV file from an excel spreadsheet. When I try to run the select statement below I get the error.
test=# SELECT * FROM dt_master;
ERROR: character with byte sequence 0xc2 0x9d in encoding "UTF8" has no equivalent in encoding "WIN1252"
I have read the solution posted in this stack overflow post and was able to overcome the issue by setting the encoding to UTF8, so up to that point I am still able to keep working with the data. My question, however, is whether setting the encoding to UTF8 actually is solving the problem or it is just a workaround that and will create other problems down the road and I would be better off removing the conflicting characters and making the data WIN1252 compliant.
Thank you
You have a weird character in your database (Unicode code point 9D, a control character) that probably got there by mistake.
You have to set the client encoding to the encoding that your application expects; no other value will produce correct results, even if you get rid of the error. The error has a reason.
You have two choices:
Fix the data in the database. The character is very likely not what was intended.
Change the application to use LATIN1 or (better) UTF-8 internally and set the client encoding appropriately.
Using UTF-8 everywhere would have the advantage that you are safe from this kind of problem.

Update/insert/retrieve accented character in DB?

I am using oracle 12G
when i run #F:\update.sql from sql plus it displays accented character é as junk character when I retrieve from either sqlplus or sql developer
By when run the individual statement from sql plus. Now if I retrieve it from sqlplus, it displays the correct character, but when I retrieve it from sqldeveloper, it again displays the junk character.
update.sql content is this
update employee set name ='é' where id= 1;
What i want is when i run #F:\update.sql , it should insert/update/retrieve it in correct format whether it is from sqlplus or any other tool ?
For information :- when i run
SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER LIKE '%CHARACTERSET%'
i get below information
PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_CHARACTERSET WE8MSWIN1252
NLS_NCHAR_CHARACTERSET AL16UTF16
when i run #.[%NLS_LANG%] from command prompt i see
SP2-0310: unable to open file ".[AMERICAN_AMERICA.WE8MSWIN1252]"
I am not familiar with SQL Developer but I can give solution for SQL*Plus.
Presume you like to work in Windows CP1252
First of all ensure that the file F:\update.sql is saved in CP1252 encoding. Many editors call this encoding ANSI which is the same (let's skip the details about difference between term ANSI and Windows-1252)
Then before you run the script enter
chcp 1252
in order to switch encoding of your cmd.exe to CP1252. By default encoding of cmd.exe is most likely CP850 or CP437 which are different.
Then set NLS_LANG environment variable to character set WE8MSWIN1252, e.g.
set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
After that your script should work fine with SQL*Plus. SQL*Plus inherits the encoding (or "character set", if you prefer this term) from parent cmd.exe. NLS_LANG tells the Oracle driver which character set you are using.
Example Summary:
chcp 1252
set NLS_LANG=.WE8MSWIN1252
sqlplus username/password#db #F:\update.sql
Some notes: In order to set encoding of cmd.exe permanently, see this answer: Unicode characters in Windows command line - how?
NLS_LANG can be set either as Environment Variable or in your Registry at HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 32-bit Oracle Client), resp. HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 64-bit Oracle Client).
For SQL Developer check you options, somewhere it should be possible to define encoding of SQL files.
You are not forced to use Windows-1252. The same works also for other encoding, for example WE8ISO8859P1 (i.e. ISO-8859-1, chcp 28591) or UTF-8. However, in case of UTF-8 your SQL-script may contain characters which are not supported by database character set WE8MSWIN1252. Such characters would be replaced by placeholder (e.g. ¿).

invalid escape character "\\" was specified in a LIKE predicate

i'm getting this error message with my query.. i'm using openjpa and sql server for my database...
this is my query:
public static List<ProcesosGeneralEntity> getALLbyProductor(String productor){
Query q = entityManager.createQuery("select a from ProcesosGeneralEntity a where a.productor like :productor ");
q.setParameter("productor", '%'+productor+'%');
List<ProcesosGeneralEntity>resultado=q.getResultList();
List<ProcesosGeneralEntity>result2=new ArrayList<ProcesosGeneralEntity>(resultado);
return result2;
}
Just in case my comment above is correct, I am submitting this as a tentative or possible answer.
Different database software behaves slightly differently, the ANSI SQL Standard does not cover all behavioral quirks of SQL, and so things like escape characters in strings differ between implementations. In SQL Server, escaping a quote mark is done with another quote mark, so to print the string "Alice's dog", one needs to use use, in SQL, the string 'Alice''s dog'. In Oracle Database, the escape character is a backslash, so to print that same string, you instead use 'Alice\'s dog'. Escape characters themselves need to be able to be printed, so in Oracle Database, to print the string "R2\D2", you need to enter the string 'R2\\D2'.
The problem you are having appears to be that it thinks it is talking to an Oracle Database, and thus defaulted to the latter behavior, and used \\ to quote a single \, instead of leaving it be. SQL Server then had a hiccup on this option or some-such. I'm not sure why it threw it back, to be honest.
Regardless, according to the OpenJPA manual's section 4. Database Support - Chapter 4. JDBC you need to specify the correct DBDictionary. The DBDictionary specifies settings like which escape characters to use in which cases, and other non-standard options that are not uniform across all database systems supported.
The solution appears to be that in the configuration file for your software, you must specify something like:
<property name="openjpa.jdbc.DBDictionary" value="sqlserver"/>

How to declare a SQL INSERT Statement with a Unicode letter [duplicate]

This question already has an answer here:
Can not insert German characters in Postgres
(1 answer)
Closed 9 years ago.
I have a sql statemwent, which contain a unicode specific sign. The unicode sign is ę in the polish word Przesunięcie. Please look at the following SQL INSERT Statement:
INSERT INTO res_bundle_props (res_bundle_id, value, name)
VALUES(2, 'Przesunięcie przystanku', 'category.test');
I work with the Postgres Database. In which way can i insert the polish word with the unicode letter?
Find what are the server and client encodings:
show server_encoding;
server_encoding
-----------------
UTF8
show client_encoding;
client_encoding
-----------------
UTF8
Then set the client to the same encoding as the server:
set client_encoding = 'UTF8';
SET
No special syntax is required so long as:
Your server_encoding includes those characters (if it's utf-8 it does);
Your client_encoding includes those characters;
Your client_encoding correctly matches the encoding of the bytes you're actually sending
The latter is the one that often trips people up. They think they can just change client_encoding with a SET client_encoding statement and it'll do some kind of magical conversion. That is not the case. client_encoding tells PostgreSQL "this is the encoding of the data you will receive from the client, and the encoding that the client expects to receive from you".
Setting client_encoding to utf-8 doesn't make the client actually send UTF-8. That depends on the client. Nor do you have to send utf-8; that string can also be represented in iso-8859-2, iso-8859-4 and iso-8859-10 among other encodings.
What's crucial is that you tell the server the encoding of the data you're sending. As it happens that string is the same in all three of the encodings mentioned, with the ę encoded as 0xae... but in utf-8 that'd be the two bytes 0xc4 0x99. If you send utf-8 to the server and tell it that it's iso-8859-2 the server can't tell you're wrong and will interpret it as Ä in iso-8859-2.
So... really, it depends on things like the system's default encoding, the encoding of any files/streams you're reading data from, etc. You have two options:
Set client_encoding appropriately for the data you're working with and the default display locale of the system. This is easiest for simple cases, but harder when dealing with multiple different encodings in input or output.
Set client_encoding to utf-8 (or the same as server_encoding) and make sure that you always convert all input data into the encoding you set client_encoding to before sending it. You must also convert all data you receive from Pg back.