How to declare a SQL INSERT Statement with a Unicode letter [duplicate] - sql

This question already has an answer here:
Can not insert German characters in Postgres
(1 answer)
Closed 9 years ago.
I have a sql statemwent, which contain a unicode specific sign. The unicode sign is ę in the polish word Przesunięcie. Please look at the following SQL INSERT Statement:
INSERT INTO res_bundle_props (res_bundle_id, value, name)
VALUES(2, 'Przesunięcie przystanku', 'category.test');
I work with the Postgres Database. In which way can i insert the polish word with the unicode letter?

Find what are the server and client encodings:
show server_encoding;
server_encoding
-----------------
UTF8
show client_encoding;
client_encoding
-----------------
UTF8
Then set the client to the same encoding as the server:
set client_encoding = 'UTF8';
SET

No special syntax is required so long as:
Your server_encoding includes those characters (if it's utf-8 it does);
Your client_encoding includes those characters;
Your client_encoding correctly matches the encoding of the bytes you're actually sending
The latter is the one that often trips people up. They think they can just change client_encoding with a SET client_encoding statement and it'll do some kind of magical conversion. That is not the case. client_encoding tells PostgreSQL "this is the encoding of the data you will receive from the client, and the encoding that the client expects to receive from you".
Setting client_encoding to utf-8 doesn't make the client actually send UTF-8. That depends on the client. Nor do you have to send utf-8; that string can also be represented in iso-8859-2, iso-8859-4 and iso-8859-10 among other encodings.
What's crucial is that you tell the server the encoding of the data you're sending. As it happens that string is the same in all three of the encodings mentioned, with the ę encoded as 0xae... but in utf-8 that'd be the two bytes 0xc4 0x99. If you send utf-8 to the server and tell it that it's iso-8859-2 the server can't tell you're wrong and will interpret it as Ä in iso-8859-2.
So... really, it depends on things like the system's default encoding, the encoding of any files/streams you're reading data from, etc. You have two options:
Set client_encoding appropriately for the data you're working with and the default display locale of the system. This is easiest for simple cases, but harder when dealing with multiple different encodings in input or output.
Set client_encoding to utf-8 (or the same as server_encoding) and make sure that you always convert all input data into the encoding you set client_encoding to before sending it. You must also convert all data you receive from Pg back.

Related

Why does psql throw ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding

SQL shell(PostgreSQL) often throw this error, e.g. when I forget to close brackets or when I try to drop table DROP TABLE book;. I do not know why.
ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding "WIN1252"
I have already seen similar questions. But switch client encoding kind of: \encoding UTF8 do not help... After, I have error, like:
DzD"D~D`DsD?: D_Ñ^D,D±DºD° Ñ?D,D½Ñ,D°DºÑ?D,Ñ?D° (D¿Ñ?D,D¼DµÑ?D½D_Dµ D¿D_D»D_D¶DµD½D,Dµ: ")") LINE 1: ); ^
The strings that can't be displayed properly are error messages translated into the language configured with lc_messages.
In your case it's a language with Cyrillic characters, for instance it could be Russian (see the result of show lc_messages; in SQL).
If the client side has an incompatible character set, such as win1252 which is meant for Western European characters, it can't display the Cyrillic characters.
To illustrate the problem, first let's display when it's properly configured on both sides.
postgres=# \encoding ISO_8859_5
postgres=# set lc_messages to 'ru_RU.iso88595';
SET
postgres=# drop table does_not_exist;
ОШИБКА: таблица "does_not_exist" не существует
Now let's switch to a win1252 encoding:
postgres=# \encoding WIN1252
postgres=# drop table does_not_exist;
ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding "WIN1252"
You get the same error as in the question, because the russian error text when attempting to drop a non-existing table contains U+041E "CYRILLIC CAPITAL LETTER O", a character that doesn't exist in WIN1252.
The solution is to use a client-side configuration that is compatible with Cyrillic, such as UTF-8 (recommended as it supports all languages) or the mono-byte encodings win1251 or iso-8859-5.
Alternatively, some installations prefer to configure the server to output untranslated error messages (set lc_messages to C for instance).
Be aware that both your terminal configuration and the psql client_encoding must be set together and be compatible. If you do \encoding UTF-8 but your terminal is still configured for a mono-byte encoding like win1252 you'll get the kind of garbled output that you mention in the last part of question.
Example for a wrong mix of terminal set to iso-8859-1 (western european), a psql client-side encoding forced to UTF8, and messages with Cyrillic characters:
postgres=# \encoding
UTF8
postgres=# show lc_messages ;
lc_messages
----------------
ru_RU.iso88595
postgres=# drop table does_not_exist;
ÐÐаблОÑа "does_not_exist" Ме ÑÑÑ ÐµÑÑвÑеÑ

How to import UTF8 table in another encoding(win1251, SQL_ASCII) with COPY()?

Prehistory: Hello, i saw many questions about encoding in postgres, but.
I have UFT8 table, and i'm using COPY function to import that table in CSV, and i need to make COPY with different encodings like WIN1251 and SQL_ASCII.
Problem: When in table i have characters that not supported in WIN1251/SQL_ASCII, i will got classic error
character with byte sequence 0xe7 0xb0 0xab in encoding "UTF8" has no equivalent in encoding "WIN1251"
I tried using "set client_encoding/ convert / convert_to" - no success.
Main question: Is there any way to do this without error using sql?
There is simply no way to convert 簫 into Windows-1252, so you can forget about that.
If you set the client encoding to SQL_ASCII, you will be able to load the data into an SQL_ASCII database, but that is of little use, since the database does not recognize it as a character, but three meaningless bytes above 127.

Encoding issue in Postgres ERROR "UTF8" is it best to set encoding to UTF8 or to make the data WIN1252 compatible?

I created a table importing a CSV file from an excel spreadsheet. When I try to run the select statement below I get the error.
test=# SELECT * FROM dt_master;
ERROR: character with byte sequence 0xc2 0x9d in encoding "UTF8" has no equivalent in encoding "WIN1252"
I have read the solution posted in this stack overflow post and was able to overcome the issue by setting the encoding to UTF8, so up to that point I am still able to keep working with the data. My question, however, is whether setting the encoding to UTF8 actually is solving the problem or it is just a workaround that and will create other problems down the road and I would be better off removing the conflicting characters and making the data WIN1252 compliant.
Thank you
You have a weird character in your database (Unicode code point 9D, a control character) that probably got there by mistake.
You have to set the client encoding to the encoding that your application expects; no other value will produce correct results, even if you get rid of the error. The error has a reason.
You have two choices:
Fix the data in the database. The character is very likely not what was intended.
Change the application to use LATIN1 or (better) UTF-8 internally and set the client encoding appropriately.
Using UTF-8 everywhere would have the advantage that you are safe from this kind of problem.

Encoding type for polish characters

I have json string which is having characters which exist in polish. example below
"Reno Truck Lachowski & Łuczak - NAPRAWY CHŁODNI,IZOTERM,ZABUDÓW POJAZDÓW CIĘŻAROWYCH"
or
"RENO TRUCK Lachowski & Łuczak s.c. SERWIS POJAZDÓW UZYTKOWYCH"
I need to update this value in database .
Can anyone let me know what is encoding type i need to set..
I tried with UTF-8 and ISO-8859-1, but both doesn't work .
Observed that when i set ISO-8859-1 the value seems to different as below
"RENO TRUCK Lachowski & ?uczak s.c. SERWIS POJAZDÓW UZYTKOWYCH"
The character Ł doesn't get updated.
Can anyone help please..
JSON values are expected to be encoded in UTF-8. The string you quoted seems to be encoded in something else. You are expected to know the encoding of the data. Note that it may not be a valid JSON if it is not UTF-8. Once you know it you could use DataWeave to convert the encoding to what your database is expect. Based on the JDBC URL it seems that the database connection is expecting ISO-8859-1.

Openfire: Offline UTF-8 encoded messages are saved wrong

We use Openfire 3.9.3. Its MySql database uses utf8_persian_ci collation and in openfire.xml we have:
...<defaultProvider>
<driver>com.mysql.jdbc.Driver</driver>
<serverURL>jdbc:mysql://localhost:3306/openfire?useUnicode=true&amp;characterEncoding=UTF-8</serverURL>
<mysql>
<useUnicode>true</useUnicode>
</mysql> ....
The problem is that offline messages which contain Persian characters (UTF-8 encoded) are saved as strings of question marks. For example سلام (means hello in Persian) is stored and showed like ????.
MySQL does not have proper Unicode support, which makes supporting data in non-Western languages difficult. However, the MySQL JDBC driver has a workaround which can be enabled by adding
?useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8
to the URL of the JDBC driver. You can edit the conf/openfire.xml file to add this value.
Note: If the mechanism you use to configure a JDBC URL is XML-based, you will need to use the XML character literal & to separate configuration parameters, as the ampersand is a reserved character for XML.
Also be sure that your DB and tables have utf8 encoding.