PostgreSQL: nonstandard use of escape string - sql

I have a PostgreSQL 8.4 database that is being queried by an application that is outside of my control. Queries such as the following are throwing warnings but are working...
SELECT "tagname","tagindex","tagtype","tagdatatype" FROM "tagtable" WHERE "tagname" = 'Lift_Stations\07\ETMs\Generator_ETM'
However, the same query for stations 08 and 09 are failing...
SELECT "tagname","tagindex","tagtype","tagdatatype" FROM "tagtable" WHERE "tagname" = 'Lift_Stations\08\ETMs\Generator_ETM'
WARNING: nonstandard use of escape in a string literal LINE 2:
...,"tagdatatype" FROM "tagtable" WHERE "tagname" = 'Lift_Stat...
^ HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: invalid byte sequence for encoding "UTF8": 0x00 HINT: This
error can also happen if the byte sequence does not match the encoding
expected by the server, which is controlled by "client_encoding".
*** Error ***
ERROR: invalid byte sequence for encoding "UTF8": 0x00 SQL state:
22021 Hint: This error can also happen if the byte sequence does not
match the encoding expected by the server, which is controlled by
"client_encoding".
I know the problem is incorrect escaping, but given the fact that 08 and 09 are the only ones not working, I'm hoping someone might have a bright idea on how to work around this.
Thanks!

It should work if you enable standard_conforming_strings.

Related

Import sql file invalid byte sequence for encoding "UTF8": 0x80

i try to import a SQL file in my Rails app with postgresql database but when i run ActiveRecord::Base.connection.execute(IO.read("tmp/FILE.SQL"))
I got this error (PG::CharacterNotInRepertoire: ERROR: invalid byte sequence for encoding "UTF8": 0x80
I never found answer here with 0x80 error code
When i check with file command i got this Non-ISO extended-ASCII text, with very long lines (334), with CRLF line terminators
I can't change the sql file because it's from client so parsing of the file without import can be another solution if the problem from the file
Any chance that your data has the Euro symbol within? Character 0x80 is € in Win-1252 character set. If that's what's going on then try this method of converting to UTF-8:
ActiveRecord::Base.connection.execute(File.read('tmp/FILE.SQL', encoding: 'cp1252').encode('utf-8'))

Why does psql throw ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding

SQL shell(PostgreSQL) often throw this error, e.g. when I forget to close brackets or when I try to drop table DROP TABLE book;. I do not know why.
ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding "WIN1252"
I have already seen similar questions. But switch client encoding kind of: \encoding UTF8 do not help... After, I have error, like:
DzD"D~D`DsD?: D_Ñ^D,D±DºD° Ñ?D,D½Ñ,D°DºÑ?D,Ñ?D° (D¿Ñ?D,D¼DµÑ?D½D_Dµ D¿D_D»D_D¶DµD½D,Dµ: ")") LINE 1: ); ^
The strings that can't be displayed properly are error messages translated into the language configured with lc_messages.
In your case it's a language with Cyrillic characters, for instance it could be Russian (see the result of show lc_messages; in SQL).
If the client side has an incompatible character set, such as win1252 which is meant for Western European characters, it can't display the Cyrillic characters.
To illustrate the problem, first let's display when it's properly configured on both sides.
postgres=# \encoding ISO_8859_5
postgres=# set lc_messages to 'ru_RU.iso88595';
SET
postgres=# drop table does_not_exist;
ОШИБКА: таблица "does_not_exist" не существует
Now let's switch to a win1252 encoding:
postgres=# \encoding WIN1252
postgres=# drop table does_not_exist;
ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding "WIN1252"
You get the same error as in the question, because the russian error text when attempting to drop a non-existing table contains U+041E "CYRILLIC CAPITAL LETTER O", a character that doesn't exist in WIN1252.
The solution is to use a client-side configuration that is compatible with Cyrillic, such as UTF-8 (recommended as it supports all languages) or the mono-byte encodings win1251 or iso-8859-5.
Alternatively, some installations prefer to configure the server to output untranslated error messages (set lc_messages to C for instance).
Be aware that both your terminal configuration and the psql client_encoding must be set together and be compatible. If you do \encoding UTF-8 but your terminal is still configured for a mono-byte encoding like win1252 you'll get the kind of garbled output that you mention in the last part of question.
Example for a wrong mix of terminal set to iso-8859-1 (western european), a psql client-side encoding forced to UTF8, and messages with Cyrillic characters:
postgres=# \encoding
UTF8
postgres=# show lc_messages ;
lc_messages
----------------
ru_RU.iso88595
postgres=# drop table does_not_exist;
ÐÐаблОÑа "does_not_exist" Ме ÑÑÑ ÐµÑÑвÑеÑ

ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding "WIN1252"

I've got a PSQL guidebook which says that first thing you should do before start usin' psql shell is to set an encoding and it offers to type chcp 1251 to allow us to type ";". So it doesn't work, but SET client_encoding TO 'UTF8' works. Could someone explain me why, please?
I had such an error due to the presence of the Cyrillic character "M" in the request text.

OCaml: error in list function

Hello I am working on list of function in Ocaml I am getting this error why is that?
Error: Syntax error
# let headOf lst=
match lst with
|??[ ] -> failwith "harun"
Warning 3: deprecated: ISO-Latin1 characters in identifiers
If your input encoding is ISO-LATIN-1, then what you're doing is typing non-breaking spaces sometimes. These are non-ASCII characters that look like spaces, with a character code of 160. You should remove them all and replace with ordinary spaces (character code 32).
If you're using an input system that sometimes inputs non-breaking spaces without your specifically asking for them, you should use a different input system for working with OCaml :-)
Update
In fact, my input system inputs a non-breaking space if I type Option-Space (iTerm2 on macOS 10.12.4). It looks like this:
# let f x��= 14;;
Warning 3: deprecated: ISO-Latin1 characters in identifiers
Error: Illegal character (\160)
The solution in my case is never to type Option-Space. Just type space (no option key).

Removing replacement character � from column

Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.
How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains �, which throws the error "The string contains an untranslatable character".
Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.
SELECT DISTINCT AIRCFT_POSITN_ID,
REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL
WHERE DFRL_CREATE_TMS > CURRENT_DATE -25
Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition).
If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.
And in case the character is indeed part of the data and not just an indication for encoding translations issues:
The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.
you cannot use it directly -
select '�';
-- [6706] The string contains an untranslatable character.
select '1A'XC;
-- [6706] The string contains an untranslatable character.
If you are using version 14.0 or above you can generate it with the CHR function:
select chr(26);
If you're below version 14.0 you can generate it like this:
select translate (_unicode '05D0'XC using unicode_to_latin with error);
Once you have generated the character you can now use it with REPLACE or OTRANSLATE
create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);
insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));
select * from t;
-- Hello ���� world ����
select otranslate (txt,chr(26),'') from t;
-- Hello world
select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;
-- Hello world
BTW, there are 2 versions for OTRANSLATE and OREPLACE:
The functions under syslib works with LATIN.
the functions under TD_SYSFNLIB works with UNICODE.
In addition to Dudu's excellent answer above, I wanted to add the following now that I've encountered the issue again and had more time to experiment. The following SELECT command produced an untranslatable character:
SELECT IDENTIFY FROM PROD_MAE_MNTNC_VW.SCHD_MNTNC;
IDENTIFY
24FEB1747659193DC330A163DCL�ORD
Trying to perform a REGEXP_REPLACE or OREPLACE directly on this character produces an error:
Failed [6706 : HY000] The string contains an untranslatable character.
I changed the CHARSET property in my Teradata connection from UTF8 to ASCII and I could now see the offending character, looks like a tab
IDENTIFY
Using the TRANSLATE_CHK command using this specific conversion succeeds and identifies the position of the offending character (Note that this does not work using the UTF8 charset):
TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) AS BADCHAR
BADCHAR
28
Now this character can be dealt with using some CASE statements to remove the bad character and retain the remainder of the string:
CASE WHEN TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) = 0 THEN IDENTIFY
ELSE SUBSTR(IDENTIFY, 1, TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE)-1)
END AS IDENTIFY
Hopes this helps someone out.