What is the encoding used in encode function with escape in postgres? - sql

I've a bytea stored in the database , I'm trying to convert it to string via the encode function and it works :
select encode(blobvalue,'escape') from myTable ;
However the following is failing :
select convert_from(blobvalue,'UTF8') from myTable ;
ERROR: invalid byte sequence for encoding "UTF8": 0xac
my server encoding is UTF8:
SHOW SERVER_ENCODING;
server_encoding
-----------------
UTF8
(1 row)
any explanation why the encode is working but the convert_from not ? isn't the encoding used in encode with escape same as server encoding ?

encode() will only convert ASCII bytes to characters. Everything else will be an escaped octal value:
SELECT encode('\x4142AC43', 'escape');
encode
═════════
AB\254C
(1 row)
Here, the byte 0xAC is not an ASCII character and is rendered as \254.
convert_from(), on the other hand, considers the bytea as a string in a certain encoding and will fail if it finds bytes that are not correct:
SELECT convert_from('\x4142AC43', 'UTF8');
ERROR: invalid byte sequence for encoding "UTF8": 0xac
So you have to figure out in which encoding your bytea value actually is.

Related

Base64 decode in db2 sql

Is it possible to decode a Base64 encoded string in DB2 database?
I was able to decode the string in SQL Server by casting to XML.
I am running db2 in Linux server
z/OS:
BASE64ENCODE and BASE64DECODE
Last Updated: 2022-08-23
The BASE64ENCODE and BASE64DECODE helper REST functions complete Base64 encoding or decoding of the provided text.
Tip: The sample HTTP user-defined functions are intended to be used within Db2 SQL applications to access remote non-Db2 REST-based services through SQL statements. Do not confuse them with Db2 native REST services, which supports using a REST-based interface to interact with Db2 data from web, mobile, and cloud applications.
The schema is DB2XML.
text
Specifies the text to encode or decode. For BASE64ENCODE, this argument is provided as a VARCHAR(2732) value and the function returns a Base64-encoded string. For BASE64DECODE, this argument is provided as a Base64-encoded VARCHAR(4096) value and the function returns the data as binary.
IBMi:
BASE64DECODE scalar function Last Updated: 2022-05-03
The BASE64DECODE function returns a character string that has been Base64 decoded. Base64 encoding is widely used to represent binary data as a string.
The schema is SYSTOOLS.
character-string
A character string in CCSID 1208 that is currently Base64 encoded. The length cannot exceed 4096 characters.
The result of the function is a varying length character for bit data string that contains character-string after being Base64 decoded.
Example
Decode a binary string that was originally X'1122334455'. The result is the original value.
VALUES SYSTOOLS.BASE64DECODE('ESIzRFU=');
-- encoding
values regexp_replace (xmlserialize (xmlelement (name "a", blob (X'1122334455')) as varchar (20)), '^<a>(.*)</a>$', '$1')
1
ESIzRFU=
-- decoding (hex function use is just to get a string representation of a binary value)
values hex (xmlcast (xmltext ('ESIzRFU=') as blob (20)))
1
1122334455

Why does psql throw ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding

SQL shell(PostgreSQL) often throw this error, e.g. when I forget to close brackets or when I try to drop table DROP TABLE book;. I do not know why.
ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding "WIN1252"
I have already seen similar questions. But switch client encoding kind of: \encoding UTF8 do not help... After, I have error, like:
DzD"D~D`DsD?: D_Ñ^D,D±DºD° Ñ?D,D½Ñ,D°DºÑ?D,Ñ?D° (D¿Ñ?D,D¼DµÑ?D½D_Dµ D¿D_D»D_D¶DµD½D,Dµ: ")") LINE 1: ); ^
The strings that can't be displayed properly are error messages translated into the language configured with lc_messages.
In your case it's a language with Cyrillic characters, for instance it could be Russian (see the result of show lc_messages; in SQL).
If the client side has an incompatible character set, such as win1252 which is meant for Western European characters, it can't display the Cyrillic characters.
To illustrate the problem, first let's display when it's properly configured on both sides.
postgres=# \encoding ISO_8859_5
postgres=# set lc_messages to 'ru_RU.iso88595';
SET
postgres=# drop table does_not_exist;
ОШИБКА: таблица "does_not_exist" не существует
Now let's switch to a win1252 encoding:
postgres=# \encoding WIN1252
postgres=# drop table does_not_exist;
ERROR: character with byte sequence 0xd0 0x9e in encoding "UTF8" has no equivalent in encoding "WIN1252"
You get the same error as in the question, because the russian error text when attempting to drop a non-existing table contains U+041E "CYRILLIC CAPITAL LETTER O", a character that doesn't exist in WIN1252.
The solution is to use a client-side configuration that is compatible with Cyrillic, such as UTF-8 (recommended as it supports all languages) or the mono-byte encodings win1251 or iso-8859-5.
Alternatively, some installations prefer to configure the server to output untranslated error messages (set lc_messages to C for instance).
Be aware that both your terminal configuration and the psql client_encoding must be set together and be compatible. If you do \encoding UTF-8 but your terminal is still configured for a mono-byte encoding like win1252 you'll get the kind of garbled output that you mention in the last part of question.
Example for a wrong mix of terminal set to iso-8859-1 (western european), a psql client-side encoding forced to UTF8, and messages with Cyrillic characters:
postgres=# \encoding
UTF8
postgres=# show lc_messages ;
lc_messages
----------------
ru_RU.iso88595
postgres=# drop table does_not_exist;
ÐÐаблОÑа "does_not_exist" Ме ÑÑÑ ÐµÑÑвÑеÑ

Convert escaped unicode character to unicode notation

I have a DB2 LUW table with escaped unicode characters in it.
I want to convert this to a real unicode string.
$ db2 "select loc_longtext from fh01tq07 where loc_longtext like '%\\u%'"
LOC_LONGTEXT
------------
S\u00e4ule
After long time try and error I'm at this point:
$ db2 "select loc_longtext, xmlquery('fn:replace(\$LOC_LONGTEXT,''\\\u([0-9a-f]{1,4})'',''&#x\$1;'')') from fh01tq07 where loc_longtext like '%\\u%'"
SQL16002N An XQuery expression has an unexpected token "&#x" following "]{1,4})','". Expected tokens may include: "<". Error QName=err:XPST0003. SQLSTATE=10505
But fn:normalize-unicode requests this type of escaped unicode format.
Any suggestions?

Inserting string as regular string in mongodb

The pymongo documentation says that BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.
So I understand that to get rid of the Unicode literal 'u', I will have to call json.dumps() on the document returned by the query.
The documentation also says that Regular strings (<type ‘str’>) are validated and stored unaltered. And I am assuming that the query result also throws it back as a regular string and not a Unicode string.
I created a dictionary with regular string types and inserted it in DB and when I retrieve it, I get the strings as Unicode. Any idea on how do I do it? The purpose is to avoid calling json.dumps() on the query result. I need to fetch large number of documents from the DB and json.dumps() is taking quite some time. The strings that I am storing contain ASCII data so I don't need Unicode strings.
The assumption that the regular string is returned back as regular string was not correct. It is stored unaltered and not encoded to UTF-8 because it is already UTF-8. While decoding during the query, everything is converted back to Unicode.
Source:
Automatic string to unicode object conversion
How can I get pymongo to always return str and not unicode?

How to generate a GUID string in the same format as Advantage Database Server?

I have an advantage database V11 in which I currently store a unique identifier which is a GUID string generated 22 characters long using NewIDString('F').
The snippet from the ADS help is as follows :-
"F or File – A GUID encoded as a 22-byte string using File and URL Safe base64 encoding with the format of xxxxxxxxxxxxxxxxxxxxxx. Base64 encoded strings are case sensitive and should not be stored in case insensitive string fields."
Is it possible to generate the same format GUID string in Microsoft SQL?
If not, the ADS Help file lists other formats as follows :-
NEWIDSTRING
Returns a GUID formatted as a string. If the format parameter is not specified, the GUID string is formatted as a hexadecimal string with the following pattern xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. The format parameter can be the following values:
M or MIME – A GUID encoded as a 24-byte string using MIME base64 encoding with the format of xxxxxxxxxxxxxxxxxxxxxxxx. Base64 encoded strings are case sensitive and should not be stored in case insensitive string fields.
F or File – A GUID encoded as a 22-byte string using File and URL Safe base64 encoding with the format of xxxxxxxxxxxxxxxxxxxxxx. Base64 encoded strings are case sensitive and should not be stored in case insensitive string fields.
N or Numbers – A GUID encoded as a 32-byte hexadecimal string value with a format of xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
D or Delimited – A GUID encoded as a 32-byte hexadecimal string with a format of xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.
B or Bracketed – A GUID encoded as a 32-byte hexadecimal string with a format of [xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx]
P or Parenthesis – A GUID encoded as a 32-byte hexadecimal string with a format of (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
C or Curlybraces – A GUID encoded as a 32-byte hexadecimal string with a format of {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
Can MS-SQL generate one of these?
Regards
Mike
In SQL SerVer the NEWID function creates new GUIDs (the SQL Server data type is called UNIQUEIDENTIFIER). A GUID is a binary value, the various formats you mention in your question are just that, merely different string representations of the binary value. In the MSDN page for NEWID example A shows the standard string representation of a GUID as XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.
Creationg BASE64 encoded strings in SQL-Server is not straight forward, but possible.
So everything you need to create a GUID in the form File format from ADS can be produced by SQL-Server.
NEWIDSTRING(M):
-- Generate a GUID in the format "1Jg/YZ7tNkG7PQr1wVe7KA=="
SELECT
CAST(N'' AS XML).value(
'xs:base64Binary(xs:hexBinary(sql:column("bin")))'
, 'VARCHAR(MAX)'
) AS "NEWIDSTRING_M"
FROM (
SELECT CAST(NEWID() AS VARBINARY) AS bin
) AS bin_sql_server_temp;
NEWIDSTRING(F):
-- Generate a GUID in the format "1Jg/YZ7tNkG7PQr1wVe7KA"
SELECT
LEFT(CAST(N'' AS XML).value(
'xs:base64Binary(xs:hexBinary(sql:column("bin")))'
, 'VARCHAR(MAX)'
), 22) AS "NEWIDSTRING_F"
FROM (
SELECT CAST(NEWID() AS VARBINARY) AS bin
) AS bin_sql_server_temp;
Where F is just M with the last two characters (which are always "==") removed.