Inserting string as regular string in mongodb - pymongo

The pymongo documentation says that BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.
So I understand that to get rid of the Unicode literal 'u', I will have to call json.dumps() on the document returned by the query.
The documentation also says that Regular strings (<type ‘str’>) are validated and stored unaltered. And I am assuming that the query result also throws it back as a regular string and not a Unicode string.
I created a dictionary with regular string types and inserted it in DB and when I retrieve it, I get the strings as Unicode. Any idea on how do I do it? The purpose is to avoid calling json.dumps() on the query result. I need to fetch large number of documents from the DB and json.dumps() is taking quite some time. The strings that I am storing contain ASCII data so I don't need Unicode strings.

The assumption that the regular string is returned back as regular string was not correct. It is stored unaltered and not encoded to UTF-8 because it is already UTF-8. While decoding during the query, everything is converted back to Unicode.
Source:
Automatic string to unicode object conversion
How can I get pymongo to always return str and not unicode?

Related

Base64 decode in db2 sql

Is it possible to decode a Base64 encoded string in DB2 database?
I was able to decode the string in SQL Server by casting to XML.
I am running db2 in Linux server
z/OS:
BASE64ENCODE and BASE64DECODE
Last Updated: 2022-08-23
The BASE64ENCODE and BASE64DECODE helper REST functions complete Base64 encoding or decoding of the provided text.
Tip: The sample HTTP user-defined functions are intended to be used within Db2 SQL applications to access remote non-Db2 REST-based services through SQL statements. Do not confuse them with Db2 native REST services, which supports using a REST-based interface to interact with Db2 data from web, mobile, and cloud applications.
The schema is DB2XML.
text
Specifies the text to encode or decode. For BASE64ENCODE, this argument is provided as a VARCHAR(2732) value and the function returns a Base64-encoded string. For BASE64DECODE, this argument is provided as a Base64-encoded VARCHAR(4096) value and the function returns the data as binary.
IBMi:
BASE64DECODE scalar function Last Updated: 2022-05-03
The BASE64DECODE function returns a character string that has been Base64 decoded. Base64 encoding is widely used to represent binary data as a string.
The schema is SYSTOOLS.
character-string
A character string in CCSID 1208 that is currently Base64 encoded. The length cannot exceed 4096 characters.
The result of the function is a varying length character for bit data string that contains character-string after being Base64 decoded.
Example
Decode a binary string that was originally X'1122334455'. The result is the original value.
VALUES SYSTOOLS.BASE64DECODE('ESIzRFU=');
-- encoding
values regexp_replace (xmlserialize (xmlelement (name "a", blob (X'1122334455')) as varchar (20)), '^<a>(.*)</a>$', '$1')
1
ESIzRFU=
-- decoding (hex function use is just to get a string representation of a binary value)
values hex (xmlcast (xmltext ('ESIzRFU=') as blob (20)))
1
1122334455

How to convert string with German characters to Blob in Firebird?

I want to convert a string into a blob with the f_strblob(CSTRING) function of FreeAdhocUDF. At this point I do not find a way to get my special characters like ß or ä shown in the blob.
The result of f_strblob('Gemäß') is Gem..
I tried to change the character set to UTF8 of my variables, but that does not help.
Is there a masking option which I did not find?
You don't need that function, and the FreeAdhocUDF documentation also marks it as obsolete for that reason.
In a lot of situations, Firebird will automatically convert string literals to blobs (eg in statements where a string literal is assigned to a blob value), and otherwise you can explicitly cast using cast('your string' as blob sub_type text).

Can ASCII arrays be manipulated as arrays without converting to String form?

This is a basic question, but I can't find anything on it, since I don't know what to search — each of my tries have come up with unrelated results.
If I use Text.Encoding.ASCII.GetBytes to convert a string into ASCII, does each byte represent exactly one character? Does the following code work as exactly intended in all circumstances (for all Strings other than the examples)?
Dim t1() As Byte = Text.Encoding.ASCII.GetBytes("Hello ")
Dim t2() As Byte = Text.Encoding.ASCII.GetBytes("World")
Dim msg As String = Text.Encoding.ASCII.GetString(t1.Concat(t2).ToArray)
Now msg should be "Hello World".
I would like this to work as I don't want to have to convert data I receive back to Strings in order to manipulate it before it is sent again.
What if I used something other than ASCII (like UTF-8, for example)?
If I use Text.Encoding.ASCII.GetBytes to convert a string into ASCII, does each byte represent exactly one character?
Yes. ASCII is a 7bit encoding, it does not support multi-byte characters. Any Unicode codepoint above U-007F will get converted to a ? character in ASCII.
If you were to use UTF-7 instead, for instance, it can encode individual Unicode codepoints into a sequence of multiple ASCII characters.
Does the following code work as exactly intended in all circumstances (for all Strings other than the examples)?
In your particular example, yes (provided you are using LINQ's Concat() method - there are other ways to concat arrays together). There is no data loss.
But for other examples, just know that you will have data loss if you convert non-ASCII characters to ASCII, or otherwise mismatch encodings between GetBytes() and GetString().
You can certainly manipulate byte arrays. Just make sure the arrays are in the same encoding if you merge them together.
.NET strings are counted sequences of UTF-16 code units (char), one or two of which encode a Unicode codepoint (int Char.ConvertToUtf32 ). Some codepoints are "combining characters", which when applied to a preceding "base character" form a grapheme (which is then rendered by a font into a glyph).
An encoder from Unicode to an encoding of another character set should attempt to preserve graphemes. In .NET, a grapheme is called a "text element."
So, yes, you can combine encoded byte sequences as long as you haven't defeated the encoder by converting parts of a grapheme into different byte sequences. If you are breaking a string into two before encoding, see TextElementEnumerator and StringInfo class.

Convert EBCDIC to UTF8 in vb.net

I want to convert an ebcdic string to a utf8 string. I use the below online tool to test this, which is very good, for conversion related stuffs.
http://kanjidict.stc.cx/recode.php
I want to convert £ˆ‰¢‰¢¤£†ø which is in EBCDIC to UTF8 string, which is thisisutf8 You can use the above link to test.
I refered the below article on how to read EBCDIC data in .net
http://www.codeproject.com/KB/vb/EasyEbcdicToAscii.aspx
Then I used the same method to read the ebcdic data
Dim encoding As System.Text.Encoding = _
System.Text.Encoding.GetEncoding(37)
However I am not getting the expected data
Here is my code, to get the result into a atring, a.
Dim a As String = System.Text.Encoding.UTF8.GetString(System.Text.Encoding.GetEncoding(37).GetBytes("£ˆ‰¢‰¢¤£†ø"))
I want to convert £ˆ‰¢‰¢¤£†ø
That's too late, looks like you already converted it to a string. You'll need to read bytes instead. Then use Encoding.GetString() to convert the ebcdic encoded bytes to a .NET string. Then use Encoding.UTF8.GetBytes() to convert that back to bytes.
Alternatively, and more practically, use the StreamReader(string, Encoding) overload to open the ebcdic file, passing the ebcdic Encoding. And StreamWriter(string) to write it back, it already defaults to utf-8.

unicode escapes in objective-c

I have a string "Artîsté". I use json_encode from PHP on it and I get "Art\u00eest\u00e9".
How do I convert that to an NSString? I have tried many things and none of them work I always end up getting Artîsté
For Example:
NSString stringWithUTF8String:"Art\u00c3\u00aest\u00c3\u00a9"];//Artîsté
#"Art\u00c3\u00aest\u00c3\u00a9"; //Artîsté
You can use CFStringCreateFromExternalRepresentation with the kCFStringEncodingNonLossyASCII encoding to parse the \uXXXX escape sequences. Check out my answer here:
Converting escaped UTF8 characters back to their original form
The problem is your input string:
"Art\u00c3\u00aest\u00c3\u00a9"
does in fact literally mean "Artîsté". \u00c3 is 'Ã', \u00ae is '®', and \u00a9 is '©'.
Whatever is producing your input string is receiving UTF-8 input but expecting something else (e.g., cp1252, ISO-8859-1, or ISO-8859-15)