I want to convert an ebcdic string to a utf8 string. I use the below online tool to test this, which is very good, for conversion related stuffs.
http://kanjidict.stc.cx/recode.php
I want to convert £ˆ‰¢‰¢¤£†ø which is in EBCDIC to UTF8 string, which is thisisutf8 You can use the above link to test.
I refered the below article on how to read EBCDIC data in .net
http://www.codeproject.com/KB/vb/EasyEbcdicToAscii.aspx
Then I used the same method to read the ebcdic data
Dim encoding As System.Text.Encoding = _
System.Text.Encoding.GetEncoding(37)
However I am not getting the expected data
Here is my code, to get the result into a atring, a.
Dim a As String = System.Text.Encoding.UTF8.GetString(System.Text.Encoding.GetEncoding(37).GetBytes("£ˆ‰¢‰¢¤£†ø"))
I want to convert £ˆ‰¢‰¢¤£†ø
That's too late, looks like you already converted it to a string. You'll need to read bytes instead. Then use Encoding.GetString() to convert the ebcdic encoded bytes to a .NET string. Then use Encoding.UTF8.GetBytes() to convert that back to bytes.
Alternatively, and more practically, use the StreamReader(string, Encoding) overload to open the ebcdic file, passing the ebcdic Encoding. And StreamWriter(string) to write it back, it already defaults to utf-8.
Related
Is it possible to decode a Base64 encoded string in DB2 database?
I was able to decode the string in SQL Server by casting to XML.
I am running db2 in Linux server
z/OS:
BASE64ENCODE and BASE64DECODE
Last Updated: 2022-08-23
The BASE64ENCODE and BASE64DECODE helper REST functions complete Base64 encoding or decoding of the provided text.
Tip: The sample HTTP user-defined functions are intended to be used within Db2 SQL applications to access remote non-Db2 REST-based services through SQL statements. Do not confuse them with Db2 native REST services, which supports using a REST-based interface to interact with Db2 data from web, mobile, and cloud applications.
The schema is DB2XML.
text
Specifies the text to encode or decode. For BASE64ENCODE, this argument is provided as a VARCHAR(2732) value and the function returns a Base64-encoded string. For BASE64DECODE, this argument is provided as a Base64-encoded VARCHAR(4096) value and the function returns the data as binary.
IBMi:
BASE64DECODE scalar function Last Updated: 2022-05-03
The BASE64DECODE function returns a character string that has been Base64 decoded. Base64 encoding is widely used to represent binary data as a string.
The schema is SYSTOOLS.
character-string
A character string in CCSID 1208 that is currently Base64 encoded. The length cannot exceed 4096 characters.
The result of the function is a varying length character for bit data string that contains character-string after being Base64 decoded.
Example
Decode a binary string that was originally X'1122334455'. The result is the original value.
VALUES SYSTOOLS.BASE64DECODE('ESIzRFU=');
-- encoding
values regexp_replace (xmlserialize (xmlelement (name "a", blob (X'1122334455')) as varchar (20)), '^<a>(.*)</a>$', '$1')
1
ESIzRFU=
-- decoding (hex function use is just to get a string representation of a binary value)
values hex (xmlcast (xmltext ('ESIzRFU=') as blob (20)))
1
1122334455
This is a basic question, but I can't find anything on it, since I don't know what to search — each of my tries have come up with unrelated results.
If I use Text.Encoding.ASCII.GetBytes to convert a string into ASCII, does each byte represent exactly one character? Does the following code work as exactly intended in all circumstances (for all Strings other than the examples)?
Dim t1() As Byte = Text.Encoding.ASCII.GetBytes("Hello ")
Dim t2() As Byte = Text.Encoding.ASCII.GetBytes("World")
Dim msg As String = Text.Encoding.ASCII.GetString(t1.Concat(t2).ToArray)
Now msg should be "Hello World".
I would like this to work as I don't want to have to convert data I receive back to Strings in order to manipulate it before it is sent again.
What if I used something other than ASCII (like UTF-8, for example)?
If I use Text.Encoding.ASCII.GetBytes to convert a string into ASCII, does each byte represent exactly one character?
Yes. ASCII is a 7bit encoding, it does not support multi-byte characters. Any Unicode codepoint above U-007F will get converted to a ? character in ASCII.
If you were to use UTF-7 instead, for instance, it can encode individual Unicode codepoints into a sequence of multiple ASCII characters.
Does the following code work as exactly intended in all circumstances (for all Strings other than the examples)?
In your particular example, yes (provided you are using LINQ's Concat() method - there are other ways to concat arrays together). There is no data loss.
But for other examples, just know that you will have data loss if you convert non-ASCII characters to ASCII, or otherwise mismatch encodings between GetBytes() and GetString().
You can certainly manipulate byte arrays. Just make sure the arrays are in the same encoding if you merge them together.
.NET strings are counted sequences of UTF-16 code units (char), one or two of which encode a Unicode codepoint (int Char.ConvertToUtf32 ). Some codepoints are "combining characters", which when applied to a preceding "base character" form a grapheme (which is then rendered by a font into a glyph).
An encoder from Unicode to an encoding of another character set should attempt to preserve graphemes. In .NET, a grapheme is called a "text element."
So, yes, you can combine encoded byte sequences as long as you haven't defeated the encoder by converting parts of a grapheme into different byte sequences. If you are breaking a string into two before encoding, see TextElementEnumerator and StringInfo class.
The pymongo documentation says that BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.
So I understand that to get rid of the Unicode literal 'u', I will have to call json.dumps() on the document returned by the query.
The documentation also says that Regular strings (<type ‘str’>) are validated and stored unaltered. And I am assuming that the query result also throws it back as a regular string and not a Unicode string.
I created a dictionary with regular string types and inserted it in DB and when I retrieve it, I get the strings as Unicode. Any idea on how do I do it? The purpose is to avoid calling json.dumps() on the query result. I need to fetch large number of documents from the DB and json.dumps() is taking quite some time. The strings that I am storing contain ASCII data so I don't need Unicode strings.
The assumption that the regular string is returned back as regular string was not correct. It is stored unaltered and not encoded to UTF-8 because it is already UTF-8. While decoding during the query, everything is converted back to Unicode.
Source:
Automatic string to unicode object conversion
How can I get pymongo to always return str and not unicode?
So I have a file that I need to have in either binary or hex format. Everything that I've been able to find basically says to store the text in a string and convert it to binary or hex from there, but I cant do it this way. The file was written using its own private character set that uses null and system hex codes, so notepad doesn't know what to do with these characters and replaces it with wrong characters and spaces. This distorts the information so it wont be correct if I try to convert it to binary/hex.
I really just need to have the binary/hex information stored in a string or text box so I can work with it. I don't really need it to be saved as a file.
Never mind, I finally figured it out. I used a file stream to read the data byte by byte. I didn't understand how to convert this as the first byte data in the array was showing as 80 when i knew the binary data should've been "1010000" (i didn't realize at that time that 80 was the decimal format).
Anyways I used the bitconverter.tostring and it put everything together and converted it to hexadecimal format. So i'm all good now.
Is there any way to check if the string is UNICODE using VB.net.
Best Regards
inchikka
You need to read the file using the Encoding that the file is written in.
It appears to be a non Unicode file that you are trying to read as Unicode, or possibly a different Unicode encoding than the default UTF-8 (could be UTF-16 for example).
StreamWriter has several constructors that the an Encoding as parameter.
You can do it by validating each character in the string against the 128 characters in the ASCII table. If the character is not found there then it might be a unicode character.
Is that what you mean?