Converting Greek characters to Unicode - vb.net

Is there any easy way of converting a windows-1252 string into a Unicode one?

All strings in .NET are Unicode in memory.
If you have a byte array that was generated from a string encoded in 1252, you can recover the string using
Dim S as String = System.Text.Encoding.GetEncoding(1252).GetString(array)
It is now a unicode string in memory. If you then want to encode that string into a UTF-8 byte array for transmission or storage, you would do the converse:
Dim A as byte() = System.Text.Encoding.GetEncoding("UTF-8").GetBytes(S)
(I think that is the right VB syntax!)

Related

Convert unicode byte arrary to a binary byte arrary

I have a byte() array full of Unicode data. I need to convert this byte array to binary data.
The original data was binary data, but data was saved as Unicode, and thus these data structures are now all 2 times as large as they need to be.
Can I convert from a byte array of one type to another byte array, or is looping required to skip every other byte?
Edit:
Comments asked for more info
The original byte array is Unicode UTF32 looks to be the format.
The output byte array needs to remove that extra encoding.
So, assuming this, then EndianUnicode as bytes to toss out the extra data works quite well
This seems to work:
b2 = System.Text.Encoding.BigEndianUnicode.GetBytes
(System.Text.Encoding.UTF32.GetString(b))
Of course it not clear as to why the resulting array is not EXACTLY 1/2 in size but the above does seem to work.
Edit2:
Ok, as noted, the question was not ONLY how to convert, but was btye arrary to btye array. Further more, the array was indeed Unicode, but the original binary byte array was based on the users local code page (English).
So the CORRECT conversion I required was this:
b2 = System.Text.Encoding.Default.GetBytes
(System.Text.Encoding.Unicode.GetString(b))
However, the above converts from a byte arrary to a string, and then back to a byte array. My question was STILL how to do this from byte arrary to byte array. Turns out you can do this, and this is how:
Dim b() As Byte
b = reader(0) ' the array is filled with Unicode (air code)
Dim b2() As Byte
' convert byte array - not have to convert to strings
Dim cFrom As System.Text.Encoding = System.Text.Encoding.Unicode
Dim cto As System.Text.Encoding = System.Text.Encoding.Default
b2 = System.Text.Encoding.Convert(cFrom, cto, b)
As noted, above is byte() array to byte() array as per my original question.
Note that "default" in above is of course the default code page (in my case a computer running English version of windows).

Conversion of Multiple ascii code to char

I have a function that convert string to ascii code (e.g string = "system" the value of string in ascii = "115 121 115 116 101 109") and what i need is a way to convert the `ascii into char. Do i need to use loop to filter the converted ascii ? I need your suggestion what is the best way to convert it
The best way is one that checks your assumptions. You say the string contains ASCII bytes. So if someone slips in a byte that cannot be ASCII, you should be told. It does appear that someone slipped in an unexpected space, but that can be ignored.
Imports System.Linq
'…
Dim asciiEncoding = Encoding.GetEncoding("US-ASCII",
EncoderFallback.ExceptionFallback,
DecoderFallback.ExceptionFallback)
Dim ascii = "115 121 115 116 101 109"
Dim asciiBytes = ascii.Split( { " "c }, StringSplitOptions.RemoveEmptyEntries) _
.Select(Function (s) Byte.Parse(s)) _
.ToArray()
Dim s = asciiEncoding.GetString(asciiBytes)
Other ways might not catch invalid data.
Some ways automagically convert from the ASCII character set to the Unicode character set, which is valid when the data is, in fact, ASCII but at least deserves a comment about the conversion and that the data is trusted to be ASCII.
Speaking of whether the data is ASCII or not, there is no text but encoded text. When you read text, you have to use the encoding it was written with. The only way to know is for the writer to make it known.
My suggestion is:
First split the string using string.split(' ');
And then convert each split string to char like this:
foreach(string word in SplitedWords)
{
Convert.ToChar(int.Parse(word));
}
This is built-in. You can treat a String as an array of Char.
Dim s As String = "system"
Console.WriteLine(s(4) & " = ASCII " & Asc(s(4))) 'output: e = ASCII 101
Console.ReadKey()
You can use the Asc() function to get the ASCII value of the character, but be aware that Strings and Chars are actually Unicode, not ASCII. Depending on encoding, you might find that one character is more than one byte.
FOR THE NITPICKERS: When you treat a String as an array of Char, the compiler boxes the String, so it is less efficient that having a true Char array.

Error reading UInt16 from BinaryReader

Why does this work
Dim mem As New MemoryStream()
Dim bin As New BinaryWriter(mem)
bin.Write(CUShort(1000))
Dim read As New BinaryReader(New MemoryStream(mem.ToArray))
MsgBox(read.ReadInt16)
The message box give me 1000 which is right. Then I try to use this
Dim mem As New MemoryStream()
Dim bin As New BinaryWriter(mem)
bin.Write(CUShort(1000))
Dim s As String = ASCII.GetString(mem.ToArray)
Dim read As New BinaryReader(New MemoryStream(ASCII.GetBytes(s)))
MsgBox(read.ReadInt16)
It gives me 831 which is incorrect. Now I try it with Unicode encoding. It works. But I want to use ASCII. Why is this, and what am I doing wrong?
What you experience happens because of the way the .NET Runtime stores strings in memory, and because different encodings have a different set of characters.
A (U)Short is represented in memory by two bytes. When you call ASCII.GetString() the byte array is interpreted as coming from an ASCII string and is therefore converted into a UTF-16 string. This conversion is performed because UTF-16 is the encoding that all strings are stored as in memory by the .NET runtime.
Encoding.Unicode however is the same as UTF-16, so (at this point) no extra conversion is needed to store the string in memory. The byte array is just copied and marked as a string, thus you get the very same bytes and the same UShort.
This fiddle illustrates what I'm talking about: https://dotnetfiddle.net/p4EKn9

Convert hex to ASCII, like in Excel

I am looking for a function in VB.NET which will convert a hex value to the corresponding ASCII, like in Excel.
For example, in Excel,
=CHAR(HEX2DEC("c7")) will return, 'Ç'
Is there any library function, which does the same, in .NET?
Dim hexValue = "FF"
Dim ascii = System.Convert.ToChar(System.Convert.ToUInt32(hexValue, 16))
You can convert a byte array to an ASCII string using System.Text.Encoding.ASCII.GetString. The byte array can be defined using hex literals.
You can use the ChrW method. You have to import the Microsoft.VisualBasic namespace:
ChrW(Convert.ToInt32("C7", 16))

String Length Issues

I have a byte array that I convert into a string like so Dim byt As Byte() = New Byte(255) {} s = New String(Encoding.ASCII.GetChars(byte))
My question is when I look at the string in a debuger its clearly a normal string but when I compare it to what I know its supposed to be it doesnt equal. So i did a quick check and for some reason its return a string thats the length of 256 characters. So i did a s.trim and it still is 256 characters long. Any idea whats going on what this?
You created a string with 256 characters that are 0. The debugger cannot display them. Use this to trim the string:
s = s.Trim(ChrW(0))