VB.NET, I can't convert Unicode escape sequences to text

VB.NET, I can't convert Unicode escape sequences to text - vb.net

I watched many videos on YouTube, read many solutions on Google and Stack Overflow! Can anyone tell me how I can convert Unicode escape sequences to text?
I tried this:
Dim f = System.Net.WebUtility.HtmlDecode("sa3444444d4ds\u0040outllok.com")
MsgBox(f)
and this:
Dim f = System.Uri.UnescapeDataString("sa3444444d4ds\u0040outllok.com")
MsgBox(f)
and this:
Dim myBytes As Byte() = System.Text.Encoding.Unicode.GetBytes("sa3444444d4ds\u0040outllok.com")
Dim myChars As Char() = System.Text.Encoding.Unicode.GetChars(myBytes)
Dim myString As String = New String(myChars)
MsgBox(myString)
and this:
Dim f = UnicodeToAscii("sa3444444d4ds\u0040outllok.com")
MsgBox(f)
Public Function UnicodeToAscii(ByVal unicodeString As String) As String
Dim ascii As Encoding = Encoding.ASCII
Dim unicode As Encoding = Encoding.Unicode
' Convert the string into a byte array.
Dim unicodeBytes As Byte() = unicode.GetBytes(unicodeString)
' Perform the conversion from one encoding to the other.
Dim asciiBytes As Byte() = Encoding.Convert(unicode, ascii, unicodeBytes)
' Convert the new byte array into a char array and then into a string.
Dim asciiChars(ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length) - 1) As Char
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0)
Dim asciiString As New String(asciiChars)
Return asciiString
End Function

You can use Regex.Unescape.
For example,
Dim s = "sa3444444d4ds\u0040outllok.com"
Console.WriteLine(Regex.Unescape(s))
outputs:
sa3444444d4ds#outllok.com
Credit to Tim Patrick for showing this in the Visual Studio Magazine article Overcoming Escape Sequence Envy in Visual Basic and C#.

Related

Dealing with null character in VB.NET

I am trying to open a binary file in VB.NET (Visual Studio 2010), that looks like this:
The file opens ok with this method:
Dim OpenFile1 As New OpenFileDialog
If (OpenFile1.ShowDialog = System.Windows.Forms.DialogResult.OK And (OpenFile1.FileName.Length > 0)) Then
'do something
End If
However, if "do something" is:
Dim readText As String = File.ReadAllText(OpenFile1.FileName)
MsgBox(readText)
Only the first byte is converted, as the second one is 00 (null) and truncates the rest of the file, marking the end of the string, and it displays only the first byte F0 (≡ in ASCII).
But if I do:
'convert file to hex string
Dim bytes As Byte() = IO.File.ReadAllBytes(OpenFile1.FileName)
Dim hex As String() = Array.ConvertAll(bytes, Function(b)
b.ToString("X2"))
Dim newfile As String
newfile = (String.Join("", hex))
RichTextBox1.Text = newfile
Now the string is properly converted to hex values. So far so good.
However, when I try to convert the string back to ASCII using this method:
'convert hex string to text and put it into the richtextbox
Dim asciistring As String = ""
For x As Integer = 0 To (newfile.Length - 1) Step 2
Dim k As String = newfile.Substring(x, 2)
asciistring &= System.Convert.ToChar(System.Convert.ToUInt32(k,
16)).ToString()
Next
RichTextBox1.Text = asciistring
Again, only the first byte is converted. The rest is truncated as soon as it finds a 00 (null).
Is there a way to circumvent this situation?

Haven't tested out this code yet, but you can try give this method a try :
Public Shared Function ConvertHex(ByVal hexString As String) As String
Try
Dim ascii As String = String.Empty
For i As Integer = 0 To hexString.Length - 1 Step 2
Dim hs As String = String.Empty
hs = hexString.Substring(i, 2)
Dim decval As UInteger = System.Convert.ToUInt32(hs, 16)
Dim character As Char = System.Convert.ToChar(decval)
ascii += character
Next
Return ascii
Catch ex As Exception
Console.WriteLine(ex.Message)
End Try
Return String.Empty
End Function
When calling the function, just pass your hex string.

Convert Arabic string to an array of bytes

I have a function which converts string to an array of bytes. If the string is written in English, the function works fine. But if the input string is Arabic, the function doesn't return, and I get this error:
Value was either too large or too small for an unsigned byte
Friend Function StringtoByteArray(ByRef value As String) As Byte()
Dim temp() As Byte
ReDim temp(Len(value) - 1)
Dim i As Integer
For i = 0 To Len(value) - 1 Step 1
temp(i) = Convert.ToByte(Convert.ToChar(Mid(value, i + 1, 1)))
Next
StringtoByteArray = temp
End Function
What should I change to convert Arabic characters to byte?
I am using VB.NET.

You don't need to write your function for that, this should work:
Dim b As Byte() = System.Text.Encoding.Unicode.GetBytes(value)

Converting UTF-8 to windows-1255 encoding in VB.NET

I am trying to convert a string encoded in UTF-8 to windows-1255 in VB.NET with no luck. Admittedly, I don't know VB but have tried using an example at MSDN and modifying it to my needs:
Public Function Utf82Hebrew(ByVal Str As String) As String
Dim ascii As Encoding = Encoding.GetEncoding("windows-1255")
Dim unicode As Encoding = Encoding.Unicode
' Convert the string into a byte array.
Dim unicodeBytes As Byte() = unicode.GetBytes(Str)
' Perform the conversion from one encoding to the other.
Dim asciiBytes As Byte() = Encoding.Convert(unicode, ascii, unicodeBytes)
' Convert the new byte array into a char array and then into a string.
Dim asciiChars(ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)-1) As Char
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0)
Dim asciiString As New String(asciiChars)
Utf82Hebrew = asciiString
End Function
This function doesn't actually do anything—the string remains in UTF-8. However, if I change this line:
Dim ascii As Encoding = Encoding.GetEncoding("windows-1255")
To this:
Dim ascii As Encoding = Encoding.ASCII
Then the function returns question marks in the place of the string.
Does anyone know how to properly convert a UTF-8 string to a specific encoding (in this case, windows-1255), and/or what I'm doing wrong in the above code?
Thanks in advance.

I modified your code.
It is very straightforward to convert text from one encoding into another.
This is how you should do it in VB.Net.
Microsof Windows file encoding is 1252, not 1255.
Public Function Utf82Hebrew(ByVal Str As String) As String
Dim ascii As System.Text.Encoding = System.Text.Encoding.GetEncoding("1252")
Dim unicode As System.Text.Encoding = System.Text.Encoding.Unicode
' Convert the string into a byte array.
Dim unicodeBytes As Byte() = unicode.GetBytes(Str)
' Perform the conversion from one encoding to the other.
Dim asciiBytes As Byte() = System.Text.Encoding.Convert(unicode, ascii, unicodeBytes)
' Convert the new byte array into a char array and then into a string.
Dim asciiString As String = ascii.GetString(asciiBytes)
Utf82Hebrew = asciiString
End Function

Mixed Encoding to String

I have a string in VB.net that may contain something like the following:
This is a 0x000020AC symbol
This is the UTF-32 encoding for the Euro Symbol according to this article http://www.fileformat.info/info/unicode/char/20ac/index.htm
I'd like to convert this into
This is a € symbol
I've tried using UnicodeEncoding() class in VB.net (Framework 2.0, as I'm modifying a legacy application)
When I use this class to encode, and then decode I still get back the original string.
I expected that the UnicodeEncoding would recognise the already encoded part and not encode it against. But it appears to not be the case.
I'm a little lost now as to how I can convert a mixed encoded string into a normal string.
Background: When saving an Excel spreadsheet as CSV, anything outside of the ascii range gets converted to ?. So my idea is that if I can get my client to search/replace a few characters, such as the Euro symbol, into an encoded string such as 0x000020AC. Then I was hoping to convert those encoded parts back into the real symbols before I insert to a SQL database.
I've tried a function such as
Public Function Decode(ByVal s As String) As String
Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(s)
Dim output As String = ""
output = uni.GetString(encodedBytes)
Return output
End Function
Which was based on the examples on the MSDN at http://msdn.microsoft.com/en-us/library/system.text.unicodeencoding.aspx
It could be that I have a complete mis-understanding of how this works in VB.net. In C# I can simply use escaped characters such as "\u20AC". But no such thing exists in VB.net.

Based on advice from Heinzi I implemented a Regex.Replace method using the following code, this appear to work for my examples.
Public Function Decode(ByVal s As String) As String
Dim output As String = ""
Dim sRegex As String = "0x[0-9a-zA-Z]{8}"
Dim r As Regex = New Regex(sRegex)
Dim myEvaluator As MatchEvaluator = New MatchEvaluator(AddressOf HexToString)
output = r.Replace(s, myEvaluator)
Return output
End Function
Public Function HexToString(ByVal hexString As Match) As String
Dim uni As New UnicodeEncoding(True, True)
Dim input As String = hexString.ToString
input = input.Substring(2)
input = input.TrimStart("0"c)
Dim output As String
Dim length As Integer = input.Length
Dim upperBound As Integer = length \ 2
If length Mod 2 = 0 Then
upperBound -= 1
Else
input = "0" & input
End If
Dim bytes(upperBound) As Byte
For i As Integer = 0 To upperBound
bytes(i) = Convert.ToByte(input.Substring(i * 2, 2), 16)
Next
output = uni.GetString(bytes)
Return output
End Function

Have you tried:
Public Function Decode(Byval Coded as string) as string
Return StrConv(Coded, vbUnicode)
End Function
Also, your function is invalid. It takes s as an argument, does a load of stuff and then outputs the s that was put into it instead of the stuff that was processed within it.

Converting non-Unicode to Unicode

I'm trying to convert a non-Unicode string like this, '¹ûº¤¡¾¢º¤ìñ©2' to Unicode like this, 'ໃຊ້ໃນຄົວເຮືອນ' which is in Lao. I tried with the code below and its return value is like this, '??????'. Any idea how can I convert the string?
Public Shared Function ConvertAsciiToUnicode(asciiString As String) As String
' Create two different encodings.
Dim encAscii As Encoding = Encoding.ASCII
Dim encUnicode As Encoding = Encoding.Unicode
' Convert the string into a byte[].
Dim asciiBytes As Byte() = encAscii.GetBytes(asciiString)
' Perform the conversion from one encoding to the other.
Dim unicodeBytes As Byte() = Encoding.Convert(encAscii, encUnicode, asciiBytes)
' Convert the new byte[] into a char[] and then into a string.
' This is a slightly different approach to converting to illustrate
' the use of GetCharCount/GetChars.
Dim unicodeChars As Char() = New Char(encUnicode.GetCharCount(unicodeBytes, 0, unicodeBytes.Length) - 1) {}
encUnicode.GetChars(unicodeBytes, 0, unicodeBytes.Length, unicodeChars, 0)
Dim unicodeString As New String(unicodeChars)
' Return the new unicode string
Return unicodeString
End Function

Your 8-bit encoded Lao text is not in ASCII, but in some codepage like IBM CP1133 or Microsoft LC0454, or most likely, the Thai codepage 874. You have to find out which one it is.
It matters how you have obtained (read, received, computed) the input string. By the time you make it a string it is already in Unicode and is easy to output in UTF-8, for example, like this:
Dim writer As New StreamWriter("myfile.txt", True, System.Text.Encoding.UTF8)
writer.Write(mystring)
writer.Close()
Here is the whole in-memory conversion:
Dim utf8_input as Byte()
...
Dim converted as Byte() = Encoding.Convert(Encoding.GetEncoding(874), Encoding.UTF8, utf8_input)
The number 874 is the number that says in which codepage your input is. Whether a particular operating system installation supports this codepage, is another question, but your own system will nearly certainly support it if you just used it to compose your Stack Overflow question.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

VB.NET, I can't convert Unicode escape sequences to text - vb.net

You can use Regex.Unescape. For example, Dim s = "sa3444444d4ds\u0040outllok.com" Console.WriteLine(Regex.Unescape(s)) outputs: sa3444444d4ds#outllok.com Credit to Tim Patrick for showing this in the Visual Studio Magazine article Overcoming Escape Sequence Envy in Visual Basic and C#.

Related

Dealing with null character in VB.NET

Convert Arabic string to an array of bytes

Converting UTF-8 to windows-1255 encoding in VB.NET

Mixed Encoding to String

Converting non-Unicode to Unicode

Categories

Resources