How to define a string literal containing non-ASCII characters? - vb.net

I'm programming in VB.NET using Visual Studio 2008.
I need to define a string literal containing the character "÷" equivalent to Chr(247).
I understand that internally VS uses UTF-16 encoding, but when the source file is written to disk it contains the single byte value F7 for this character.
This source file is processed by another program that uses UTF-8 encoding by default, so it fails to interpret this character correctly, attempting to combine it with the following single-byte character.
What encoding would correctly interpret the single byte F7 as the single character ÷?
Alternatively, is there a way of expressing a non-ASCII literal that uses only ASCII characters - like using some kind of escape sequence?

well, i always thought that by default VS uses UTF-8 to save files. But ÷ is F7 in encoding ISO 8859-1. If this is not enough for you go here: how to change source file encoding in csharp project (visual studio / msbuild machine)?

Related

LABVIEW ° symbol (potentially encoded in ANSI) to UTF-8 ° symbol

I am reading in data from a .xlsx file, apparently encoded in ANSI(?). Labview can take the data just fine and when creating a text file based on the data, when opened/viewed with encoding ANSI (notepad++ or just notepad) it looks fine. The problem being that Notepad++ is defaulted to UTF-8 so not many people know to change the encoding to "ANSI" and the ° symbol does not translate well.
I use the Report Generation Toolkit Excel Get Data VI to get the data from excel and return it as a 2D string array in LabVIEW.
I am making the assumption that it's encoded in ANSI because when I open the text file (the .xml that I insert the excel data into) in Notepad++
I get 2 characters for what was supposed to be my degree symbol °, and when I change the encoding from UTF-8 to ANSI then the data is as how I read it. Also when I open the .xml file in Notepad, the degree symbol shows normally.

wxWidgets string on windows

I have this code:
wxString tmp(wxT("Información del usuario"));
wxStaticBoxSizer* sbSizer1 = new wxStaticBoxSizer (wxVERTICAL, panel, tmp);
This shows rare symbols instead of ñ in Windows but in Linux it shows correctly the letter..any ideas?
The value of the string in your code depends on the encoding of your source file and also the charset used by your compiler. If your source file itself is in Unicode (whether it's UTF-8 or UTF-16), then you can use L"..." to create a wide string literal. If not, or you're not sure, you can always use wxString::FromUTF8() to explicitly encode the string as UTF-8, e.g. wxString::FromUTF8("Informaci\xc3\xb3n...") will always work.

Saving CSV file with degree symbol and ASCII encoded

I have string variable txt. It contains "°" degree symbol. I would like to save string into CSV file ASCII encoded. I use the procedure below But the "°" symbol is converted to "?". Do you have any idea how to save properly degree symbol?
Public Sub Write_File(ByVal txt As String, ByVal fName As String)
Try
Using OutFile As New StreamWriter(fName, False, Text.Encoding.ASCII)
OutFile.Write(txt)
End Using
Me.Write_Log("Succesfully Exported")
Catch ex As Exception
Me.Write_Log("Write Error during export")
End Try
End Sub
Encoding.ASCII is for the standard 7-bit ASCII encoding, which does not contain a degree symbol at all. In order to get a degree symbol in ASCII, you would have to use one of the many 8-bit ASCII encodings. For English, you'd probably be most interested in using the ISO 8859-1 code page, since that's the most standard-ish one there is of the bunch. For instance, instead of using Encoding.ASCII, you could do something like this:
Using OutFile As New StreamWriter(fName, False, Text.Encoding.GetEncoding("iso-8859-1"))
OutFile.Write(txt)
End Using
For a complete list of available encodings, use the Encoding.GetEncodings method, or look at the list of supported ones in the MSDN documentation.
Of course, none of the various 8-bit ASCII encodings are compatible with each other, so, if you do use that, the degree symbol will be a completely different symbol when viewed on a system that uses a different code page by default. That is precisely why UTF-8 has become the new standard. Usage of 8-bit ASCII is widely discouraged since it is practically unworkable in multi-cultural scenarios. If you can use UTF-8 instead, I would. If you must use ASCII, it's best to stick to the standard 7-bit encoding. If you must use an 8-bit ASCII encoding, please do so sparingly and with full awareness of its drawbacks.
One more thing. You mention the degree symbol as being character 167 (0xA7) in your desired target encoding. If that is the case, you may actually be wanting IBM437 encoding rather than ISO 8859-1. IBM437 is the old code page that was used by default in MS-DOS. If you really need to use that code page, you may have additional trouble for two reasons. As you'll see in the MSDN article, that code page is not well supported in the .NET framework. In my testing, outputting the Unicode string containing the degree symbol using that encoding did not work properly. Therefore, you may find yourself needing to use a byte array to represent the data rather than a String variable (which is Unicode). For instance:
File.WriteAllBytes("Test.txt", {167})
The second problem is that IBM437 is likely not the default code page for your windows OS, so even when it is written to the file as byte value 167, it won't actually look like a degree symbol when you view it in a windows application such as notepad.

how to check the string is UNICODE vb.net

Is there any way to check if the string is UNICODE using VB.net.
Best Regards
inchikka
You need to read the file using the Encoding that the file is written in.
It appears to be a non Unicode file that you are trying to read as Unicode, or possibly a different Unicode encoding than the default UTF-8 (could be UTF-16 for example).
StreamWriter has several constructors that the an Encoding as parameter.
You can do it by validating each character in the string against the 128 characters in the ASCII table. If the character is not found there then it might be a unicode character.
Is that what you mean?

VBA - Read file byte by byte on system with Asian locale

I am trying to convert a file from binary to text, by simply replacing each character with the hexadecimal code. For example, character 'c' will be replaced by '63'.
I have a code which is working fine in normal systems, but it breaks down in the PC where I need to use it as it has default locale set to Chinese.
I am using the following statements to read a byte -
ch$ = " "
Get #f%, , ch$
I suspect there is a problem when I am reading the file byte by byte, as it is skipping certain bytes because they form composite characters. It's probably reading 2 bytes which form an Asian character as one byte. It is thus forming a much smaller file than the expected size.
How can I read the file byte by byte?
Full code is pasted here: http://pastebin.com/kjpSnqzV
Your suspicion is correct. VB file reading automatically converts strings into Unicode from the default code page on the PC. On an Asian code page, some characters are represented as more than one byte.
I advise you to use a Byte variable rather than a string - that will stop VB being over helpful.
Dim ch As Byte
Get #f%, , ch
Another possible problem with the original code is that some byte sequences are illegal on Asian code pages (they don't represent valid characters). So your code could experience errors for some input files, but presumably you want it to work with any file.