Why does this work
Dim mem As New MemoryStream()
Dim bin As New BinaryWriter(mem)
bin.Write(CUShort(1000))
Dim read As New BinaryReader(New MemoryStream(mem.ToArray))
MsgBox(read.ReadInt16)
The message box give me 1000 which is right. Then I try to use this
Dim mem As New MemoryStream()
Dim bin As New BinaryWriter(mem)
bin.Write(CUShort(1000))
Dim s As String = ASCII.GetString(mem.ToArray)
Dim read As New BinaryReader(New MemoryStream(ASCII.GetBytes(s)))
MsgBox(read.ReadInt16)
It gives me 831 which is incorrect. Now I try it with Unicode encoding. It works. But I want to use ASCII. Why is this, and what am I doing wrong?
What you experience happens because of the way the .NET Runtime stores strings in memory, and because different encodings have a different set of characters.
A (U)Short is represented in memory by two bytes. When you call ASCII.GetString() the byte array is interpreted as coming from an ASCII string and is therefore converted into a UTF-16 string. This conversion is performed because UTF-16 is the encoding that all strings are stored as in memory by the .NET runtime.
Encoding.Unicode however is the same as UTF-16, so (at this point) no extra conversion is needed to store the string in memory. The byte array is just copied and marked as a string, thus you get the very same bytes and the same UShort.
This fiddle illustrates what I'm talking about: https://dotnetfiddle.net/p4EKn9
Related
For my current project in need a way to use ä,ö etc. in a datatable that is written to a .csv
It is the same project as in: VB Reading data from SQL Server to Array, writing into .CSV
I know that I need UTF-8 but how do I use it ?
Unlike VB6/VBScript/VBA, VB.Net strings already use full Unicode internally. You can already put accented characters in your string variables (and string members for other objects), and you don't need to do anything special.
There are three things you do need to watch for, though.
First, you must be sure to use NVARCHAR rather than VARCHAR for your Sql Server columns, as well as your ADO.Net parameters. You may also need to be careful about what collation you have (but the default is almost certainly fine here).
Second, when you open your StreamWriter, you need to use unicode-capable correct Encoding. System.Text.UTF8Encoding is one option. You could also do System.Text.UnicodeEncoding (which is UTF16) or System.Text.UTF32Encoding and get accurate output.
Finally, just because you successfully create a unicode CSV file, this does not mean your downstream consumers will handle the file correctly. A lot of text editors and other tools like to assume csv data is ASCII. But that's really outside of your scope. All you can is give them valid data. If they don't process it, that's on them :)
So assuming the database is correct, and based on the other question, you have this code:
Sub WriteCsvFiles(destPath As String, headings As String(), dt As DataTable)
Dim separator As Char = ";"c
Dim header = String.Join(separator, headings)
For Each r As DataRow In dt.Rows
Dim destFile = Path.Combine(destPath, r(0).ToString().Trim() & ".csv")
Using sw As New StreamWriter(destFile)
sw.WriteLine(header)
sw.WriteLine(CsvLine(r.ItemArray, separator))
End Using
Next
End Sub
This is close. However, take a look at the remarks in the documentation for the StreamWriter constructor:
This constructor creates a StreamWriter with UTF-8 encoding without a Byte-Order Mark (BOM), so its GetPreamble method returns an empty byte array. The default UTF-8 encoding for this constructor throws an exception on invalid bytes. This behavior is different from the behavior provided by the encoding object in the Encoding.UTF8 property.
So we kind of already have UTF-8 data, but to really have a correct UTF-8 file, including correct byte-order handling for certain wide characters, we need to change things just a little bit. Where you have this right now:
Using sw As New StreamWriter(destFile)
should become:
Using sw As New StreamWriter(destFile, False, Encoding.UTF8)
It also seems very odd to create a separate file for every row that will all have the same structure. I know it's in your original question, but I'd really push back on that, or find out why, and the maybe re-write the method as so:
Sub WriteCsvFile(destFile As String, headings As IEnumerable(Of String), dt As DataTable)
Dim separator As Char = ";"c
Dim header As String = String.Join(separator, headings)
Using sw As New StreamWriter(destFile, False, Encoding.UTF8)
sw.WriteLine(header)
For Each r As DataRow In dt.Rows
sw.WriteLine(CsvLine(r.ItemArray, separator))
Next
End Using
End Sub
I have a byte() array full of Unicode data. I need to convert this byte array to binary data.
The original data was binary data, but data was saved as Unicode, and thus these data structures are now all 2 times as large as they need to be.
Can I convert from a byte array of one type to another byte array, or is looping required to skip every other byte?
Edit:
Comments asked for more info
The original byte array is Unicode UTF32 looks to be the format.
The output byte array needs to remove that extra encoding.
So, assuming this, then EndianUnicode as bytes to toss out the extra data works quite well
This seems to work:
b2 = System.Text.Encoding.BigEndianUnicode.GetBytes
(System.Text.Encoding.UTF32.GetString(b))
Of course it not clear as to why the resulting array is not EXACTLY 1/2 in size but the above does seem to work.
Edit2:
Ok, as noted, the question was not ONLY how to convert, but was btye arrary to btye array. Further more, the array was indeed Unicode, but the original binary byte array was based on the users local code page (English).
So the CORRECT conversion I required was this:
b2 = System.Text.Encoding.Default.GetBytes
(System.Text.Encoding.Unicode.GetString(b))
However, the above converts from a byte arrary to a string, and then back to a byte array. My question was STILL how to do this from byte arrary to byte array. Turns out you can do this, and this is how:
Dim b() As Byte
b = reader(0) ' the array is filled with Unicode (air code)
Dim b2() As Byte
' convert byte array - not have to convert to strings
Dim cFrom As System.Text.Encoding = System.Text.Encoding.Unicode
Dim cto As System.Text.Encoding = System.Text.Encoding.Default
b2 = System.Text.Encoding.Convert(cFrom, cto, b)
As noted, above is byte() array to byte() array as per my original question.
Note that "default" in above is of course the default code page (in my case a computer running English version of windows).
I have an application that makes a call to a third party web API that returns a String that looks something like this:
"JVBERi0xLjMNCiXi48/TDQoxIDAgb2JqDQo8PA0KL1R5cGUgL091dGxpbmVzDQovQ291bnQgMA0KPj4NCmVuZG9iag0KMiAwIG9iag0KDQpbL1BERiAvVGV4dCAvSW1hZ2VDXQ0KZW"
(It's actually much longer than that but I'm hoping just the small snippet is enough to recognize it without me pasting a string a mile long)
The documentation says it returns a Byte array but when I try to accept it as a Byte array directly, I get errors. Part of my problem here is that the documentation isn't completely clear what the Byte array represents. Since it's a GetReport function I'm calling, I'm guessing it's a PDF but I'm not 100% sure as the documentation doesn't say at all.
So, anyway, I'm getting this String and I'm trying to convert it to a PDF. Here's what that looks like:
Dim reportString As String = GetValuationReport(12345, token.SecurityToken)
Dim report As Byte() = System.Text.Encoding.Unicode.GetBytes(reportString)
File.WriteAllBytes("C:\filepath\myreport.pdf", report)
I'm pretty sure that the middle line converts the String into a new Byte array rather than simply converting it into its Byte array equivalent but I don't know how to do that.
Any help would be fantastic. Thanks!
It looks like your string may be Base64 encoded, in which case you would use this to convert it to bytes:
Dim report As Byte() = Convert.FromBase64String(reportString)
this is my code to convert string to hex
Function StringToHex(ByVal text As String) As String
Dim xhex As String
For i As Integer = 0 To text.Length - 1
xhex &= Asc(text.Substring(i, 1)).ToString("x").ToUpper
Next
Return xhex
End Function
I convert string file to hex with this code, but if size file more than 1MB my program is not responding
how to make this code more efficient for size file more than 1MB sorry my english is bad
As I said in my initial comment, your current approach is creating a new string each time you go through the For loop. Strings are immutable (can't be changed) in .NET - so for example if you have 3000 characters in the string, xHex = &a is going to create 3,000 strings, and that's just for the first part. Then you have a Substring, then a ToString and finally a ToUpper - so if my math is right, you're creating 4 strings for every character in the input string (so if you have 3,000 characters that 12,000 additional strings).
The call to Substring is unnecessary - you can treat the string as an array and access each character in the string as an array index, so now you would have:
xhex &= Asc(text(i)).ToString("x").ToUpper
You can also get rid of the call .ToUpper() by using an uppercase "X" in the call to .ToString() - so now you have:
xhex &= Asc(text(i)).ToString("X")
You could also make xhex a StringBuilder, and then you'd only be creating one additional string each time through the loop (the call to .ToString()). Putting that all together gives you this:
Dim xhex As StringBuilder = New StringBuilder()
For i As Integer = 0 To text.Length - 1
xhex.Append(Asc(text(i).ToString("X"))
Next
Return xhex.ToString()
That may help with the process, but if the string is really large you may still run into memory issues. IF the file is really large I'd recommend reading it using a Stream and processing the Stream one byte at a time (or several bytes at time, your choice).
I would also suggest Googling for VB.NET convert string to hex, as there are many examples of other ways to do this.
Is there any easy way of converting a windows-1252 string into a Unicode one?
All strings in .NET are Unicode in memory.
If you have a byte array that was generated from a string encoded in 1252, you can recover the string using
Dim S as String = System.Text.Encoding.GetEncoding(1252).GetString(array)
It is now a unicode string in memory. If you then want to encode that string into a UTF-8 byte array for transmission or storage, you would do the converse:
Dim A as byte() = System.Text.Encoding.GetEncoding("UTF-8").GetBytes(S)
(I think that is the right VB syntax!)