VB Hex to DocX Object - vb.net

I have a selection of docx files stored as blob data in hexadecimal, I need to retrieve these so I can access the text within.
So far, I have converted the hex to string format with the following:
Dim blob = BLOB DATA
Dim con As String = String.Empty
For x = 2 To st.Length - 2 Step 2
con &= ChrW(CInt("&H" & st.Substring(x, 2)))
Next
However, if I then save the output from this as a .docx the file will not open because it is 'corrupt'. I presume that is why when I load this string into a memorystream and then try and use Novacode.DocX.Load(memoryStream) it gives me a similar corruption error.
I have tried splitting to byte array in two fashions, both give me different results.
System.Text.Encoding.Default.GetBytes(hex)
I have also tried.
Public Function HexToByteArray(hex As String) As Byte()
Dim upperBound As Integer = hex.Length \ 2
If hex.Length Mod 2 = 0 Then
upperBound -= 1
Else
hex = "0" & hex
End If
Dim bytes(upperBound) As Byte
For i As Integer = 2 To upperBound
bytes(i) = Convert.ToByte(hex.Substring(i * 2, 2), 16)
Next
Return bytes
End Function
I then tried converting them both to a memory stream and using them to create a DocX object like so:
Dim doc As DocX = DocX.Load(New MemoryStream(bytes))

docx is not a text format, it's a binary format. Thus, converting it to a string is just plain wrong. Your end result needs to be a byte array.
Knowing that, your problem can be split into two simpler problems:
Split your hex string into strings of two characters each. See this SO question for details (or keep your existing loop, which is perfectly fine):
How to split a string by x amount of characters
Convert those "small" strings, which contain the hexadecimal representation of a byte, into bytes. See this SO question for details:
How do I convert a Hexidecimal string to a Byte Array?
Combining those two solutions is left as an exercise to the reader. We don't want to spoil all the fun or ruin the learning experience. ;-)

Related

how to parse string containing Unicode ID's as well as plain text for display in datagrid view [duplicate]

This question already has answers here:
How do I convert Unicode escape sequences to Unicode characters in a .NET string?
(5 answers)
Closed 3 years ago.
I am trying to parse a string (returned by a web server), which contains non-standard (as far as I can tell) unicode Id's such as "\Ud83c" or "\U293c", as well as plain text. I need to display this string, emojis in tact, to the user in a datagrid view.
btw, I am blind so please excuse any formatting errors :(
full example of what my code is parsing: "Castle: \Ud83d\Udc40Jerusal\U00e9m.Miles"
the code I wrote which is failing miserably:
Public Function ParseUnicodeId(LNKText As String) As String
Dim workingarray() As String
Dim CurString As String
Dim finalString As String
finalString = ""
' split at \ char
workingarray = Split(LNKText, chr(92))
For Each CurString In workingarray
If CurString <> "" Then
' remove leading U so number can be converted to hex
CurString = Right(CurString, Len(CurString) - 1)
' attempt to cut off right most chars until number can be converted to text as there is nothign separating end of Unicode chars and start of plain text
Do While IsNumeric(CurString) = False
If CurString = "" Then
Exit Do
End If
CurString = Left(CurString, Len(CurString) - 1)
Loop
If CurString.StartsWith("U", StringComparison.InvariantCultureIgnoreCase) Then
CurString = CurString.Substring(1)
End If
' convert result from above to hex
Dim numeric = Int32.Parse(CurString, NumberStyles.HexNumber)
' convert to bytes
Dim bytes = BitConverter.GetBytes(numeric)
' convert resulting bytes to a real char for display
finalString = finalString & Encoding.Unicode.GetString(bytes)
End If
Next
ParseUnicodeId = finalString
End Function
I tried to do this all kinds of ways; but can't seem to get it right. My code currently returns empty strings, although my guess is that is because of some of the more recent changes I have made to cut off the leading U or to try and chop off one char at a time. If I take those bits out and just pass it something like "Ud83c", it works perfectly; its only when plain text is mixed in that it fails, but I can't seem to come up with a way to separate the two and re-combine at the end.
You can use Regex.Unescape() to convert the unicode escaped char (\uXXXX) to a string.
If you receive \U instead of \u, you also need to perform that substitution, since \U is not recognized as a valid escape sequence.
Dim input as String = "Castle: \Ud83d\Udc40Jerusal\U00e9m.Miles"
Dim result As String = Regex.Unescape(input.Replace("\U", "\u")).
This prints (it may depend on the Font used):
Castle: 👀Jerusalém.Miles
As a note, you might also have used the wrong encoding when you decoded the input stream.

Output/write SIMH DEC tape file to disk

--->>> In VB .Net
I was interested in writing a simulated disk file in SIMH file format. Essentially ASCII records are encapsulated with Tape (FILE) Control characters and binary record lengths. I know this is easily handled in C/C++ but I would like to implement this in VB.Net
First, Is there anyone can put an example on in writing table control characters to a data stream for output to a flat file.
Second, Example of writing to the SIMH format. I have have a PDF describing the the SIMH format so I'm reviewing this specification.
Thirdly, using the BitConverter to export a stream with binary tape markers encapsulating ASCII records. I believe i could use the BitConverter to create the binary file markers. However it seems I can use BinaryWriter, setting up a FileStream Handle. If write the binary tape control characters with BinaryWriter how to implement one writer writing both Binary record lengths and the ASCII record? An example would be ideal.
SIMH is a derivative DEC tape format.
Dim fs As FileSystem = Nothing
Dim infile As String = Path.Combine(<path_to_file>)
fs = FileSystem(infile, FileMode.Create)
Using bw as BinaryWriter = New BinaryWriter(fs)
For i as integer = 0 to Headers.Count - 1
Dim lengthHeader As Integer = m_headers(i).Length
If lengthHeader = MAX_HEADER_LENGTH then
Dim dataArray() As byte = StringtoByteArray(m_headers(i))
bw.Write(lengthHeader)
bw.Write(dataArray)
bw.Write(lengthHeader)
End If
Next
For i As Integer = 0 To m_myRecs.Count - 1
Dim thisRec As String = m_myRecs(i)
Dim lengthRec As Integer = thisRec.Length
Dim dataSentinal = (row * 6) + 8 + 2
bw.Write(dataSentinal)
Dim dataArray() As byte = StringtoByteArray(m_myRecs(i))
bw.Write(dataArray)
bw.Write(dataSentinal)
Next
bw.Write(-1)
End Using
fs.Close()

How to replace bytes in VB.NET?

I have two strings:
Dim Original_Hex_Bytes as string = "616572646E61"
Dim Patched_Hex_Bytes as string = "616E64726561"
Then I have a binary file and I need to search for the Original_Hex_Bytes and replace them with Patched_Hex_Bytes; I don't konw the offset where begin to write new bytes :(
How can I do this?
If needed, I also know how to convert Hex strings in bytes, I use this:
Private Function Hex_To_Bytes(ByVal strinput As String) As Byte()
Dim i As Integer = 0
Dim x As Integer = 0
Dim bytes(strinput.Length / 2) As Byte
Do While (strinput.Length > i + 1)
Dim lngDecimal As Long = Convert.ToInt32(strinput.Substring(i, 2), 16)
bytes(x) = Convert.ToByte(lngDecimal)
i += 2
x += 1
Loop
Return bytes
End Function
You can use BinaryReader and BinaryWriter classes to achieve this.
But in this case, as you do not know the file structure, need to read the entire file and sweep it in search of bytes array and will be easier to use ASCII strings as aerdna and andrea.
When you know the structure of a file is more appropriate to work with data structure to manipulate its contents.

Convert string of byte array back to original string in vb.net

I have a plain text string that I'm converting to a byte array and then to a string and storing in a database.
Here is how I'm doing it:
Dim b As Byte() = System.Text.Encoding.UTF8.GetBytes("Hello")
Dim s As String = BitConverter.ToString(b).Replace("-", "")
Afterwards I store the value of s (which is "48656C6C6F") into a database.
Later on, I want to retrieve this value from the database and convert it back to "Hello". How would I do that?
You can call the following function with your hex string and get "Hello" returned to you. Note that the function doesn't validate the input, you would need to add validation unless you can be sure the input is valid.
Private Function HexToString(ByVal hex As String) As String
Dim result As String = ""
For i As integer = 0 To hex.Length - 1 Step 2
Dim num As Integer = Convert.ToInt32(hex.Substring(i, 2), 16)
result &= Chr(num)
Next
Return result
End Function
James Thorpe points out in his comment that it would be more appropriate to use Encoding.UTF8.GetString to convert back to a string as that is the reverse of the method used to create the hex string in the first place. I agree, but as my original answer was already accepted, I hesitate to change it, so I am adding an alternative version. The note about validation of input being skipped still applies.
Private Function HexToString(ByVal hex As String) As String
Dim bytes(hex.Length \ 2 - 1) As Byte
For i As Integer = 0 To hex.Length - 1 Step 2
bytes(i \ 2) = Byte.Parse(hex.Substring(i, 2), System.Globalization.NumberStyles.HexNumber)
Next
Return System.Text.Encoding.UTF8.GetString(bytes)
End Function

Mixed Encoding to String

I have a string in VB.net that may contain something like the following:
This is a 0x000020AC symbol
This is the UTF-32 encoding for the Euro Symbol according to this article http://www.fileformat.info/info/unicode/char/20ac/index.htm
I'd like to convert this into
This is a € symbol
I've tried using UnicodeEncoding() class in VB.net (Framework 2.0, as I'm modifying a legacy application)
When I use this class to encode, and then decode I still get back the original string.
I expected that the UnicodeEncoding would recognise the already encoded part and not encode it against. But it appears to not be the case.
I'm a little lost now as to how I can convert a mixed encoded string into a normal string.
Background: When saving an Excel spreadsheet as CSV, anything outside of the ascii range gets converted to ?. So my idea is that if I can get my client to search/replace a few characters, such as the Euro symbol, into an encoded string such as 0x000020AC. Then I was hoping to convert those encoded parts back into the real symbols before I insert to a SQL database.
I've tried a function such as
Public Function Decode(ByVal s As String) As String
Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(s)
Dim output As String = ""
output = uni.GetString(encodedBytes)
Return output
End Function
Which was based on the examples on the MSDN at http://msdn.microsoft.com/en-us/library/system.text.unicodeencoding.aspx
It could be that I have a complete mis-understanding of how this works in VB.net. In C# I can simply use escaped characters such as "\u20AC". But no such thing exists in VB.net.
Based on advice from Heinzi I implemented a Regex.Replace method using the following code, this appear to work for my examples.
Public Function Decode(ByVal s As String) As String
Dim output As String = ""
Dim sRegex As String = "0x[0-9a-zA-Z]{8}"
Dim r As Regex = New Regex(sRegex)
Dim myEvaluator As MatchEvaluator = New MatchEvaluator(AddressOf HexToString)
output = r.Replace(s, myEvaluator)
Return output
End Function
Public Function HexToString(ByVal hexString As Match) As String
Dim uni As New UnicodeEncoding(True, True)
Dim input As String = hexString.ToString
input = input.Substring(2)
input = input.TrimStart("0"c)
Dim output As String
Dim length As Integer = input.Length
Dim upperBound As Integer = length \ 2
If length Mod 2 = 0 Then
upperBound -= 1
Else
input = "0" & input
End If
Dim bytes(upperBound) As Byte
For i As Integer = 0 To upperBound
bytes(i) = Convert.ToByte(input.Substring(i * 2, 2), 16)
Next
output = uni.GetString(bytes)
Return output
End Function
Have you tried:
Public Function Decode(Byval Coded as string) as string
Return StrConv(Coded, vbUnicode)
End Function
Also, your function is invalid. It takes s as an argument, does a load of stuff and then outputs the s that was put into it instead of the stuff that was processed within it.