VB.NET Save XMLdocument as UTF-8 with BOM - vb.net

I have written a test application to modify a few hundred XMLs, basically I modify a few nodes and then save the XML again.
The input XML files are UTF-8-BOM but the output is UTF-8 (as viewed in Notepad++).
The code runs on a console application in VB.NET 4.7.2 and this is my basic code:
Dim myXML As XmlDocument = New XmlDocument
Dim nodelist As XmlNodeList
Dim node As XmlNode
myXML.Load(file)
nodelist = myXML.SelectNodes("//root/row")
For Each node In nodelist
'All my code goes here
Next
myXML.Save(file)
I tried things like:
myXML.CreateXmlDeclaration("1.0", "UTF-16", "")
But that didn't work. I have been searching and it seems everyone has the exact oposite issue and checking in the MSDOC I can't see any reference to specify BOM or not:
https://learn.microsoft.com/en-us/dotnet/api/system.xml.xmldeclaration.encoding?view=netframework-4.7
The issue is that when the files are imported to the DB without BOM some characters show corruption so I really need the same encoding.
After so much reading I could try to rewrite my app using StreamWriter instead XMLDocument but if there is a workaround about that I would pretty much prefer it :). Thanks!

As suggested by #JosefZ I used this strategy:
Using writer = New XmlTextWriter(file, New UTF8Encoding(True))
Dim xmlsettings As XmlWriterSettings = New XmlWriterSettings
xmlsettings.OmitXmlDeclaration = False
xmlsettings.ConformanceLevel = ConformanceLevel.Fragment
xmlsettings = writer.Settings
myXML.Save(writer)
End Using

Related

VB.NET: Modifying non-text file as text without ruining it

I need my application to find and modify a text string in a .swp file (generated by VBA for SOLIDWORKS). If I open said file as text in Notepad++, most of the text looks like this (this is an excerpt):
Meaning there is readable text, and symbols that appear as NUL, BEL, EXT and so on, depending on selected encoding. If I make my changes via Notepad++ (finding and changing "1.38" to "1.39"), there are no issues, the file can be opened via SOLIDWORKS and is still recognized as valid. After all, I don't need to modify these non-readable bits. However, if I do the same modification in my VB.NET application,
Dim filePath As String = "D:\OneDrive\Desktop\launcher macro.swp"
Dim fileContents As String = My.Computer.FileSystem.ReadAllText(filePath, Encoding.UTF8).Replace("1.38", "1.39")
My.Computer.FileSystem.WriteAllText(filePath, fileContents, Encoding.UTF8)
then the file gets corrupted, and is no longer recognized by SOLIDWORKS. I suspect this is because ReadAllText and WriteAllText cannot handle whatever data is in these non-readable bits.
I tried many different encodings, but it seems to make no difference. I am not sure how Notepad++ does it, but I can't seem to get the same result in my VB.NET application.
Can someone advise?
Thanks to #jmcilhinney, this is a solution that worked for me - reading file as bytes, converting to string, and then saving, using ANSI formatting:
Dim file_name As String = "D:\OneDrive\Desktop\launcher macro.swp"
Dim fs As New FileStream(file_name, FileMode.Open)
Dim binary_reader As New BinaryReader(fs)
fs.Position = 0
Dim bytes() As Byte = binary_reader.ReadBytes(binary_reader.BaseStream.Length)
Dim fileContents As String = System.Text.Encoding.Default.GetString(bytes)
fileContents = fileContents.Replace("1.38", "1.39")
binary_reader.Close()
fs.Dispose()
System.IO.File.WriteAllText(file_name, fileContents, Encoding.Default)

Saving embedded resource contents to string

I am trying to copy the contents of an embedded file to a string in Visual Basic using Visual Studio 2013. I already have the resource (Settings.xml) imported and set as an embedded resource. Here is what I have:
Function GetFileContents(ByVal FileName As String) As String
Dim this As [Assembly]
Dim fileStream As IO.Stream
Dim streamReader As IO.StreamReader
Dim strContents As String
this = System.Reflection.Assembly.GetExecutingAssembly
fileStream = this.GetManifestResourceStream(FileName)
streamReader = New IO.StreamReader(fileStream)
strContents = streamReader.ReadToEnd
streamReader.Close()
Return strContents
End Function
When I try to save the contents to a string by using:
Dim contents As String = GetFileContents("Settings.xml")
I get the following error:
An unhandled exception of type 'System.ArgumentNullException' occurred in mscorlib.dll
Additional information: Value cannot be null.
Which occurs at line:
streamReader = New IO.StreamReader(fileStream)
Nothing else I've read has been very helpful, hoping someone here can tell me why I'm getting this. I'm not very good with embedded resources in vb.net.
First check fileStream that its not empty as it seems its contains nothing that's why you are getting a Null exception.
Instead of writing to file test it by using a msgBox to see it its not null.
fileStream is Nothing because no resources were specified during compilation, or because the resource is not visible to GetFileContents.
After fighting the thing for hours, I discovered I wasn't importing the resource correctly. I had to go to Project -> Properties -> Resources and add the resource from existing file there, rather than importing the file from the Solution Explorer. After adding the file correctly, I was able to write the contents to a string by simply using:
Dim myString As String = (My.Resources.Settings)
Ugh, it's always such a simple solution, not sure why I didn't try that first. Hopefully this helps someone else because I saw nothing about this anywhere else I looked.

How do you delete a file generated via webapi after returning the file as response?

I'm creating a file on the fly on a WebAPI call, and sending that file back to the client.
I think I'm misunderstanding flush/close on a FileStream:
Dim path As String = tempFolder & "\" & fileName
Dim result As New HttpResponseMessage(HttpStatusCode.OK)
Dim stream As New FileStream(path, FileMode.Open)
With result
.Content = New StreamContent(stream)
.Content.Headers.ContentDisposition = New Headers.ContentDispositionHeaderValue("attachment")
.Content.Headers.ContentDisposition.FileName = fileName
.Content.Headers.ContentType = New Headers.MediaTypeHeaderValue("application/octet-stream")
.Content.Headers.ContentLength = stream.Length
End With
'stream.Flush()
'stream.Close()
'Directory.Delete(tempFolder, True)
Return result
You can see where I've commented things out above.
Questions:
Does the stream flush/close itself?
How can I delete the tempFolder after returning the result?
On top of all this, it would be great to know how to generate the file and send it to the user without writing it to the file system first. I'm confident this is possible, but I'm not sure how. I'd love to be able to understand how to do this, and solve my current problem.
Update:
I went ahead with accepted answer, and found it to be quite simple:
Dim ReturnStream As MemoryStream = New MemoryStream()
Dim WriteStream As StreamWriter = New StreamWriter(ReturnStream)
With WriteStream
.WriteLine("...")
End With
WriteStream.Flush()
WriteStream.Close()
Dim byteArray As Byte() = ReturnStream.ToArray()
ReturnStream.Flush()
ReturnStream.Close()
Then I was able to stream the content as bytearraycontent:
With result
.Content = New ByteArrayContent(byteArray)
...
End With
On top of all this, it would be great to know how to generate the file and send it to the user without writing it to the file system first. I'm confident this is possible, but I'm not sure how. I'd love to be able to understand how to do this, and solve my current problem.
To do the same thing without writing a file to disk, you might look into the MemoryStream class. As you'd guess, it streams data from memory like the FileStream does from a file. The two main steps would be:
Take your object in memory and instead of writing it to a file, you'd serialize it into a MemoryStream using a BinaryFormatter or other method (see that topic on another StackOverflow Q here: How to convert an object to a byte array in C#).
Pass the MemoryStream to the StreamContent method, exactly the same way you're passing the FileStream now.

VB.Net XMLdocument memory management

Dim docu As New XmlDocument()
docu.load("C:\bigfile.xml")
Dim tempNode As XmlNode
tempNode = docu.SelectSingleNode("/header/type")
someArray(someIndex) = tempNode.innerText
...do something more...
I am using XmlDocument() to load a huge XML document (100~300MB)
When I open the document and read it as string, my application uses about 900MB of RAM.
I wonder why it happens and how can I prevent it ?
Note that : even, the XmlDocument does not have Dispose() to remove allocated things.
Although I need the whole string of the huge XML file in later part of the app, the /header/type.innerText is only a single word
More of source :
Private Sub setInfo(ByVal notePath As String)
Dim _NOTE As XDocument
_NOTE = XDocument.Load(notePath)
If (From node In _NOTE...<title> Select node).Value = "" Then
lvlist.Items.Add("No Title")
Else
lvlist.Items.Add((From node In _NOTE...<title> Select node).Value)
End If
lvlist.Items(lvlist.Items.Count - 1).SubItems.Add((From node In _NOTE...<group> Select node).Count)
End Sub
It reads XML document, counts tags and retrieves string value. That's all.
After having those values, _NOTE (XDocument) is of no use at that time.
XmlReader will probably solve your need. From MSDN:
Represents a reader that provides fast, noncached, forward-only access to XML data.
Right, XmlDocument does not have a Dispose method. It is put into memory. It does follow an object life cycle. So if you create the object in a Function and return the string you want then it will free itself as it losses scope from the Function.

saxon9EE - javax.xml.transform.TransformerConfigurationException: Failed to compile stylesheet

I am trying to transform our system data (in xml format) to defined output xml file.
To create this output xml, I am planning to use XSLT version 2.0. Hence I'm using Saxon EE evaluation version (SaxonEE9-5-1-2 version)
I am trying to demonstrate this software to my team thereby buy the software and use the features and functionality of XSLT 2.0, XQuery, etc
Currently I am working on a prototype project - where I am having problem.
VB.NET Transformation code is as follows:
Dim strXSLT As String = String.Empty
Dim strXML As String = String.Empty
'Retrieve data from database as string
strXML = GetData()
Dim processor As Saxon.Api.Processor = New Saxon.Api.Processor()
Dim builder As Saxon.Api.DocumentBuilder = processor.NewDocumentBuilder()
'Retrieve XSLT file content from database as string
strXSLT = GetXSLT()
'Convert XSLT string data as memorystream
Dim byteDataXSLT() As Byte
byteDataXSLT = System.Text.Encoding.UTF8.GetBytes(strXSLT)
Dim msXSLT As New System.IO.MemoryStream(byteDataXSLT,0, byteDataXSLT.Length)
'Convert XML string data as memorystream
Dim byteDataXML() As Byte
byteDataXML = System.Text.Encoding.UTF8.GetBytes(strXML)
Dim msXML As New System.IO.MemoryStream(byteDataXML,0, byteDataXML.Length)
'Save the xml file before processing
Dim fsXML As New System.IO.FileStream("C:\Temp\XML.xml", System.IO.FileMode.OpenOrCreate)
Dim byteFileXML() As Byte
byteFileXML = msXML.ToArray()
fsXML.Write(byteFileXML, 0, byteFileXML.Length)
'Save the xsl file before processing
Dim fsXSLT As New System.IO.FileStream("C:\Temp\XSL.xslt", System.IO.FileMode.OpenOrCreate)
Dim byteFileXSLT() As Byte
byteFileXSLT = msXSLT.ToArray()
fsXSLT.Write(byteFileXSLT, 0, byteFileXSLT.Length)
Dim sURI As New Uri("file:///C:/")
builder.BaseUri = sURI
Dim input As Saxon.Api.XdmNode = builder.Build(msXML)
Dim transformer As Saxon.Api.XsltTransformer = processor.NewXsltCompiler().Compile(msXSLT).Load()
transformer.InitialContextNode = input
Dim serializer As New Saxon.Api.Serializer()
serializer.SetOutputFile(strOutputFileName)
transformer.Run(serializer)
I am getting below error on below line, only when I run through program. When I do the transformation manually i am not getting this error.
Dim transformer As Saxon.Api.XsltTransformer = processor.NewXsltCompiler().Compile(msXSLT).Load()
Error Message:
saxon9ee javax.xml.transform.TransformerConfigurationException: Failed to compile stylesheet.
But when I saved the xml and xslt memorystream to filestream and do the transformation manually, we get the desired output.
We are trying to figure out the root cause of this issue, but no luck.
Any details on this error message would be really helpful and very much appreciated.
Please let me know if you need more information.
Many Thanks in advance
Raghu
Looks like you are not seeing the error messages produced by the compiler. If you aren't running from the Console, then set the ErrorList property on the XsltCompiler, and when the exception occurs, inspect and display the errors somewhere, e.g. send them to a log file or pop up a window containing them.
Without seeing the error message, I can't begin to guess what the error might be.