WordProcessingDocument with MemoryStream gives corrupt results - vb.net

I'm attempting to use OpenXML for opening a word document, and after modifications, return a modified version to the user as a downloadable file. As for the downloadable portion of this, I have that covered, in that, I modify my response to simply return a word document. But, what I am having a problem with is successfully opening and modifying a word document. While I am able to provide a document to download, the file is consistently "corrupt" when I open it.
I have an extremely simple snippet that is still returning a corrupt word document. It's almost as though the file becomes corrupt the very moment I open it (since I've removed all of the modifications from this snippet).
Private Function GenerateExportWithTemplate() As MemoryStream
Dim template As Byte() = System.IO.File.ReadAllBytes(strTemplateFileName)
Using ms As New MemoryStream()
ms.Write(template, 0, template.Length)
Using doc As WordprocessingDocument = WordprocessingDocument.Open(ms, True)
'Code to modify word document.
End Using
Return ms
End Using
End Function
I don't believe I fully understand what is happening in the snippet above. Any explanation in why I am receiving a corrupt file would be greatly appreciated.
How do I successfully open and modify a WordProcessingDocument from a template file, and return as a memory stream? While there are a million examples of how to open a file and make modifications, I have yet to find one that is successful. Meaning, the file is corrupt every time. How do I do this without receiving a corrupt word document as the result?

Related

iText7 Merge of 2 PDF MemorStreams Not Working

I am upgrading some older iTextSharp code to the new iText 7 libraries. I am having a lot of trouble determining the proper way to merge 2 PDF MemoryStreams into one PDF MemoryStream that contains all the pages from both source PDF MemoryStreams. It seems simple and I think the code below is set up properly but the resulting PDF memory stream only contains the first file. The second PDF file is never present and never concatenated to the first.
I have found multiple ways documented on the Internet as the "proper" way to do the merge. The actual sample code with iText 7 seems to be unusually complex (in that is mixes multiple concepts into one sample repeatedly - as in doesn't reduce the concept to the simplest possible code), and seems to fail to demonstrate simple concepts. For instance, their PDFMerge documentation has no sample code at all in the documentation (nor does anything else I looked at in the class documentation). The examples they have online actually always mix merging from files (not MemoryStreams) with other concepts like adding page numbers or adding Table of Contents. So they never just show one concept and they never start with anything other than files. My PDFs are coming out of a database and we just need to merge them into one PDF memory stream and save it back out. My concern is that maybe I am not creating the MemoryStream properly when I initialize the PDFWriter. As none of their samples ever do anything but initial with an actual file, I was unable to confirm this was done properly. I also fully qualified all objects in the code because I want to leave the old iTextSharp code in place while I am upgrading to the new iText 7. This was done to make sure an iTextSharp object of the same name wasn't inadvertently being unknowingly used.
Also, in the interest of making the source as easy as possible to read I removed some of the declarations and initialization of objects being used. Everything was traced through and all values are fully loaded with proper values as you trace through the code. The only problem is that the second PDFMerge doesn't seem to do anything. I am assuming the problem is that I didn't prepare the PDF objects properly or that I have to do something special with the PDFWriter on the Destination PDF Document (p_pdfDocument) before the second PDF is written out with the PDFMerge object.
Dim p_bResult As Boolean = False
Dim p_bArray As Byte() = Nothing
Dim p_memStream As New System.IO.MemoryStream
Dim p_pdfWriter As New iText.Kernel.Pdf.PdfWriter(p_memStream)
Dim p_pdfDocument As New iText.Kernel.Pdf.PdfDocument(p_pdfWriter)
Dim p_pdf1Stream As New System.IO.MemoryStream(CType(p_cImage1.ImageFile, Byte()))
Dim p_pdf2Stream As New System.IO.MemoryStream(CType(p_cImage2.ImageFile, Byte()))
Dim p_pdf1Reader As New iText.Kernel.Pdf.PdfReader(p_pdf1Stream)
Dim p_pdf2Reader As New iText.Kernel.Pdf.PdfReader(p_pdf2Stream)
Dim p_pdf1Document As New iText.Kernel.Pdf.PdfDocument(p_pdf1Reader)
Dim p_pdf2Document As New iText.Kernel.Pdf.PdfDocument(p_pdf2Reader)
Dim p_pdfMerger As New iText.Kernel.Utils.PdfMerger(p_pdfDocument)
p_pdfMerger.Merge(p_pdf1Document, 1, p_pdf1Document.GetNumberOfPages())
p_pdfMerger.Merge(p_pdf2Document, 1, p_pdf2Document.GetNumberOfPages())
'Problem is here... the array only has the first PDF in it
'The second p_pdfMerger.Merge didn't seem to do anything
p_bArray = p_memStream.ToArray
p_pdf1Document.Close()
p_pdf2Document.Close()
p_pdfDocument.Close()
I expected the 2 source PDF MemoryStreams to be present in the destination MemoryStream but it only contains the first PDF in it.
Edit:
I changed the ending to...
p_pdfMerger.Merge(p_pdf1Document, 1, p_pdf1Document.GetNumberOfPages())
p_pdfMerger.Merge(p_pdf2Document, 1, p_pdf2Document.GetNumberOfPages())
p_cImage1.PageCount = p_pdfDocument.GetNumberOfPages()
p_pdfDocument.Close()
p_bArray = p_memStream.ToArray
p_pdf1Document.Close()
p_pdf2Document.Close()
Thing is that the p_pdfDocument.GetNumberOfPages() is correct but bytes are still just first PDF document when saved to database and viewed.
I tested your use case, condensing your code a bit, reading the input memory streams from files, and writing the output memory stream to a file as I don't have your database environment:
Using MemoryStream As New MemoryStream,
Pdf1MemoryStream As New MemoryStream(File.ReadAllBytes(MY_FIRST_PDF_FILE)),
Pdf2MemoryStream As New MemoryStream(File.ReadAllBytes(MY_SECOND_PDF_FILE))
Using PdfDocument As New PdfDocument(New PdfWriter(MemoryStream)),
Pdf1 As New PdfDocument(New PdfReader(Pdf1MemoryStream)),
Pdf2 As New PdfDocument(New PdfReader(Pdf2MemoryStream))
Dim Merger As New PdfMerger(PdfDocument)
Merger.Merge(Pdf1, 1, Pdf1.GetNumberOfPages)
Merger.Merge(Pdf2, 1, Pdf2.GetNumberOfPages)
End Using
Dim PdfBytes As Byte() = MemoryStream.ToArray()
Using FileStream As Stream = File.Create("TwoPdfsMergedInMemoryStream.pdf")
FileStream.Write(PdfBytes, 0, PdfBytes.Length)
End Using
End Using
As result I got the contents of both source files in TwoPdfsMergedInMemoryStream.pdf as it should be. Concerning your observation
Thing is that the p_pdfDocument.GetNumberOfPages() is correct but bytes are still just first PDF document when saved to database and viewed.
therefore, I would assume that p_bArray does contain a PDF with the contents of both your source PDFs but that there is an issue in saving to database or viewing.
To test this you could save the contents of the byte array to a file somewhere like I do above; then you can check what really is in the array.

Using the same file error

On a button click I have the following code to write what Is in my textboxes.
Dim file As System.IO.StreamWriter
file = My.Computer.FileSystem.OpenTextFileWriter("C:/Users/Nick/Documents/Dra.txt", False)
file.WriteLine(NameBasic)
file.WriteLine(LastBasic)
file.WriteLine(PhoneBasic)
file.WriteLine(NameEmer1)
On my form load, I load what is in the notepad from what was written, It is saying It is already being used(the file) which is true, how can I have two different functions(write, and read) manipulating the same file with out this error?
The process cannot access the file 'C:\Users\Nick\Documents\Dra.txt' because it is being used by another process.
And here is the code for my onformload
Dim read As System.IO.StreamReader
read = My.Computer.FileSystem.OpenTextFileReader("C:/Users/Nick/Documents/Dra.txt")
lblNameBasic.Text = read.ReadLine
I am sort of stuck on this problem, thank you
You need to close the file when you are done writing to it.
Dim file As System.IO.StreamWriter
file = My.Computer.FileSystem.OpenTextFileWriter("C:/Users/Nick/Documents/Dra.txt", False)
file.WriteLine(NameBasic)
file.WriteLine(LastBasic)
file.WriteLine(PhoneBasic)
file.WriteLine(NameEmer1)
file.Close()
To answer your question:
how can I have two different functions(write, and read) manipulating the same file with out this error?
If you really want to simultaneously read and write to the same file from two processes, you need to open the file with the FileShare.ReadWrite option. The My.Computer.FileSystem methods don't do that.¹
However, I suspect that you don't really wan't to do that. I suspect that you want to write and, after you are finished writing, you want to read. In that case, just closing the file after using it will suffice. The easiest way to do that is to use the Using statement:
Using file = My.Computer.FileSystem.OpenTextFileWriter(...)
file.WriteLine(...)
file.WriteLine(...)
...
End Using
This ensures that the file will be closed properly as soon as the Using block is left (either by normal execution or by an exception). In fact, it is good practice to wrap use of every object whose class implements IDisposable (such as StreamWriter) in a Using statement.
¹ According to the reference source, OpenTextFileWriter creates a New IO.StreamWriter(file, append, encoding), which in turn creates a new FileStream(..., FileShare.Read, ...).

Merging Documents with Open XML

I am looking for mail merge alternatives in my vb.net app. I have used the mail merge feature of word, and find that it is quite buggy when dealing with a large volume of documents. I am looking at alternate methods of generating the merge, and have come across open xml. I think this will probably be the answer I am looking for. I have come to understand that the merge will be entirely code-driven in vb.net. I have started playing around with the following code:
Dim wordprocessingDocument As WordprocessingDocument = wordprocessingDocument.Open("C:\Users\JasonB\Documents\test.docx", True)
'for each simplefield (mergefield)
For Each field In wordprocessingDocument.MainDocumentPart.Document.Body.Descendants(Of SimpleField)()
'get the document instruction values
Dim instruction As String() = field.Instruction.Value.Split(splitChar, StringSplitOptions.RemoveEmptyEntries)
'if mergefield
If instruction(0).ToLower.Equals("mergefield") Then
Dim fieldname As String = instruction(1)
For Each fieldtext In field.Descendants(Of Text)()
fieldtext.Text = "I AM TESTING"
Next
End If
wordprocessingDocument.MainDocumentPart.Document.Save()
wordprocessingDocument.Dispose()
Now this works great and all, but I am realizing that I need to create as many documents as I will have datarows (assuming I use a datatable to handle the data).
One suggestion I found was to loop through each datarow, take my document template, save it to a folder and insert the datarow data. This could mean however that I end up with 12,000 documents in a single folder that need to be joined later and converted to pdf.
Is there another option? The other thing that stood out to me is to create a new word document, and duplicate over the xml from the template, and then replace the values. I dont know however if there is a "simpler" way of doing this, thanks.
If you don't want to save all 12,000 documents to file you should be able to process, convert and email them one at a time using temporary files.
Converting the DOCX to PDF in .NET might be an issue but looks like it's possible using Word Automation (Saving Word DOCX files as PDF).
The bottom line is you don't need to generate all documents before emailing them if you perform the process one document at a time. You can use SmtpClient in VB.NET to email the PDF after it is generated.
In terms of creating the document I have seen reports generated where a simple string replace is used to replace a string such as '%FIRSTNAME%' with the person's name and so on. This isn't necessarily the best approach but can work quite well. This way you can create your template in Word or OpenOffice and then edit it in .NET using OpenXML.

Let VB Form prepare working environment in chosen directory

I am working on a GUI for a simulation program. The simulation program is a single .exe which is driven by an input file (File.inp placed in the same directory).
The Original.inp file functions as a template from which the form reads all the values into an array. Then it changes these values reflecting the changes done by the user in the form. After that it writes all the new values to File.inp.
By pushing the "Run" button the Simulation.exe file is executed.
The folder structure looks like this:
root
|
|---input
| |
| |--Original.inp
|
|---GUI.exe
|---Simulation.exe
|---File.inp
Ideally I would supply only the GUI, the user would select the working directory and then the GUI.exe would create an input folder and extract the Original.inp and Simulation.exe in the appropriate locations. So far I have only managed to include Original.inp and Simulation.exe as "EmbeddedResources" in my VB project and I have let my code create an input folder in the working directory chosen by the user.
Can someone please explain to me how I can extract the .inp and .exe file into the correct directories? I've searched on google, tried File.WriteAllBytes and Filestream.WriteByte but did not get the desired results.
The problem with File.WriteAllBytes was that I could not point to the embedded resource ("Simulation.exe is not a member of Resources" and with Filestream.WriteByte I got a 0 kb file.
The question commenters are correct, this is probably a task best left for a setup program. However, that having been stated, in the interest of answering the question as asked I offer the following approach.
Contrary to your supposition in your question comment, you do need to "read" the embedded resource from the GUI's executable file, since it's an embedded resource and not an external resource. It won't magically extract itselt from the executable file. You need to do the manual read from the assembly and write to your specified locations. To do this, you need to read the resource using .Net Reflection, via the currently executing assembly's GetManifestResourceStream method.
The Simulation.exe file is a binary file so it must be handled as such. I assumed that the Orginal.inp file was a text file since it afforded the opportunity to demonstrate different types of file reads and writes. Any error handling (and there should be plenty) is omitted for brevity.
The code could look something like this:
Imports System.IO
Imports System.Reflection
Module Module1
Sub Main()
'Determine where the GUI executable is located and save for later use
Dim thisAssembly As Assembly = Assembly.GetExecutingAssembly()
Dim appFolder As String = Path.GetDirectoryName(thisAssembly.Location)
Dim fileContents As String = String.Empty
'Read the contents of the template file. It was assumed this is in text format so a
'StreamReader, adept at reading text files, was used to read the entire file into a string
'N.B. The namespace that prefixes the file name in the next line is CRITICAL. An embedded resource
'is placed in the executable with the namespace noted in the project file, so it must be
'dereferenced in the same manner.
Using fileStream As Stream = thisAssembly.GetManifestResourceStream("SOQuestion10613051.Original.inp")
If fileStream IsNot Nothing Then
Using textStreamReader As New StreamReader(fileStream)
fileContents = textStreamReader.ReadToEnd()
textStreamReader.Close()
End Using
fileStream.Close()
End If
End Using
'Create the "input" subfolder if it doesn't already exist
Dim inputFolder As String = Path.Combine(appFolder, "input")
If Not Directory.Exists(inputFolder) Then
Directory.CreateDirectory(inputFolder)
End If
'Write the contents of the resource read above to the input sub-folder
Using writer As New StreamWriter(Path.Combine(inputFolder, "Original.inp"))
writer.Write(fileContents)
writer.Close()
End Using
'Now read the simulation executable. The same namespace issues noted above still apply.
'Since this is a binary file we use a file stream to read into a byte buffer
Dim buffer() As Byte = Nothing
Using fileStream As Stream = thisAssembly.GetManifestResourceStream("SOQuestion10613051.Simulation.exe")
If fileStream IsNot Nothing Then
ReDim buffer(fileStream.Length)
fileStream.Read(buffer, 0, fileStream.Length)
fileStream.Close()
End If
End Using
'Now write the byte buffer with the contents of the executable file to the root folder
If buffer IsNot Nothing Then
Using exeStream As New FileStream(Path.Combine(appFolder, "Simulation.exe"), FileMode.Create, FileAccess.Write, FileShare.None)
exeStream.Write(buffer, 0, buffer.Length)
exeStream.Close()
End Using
End If
End Sub
End Module
You will also have to add logic to determine if the files have been extracted so it doesn't happen every time the GUI is invoked. That's a big reason why an installation program might be the correct answer.

Create pdf file from Binary data using itextSharp

I want to create a pdf file from Binary data. I looked around and found examples using iTextSharp by fetching data from the database.
But almost all of them show how to display in the browser. I want to create a file like
pdffromDB.pdf instead of displaying as shown below
doc.Close();
Response.BinaryWrite(MemStream.GetBuffer());
Response.End();
MemStream.Close();
I would really appreciate if you can direct me to an example that will allow me to create a real pdf file.
Thanks
Assuming your MemStream already contains all the bytes making up a valid PDF file, you should be able to convince the visitor's browser to prompt to save it as a file by adding the following statement before Response.BinaryWrite:
Response.AddHeader("Content-Disposition", "attachment; filename=Whatever.pdf")
As an aside, code after Response.End generally isn't executed: your MemStream will still be closed and disposed of just fine in this case due to going out of scope, but in general you should treat Response.End the same as Exit Sub, and code accordingly, e.g.
Using ms As New IO.MemoryStream
...
Response.End()
End Using