iText7 Merge of 2 PDF MemorStreams Not Working - vb.net

I am upgrading some older iTextSharp code to the new iText 7 libraries. I am having a lot of trouble determining the proper way to merge 2 PDF MemoryStreams into one PDF MemoryStream that contains all the pages from both source PDF MemoryStreams. It seems simple and I think the code below is set up properly but the resulting PDF memory stream only contains the first file. The second PDF file is never present and never concatenated to the first.
I have found multiple ways documented on the Internet as the "proper" way to do the merge. The actual sample code with iText 7 seems to be unusually complex (in that is mixes multiple concepts into one sample repeatedly - as in doesn't reduce the concept to the simplest possible code), and seems to fail to demonstrate simple concepts. For instance, their PDFMerge documentation has no sample code at all in the documentation (nor does anything else I looked at in the class documentation). The examples they have online actually always mix merging from files (not MemoryStreams) with other concepts like adding page numbers or adding Table of Contents. So they never just show one concept and they never start with anything other than files. My PDFs are coming out of a database and we just need to merge them into one PDF memory stream and save it back out. My concern is that maybe I am not creating the MemoryStream properly when I initialize the PDFWriter. As none of their samples ever do anything but initial with an actual file, I was unable to confirm this was done properly. I also fully qualified all objects in the code because I want to leave the old iTextSharp code in place while I am upgrading to the new iText 7. This was done to make sure an iTextSharp object of the same name wasn't inadvertently being unknowingly used.
Also, in the interest of making the source as easy as possible to read I removed some of the declarations and initialization of objects being used. Everything was traced through and all values are fully loaded with proper values as you trace through the code. The only problem is that the second PDFMerge doesn't seem to do anything. I am assuming the problem is that I didn't prepare the PDF objects properly or that I have to do something special with the PDFWriter on the Destination PDF Document (p_pdfDocument) before the second PDF is written out with the PDFMerge object.
Dim p_bResult As Boolean = False
Dim p_bArray As Byte() = Nothing
Dim p_memStream As New System.IO.MemoryStream
Dim p_pdfWriter As New iText.Kernel.Pdf.PdfWriter(p_memStream)
Dim p_pdfDocument As New iText.Kernel.Pdf.PdfDocument(p_pdfWriter)
Dim p_pdf1Stream As New System.IO.MemoryStream(CType(p_cImage1.ImageFile, Byte()))
Dim p_pdf2Stream As New System.IO.MemoryStream(CType(p_cImage2.ImageFile, Byte()))
Dim p_pdf1Reader As New iText.Kernel.Pdf.PdfReader(p_pdf1Stream)
Dim p_pdf2Reader As New iText.Kernel.Pdf.PdfReader(p_pdf2Stream)
Dim p_pdf1Document As New iText.Kernel.Pdf.PdfDocument(p_pdf1Reader)
Dim p_pdf2Document As New iText.Kernel.Pdf.PdfDocument(p_pdf2Reader)
Dim p_pdfMerger As New iText.Kernel.Utils.PdfMerger(p_pdfDocument)
p_pdfMerger.Merge(p_pdf1Document, 1, p_pdf1Document.GetNumberOfPages())
p_pdfMerger.Merge(p_pdf2Document, 1, p_pdf2Document.GetNumberOfPages())
'Problem is here... the array only has the first PDF in it
'The second p_pdfMerger.Merge didn't seem to do anything
p_bArray = p_memStream.ToArray
p_pdf1Document.Close()
p_pdf2Document.Close()
p_pdfDocument.Close()
I expected the 2 source PDF MemoryStreams to be present in the destination MemoryStream but it only contains the first PDF in it.
Edit:
I changed the ending to...
p_pdfMerger.Merge(p_pdf1Document, 1, p_pdf1Document.GetNumberOfPages())
p_pdfMerger.Merge(p_pdf2Document, 1, p_pdf2Document.GetNumberOfPages())
p_cImage1.PageCount = p_pdfDocument.GetNumberOfPages()
p_pdfDocument.Close()
p_bArray = p_memStream.ToArray
p_pdf1Document.Close()
p_pdf2Document.Close()
Thing is that the p_pdfDocument.GetNumberOfPages() is correct but bytes are still just first PDF document when saved to database and viewed.

I tested your use case, condensing your code a bit, reading the input memory streams from files, and writing the output memory stream to a file as I don't have your database environment:
Using MemoryStream As New MemoryStream,
Pdf1MemoryStream As New MemoryStream(File.ReadAllBytes(MY_FIRST_PDF_FILE)),
Pdf2MemoryStream As New MemoryStream(File.ReadAllBytes(MY_SECOND_PDF_FILE))
Using PdfDocument As New PdfDocument(New PdfWriter(MemoryStream)),
Pdf1 As New PdfDocument(New PdfReader(Pdf1MemoryStream)),
Pdf2 As New PdfDocument(New PdfReader(Pdf2MemoryStream))
Dim Merger As New PdfMerger(PdfDocument)
Merger.Merge(Pdf1, 1, Pdf1.GetNumberOfPages)
Merger.Merge(Pdf2, 1, Pdf2.GetNumberOfPages)
End Using
Dim PdfBytes As Byte() = MemoryStream.ToArray()
Using FileStream As Stream = File.Create("TwoPdfsMergedInMemoryStream.pdf")
FileStream.Write(PdfBytes, 0, PdfBytes.Length)
End Using
End Using
As result I got the contents of both source files in TwoPdfsMergedInMemoryStream.pdf as it should be. Concerning your observation
Thing is that the p_pdfDocument.GetNumberOfPages() is correct but bytes are still just first PDF document when saved to database and viewed.
therefore, I would assume that p_bArray does contain a PDF with the contents of both your source PDFs but that there is an issue in saving to database or viewing.
To test this you could save the contents of the byte array to a file somewhere like I do above; then you can check what really is in the array.

Related

Cloning a csv file to memory and using it in a datatable? (vb.net)

I have a question about the following code. in order to prevent problems caused by file locking I came across the following code.
Dim OrignalBitmap As New Bitmap(Application.StartupPath & "\IMAGES\BACKGROUND_LARGE.jpg")
Dim CloneBitmap As New Bitmap(OrignalBitmap)
OrignalBitmap.Dispose()
Which works like a charm. Now I have all the images in place and I can still access them as a file without anything locking. It works so well for what I need that I was thinking if its possible to do this for file formats other than images such as Csv files which are then used in a datagridview as a bound table?
Usually it is enough to open a File like this, so that it will not block other programs to access and open it.
Dim path1 As String = "C:\temp\temp.csv"
Using fs As FileStream = File.Open(path1, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite)
' Do something with filestream
End Using
this will prevent even huge files to open without blocking access
you should check https://learn.microsoft.com/de-de/dotnet/api/system.io.file.open?view=netframework-4.8

WordProcessingDocument with MemoryStream gives corrupt results

I'm attempting to use OpenXML for opening a word document, and after modifications, return a modified version to the user as a downloadable file. As for the downloadable portion of this, I have that covered, in that, I modify my response to simply return a word document. But, what I am having a problem with is successfully opening and modifying a word document. While I am able to provide a document to download, the file is consistently "corrupt" when I open it.
I have an extremely simple snippet that is still returning a corrupt word document. It's almost as though the file becomes corrupt the very moment I open it (since I've removed all of the modifications from this snippet).
Private Function GenerateExportWithTemplate() As MemoryStream
Dim template As Byte() = System.IO.File.ReadAllBytes(strTemplateFileName)
Using ms As New MemoryStream()
ms.Write(template, 0, template.Length)
Using doc As WordprocessingDocument = WordprocessingDocument.Open(ms, True)
'Code to modify word document.
End Using
Return ms
End Using
End Function
I don't believe I fully understand what is happening in the snippet above. Any explanation in why I am receiving a corrupt file would be greatly appreciated.
How do I successfully open and modify a WordProcessingDocument from a template file, and return as a memory stream? While there are a million examples of how to open a file and make modifications, I have yet to find one that is successful. Meaning, the file is corrupt every time. How do I do this without receiving a corrupt word document as the result?

Visual Basic Reading Saving Images Into Access

Can someone please give me an explanation to this code, and for your information I am trying to save a picture which is displayed in a picture box and save it to the Microsoft access database, I don't understand what anything means, and especially the 0.
If Not PictureBox1.Image Is Nothing Then
Dim fsreader As New FileStream(OpenFileDialog1.FileName, FileMode.Open, FileAccess.Read)
Dim breader As New BinaryReader(fsreader)
Dim imgbuffer(fsreader.Length) As Byte
breader.Read(imgbuffer, 0, fsreader.Length)
fsreader.Close()
Your snippet code is code to save a binary of picture to imgbuffer variable in VB.Net. This snippet code didn't save this images into Ms Access. But...
I will try to explain it, how this code works :
If Not PictureBox1.Image Is Nothing Then 'This code for checking if there's any images in the picture box, if it's so run next code.
Dim fsreader As New FileStream(OpenFileDialog1.FileName, FileMode.Open, FileAccess.Read) 'This variable declared for getting file name, access file and the read mode. And open it using `Filestream` variable, so after it's opened, the length of images will be saved into `fsreader` variable
Dim breader As New BinaryReader(fsreader) 'This code for reading the binary of images from `fsreader` variable and put the image buffer from range of byte to a variable
Dim imgbuffer(fsreader.Length) As Byte 'This variable is use to collect image buffer with specified length of binary images from `fsreader` variable.
breader.Read(imgbuffer, 0, fsreader.Length) 'This code is use for reading binary of images with specified length of byte and put the image buffer into a imgbuffer variable. The number `0` in the brackets means the binary images will be put fully in imgbuffer. Why `0` ? can I use larger than `0` ? yes you can but the image will be corrupt or differ from the original, so you must read the binary images from 0 to the image file size.
fsreader.Close() 'This is for closing `fsreader` from accessing images file.
So, what's next ?
From that snippet code, you can continued it to process the imgbuffer variable to save it into Ms-Access using some library collection that provided in VB.Net, like OleDB or anything else. At last, I would prefer using MemoryStream to put image buffer and save it into Bitmap variable.
Hope this explain your snippet code.

Merging Documents with Open XML

I am looking for mail merge alternatives in my vb.net app. I have used the mail merge feature of word, and find that it is quite buggy when dealing with a large volume of documents. I am looking at alternate methods of generating the merge, and have come across open xml. I think this will probably be the answer I am looking for. I have come to understand that the merge will be entirely code-driven in vb.net. I have started playing around with the following code:
Dim wordprocessingDocument As WordprocessingDocument = wordprocessingDocument.Open("C:\Users\JasonB\Documents\test.docx", True)
'for each simplefield (mergefield)
For Each field In wordprocessingDocument.MainDocumentPart.Document.Body.Descendants(Of SimpleField)()
'get the document instruction values
Dim instruction As String() = field.Instruction.Value.Split(splitChar, StringSplitOptions.RemoveEmptyEntries)
'if mergefield
If instruction(0).ToLower.Equals("mergefield") Then
Dim fieldname As String = instruction(1)
For Each fieldtext In field.Descendants(Of Text)()
fieldtext.Text = "I AM TESTING"
Next
End If
wordprocessingDocument.MainDocumentPart.Document.Save()
wordprocessingDocument.Dispose()
Now this works great and all, but I am realizing that I need to create as many documents as I will have datarows (assuming I use a datatable to handle the data).
One suggestion I found was to loop through each datarow, take my document template, save it to a folder and insert the datarow data. This could mean however that I end up with 12,000 documents in a single folder that need to be joined later and converted to pdf.
Is there another option? The other thing that stood out to me is to create a new word document, and duplicate over the xml from the template, and then replace the values. I dont know however if there is a "simpler" way of doing this, thanks.
If you don't want to save all 12,000 documents to file you should be able to process, convert and email them one at a time using temporary files.
Converting the DOCX to PDF in .NET might be an issue but looks like it's possible using Word Automation (Saving Word DOCX files as PDF).
The bottom line is you don't need to generate all documents before emailing them if you perform the process one document at a time. You can use SmtpClient in VB.NET to email the PDF after it is generated.
In terms of creating the document I have seen reports generated where a simple string replace is used to replace a string such as '%FIRSTNAME%' with the person's name and so on. This isn't necessarily the best approach but can work quite well. This way you can create your template in Word or OpenOffice and then edit it in .NET using OpenXML.

Create pdf file from Binary data using itextSharp

I want to create a pdf file from Binary data. I looked around and found examples using iTextSharp by fetching data from the database.
But almost all of them show how to display in the browser. I want to create a file like
pdffromDB.pdf instead of displaying as shown below
doc.Close();
Response.BinaryWrite(MemStream.GetBuffer());
Response.End();
MemStream.Close();
I would really appreciate if you can direct me to an example that will allow me to create a real pdf file.
Thanks
Assuming your MemStream already contains all the bytes making up a valid PDF file, you should be able to convince the visitor's browser to prompt to save it as a file by adding the following statement before Response.BinaryWrite:
Response.AddHeader("Content-Disposition", "attachment; filename=Whatever.pdf")
As an aside, code after Response.End generally isn't executed: your MemStream will still be closed and disposed of just fine in this case due to going out of scope, but in general you should treat Response.End the same as Exit Sub, and code accordingly, e.g.
Using ms As New IO.MemoryStream
...
Response.End()
End Using