PDF fill in not merging correctly - vb.net

We are using an asp.net website with iTextSharp.dll version 5.5.13
We can merge multiple PDF files into one using a stream and it works perfectly. However, when we use a PDF that was created in a "fill-in" function the new PDF file does not correctly merge into the other documents. It merges without the filled in values. However, if I open the filled in PDF that it creates the filled in data displays and prints fine.
I have tried merging the new "filled in" PDF at a later time but it still displays the template file as though the filled in data was missing.
Below code fills in the data
Dim strFileName As String = Path.GetFileNameWithoutExtension(strSourceFile)
Dim strOutPath As String = HttpContext.Current.Server.MapPath("~/Apps/Lifetime/OfficeDocs/Export/")
newFile = strOutPath & strFileName & " " & strRONumber & ".pdf"
If Not File.Exists(newFile) Then
Dim pdfReader As PdfReader = New PdfReader(strSourceFile)
Using pdfStamper As PdfStamper = New PdfStamper(pdfReader, New FileStream(newFile, FileMode.Create))
Dim pdfFormFields As AcroFields = pdfStamper.AcroFields
pdfFormFields.SetField("CUSTOMER NAME", strCustomer)
pdfFormFields.SetField("YR MK MODEL", strVehicle)
pdfFormFields.SetField("RO#", strRONumber)
pdfStamper.FormFlattening = False
pdfStamper.Dispose()
End Using
End If
Then code below here merges multiple PDF files/paths sent to it
Public Shared Sub MergePDFs(ByVal files As List(Of String), ByVal filename As String)
'Gets a list of full path files and merges into one memory stream
'and outputs it to a browser response.
Dim MemStream As New System.IO.MemoryStream
Dim doc As New iTextSharp.text.Document
Dim reader As iTextSharp.text.pdf.PdfReader
Dim numberOfPages As Integer
Dim currentPageNumber As Integer
Dim writer As iTextSharp.text.pdf.PdfWriter = iTextSharp.text.pdf.PdfWriter.GetInstance(doc, MemStream)
doc.Open()
Dim cb As iTextSharp.text.pdf.PdfContentByte = writer.DirectContent
Dim page As iTextSharp.text.pdf.PdfImportedPage
Dim strError As String = ""
For Each strfile As String In files
reader = New iTextSharp.text.pdf.PdfReader(strfile)
numberOfPages = reader.NumberOfPages
currentPageNumber = 0
Do While (currentPageNumber < numberOfPages)
currentPageNumber += 1
doc.SetPageSize(reader.GetPageSizeWithRotation(currentPageNumber))
doc.NewPage()
page = writer.GetImportedPage(reader, currentPageNumber)
cb.AddTemplate(page, 0, 0)
Loop
Next
doc.Close()
doc.Dispose()
If MemStream Is Nothing Then
HttpContext.Current.Response.Write("No Data is available for output")
Else
HttpContext.Current.Response.Clear()
HttpContext.Current.Response.ContentType = "application/pdf"
HttpContext.Current.Response.AppendHeader("Content-Disposition", "inline; filename=" + filename)
HttpContext.Current.Response.BinaryWrite(MemStream.ToArray)
HttpContext.Current.Response.OutputStream.Flush()
HttpContext.Current.Response.OutputStream.Close()
HttpContext.Current.Response.OutputStream.Dispose()
MemStream.Close()
MemStream.Dispose()
End If
End Sub
I expect the "filled in" PDF in the list of files to retain the filled in data but it does not. Even if I try to merge the filled in file later it still comes up missing the filled in data. If I print the filled in file it looks perfect.

PdfWriter.GetImportedPage only returns you a copy of the page contents. This does not include any annotations, in particular not the widget annotations of form fields on the page at hand.
To also copy the annotations of the source pages, use the iText PdfCopy class instead. This class is designed to copy pages including all annotations. Furthermore, it includes methods to copy all pages of a source document in one step.
You have to tell the PdfCopy object to merge fields, otherwise the document-wide form structure won't be built.
As an aside, your code creates many PdfReader objects but does not close them. That may increase your memory requirements substantially.
Thus:
Public Shared Sub MergePDFsImproved(ByVal files As List(Of String), ByVal filename As String)
Using mem As New MemoryStream()
Dim readers As New List(Of PdfReader)
Using doc As New Document
Dim copy As New PdfCopy(doc, mem)
copy.SetMergeFields()
doc.Open()
For Each strfile As String In files
Dim reader As New PdfReader(strfile)
copy.AddDocument(reader)
readers.Add(reader)
Next
End Using
For Each reader As PdfReader In readers
reader.Close()
Next
HttpContext.Current.Response.Clear()
HttpContext.Current.Response.ContentType = "application/pdf"
HttpContext.Current.Response.AppendHeader("Content-Disposition", "inline; filename=" + filename)
HttpContext.Current.Response.BinaryWrite(mem.ToArray)
HttpContext.Current.Response.OutputStream.Flush()
HttpContext.Current.Response.OutputStream.Close()
HttpContext.Current.Response.OutputStream.Dispose()
End Using
End Sub
Actually I'm not sure whether it is a good idea to Close and Dispose the response output stream here, that shouldn't be the responsibility of a PDF merging method.
This is a related answer for the Java version of iText; you may want to read it for additional information. Unfortunately many links in that answer meanwhile are dead.

Related

Unable To Read Text File Because Its Open By Another Process VB.Net

I am having problems trying to read a text file which is open by another process.
After searching SO I have found a few similar questions albeit in C# and nor VB.Net which seem to refer to the fact that FileShare.ReadWrite is the key to getting this to work but yet I am still struggling with it.
This is what I have so far but nothing is appearing in TextBox1.
Dim logFileStream As FileStream = New FileStream("C:\test.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
Dim logFileReader As StreamReader = New StreamReader(logFileStream)
While Not logFileReader.EndOfStream
Dim line As String = logFileReader.ReadLine()
TextBox1.Text = line
End While
logFileReader.Close()
logFileStream.Close()
My goal is to just use the last 2 lines of what's in the file c:\test.txt and display those contents into a Label but I guess I first need to read and show the content before I can start to look at just extracting the last 2 lines.
Update:
After re-visiting the MS Docs, I have rearranged the code as below and I can now seem to read the open file into a TextBox
Dim strLogFilePath As String
Dim LogFileStream As FileStream
Dim LogFileReader As StreamReader
Dim strRowText As String
strLogFilePath = "C:\DSD\DSDPlus.srt"
LogFileStream = New FileStream(strLogFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
LogFileReader = New StreamReader(LogFileStream)
strRowText = LogFileReader.ReadToEnd()
TextBox1.Text = strRowText
LogFileReader.Close()
LogFileStream.Close()

Winnovative Hides Text of TextElements on PDFs

There is a group somewhere in our organization that scans documents and converts them to PDFs. They then associate those PDFs with an "event" record and store them in a database. On demand, my application -- which uses Winnovative HTML to PDF v9.0.0.0 -- has to retrieve the PDFs associated with an event, place a header on the first page of each, and store them on the file system. This header is a TextElement.
On some PDFs, the header displays beautifully. On others, the header does not appear. However, when viewing the PDF, the header can be "highlighted" with the cursor and its text successfully copied, so the header is indeed present and properly positioned. (See the green arrow in the inserted image.)
I have identified two PDFs that were scanned by the same person thirty minutes apart and associated with the same event in the database. On one, the header is displayed; on the other, it is not. To investigate, I have set the BackColor of the TextElement to Crimson. The Text appears and doesn't appear as before, but the TextElement always appears bright red.
The properties of the two Document and PDFPage objects are identical, including the TransparencyEnabled property. This phenomenon is present in PDFs of all sorts of documents scanned by various people over time. And it's not just this header TextElement, but TextElements everywhere on the PDF (e.g. Page X of Y, watermarks). On a given PDF, if the Text of one is visible, the Text of all is visible, and vice versa.
I can find no pattern or explanation. What could be causing some PDFs to "hide" the Text (and only the Text) of all TextElements that I put on them while others don't?
Private Sub AddTitleToFirstPage(ByRef pdf As Document)
Dim headerSystemFont As New Font("Arial", 10)
Dim headerFont As PdfFont = pdf.Fonts.Add(headerSystemFont)
Dim headerTextElement As New TextElement(65, 20, "My Page Title", headerFont)
headerTextElement.TextAlign = HorizontalTextAlign.Center
headerTextElement.ForeColor = Color.DarkBlue
headerTextElement.BackColor = Color.Crimson
pdf.Pages(0).AddElement(headerTextElement)
End Sub
Friend Function UpdatePdfDoc(pdfBytes As Byte()) As Byte()
Dim bytes As Byte()
Using docStream As New MemoryStream(pdfBytes, 0, pdfBytes.Length)
Dim returnDoc As Document = New Document(docStream)
returnDoc.LicenseKey = WinnovativeLicenceKey
AddTitleToFirstPage(returnDoc)
bytes = returnDoc.Save()
docStream.Close()
End Using
Return bytes
End Function
Friend Function GetEventObjectPdfSource(scannedDocIds As List(Of String)) As Object
Dim scannedDocObjectPdfSourceList As New List(Of Byte())()
For Each scannedDocId As String In scannedDocIds
Dim scannedDocObjectPdfSource As Byte() = GetScannedDocBlobById(scannedDocId)
scannedDocObjectPdfSource = UpdatePdfDoc(scannedDocObjectPdfSource)
scannedDocObjectPdfSourceList.Add(scannedDocObjectPdfSource)
Next
Return scannedDocObjectPdfSourceList
End Function
Friend Function GetEventObjectPdf(eventId As String) As String
Dim pdfFileName As String = GetPDFFileName(eventId)
Dim scannedDocIds As List(Of String) = GetScannedDocumentsForEvent(eventId)
Dim objectPdfSourceList As List(Of Byte()) = CType(GetEventObjectPdfSource1(scannedDocIds), List(Of Byte()))
For Each objectPdfSource As Byte() In objectPdfSourceList
Using docStream As New MemoryStream(objectPdfSource, 0, objectPdfSource.Length)
Dim masterDoc As New Document(docStream)
masterDoc.LicenseKey = WinnovativeLicenceKey
Do While masterDoc.Bookmarks.Count > 0
masterDoc.Bookmarks.Remove(0)
Loop
Try
masterDoc.AutoCloseAppendedDocs = True
masterDoc.Save(pdfFileName)
Catch ex As Threading.ThreadAbortException
Threading.Thread.ResetAbort()
Finally
masterDoc.DetachStream()
masterDoc.Close()
End Try
docStream.Close()
End Using
Next
Return pdfFileName
End Function
Please forgive the clunky code. I didn't write it. I just inherited it.

Increase page size in PDF documents to fit barcode (itextsharp)

I'm using vb.net to build a workflow where I'm processing a number of PDF files. One of the things I need to do is to place a barcode in the bottom left corner of the first page on each PDF document.
I already worked out the code to place the barcode but the problem is that it may cover existing content on the first page.
I want to increase the page size and add about 40 pixels of white space at the bottom of the first page where I can place the barcode. But I dont know how to do this!
Here is the existing code:
Public Sub addBarcodeToPdf(byval openPDFpath as string, byval savePDFpath as string, ByVal barcode As String)
Dim myPdf As PdfReader
Try
myPdf = New PdfReader(openPDFpath)
Catch ex As Exception
logEvent("LOAD PDF EXCEPTION " & ex.Message)
End Try
Dim stamper As PdfStamper = New PdfStamper(myPDF, New FileStream(savePDFpath, FileMode.Create))
Dim over As PdfContentByte = stamper.GetOverContent(1)
Dim textFont As BaseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252, BaseFont.NOT_EMBEDDED)
Dim BarcodeFont As BaseFont = BaseFont.CreateFont("c:\windows\fonts\FRE3OF9X.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED)
over.SetColorFill(BaseColor.BLACK)
over.BeginText()
over.SetFontAndSize(textFont, 15)
over.SetTextMatrix(30, 3)
over.ShowText(barcode)
over.EndText()
over.BeginText()
over.SetFontAndSize(BarcodeFont, 28)
over.SetTextMatrix(5, 16)
over.ShowText("*" & barcode & "*")
over.EndText()
stamper.Close()
myPdf.Close()
End Sub
Thank you in advance!
/M
Thank you Bruno for pointing me in the right direction. I haven't done volume testing yet but I managed to get it work on one PDF sample. Just changing the mediabox was not enough (I could only make the page size smaller) but when changing the cropbox at the same thime I got the results I was looking for.
Code in VB below for reference
Dim myPdf As PdfReader
Try
myPdf = New PdfReader(openPDFpath)
Catch ex As Exception
logEvent("LOAD PDF EXCEPTION " & ex.Message)
End Try
Dim stamper As PdfStamper = New PdfStamper(myPdf, New FileStream(savePDFpath, FileMode.Create))
Dim pageDict As PdfDictionary = myPdf.GetPageN(1)
Dim mediabox As PdfArray = pageDict.GetAsArray(PdfName.MEDIABOX)
Dim cropbox As PdfArray = pageDict.GetAsArray(PdfName.CROPBOX)
Dim pixelsToAdd As Integer = -40
mediabox.Set(1, New PdfNumber(pixelsToAdd))
cropbox.Set(1, New PdfNumber(pixelsToAdd))
stamper.Close()
myPdf.Close()
Thanks
/M

RDLC local report trying to render before loading datasource

I have created a local report that contains a few sub reports. I am trying to load the report straight to pdf based off the click of a button.
I have tried showing the report with report viewer and it shows up fine, but when I try to render directly to pdf I get an error about data source. When I debug the code I notice that my subprocessing function does run until after the button call function finishes and throws the error.
Dim reportParam As New ReportParameter("appId", appid)
Dim reportParam2 As New ReportParameter("appDate", appDate)
Dim reportParam3 As New ReportParameter("auditDate", Now)
Dim reportArray As New ReportParameterCollection
reportArray.Add(reportParam)
reportArray.Add(reportParam2)
reportArray.Add(reportParam3)
AddHandler ReportViewer1.LocalReport.SubreportProcessing, AddressOf SetSubDataSource
ReportViewer1.LocalReport.SetParameters(reportArray)
ObjectDataSource1.SelectParameters("appID").DefaultValue = appid
ObjectDataSource1.SelectParameters("appDate").DefaultValue = appDate
ObjectDataSource1.DataBind()
Dim warnings As Warning() = Nothing
Dim streamids As String() = Nothing
Dim mimeType As String = Nothing
Dim encoding As String = Nothing
Dim extension As String = Nothing
Try
ReportViewer1.DataBind()
ReportViewer1.LocalReport.Refresh()
Dim byteViewer As Byte()
byteViewer = ReportViewer1.LocalReport.Render("PDF", Nothing, mimeType, encoding, extension, streamids, warnings)
Response.Buffer = True
'Response.Clear()
Response.ContentType = mimeType
Response.AddHeader("content-disposition", "attachment; filename=test.pdf")
Response.BinaryWrite(byteViewer)
Response.OutputStream.Write(byteViewer, 0, byteViewer.Length)
Response.Flush()
Response.Close()
Thanks for your advice in advance, I just trying to figure out how to get the report to load data sources before it renders and not after this function is complete.

Render Html using RDL Local Report

I need to render html from rdl reports using the LocalReport class, I dont want to use ReportViewer for the same. Is there any way i can enable generating HTML.
As far as I know LocalReport cannot be exported to HTML (only Excel,Word and PDF are available). But if you are still interested in export you can use following
Dim Report = New LocalReport
prepare report the same way as for viewing (Datasource for RDL reports with ReportViewer)
Dim warnings As Warning() = Nothing
Dim streamids As String() = Nothing
Dim mimeType As String = Nothing
Dim encoding As String = Nothing
Dim extension As String = Nothing
Dim bytes As Byte() = Nothing
bytes = Report.Render(RenderFormat, Nothing, mimeType, encoding, extension, streamids, warnings)
Using fs As New IO.FileStream(RepPath, IO.FileMode.Create)
fs.Write(bytes, 0, bytes.Length)
fs.Close()
ReDim bytes(0)
end Using
You can get list of available extensions with Report.ListRenderingExtensions
ServerReport solution is similar, but more possible export formats is available.