Winnovative Hides Text of TextElements on PDFs - pdf

There is a group somewhere in our organization that scans documents and converts them to PDFs. They then associate those PDFs with an "event" record and store them in a database. On demand, my application -- which uses Winnovative HTML to PDF v9.0.0.0 -- has to retrieve the PDFs associated with an event, place a header on the first page of each, and store them on the file system. This header is a TextElement.
On some PDFs, the header displays beautifully. On others, the header does not appear. However, when viewing the PDF, the header can be "highlighted" with the cursor and its text successfully copied, so the header is indeed present and properly positioned. (See the green arrow in the inserted image.)
I have identified two PDFs that were scanned by the same person thirty minutes apart and associated with the same event in the database. On one, the header is displayed; on the other, it is not. To investigate, I have set the BackColor of the TextElement to Crimson. The Text appears and doesn't appear as before, but the TextElement always appears bright red.
The properties of the two Document and PDFPage objects are identical, including the TransparencyEnabled property. This phenomenon is present in PDFs of all sorts of documents scanned by various people over time. And it's not just this header TextElement, but TextElements everywhere on the PDF (e.g. Page X of Y, watermarks). On a given PDF, if the Text of one is visible, the Text of all is visible, and vice versa.
I can find no pattern or explanation. What could be causing some PDFs to "hide" the Text (and only the Text) of all TextElements that I put on them while others don't?
Private Sub AddTitleToFirstPage(ByRef pdf As Document)
Dim headerSystemFont As New Font("Arial", 10)
Dim headerFont As PdfFont = pdf.Fonts.Add(headerSystemFont)
Dim headerTextElement As New TextElement(65, 20, "My Page Title", headerFont)
headerTextElement.TextAlign = HorizontalTextAlign.Center
headerTextElement.ForeColor = Color.DarkBlue
headerTextElement.BackColor = Color.Crimson
pdf.Pages(0).AddElement(headerTextElement)
End Sub
Friend Function UpdatePdfDoc(pdfBytes As Byte()) As Byte()
Dim bytes As Byte()
Using docStream As New MemoryStream(pdfBytes, 0, pdfBytes.Length)
Dim returnDoc As Document = New Document(docStream)
returnDoc.LicenseKey = WinnovativeLicenceKey
AddTitleToFirstPage(returnDoc)
bytes = returnDoc.Save()
docStream.Close()
End Using
Return bytes
End Function
Friend Function GetEventObjectPdfSource(scannedDocIds As List(Of String)) As Object
Dim scannedDocObjectPdfSourceList As New List(Of Byte())()
For Each scannedDocId As String In scannedDocIds
Dim scannedDocObjectPdfSource As Byte() = GetScannedDocBlobById(scannedDocId)
scannedDocObjectPdfSource = UpdatePdfDoc(scannedDocObjectPdfSource)
scannedDocObjectPdfSourceList.Add(scannedDocObjectPdfSource)
Next
Return scannedDocObjectPdfSourceList
End Function
Friend Function GetEventObjectPdf(eventId As String) As String
Dim pdfFileName As String = GetPDFFileName(eventId)
Dim scannedDocIds As List(Of String) = GetScannedDocumentsForEvent(eventId)
Dim objectPdfSourceList As List(Of Byte()) = CType(GetEventObjectPdfSource1(scannedDocIds), List(Of Byte()))
For Each objectPdfSource As Byte() In objectPdfSourceList
Using docStream As New MemoryStream(objectPdfSource, 0, objectPdfSource.Length)
Dim masterDoc As New Document(docStream)
masterDoc.LicenseKey = WinnovativeLicenceKey
Do While masterDoc.Bookmarks.Count > 0
masterDoc.Bookmarks.Remove(0)
Loop
Try
masterDoc.AutoCloseAppendedDocs = True
masterDoc.Save(pdfFileName)
Catch ex As Threading.ThreadAbortException
Threading.Thread.ResetAbort()
Finally
masterDoc.DetachStream()
masterDoc.Close()
End Try
docStream.Close()
End Using
Next
Return pdfFileName
End Function
Please forgive the clunky code. I didn't write it. I just inherited it.

Related

PDF fill in not merging correctly

We are using an asp.net website with iTextSharp.dll version 5.5.13
We can merge multiple PDF files into one using a stream and it works perfectly. However, when we use a PDF that was created in a "fill-in" function the new PDF file does not correctly merge into the other documents. It merges without the filled in values. However, if I open the filled in PDF that it creates the filled in data displays and prints fine.
I have tried merging the new "filled in" PDF at a later time but it still displays the template file as though the filled in data was missing.
Below code fills in the data
Dim strFileName As String = Path.GetFileNameWithoutExtension(strSourceFile)
Dim strOutPath As String = HttpContext.Current.Server.MapPath("~/Apps/Lifetime/OfficeDocs/Export/")
newFile = strOutPath & strFileName & " " & strRONumber & ".pdf"
If Not File.Exists(newFile) Then
Dim pdfReader As PdfReader = New PdfReader(strSourceFile)
Using pdfStamper As PdfStamper = New PdfStamper(pdfReader, New FileStream(newFile, FileMode.Create))
Dim pdfFormFields As AcroFields = pdfStamper.AcroFields
pdfFormFields.SetField("CUSTOMER NAME", strCustomer)
pdfFormFields.SetField("YR MK MODEL", strVehicle)
pdfFormFields.SetField("RO#", strRONumber)
pdfStamper.FormFlattening = False
pdfStamper.Dispose()
End Using
End If
Then code below here merges multiple PDF files/paths sent to it
Public Shared Sub MergePDFs(ByVal files As List(Of String), ByVal filename As String)
'Gets a list of full path files and merges into one memory stream
'and outputs it to a browser response.
Dim MemStream As New System.IO.MemoryStream
Dim doc As New iTextSharp.text.Document
Dim reader As iTextSharp.text.pdf.PdfReader
Dim numberOfPages As Integer
Dim currentPageNumber As Integer
Dim writer As iTextSharp.text.pdf.PdfWriter = iTextSharp.text.pdf.PdfWriter.GetInstance(doc, MemStream)
doc.Open()
Dim cb As iTextSharp.text.pdf.PdfContentByte = writer.DirectContent
Dim page As iTextSharp.text.pdf.PdfImportedPage
Dim strError As String = ""
For Each strfile As String In files
reader = New iTextSharp.text.pdf.PdfReader(strfile)
numberOfPages = reader.NumberOfPages
currentPageNumber = 0
Do While (currentPageNumber < numberOfPages)
currentPageNumber += 1
doc.SetPageSize(reader.GetPageSizeWithRotation(currentPageNumber))
doc.NewPage()
page = writer.GetImportedPage(reader, currentPageNumber)
cb.AddTemplate(page, 0, 0)
Loop
Next
doc.Close()
doc.Dispose()
If MemStream Is Nothing Then
HttpContext.Current.Response.Write("No Data is available for output")
Else
HttpContext.Current.Response.Clear()
HttpContext.Current.Response.ContentType = "application/pdf"
HttpContext.Current.Response.AppendHeader("Content-Disposition", "inline; filename=" + filename)
HttpContext.Current.Response.BinaryWrite(MemStream.ToArray)
HttpContext.Current.Response.OutputStream.Flush()
HttpContext.Current.Response.OutputStream.Close()
HttpContext.Current.Response.OutputStream.Dispose()
MemStream.Close()
MemStream.Dispose()
End If
End Sub
I expect the "filled in" PDF in the list of files to retain the filled in data but it does not. Even if I try to merge the filled in file later it still comes up missing the filled in data. If I print the filled in file it looks perfect.
PdfWriter.GetImportedPage only returns you a copy of the page contents. This does not include any annotations, in particular not the widget annotations of form fields on the page at hand.
To also copy the annotations of the source pages, use the iText PdfCopy class instead. This class is designed to copy pages including all annotations. Furthermore, it includes methods to copy all pages of a source document in one step.
You have to tell the PdfCopy object to merge fields, otherwise the document-wide form structure won't be built.
As an aside, your code creates many PdfReader objects but does not close them. That may increase your memory requirements substantially.
Thus:
Public Shared Sub MergePDFsImproved(ByVal files As List(Of String), ByVal filename As String)
Using mem As New MemoryStream()
Dim readers As New List(Of PdfReader)
Using doc As New Document
Dim copy As New PdfCopy(doc, mem)
copy.SetMergeFields()
doc.Open()
For Each strfile As String In files
Dim reader As New PdfReader(strfile)
copy.AddDocument(reader)
readers.Add(reader)
Next
End Using
For Each reader As PdfReader In readers
reader.Close()
Next
HttpContext.Current.Response.Clear()
HttpContext.Current.Response.ContentType = "application/pdf"
HttpContext.Current.Response.AppendHeader("Content-Disposition", "inline; filename=" + filename)
HttpContext.Current.Response.BinaryWrite(mem.ToArray)
HttpContext.Current.Response.OutputStream.Flush()
HttpContext.Current.Response.OutputStream.Close()
HttpContext.Current.Response.OutputStream.Dispose()
End Using
End Sub
Actually I'm not sure whether it is a good idea to Close and Dispose the response output stream here, that shouldn't be the responsibility of a PDF merging method.
This is a related answer for the Java version of iText; you may want to read it for additional information. Unfortunately many links in that answer meanwhile are dead.

Adding images into next available picturebox

I'm doing a deck builder project through a card database and so far when I click a row (using datagridview), the value contained in the "image_url" column is printed into an invisible textbox, which is then used to download that image and show it in a picturebox.
Now that all works fine, but decks go up to 60 cards so I'm going to use 60 pictureboxes to print the user's selected cards. What I'm been trying to do is set up like a picturecount and when they select a column the number is increased by one like this:
picturebox(picturecount) = textbox4.text
but I've run into too many errors. Would you know a way to display the user's selected card in the next available picturebox? For example if they select "Dark Magician" three times, then the image of the "Dark Magician" is printed in the first available three pictureboxes
VB.NET:
Private Async Sub PictureLoader()
Dim imageURL As String
If TextBox4.Text = "" Then
imageURL = dataSet.Tables("YGO cards").Rows(row_count).Item(7)
Else
imageURL = TextBox4.Text
End If
Dim client As Net.WebClient = New Net.WebClient()
Dim ms As MemoryStream = New MemoryStream(Await client.DownloadDataTaskAsync(New Uri(imageURL)))
Using image As Image = Image.FromStream(ms)
PictureBox1.Image?.Dispose()
PictureBox1.Image = DirectCast(image.Clone(), Image)
End Using
ms.Dispose()
client.Dispose()
End Sub
and this is the event when a column is selected in the datagrid!
Dim index As Integer
index = e.RowIndex
Dim selectedrow As DataGridViewRow
selectedrow = DataGridView1.Rows(index)
' selectedrow.Cells(1) is the image_Url column
TextBox4.Text = selectedrow.Cells(1).Value.ToString()
If TextBox4.Text = "" Then
PictureBox1.Image = Nothing
' imageURL = dataSet.Tables("YGO cards").Rows(row_count).Item(7)
Else
PictureLoader()
End If
Ignore for a moment the specifics of your particular problem and break this down into a generic statement. What you're saying is that you have a collection and that the size of the collection can grow or shrink based on user input. This is an ideal case for a List(Of T) where you declare the List by specifying the data type of the items it will hold and then add items as needed. Because you are storing the URL of the card, you would create a new List(Of String):
Dim cards As List(Of String) = New List(Of String)
Now whenever you needed to add URLs to your list you would call the Add method if it is a single URL or AddRange if it is multiple URLs:
cards.Add(TextBox1.Text)
'Or
cards.AddRange({TextBox1.Text, TextBox2.Text, TextBox3.Text})
As far as displaying the image in the PictureBox, there's really no need to create a MemoryStream and clone an Image considering that the PictureBox class has the Load and LoadAsync (which it looks like you want asynchronous capabilities) methods. But if you wanted to create a PictureBox for every item in your collection, you will need to iterate through the collection, create a new PictureBox, call the Load or Load Async method on the currently iterated URL, and then add it to the Form (or a container in general). This can be done using a traditional For/Each loop:
'Create a placeholder variable
Dim cardPictureBox As PictureBox
'Loop through every selected card URL
For Each url As String In Cards
'Create a new PictureBox
cardPictureBox = New PictureBox() With {
.Size = New Size(100, 100)
.SizeMode = PictureBoxSizeMode.CenterImage
.WaitOnLoad = False
}
'Add the PictureBox to the Form
Me.Controls.Add(cardPictureBox)
'Load the image asynchronously
cardPictureBox.LoadAsync(url)
Next

How to make invisible text visible with iText 4 in an existing pdf

I have a PDF which is created by scanning software. One image per page and hidden OCR'ed text.
I want to remove the images and make the text visible.
I found info how to remove images (replace by another image) but found no way for making the invisible text visible.
Sample PDF with image and hidden text
I tried below method, but it does not work:
Public Shared Sub UnhideText(ByVal strFileName As String)
Dim pdf As iTextSharp.text.pdf.PdfReader = New iTextSharp.text.pdf.PdfReader(strFileName)
Dim stp As iTextSharp.text.pdf.PdfStamper = New iTextSharp.text.pdf.PdfStamper(pdf, New IO.FileStream("e:\out.pdf", IO.FileMode.Create))
'This does not work, text remains unvisible. I guess SetTextRenderingMode applies only for new added text.
For pageNumber As Integer = 1 To pdf.NumberOfPages
Dim cb As iTextSharp.text.pdf.PdfContentByte = stp.GetOverContent(pageNumber)
cb.SetTextRenderingMode(iTextSharp.text.pdf.PdfContentByte.TEXT_RENDER_MODE_FILL)
Next
stp.Close()
End Sub

Missing deselected display items at the beginning of the list box (iTextSharp)

I’m getting unwanted results in Adobe Reader DC when generating or regenerating a multi-selection list box with iTextSharp in an Acroform PDF document.
Problem: The PDF form is missing deselected display items at the beginning of the list box when viewing the modified PDF in Adobe Reader DC. For example: “One“,“Two“,“Three“,“Four“,“Five“ are list items; and “Two“ and “Four“ are selected; then the previous items such as “One” are missing the top of the list box. And the first item displayed in the list box starts with the first selection, in this case “Two”. (See Adobe Reader DC Screenshot)
FYI: Using Adobe Reader DC, when I select different field selections in the list box, and then click outside the list box, the list box field reverts back to normal appearance with all the items shown. I can’t reproduce this behavior when opening the modified PDF in Adobe Acrobat Professional 8 and all the field items are visible and correctly selected. This missing list items behavior can also be reproduced in GhostScript when converting PDF to BMP or PNG.
Please answer my question: Can you please provide me with a resolution to this issue if this is an iTextSharp problem or if my syntax is incorrect. Would you also please let me know if this behavior can reproduced using your Adobe Reader DC?
Thank you for your support!
Modified Acroform PDF Document with issue:
http://www.nk-inc.com/listbox-error.pdf
Adobe Reader DC Screenshot:
(source: nk-inc.com)
ADDITIONAL INFORMATION:
iTextSharp.dll Version: 5.5.6.0
Adobe Reader DC Version: 2015.008.20082
Adobe Acrobat Pro Version: 8.x
Form Type: Acroform PDF
VB.NET CODE (v3.5 – Windows Application):
Imports iTextSharp.text.pdf
Imports iTextSharp.text
Imports System.IO
Public Class listboxTest
Private Sub RunTest()
Dim cList As New listboxTest()
Dim fn As String = Application.StartupPath.ToString.TrimEnd("\") & "\listbox-error.pdf"
Dim b() As Byte = cList.addListBox(System.IO.File.ReadAllBytes(fn), New iTextSharp.text.Rectangle(231.67, 108.0, 395.67, 197.0), "ListBox1", "ListBox1", 1)
File.WriteAllBytes(fn, b)
Process.Start(fn)
End Sub
Public Function addListBox(ByVal pdfBytes() As Byte, ByVal newRect As Rectangle, ByVal newFldName As String, ByVal oldfldname As String, ByVal pg As Integer) As Byte()
Dim pdfReaderDoc As New PdfReader(pdfBytes)
Dim m As New System.IO.MemoryStream
Dim b() As Byte = Nothing
Try
With New PdfStamper(pdfReaderDoc, m)
Dim txtField As iTextSharp.text.pdf.TextField
txtField = New iTextSharp.text.pdf.TextField(.Writer, newRect, newFldName)
txtField.TextColor = BaseColor.BLACK
txtField.BackgroundColor = BaseColor.WHITE
txtField.BorderColor = BaseColor.BLACK
txtField.FieldName = newFldName 'ListBox1
txtField.Alignment = 0 'LEFT
txtField.BorderStyle = 0 'SOLID
txtField.BorderWidth = 1.0F 'THIN
txtField.Visibility = TextField.VISIBLE
txtField.Rotation = 0 'None
txtField.Box = newRect '231.67, 108.0, 395.67, 197.0
Dim opt As New PdfArray
Dim ListBox_ItemDisplay As New List(Of String)
ListBox_ItemDisplay.Add("One")
ListBox_ItemDisplay.Add("Two")
ListBox_ItemDisplay.Add("Three")
ListBox_ItemDisplay.Add("Four")
ListBox_ItemDisplay.Add("Five")
Dim ListBox_ItemValue As New List(Of String)
ListBox_ItemValue.Add("1X")
ListBox_ItemValue.Add("2X")
ListBox_ItemValue.Add("3X")
ListBox_ItemValue.Add("4X")
ListBox_ItemValue.Add("5X")
txtField.Options += iTextSharp.text.pdf.TextField.MULTISELECT
Dim selIndex As New List(Of Integer)
Dim selValues As New List(Of String)
selIndex.Add(CInt(1)) ' SELECT #1 (index)
selIndex.Add(CInt(3)) ' SELECT #3 (index)
txtField.Choices = ListBox_ItemDisplay.ToArray
txtField.ChoiceExports = ListBox_ItemValue.ToArray
txtField.ChoiceSelections = selIndex
Dim listField As PdfFormField = txtField.GetListField
If Not String.IsNullOrEmpty(oldfldname & "") Then
.AcroFields.RemoveField(oldfldname, pg)
End If
.AddAnnotation(listField, pg)
.Writer.CloseStream = False
.Close()
If m.CanSeek Then
m.Position = 0
End If
b = m.ToArray
m.Close()
m.Dispose()
pdfReaderDoc.Close()
End With
Return b.ToArray
Catch ex As Exception
Err.Clear()
Finally
b = Nothing
End Try
Return Nothing
End Function
End Class
The reason why the visible list starts with the second entry is that iTextSharp starts drawing the list at the first selected entry.
This is an optimization for lists which have more (probably many more) entries than can be displayed in the fixed text box area, so that the displayed entries contain at least one interesting, i.e. selected, one.
Unfortunately this optimization does not consider whether this means leaving some lines empty at the bottom, and in case of lists which fit completely into the text box, there even aren't scroll bars or anything.
But iTextSharp also offers a way to disable this optimization: You can explicitly set the first visible item manually:
txtField.ChoiceSelections = selIndex
txtField.VisibleTopChoice = 0 ' Top visible choice is start of list!
Dim listField As PdfFormField = txtField.GetListField
Adding this middle line makes the generated appearance start at the first list value.

PDFSharp Export JPG - ASP.NET

I am using an ajaxfileupload control to upload a pdf file to the server. On the server side, I'd like to convert the pdf to jpg. Using the PDFsharp Sample: Export Images as a guide, I've got the following:
Imports System
Imports System.Drawing
Imports System.Drawing.Imaging
Imports PdfSharp.Pdf
Imports System.IO
Imports PdfSharp.Pdf.IO
Imports PdfSharp.Pdf.Advanced
Namespace Tools
Public Module ConvertImage
Public Sub pdf2JPG(pdfFile As String, jpgFile As String)
pdfFile = System.Web.HttpContext.Current.Request.PhysicalApplicationPath & "upload\" & pdfFile
Dim document As PdfSharp.Pdf.PdfDocument = PdfReader.Open(pdfFile)
Dim imageCount As Integer = 0
' Iterate pages
For Each page As PdfPage In document.Pages
' Get resources dictionary
Dim resources As PdfDictionary = page.Elements.GetDictionary("/Resources")
If resources IsNot Nothing Then
' Get external objects dictionary
Dim xObjects As PdfDictionary = resources.Elements.GetDictionary("/XObject")
If xObjects IsNot Nothing Then
Dim items As ICollection(Of PdfItem) = xObjects.Elements.Values
' Iterate references to external objects
For Each item As PdfItem In items
Dim reference As PdfReference = TryCast(item, PdfReference)
If reference IsNot Nothing Then
Dim xObject As PdfDictionary = TryCast(reference.Value, PdfDictionary)
' Is external object an image?
If xObject IsNot Nothing AndAlso xObject.Elements.GetString("/Subtype") = "/Image" Then
ExportImage(xObject, imageCount)
End If
End If
Next
End If
End If
Next
End Sub
Private Sub ExportImage(image As PdfDictionary, ByRef count As Integer)
Dim filter As String = image.Elements.GetName("/Filter")
Select Case filter
Case "/DCTDecode"
ExportJpegImage(image, count)
Exit Select
Case "/FlateDecode"
ExportAsPngImage(image, count)
Exit Select
End Select
End Sub
Private Sub ExportJpegImage(image As PdfDictionary, ByRef count As Integer)
' Fortunately JPEG has native support in PDF and exporting an image is just writing the stream to a file.
Dim stream As Byte() = image.Stream.Value
Dim fs As New FileStream([String].Format("Image{0}.jpeg", System.Math.Max(System.Threading.Interlocked.Increment(count), count - 1)), FileMode.Create, FileAccess.Write)
Dim bw As New BinaryWriter(fs)
bw.Write(stream)
bw.Close()
End Sub
Private Sub ExportAsPngImage(image As PdfDictionary, ByRef count As Integer)
Dim width As Integer = image.Elements.GetInteger(PdfImage.Keys.Width)
Dim height As Integer = image.Elements.GetInteger(PdfImage.Keys.Height)
Dim bitsPerComponent As Integer = image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent)
' TODO: You can put the code here that converts vom PDF internal image format to a Windows bitmap
' and use GDI+ to save it in PNG format.
' It is the work of a day or two for the most important formats. Take a look at the file
' PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.
' We don't need that feature at the moment and therefore will not implement it.
' If you write the code for exporting images I would be pleased to publish it in a future release
' of PDFsharp.
End Sub
End Module
End Namespace
As I debug, it blows up on Dim filter As String = image.Elements.GetName("/Filter") in ExportImage. The message is:
Unhandled exception at line 336, column 21 in ~:46138/ScriptResource.axd?d=LGq0ri4wlMGBKd-1vxLjtxNH_pd26HaruaEG_1eWx-epwPmhNKVpO8IpfHoIHzVj2Arxn5804quRprX3HtHb0OmkZFRocFIG-7a-SJYT_EwYUd--x9AHktpraSBgoZk4VJ1RMtFNwl1mULDLid5o5U9iBcuDi4EQpbpswgBn_oI1&t=ffffffffda74082d
0x800a139e - JavaScript runtime error: error raising upload complete event and start new upload
Any thoughts on what the issue might be? It seems an issue with the ajaxfileupload control, but I don't understand why it would be barking here. It's neither here nor there, but I know I'm not using jpgFile yet.
PDFsharp cannot create JPEG Images from PDF pages:
http://pdfsharp.net/wiki/PDFsharpFAQ.ashx#Can_PDFsharp_show_PDF_files_Print_PDF_files_Create_images_from_PDF_files_3
The sample you refer to can extract JPEG Images that are included in PDF files. That's all. The sample does not cover all possible cases.
Long story short: the code you are showing seems unrelated to the error message. And it seems unrelated to your intended goal.