How to make invisible text visible with iText 4 in an existing pdf - pdf

I have a PDF which is created by scanning software. One image per page and hidden OCR'ed text.
I want to remove the images and make the text visible.
I found info how to remove images (replace by another image) but found no way for making the invisible text visible.
Sample PDF with image and hidden text
I tried below method, but it does not work:
Public Shared Sub UnhideText(ByVal strFileName As String)
Dim pdf As iTextSharp.text.pdf.PdfReader = New iTextSharp.text.pdf.PdfReader(strFileName)
Dim stp As iTextSharp.text.pdf.PdfStamper = New iTextSharp.text.pdf.PdfStamper(pdf, New IO.FileStream("e:\out.pdf", IO.FileMode.Create))
'This does not work, text remains unvisible. I guess SetTextRenderingMode applies only for new added text.
For pageNumber As Integer = 1 To pdf.NumberOfPages
Dim cb As iTextSharp.text.pdf.PdfContentByte = stp.GetOverContent(pageNumber)
cb.SetTextRenderingMode(iTextSharp.text.pdf.PdfContentByte.TEXT_RENDER_MODE_FILL)
Next
stp.Close()
End Sub

Related

Stamp rotated text using itext7 in vb.net

I'm trying to convert some itextsharp code to use itext7 which stamps text on each page of a pdf at rotate 90 degrees. Unfortunately all the examples I can find are in c# and while I can use an online translator I'm having difficulties with this one.
The below code stamps my text on at the specified coords on each page of a given pdf:
Shared Sub itext7_stamp_text_on_pdf(mypdfname As String, myfoldername As String)
Dim src As String = myfoldername & "\" & mypdfname
Dim dest As String = myfoldername & "\Stamped " & mypdfname
Dim pdfDoc As PdfDocument = New PdfDocument(New PdfReader(src), New PdfWriter(dest))
Dim document As Document = New Document(pdfDoc)
Dim canvas As PdfCanvas
Dim n As Integer = pdfDoc.GetNumberOfPages()
For i = 1 To n
Dim page As PdfPage = pdfDoc.GetPage(i)
canvas = New PdfCanvas(page)
With canvas
.SetFontAndSize(PdfFontFactory.CreateFont(StandardFonts.HELVETICA), 12)
.BeginText()
.MoveText(100, 100)
.ShowText("SAMPLE TEXT 100,100")
.EndText()
End With
Next
pdfDoc.Close()
End Sub
... but I can't see a way of rotating it to 90 degrees.
There's an example here if you use a paragraph:
https://kb.itextpdf.com/home/it7kb/examples/itext-7-building-blocks-chapter-2-rootelement-examples#iText7BuildingBlocksChapter2:RootElementexamples-c02e14_showtextaligned
... but I can't seem to translate this to vb.net. I can specify where the errors I get are, but I thought I'd be better asking this general question first in case there's a way to do this without using a paragraph.
Can anyone help please?
Thanks!
Well, after some more digging this code seems to work OK on the rotation part:
Dim pdf As New PdfDocument(New PdfReader(inpdf), New PdfWriter(outpdf))
Dim document As New Document(pdf)
document.ShowTextAligned("This is some test text", 400, 750, TextAlignment.CENTER, VerticalAlignment.MIDDLE, 0.5F * CSng(Math.PI))
document.Close()
End Sub
.... but it gets hidden behind existing content, so I need a way to make sure it's set to over content.

How to find xy position of text in pdf or image

I want to have my code find the xy position of text in a pdf or image, so that I can crop the image out, this is so that I can include any diagrams that the question includes in the question (which consists of an image that text is put on top of), I am currently using the EJ2.PdfViewer from syncfusion but I am happy to use other packages that are more useful for my purposes.
My test code for reference if it will help:
Imports System
Imports Syncfusion.EJ2.PdfViewer
Module Program
Sub Main(args As String())
Dim extraction As PdfRenderer = New PdfRenderer()
extraction.Load("C:\math.pdf")
Dim textCollection As List(Of TextData) = New List(Of TextData)
Dim text As String = extraction.ExtractText(44, textCollection)
Console.WriteLine(text)
End Sub
End Module
To get position of text in a pdf , you can use some libs:
iText7: https://itextpdf.com/resources/api-documentation
Spire PDF: https://www.e-iceblue.com/Introduce/pdf-for-net-introduce.html
To get position of text in a Image:
Google Vision API: https://cloud.google.com/vision/docs/ocr

Winnovative Hides Text of TextElements on PDFs

There is a group somewhere in our organization that scans documents and converts them to PDFs. They then associate those PDFs with an "event" record and store them in a database. On demand, my application -- which uses Winnovative HTML to PDF v9.0.0.0 -- has to retrieve the PDFs associated with an event, place a header on the first page of each, and store them on the file system. This header is a TextElement.
On some PDFs, the header displays beautifully. On others, the header does not appear. However, when viewing the PDF, the header can be "highlighted" with the cursor and its text successfully copied, so the header is indeed present and properly positioned. (See the green arrow in the inserted image.)
I have identified two PDFs that were scanned by the same person thirty minutes apart and associated with the same event in the database. On one, the header is displayed; on the other, it is not. To investigate, I have set the BackColor of the TextElement to Crimson. The Text appears and doesn't appear as before, but the TextElement always appears bright red.
The properties of the two Document and PDFPage objects are identical, including the TransparencyEnabled property. This phenomenon is present in PDFs of all sorts of documents scanned by various people over time. And it's not just this header TextElement, but TextElements everywhere on the PDF (e.g. Page X of Y, watermarks). On a given PDF, if the Text of one is visible, the Text of all is visible, and vice versa.
I can find no pattern or explanation. What could be causing some PDFs to "hide" the Text (and only the Text) of all TextElements that I put on them while others don't?
Private Sub AddTitleToFirstPage(ByRef pdf As Document)
Dim headerSystemFont As New Font("Arial", 10)
Dim headerFont As PdfFont = pdf.Fonts.Add(headerSystemFont)
Dim headerTextElement As New TextElement(65, 20, "My Page Title", headerFont)
headerTextElement.TextAlign = HorizontalTextAlign.Center
headerTextElement.ForeColor = Color.DarkBlue
headerTextElement.BackColor = Color.Crimson
pdf.Pages(0).AddElement(headerTextElement)
End Sub
Friend Function UpdatePdfDoc(pdfBytes As Byte()) As Byte()
Dim bytes As Byte()
Using docStream As New MemoryStream(pdfBytes, 0, pdfBytes.Length)
Dim returnDoc As Document = New Document(docStream)
returnDoc.LicenseKey = WinnovativeLicenceKey
AddTitleToFirstPage(returnDoc)
bytes = returnDoc.Save()
docStream.Close()
End Using
Return bytes
End Function
Friend Function GetEventObjectPdfSource(scannedDocIds As List(Of String)) As Object
Dim scannedDocObjectPdfSourceList As New List(Of Byte())()
For Each scannedDocId As String In scannedDocIds
Dim scannedDocObjectPdfSource As Byte() = GetScannedDocBlobById(scannedDocId)
scannedDocObjectPdfSource = UpdatePdfDoc(scannedDocObjectPdfSource)
scannedDocObjectPdfSourceList.Add(scannedDocObjectPdfSource)
Next
Return scannedDocObjectPdfSourceList
End Function
Friend Function GetEventObjectPdf(eventId As String) As String
Dim pdfFileName As String = GetPDFFileName(eventId)
Dim scannedDocIds As List(Of String) = GetScannedDocumentsForEvent(eventId)
Dim objectPdfSourceList As List(Of Byte()) = CType(GetEventObjectPdfSource1(scannedDocIds), List(Of Byte()))
For Each objectPdfSource As Byte() In objectPdfSourceList
Using docStream As New MemoryStream(objectPdfSource, 0, objectPdfSource.Length)
Dim masterDoc As New Document(docStream)
masterDoc.LicenseKey = WinnovativeLicenceKey
Do While masterDoc.Bookmarks.Count > 0
masterDoc.Bookmarks.Remove(0)
Loop
Try
masterDoc.AutoCloseAppendedDocs = True
masterDoc.Save(pdfFileName)
Catch ex As Threading.ThreadAbortException
Threading.Thread.ResetAbort()
Finally
masterDoc.DetachStream()
masterDoc.Close()
End Try
docStream.Close()
End Using
Next
Return pdfFileName
End Function
Please forgive the clunky code. I didn't write it. I just inherited it.

Missing deselected display items at the beginning of the list box (iTextSharp)

I’m getting unwanted results in Adobe Reader DC when generating or regenerating a multi-selection list box with iTextSharp in an Acroform PDF document.
Problem: The PDF form is missing deselected display items at the beginning of the list box when viewing the modified PDF in Adobe Reader DC. For example: “One“,“Two“,“Three“,“Four“,“Five“ are list items; and “Two“ and “Four“ are selected; then the previous items such as “One” are missing the top of the list box. And the first item displayed in the list box starts with the first selection, in this case “Two”. (See Adobe Reader DC Screenshot)
FYI: Using Adobe Reader DC, when I select different field selections in the list box, and then click outside the list box, the list box field reverts back to normal appearance with all the items shown. I can’t reproduce this behavior when opening the modified PDF in Adobe Acrobat Professional 8 and all the field items are visible and correctly selected. This missing list items behavior can also be reproduced in GhostScript when converting PDF to BMP or PNG.
Please answer my question: Can you please provide me with a resolution to this issue if this is an iTextSharp problem or if my syntax is incorrect. Would you also please let me know if this behavior can reproduced using your Adobe Reader DC?
Thank you for your support!
Modified Acroform PDF Document with issue:
http://www.nk-inc.com/listbox-error.pdf
Adobe Reader DC Screenshot:
(source: nk-inc.com)
ADDITIONAL INFORMATION:
iTextSharp.dll Version: 5.5.6.0
Adobe Reader DC Version: 2015.008.20082
Adobe Acrobat Pro Version: 8.x
Form Type: Acroform PDF
VB.NET CODE (v3.5 – Windows Application):
Imports iTextSharp.text.pdf
Imports iTextSharp.text
Imports System.IO
Public Class listboxTest
Private Sub RunTest()
Dim cList As New listboxTest()
Dim fn As String = Application.StartupPath.ToString.TrimEnd("\") & "\listbox-error.pdf"
Dim b() As Byte = cList.addListBox(System.IO.File.ReadAllBytes(fn), New iTextSharp.text.Rectangle(231.67, 108.0, 395.67, 197.0), "ListBox1", "ListBox1", 1)
File.WriteAllBytes(fn, b)
Process.Start(fn)
End Sub
Public Function addListBox(ByVal pdfBytes() As Byte, ByVal newRect As Rectangle, ByVal newFldName As String, ByVal oldfldname As String, ByVal pg As Integer) As Byte()
Dim pdfReaderDoc As New PdfReader(pdfBytes)
Dim m As New System.IO.MemoryStream
Dim b() As Byte = Nothing
Try
With New PdfStamper(pdfReaderDoc, m)
Dim txtField As iTextSharp.text.pdf.TextField
txtField = New iTextSharp.text.pdf.TextField(.Writer, newRect, newFldName)
txtField.TextColor = BaseColor.BLACK
txtField.BackgroundColor = BaseColor.WHITE
txtField.BorderColor = BaseColor.BLACK
txtField.FieldName = newFldName 'ListBox1
txtField.Alignment = 0 'LEFT
txtField.BorderStyle = 0 'SOLID
txtField.BorderWidth = 1.0F 'THIN
txtField.Visibility = TextField.VISIBLE
txtField.Rotation = 0 'None
txtField.Box = newRect '231.67, 108.0, 395.67, 197.0
Dim opt As New PdfArray
Dim ListBox_ItemDisplay As New List(Of String)
ListBox_ItemDisplay.Add("One")
ListBox_ItemDisplay.Add("Two")
ListBox_ItemDisplay.Add("Three")
ListBox_ItemDisplay.Add("Four")
ListBox_ItemDisplay.Add("Five")
Dim ListBox_ItemValue As New List(Of String)
ListBox_ItemValue.Add("1X")
ListBox_ItemValue.Add("2X")
ListBox_ItemValue.Add("3X")
ListBox_ItemValue.Add("4X")
ListBox_ItemValue.Add("5X")
txtField.Options += iTextSharp.text.pdf.TextField.MULTISELECT
Dim selIndex As New List(Of Integer)
Dim selValues As New List(Of String)
selIndex.Add(CInt(1)) ' SELECT #1 (index)
selIndex.Add(CInt(3)) ' SELECT #3 (index)
txtField.Choices = ListBox_ItemDisplay.ToArray
txtField.ChoiceExports = ListBox_ItemValue.ToArray
txtField.ChoiceSelections = selIndex
Dim listField As PdfFormField = txtField.GetListField
If Not String.IsNullOrEmpty(oldfldname & "") Then
.AcroFields.RemoveField(oldfldname, pg)
End If
.AddAnnotation(listField, pg)
.Writer.CloseStream = False
.Close()
If m.CanSeek Then
m.Position = 0
End If
b = m.ToArray
m.Close()
m.Dispose()
pdfReaderDoc.Close()
End With
Return b.ToArray
Catch ex As Exception
Err.Clear()
Finally
b = Nothing
End Try
Return Nothing
End Function
End Class
The reason why the visible list starts with the second entry is that iTextSharp starts drawing the list at the first selected entry.
This is an optimization for lists which have more (probably many more) entries than can be displayed in the fixed text box area, so that the displayed entries contain at least one interesting, i.e. selected, one.
Unfortunately this optimization does not consider whether this means leaving some lines empty at the bottom, and in case of lists which fit completely into the text box, there even aren't scroll bars or anything.
But iTextSharp also offers a way to disable this optimization: You can explicitly set the first visible item manually:
txtField.ChoiceSelections = selIndex
txtField.VisibleTopChoice = 0 ' Top visible choice is start of list!
Dim listField As PdfFormField = txtField.GetListField
Adding this middle line makes the generated appearance start at the first list value.

How to save data to text file and retrieve

I'm using VB.NET. I am able to load the pics from a folder into a flowlayoutpanel. And then load the clicked picture into a separate picturebox and display the picture's filepath in a label.
Now I want to be able to add rating and description to each of the image in the flowlayoutpanel and save it to a text file in the folder from which the pictures have been loaded. The app should load be able to load the rating and description on the next launch or when the selected image is changed. How do I accomplish this?
You should probably look at accessing the metadata of the pic. This way the info you want is carried with the pic. This is contained in the PropertyItems Class, which is a property of the Image class
Here's a link to an answered question about adding a comment to a jpg. Hope this helps.
Here's an untested conversion of that code in VB.net. You'll probably have to add a reference or 2 and import a couple of namespaces, but syntactically this is correct as near as I can tell.
Public Function SetImageComment(input As Image, comment As String) As Image
Using memStream As New IO.MemoryStream()
input.Save(memStream, Imaging.ImageFormat.Jpeg)
memStream.Position = 0
Dim decoder As New JpegBitmapDecoder(memStream, BitmapCreateOptions.PreservePixelFormat, BitmapCacheOption.OnLoad)
Dim metadata As BitmapMetadata
If decoder.Metadata Is Nothing Then
metadata = New BitmapMetadata("jpg")
Else
metadata = decoder.Metadata
End If
metadata.Comment = comment
Dim bitmapFrame = decoder.Frames(0)
Dim encoder As BitmapEncoder = New JpegBitmapEncoder()
encoder.Frames.Add(bitmapFrame.Create(bitmapFrame, bitmapFrame.Thumbnail, metadata, bitmapFrame.ColorContexts))
Dim imageStream As New IO.MemoryStream
encoder.Save(imageStream)
imageStream.Position = 0
input.Dispose()
input = Nothing
Return Image.FromStream(imageStream)
End Using
End Function