Create Table of Contents using iTextSharp

Create Table of Contents using iTextSharp - vb.net

I'm working on some code that I can't make it work.
I have a program that takes multiple pdf's and merges them into one file. Now I need to create a table of contents on the first page. You can see examples of the documents below.
I would like to outsource this to someone who is an expert with iTextSharp. I don't think this will take more than an hour or two the most.
The requirements are:
The toc will be based of the bookmarks.
The toc text will be linked to the proper page so the user can click on the text to go to the page.
The existing bookmarks in sampe1.pdf must remain.
The page numbers are already calculated, so do don't have to worry about that.
The working code must be part of the VB.Net project files I give you. I've tried several snippets without luck, I would like it to just work without me having to adapt the code.
The file I generate looks like this: http://gamepacks.org/sample1.pdf
The file with toc should look like this (layout, not the font style): http://gamepacks.org/sample2.pdf
I would appreciate anyone who can help me out.
The code I used to generate sample1.pdf looks like this to give you an idea what you need to work with.
Public Sub MergePdfFiles(ByVal docList As List(Of Portal.DocumentRow), ByVal outputPath As String)
'
' http://www.vbforums.com/showthread.php?475920-Merge-Pdf-Files-and-Add-Bookmarks-to-It-(Using-iTextSharp)
'
If docList.Count = 0 Then Exit Sub
Dim tmpFile As String = "c:\STEP_1_Working.pdf"
Dim OutlineList As List(Of PdfOutline) = New List(Of PdfOutline)
Dim FirstPageIndex As Integer = 1 ' Tracks which page to link the bookmark
Dim result As Boolean = False
Dim pdfCount As Integer = 0 'total input pdf file count
Dim fileName As String = String.Empty 'current input pdf filename
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim pageCount As Integer = 0 'current input pdf page count
Dim doc As iTextSharp.text.Document = Nothing 'the output pdf document
Dim writer As PdfWriter = Nothing
Dim cb As PdfContentByte = Nothing
'Declare a variable to hold the imported pages
Dim page As PdfImportedPage = Nothing
Dim rotation As Integer = 0
'Now loop thru the input pdfs
For Each row As Portal.DocumentRow In docList
reader = New iTextSharp.text.pdf.PdfReader(row.FilePath)
' Is this the first pdf file
If (row.Name = docList(0).Name) Then
doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1), 18, 18, 18, 18)
writer = PdfWriter.GetInstance(doc, New IO.FileStream(tmpFile, IO.FileMode.Create))
' Always show the bookmarks
writer.ViewerPreferences = PdfWriter.PageModeUseOutlines
'Set metadata and open the document
With doc
.AddAuthor("Sample Title")
.AddCreationDate()
.Open()
End With
'Instantiate a PdfContentByte object
cb = writer.DirectContentUnder
End If
For i As Integer = 1 To reader.NumberOfPages
'Get the input page size
doc.SetPageSize(reader.GetPageSizeWithRotation(i))
'Create a new page on the output document
doc.NewPage()
'If it is the 1st page, we add bookmarks to the page
If i = 1 Then
If row.Parent = "" Then
Dim oline As PdfOutline = New PdfOutline(cb.RootOutline, PdfAction.GotoLocalPage(FirstPageIndex, New PdfDestination(FirstPageIndex), writer), row.Name)
Else
Dim parent As PdfOutline = Nothing
For Each tmp As PdfOutline In cb.RootOutline.Kids
If tmp.Title = row.Parent Then
parent = tmp
End If
Next
' Create new group outline
If parent Is Nothing Then
parent = New PdfOutline(cb.RootOutline, PdfAction.GotoLocalPage(FirstPageIndex, New PdfDestination(FirstPageIndex), writer), row.Parent)
End If
' Add to new parent
Dim oline As PdfOutline = New PdfOutline(parent, PdfAction.GotoLocalPage(FirstPageIndex, New PdfDestination(FirstPageIndex), writer), row.Name)
OutlineList.Add(oline)
End If
FirstPageIndex += reader.NumberOfPages
End If
'Now we get the imported page
page = writer.GetImportedPage(reader, i)
'Read the imported page's rotation
rotation = reader.GetPageRotation(i)
'Then add the imported page to the PdfContentByte object as a template based on the page's rotation
If rotation = 90 Then
cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(i).Height)
ElseIf rotation = 270 Then
cb.AddTemplate(page, 0, 1.0F, -1.0F, 0, reader.GetPageSizeWithRotation(i).Width + 60, -30)
Else
cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0)
End If
Next
Next
doc.Close()
End Sub

Related

Footer and Watermark added by iTextSharp not appearing in Edge but OK in Chrome

I create pdfs using iTextSharp and add a footnote and watermark to these using a PdfStamper. This has been working fine. Recently the footers and watermarks have not been appearing when the pdfs are viewed in MS Edge. However, if I view the same pdf in Chrome the footers and watermarks appear correctly.
I store the pdfs in blob storage in Azure.
I have recently changed iTextSharp version from V4.1.2.0 to V5.5.13.
The code for adding watermarks and footers is as follows:
Dim byteArray As Byte()
Using stream As MemoryStream = New MemoryStream
reportBlockBlob.DownloadToStream(stream)
reader = New PdfReader(CType(stream.ToArray(), Byte()))
If reader IsNot Nothing Then
Using stamper As PdfStamper = New PdfStamper(reader, stream)
Dim PageCount As Integer = reader.NumberOfPages
If bReportInvalid Then
For i As Integer = 1 To PageCount
StampWaterMark(stamper, i, "INVALID", fontReport60, 35, New text.BaseColor(70, 70, 255), reader.GetPageSizeWithRotation(i))
Next
ElseIf Not UserRoles.Contains(WaspWAVB.con.csUserRoleCertificateSignOff) Then
For i As Integer = 1 To PageCount
StampWaterMark(stamper, i, "DRAFT", fontReport60, 35, New text.BaseColor(70, 70, 255), reader.GetPageSizeWithRotation(i))
Next
End If
If bAddFooter Then
Dim sRepUCN As String = "UCN"
Dim sCopyright As String = "Copyright"
Dim yPos As Integer = 12
Dim xLeftPos As Integer = 36
Dim xMidPos As Single = 297.5
Dim xRightPos As Integer = 559
For i As Integer = 3 To PageCount - iAppendixCount
ColumnText.ShowTextAligned(stamper.GetOverContent(i), text.Element.ALIGN_LEFT, New text.Phrase(sRepUCN, fontMedium), xLeftPos, yPos, 0)
ColumnText.ShowTextAligned(stamper.GetOverContent(i), text.Element.ALIGN_CENTER, New text.Phrase(sCopyright, fontMedium), xMidPos, yPos, 0)
Dim rttnPg As Integer = reader.GetPageRotation(i)
If rttnPg <> 0 Then
xRightPos = 806
End If
ColumnText.ShowTextAligned(stamper.GetOverContent(i), text.Element.ALIGN_RIGHT, New text.Phrase(String.Format(WaspWAVB.con.csPageXofY, i - 2, PageCount - 2 - iAppendixCount), fontMedium), xRightPos, yPos, 0)
Next
End If
End Using
byteArray = stream.ToArray()
End If
End Using
reportBlockBlob.Properties.ContentType = "application/pdf"
reportBlockBlob.UploadFromByteArray(byteArray, 0, byteArray.Length)
Public Shared Sub StampWaterMark(ByRef stamper As PdfStamper,
ByVal i As Integer,
ByVal watermark As String,
ByVal font As text.Font,
ByVal angle As Single,
ByVal color As text.BaseColor,
ByVal realPageSize As text.Rectangle,
Optional ByVal rect As text.Rectangle = Nothing)
Dim gstate = New PdfGState()
gstate.FillOpacity = 0.1F
gstate.StrokeOpacity = 0.3F
stamper.GetOverContent(i).SaveState()
stamper.GetOverContent(i).SetGState(gstate)
stamper.GetOverContent(i).SetColorFill(color)
stamper.GetOverContent(i).BeginText()
Dim ps = If(rect, realPageSize)
Dim x = (ps.Right + ps.Left) / 2
Dim y = (ps.Bottom + ps.Top) / 2
ColumnText.ShowTextAligned(stamper.GetOverContent(i), text.Element.ALIGN_CENTER, New text.Phrase(watermark, font), x, y, angle)
End Sub
I have tried re-arranging the order in which the footer and watermark are applied and commenting out either the addition of watermark or footer. None of this helps.
I use the code to add a footer elsewhere in the code and this works. Where it works, the footer is applied to an individual page. Where it doesn't work I have just collected together pages stored in separate blobs and amalgamated them into one memorystream. The footer and watermark are applied to this.
What is mystifying is that the pdf works fine in Chrome but not in Edge. This works either way round - i.e. if I create it in Chrome and view it in Edge, the footers disappear and if I create it in Edge and view it in Chrome, the footers appear.
Has anyone else seen this problem and knows how to solve it?

Out of memory exception when merging multiple pdf

I have to merge multiple pdf's into one pdf. I'm using iTextSHarp 5.5.10.0 to accomplish this, and it work fine when I merge small files( 2 or more, with 20 pages each), and even work when I try to merge 2 files, one with 3000 pages and the second with 1300 pages, but when I try to merge 3 pdf's with 3000 pages each, I get an out of memory exception.
I don't know how to solve this.
I'm using fileStream and not memoryStream.
I took the code from the answer to this question:
VB.Net Merge multiple pdfs into one and export
Public Function MergePdfFiles(ByVal pdfFiles() As String, ByVal outputPath As String) As Boolean
Dim result As Boolean = False
Dim pdfCount As Integer = 0 'total input pdf file count
Dim f As Integer = 0 'pointer to current input pdf file
Dim fileName As String
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim pageCount As Integer = 0
Dim pdfDoc As iTextSharp.text.Document = Nothing 'the output pdf document
Dim writer As PdfWriter = Nothing
Dim cb As PdfContentByte = Nothing
Dim page As PdfImportedPage = Nothing
Dim rotation As Integer = 0
Try
pdfCount = pdfFiles.Length
If pdfCount > 1 Then
'Open the 1st item in the array PDFFiles
fileName = pdfFiles(f)
reader = New iTextSharp.text.pdf.PdfReader(fileName)
'Get page count
pageCount = reader.NumberOfPages
pdfDoc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1), 18, 18, 18, 18)
writer = PdfWriter.GetInstance(pdfDoc, New FileStream(outputPath, FileMode.OpenOrCreate))
With pdfDoc
.Open()
End With
'Instantiate a PdfContentByte object
cb = writer.DirectContent
'Now loop thru the input pdfs
While f < pdfCount
'Declare a page counter variable
Dim i As Integer = 0
'Loop thru the current input pdf's pages starting at page 1
While i < pageCount
i += 1
'Get the input page size
pdfDoc.SetPageSize(reader.GetPageSizeWithRotation(i))
'Create a new page on the output document
pdfDoc.NewPage()
'If it is the 1st page, we add bookmarks to the page
'Now we get the imported page
page = writer.GetImportedPage(reader, i)
'Read the imported page's rotation
rotation = reader.GetPageRotation(i)
'Then add the imported page to the PdfContentByte object as a template based on the page's rotation
If rotation = 90 Then
cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(i).Height)
ElseIf rotation = 270 Then
cb.AddTemplate(page, 0, 1.0F, -1.0F, 0, reader.GetPageSizeWithRotation(i).Width + 60, -30)
Else
cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0)
End If
End While
'Increment f and read the next input pdf file
f += 1
If f < pdfCount Then
fileName = pdfFiles(f)
reader = New iTextSharp.text.pdf.PdfReader(fileName)
pageCount = reader.NumberOfPages
End If
End While
'When all done, we close the document so that the pdfwriter object can write it to the output file
pdfDoc.Close()
result = True
End If
Catch ex As Exception
Return False
End Try
Return result
End Function
I get the Exception from this line, in the end of the while, when trying to read the second file:
reader = New iTextSharp.text.pdf.PdfReader(fileName)
How can I solve this?

ITextSharp PdfReader not reading new text in different PDFs

I have a windows services application that reads the text of PDFs using ITextSharp. I'm using a textbox to show the text of PDFs.
It works fine when it reads the first PDF, but when it reads the second PDF, the text does not change and the text is still of the first PDF. Here is my code :
dim vFileName as string
dim vFileEntries as string()
dim vPath as string = "C:\PDF"
if directory.exists(vPath) then
vFileEntries = directory.getfiles(vPath)
for each vFileName in vFileEntries
dim PR as PdfReader = new PdfReader(vFileName)
for CurrentPage as integer = 1 to PR.NumberOfPages
RichTextBox1.text = ""
dim ltestrategy as LocationTextExtractionStrategy = New LocationTextExtractionStrategy
dim currentext as string = PdfTextExtractor.GetTextFromPage(PR, CurrentPage, ltestrategy)
RichTextBox1.Text = RichTextBox1.Text + currentext
next
PR.close()
next vFileName
end if
Any help is appreciated

The way your code is set up now, it look like RichTextBox1.text will contain the text of the last page of the last pdf that got processed. The following change will bring in the text from all pages of all pdfs processed from your folder.
To make this happen you will want to change the following:
for CurrentPage as integer = 1 to PR.NumberOfPages
RichTextBox1.text = ""
dim ltestrategy as LocationTextExtractionStrategy = New LocationTextExtractionStrategy
dim currentext as string = PdfTextExtractor.GetTextFromPage(PR, CurrentPage, ltestrategy)
RichTextBox1.Text = RichTextBox1.Text + currentext
next
to:
for CurrentPage as integer = 1 to PR.NumberOfPages
currenttext = ""
dim ltestrategy as LocationTextExtractionStrategy = New LocationTextExtractionStrategy
dim currentext as string = PdfTextExtractor.GetTextFromPage(PR, CurrentPage, ltestrategy)
RichTextBox1.Text = RichTextBox1.Text + currentext
next
where you are re-initializing currentext instead of RichTextBox1.text. This will give you the text of all of the pdfs, with all of their pages, to the text box.

ReportViewer.LocalReport.Render("PDF")

I am using Report Viewer for WinForms. The problem i am having is this: I have a form that contains a form which is used to view a local report which work fine, but when I try to render the same report as a PDF, it is cut-off, but in report viewer the same report renders a report on one page. When I render to PDF it cuts it off and the part of the report that was cut-off renders on a 2nd page. So in other words, part of the same report is on page 1, and 2nd half is on 2nd page in the PDF?
Code:
Private Function GetPDfReport() As String
Dim parameters = Me.GetReportParms()
Dim query = Me.GetReportQuery()
Dim rView As Microsoft.Reporting.WinForms.ReportViewer = New Microsoft.Reporting.WinForms.ReportViewer
rView.Dock = DockStyle.Fill
rView.SetDisplayMode(DisplayMode.PrintLayout)
Dim pnl As New Panel()
pnl.Name = "pnlMain"
pnl.Location = New System.Drawing.Point(0, 25)
pnl.Size = New System.Drawing.Size(734, 478)
pnl.Controls.Add(rView)
Dim dbReader As New dbReader()
Dim ds As DataSet = dbReader.DataSet(query)
Dim rds As Microsoft.Reporting.WinForms.ReportDataSource = New Microsoft.Reporting.WinForms.ReportDataSource("DataSet1", ds.Tables(0))
rView.ProcessingMode = Microsoft.Reporting.WinForms.ProcessingMode.Local
rView.LocalReport.DataSources.Add(rds)
rView.LocalReport.ReportEmbeddedResource = "EasyDose.rptIncident.rdlc"
If Not IsNothing(parameters) Then
Dim Bound0 As Integer = parameters.GetUpperBound(0)
Dim Bound1 As Integer = parameters.GetUpperBound(1)
For index = 0 To Bound0
Dim rParameter As New ReportParameter(parameters(index, 0), parameters(index, 1))
rView.LocalReport.SetParameters(rParameter)
Next
End If
Dim ps As PageSettings = rView.GetPageSettings
ps.Margins.Top = 0 ' 10mm approx
ps.Margins.Right = 0
ps.Margins.Bottom = 0
ps.Margins.Left = 0
ps.Landscape = False
'ps.PaperSize = New PaperSize("LetterExtra", (9.275 * 100), (12 * 100)) ' Letter paper (8.5 in. by 11 in.) ' Letter extra paper (9.275 in. by 12 in.)
ps.PaperSize = New PaperSize("A4", (8.27 * 100), (11.69 * 100))
rView.RefreshReport()
Dim exePath As String = System.IO.Path.GetDirectoryName(Application.ExecutablePath)
Dim dir As New DirectoryInfo(System.IO.Path.Combine(exePath, "tmpDir"))
Dim file As New FileInfo(System.IO.Path.Combine( _
dir.FullName, String.Format("Patient_Details_{0:yyyyMMdd_hhmmss}.pdf", DateTime.Now)))
If Not dir.Exists Then
dir.Create()
End If
Dim bytes As Byte() = rView.LocalReport.Render("PDF")
Using fs As New System.IO.FileStream(file.FullName, System.IO.FileMode.Create)
fs.Write(bytes, 0, bytes.Length)
fs.Close()
End Using
Return file.FullName
End Function

are you seeing the local report in the embedded ReportViewer using the "Print Layout" option activated? That should show exactly the same output as your printed result.
If you have problems in the PDF is probably caused by the design of the report itself. Check the font, the page size and orientation, the margins, the page breaks.

uisng System.IO;
byte[] rep = reportViewer1.LocalReport.Render("pdf", deviceInfo: "");
// if a certificate warning appears just ignore and re-run
File.WriteAllBytes(filepath+filename+".pdf",rep);

how to highlight a text or word in a pdf file using iTextsharp?

I need to search a word in a existing pdf file and i want to highlight the text or word
and save the pdf file
I have an idea using PdfAnnotation.CreateMarkup we could find the position of the text and we can add bgcolor to it...but i dont know how to implement it :(
Please help me out

This is one of those "sounds easy but is actually really complicated" things. See Mark's posts here and here. Ultimately you'll probably be pointed to LocationTextExtractionStrategy. Good luck! If you actually find out how to do it post it here, there several people wondering exactly what you are wondering!

I've found how to do this, just in case someone needs to get words or sentences with locations (coordinates) from a PDF document you'll find this example Project
HERE
, I used VB.NET 2010 for this. Remember to add a reference to your iTextSharp DLL in this Project.
I added my own TextExtraction Strategy Class, based on Class LocationTextExtractionStrategy. I focused on TextChunks, because they already have these coordinates.
There are some known limitations like:
No multiple line searches (phrases), just char/s or word's or a one line sentence are allowed.
It Won't work with rotated text.
I didn't test on PDFs with landscape page orientation but i assume some modifications may be required for this.
In case you need to draw this HighLight/rectangles over a watermark you'll need to add/modify some code, but just code in the Form, this is not related to the text/locations extraction proccess.

#Jcis, I actually managed a workaround for handling multiple searches using your example as a starting point. I use your project as a reference in a c# project, and altered what it does. Instead of just highlighting I actually have it drawing a white rectangle around the search term, and then using the rectangle coordinates, place a form field. I also had to swap the contentbyte writing mode to getovercontent so that I block out the searched text entirely. What I actually did was to create a string array of search terms, and then using a for loop, I create as many different text fields as I need.
Test.Form1 formBuilder = new Test.Form1();
string[] fields = new string[] { "%AccountNumber%", "%MeterNumber%", "%EmailFieldHolder%", "%AddressFieldHolder%", "%EmptyFieldHolder%", "%CityStateZipFieldHolder%", "%emptyFieldHolder1%", "%emptyFieldHolder2%", "%emptyFieldHolder3%", "%emptyFieldHolder4%", "%emptyFieldHolder5%", "%emptyFieldHolder6%", "%emptyFieldHolder7%", "%emptyFieldHolder8%", "%SiteNameFieldHolder%", "%SiteNameFieldHolderWithExtraSpace%" };
//int a = 0;
for (int a = 0; a < fields.Length; )
{
string[] fieldNames = fields[a].Split('%');
string[] fieldName = Regex.Split(fieldNames[1], "Field");
formBuilder.PDFTextGetter(fields[a], StringComparison.CurrentCultureIgnoreCase, htmlToPdf, finalhtmlToPdf, fieldName[0]);
File.Delete(htmlToPdf);
System.Array.Clear(fieldNames, 0, 2);
System.Array.Clear(fieldName, 0, 1);
a++;
if (a == fields.Length)
{
break;
}
string[] fieldNames1 = fields[a].Split('%');
string[] fieldName1 = Regex.Split(fieldNames1[1], "Field");
formBuilder.PDFTextGetter(fields[a], StringComparison.CurrentCultureIgnoreCase, finalhtmlToPdf, htmlToPdf, fieldName1[0]);
File.Delete(finalhtmlToPdf);
System.Array.Clear(fieldNames1, 0, 2);
System.Array.Clear(fieldName1, 0, 1);
a++;
}
It bounces the PDFTextGetter function in your example back and forth between two files until I achieve the finished product. It works really well, and it would not have been possible without your initial project, so thank you for that. I also altered your VB to do the text field mapping like so;
For Each rect As iTextSharp.text.Rectangle In MatchesFound
cb.Rectangle(rect.Left, rect.Bottom + 1, rect.Width, rect.Height + 4)
Dim field As New TextField(stamper.Writer, rect, FieldName & Fields)
Dim form = stamper.AcroFields
Dim fieldKeys = form.Fields.Keys
stamper.AddAnnotation(field.GetTextField(), page)
Fields += 1
Next
Just figured I would share what I managed to do with your project as a backbone. It even increments the field names as I need them to. I also had to add a new parameter to your function, but that's not worth listing here. Thank you again for this great head start.

Thanks Jcis!
After a couple of hours of research and thinking, i found your solution, which helped me to solve my Problems.
there were 2 little bugs.
first: the stamper needs to be closed before the reader, otherwise it throws an exception.
Public Sub PDFTextGetter(ByVal pSearch As String, ByVal SC As StringComparison, ByVal SourceFile As String, ByVal DestinationFile As String)
Dim stamper As iTextSharp.text.pdf.PdfStamper = Nothing
Dim cb As iTextSharp.text.pdf.PdfContentByte = Nothing
Me.Cursor = Cursors.WaitCursor
If File.Exists(SourceFile) Then
Dim pReader As New PdfReader(SourceFile)
stamper = New iTextSharp.text.pdf.PdfStamper(pReader, New System.IO.FileStream(DestinationFile, FileMode.Create))
PB.Value = 0 : PB.Maximum = pReader.NumberOfPages
For page As Integer = 1 To pReader.NumberOfPages
Dim strategy As myLocationTextExtractionStrategy = New myLocationTextExtractionStrategy
'cb = stamper.GetUnderContent(page)
cb = stamper.GetOverContent(page)
Dim state As New PdfGState()
state.FillOpacity = 0.3F
cb.SetGState(state)
'Send some data contained in PdfContentByte, looks like the first is always cero for me and the second 100, but i'm not sure if this could change in some cases
strategy.UndercontentCharacterSpacing = cb.CharacterSpacing
strategy.UndercontentHorizontalScaling = cb.HorizontalScaling
'It's not really needed to get the text back, but we have to call this line ALWAYS,
'because it triggers the process that will get all chunks from PDF into our strategy Object
Dim currentText As String = PdfTextExtractor.GetTextFromPage(pReader, page, strategy)
'The real getter process starts in the following line
Dim MatchesFound As List(Of iTextSharp.text.Rectangle) = strategy.GetTextLocations(pSearch, SC)
'Set the fill color of the shapes, I don't use a border because it would make the rect bigger
'but maybe using a thin border could be a solution if you see the currect rect is not big enough to cover all the text it should cover
cb.SetColorFill(BaseColor.PINK)
'MatchesFound contains all text with locations, so do whatever you want with it, this highlights them using PINK color:
For Each rect As iTextSharp.text.Rectangle In MatchesFound
' cb.Rectangle(rect.Left, rect.Bottom, rect.Width, rect.Height)
cb.SaveState()
cb.SetColorFill(BaseColor.YELLOW)
cb.Rectangle(rect.Left, rect.Bottom, rect.Width, rect.Height)
cb.Fill()
cb.RestoreState()
Next
'cb.Fill()
PB.Value = PB.Value + 1
Next
stamper.Close()
pReader.Close()
End If
Me.Cursor = Cursors.Default
End Sub
second: your solution dont work, when the searched text is in the last line of the extraced text.
Public Function GetTextLocations(ByVal pSearchString As String, ByVal pStrComp As System.StringComparison) As List(Of iTextSharp.text.Rectangle)
Dim FoundMatches As New List(Of iTextSharp.text.Rectangle)
Dim sb As New StringBuilder()
Dim ThisLineChunks As List(Of TextChunk) = New List(Of TextChunk)
Dim bStart As Boolean, bEnd As Boolean
Dim FirstChunk As TextChunk = Nothing, LastChunk As TextChunk = Nothing
Dim sTextInUsedChunks As String = vbNullString
' For Each chunk As TextChunk In locationalResult
For j As Integer = 0 To locationalResult.Count - 1
Dim chunk As TextChunk = locationalResult(j)
If chunk.text.Contains(pSearchString) Then
Thread.Sleep(1)
End If
If ThisLineChunks.Count > 0 AndAlso (Not chunk.SameLine(ThisLineChunks.Last) Or j = locationalResult.Count - 1) Then
If sb.ToString.IndexOf(pSearchString, pStrComp) > -1 Then
Dim sLine As String = sb.ToString
'Check how many times the Search String is present in this line:
Dim iCount As Integer = 0
Dim lPos As Integer
lPos = sLine.IndexOf(pSearchString, 0, pStrComp)
Do While lPos > -1
iCount += 1
If lPos + pSearchString.Length > sLine.Length Then Exit Do Else lPos = lPos + pSearchString.Length
lPos = sLine.IndexOf(pSearchString, lPos, pStrComp)
Loop
'Process each match found in this Text line:
Dim curPos As Integer = 0
For i As Integer = 1 To iCount
Dim sCurrentText As String, iFromChar As Integer, iToChar As Integer
iFromChar = sLine.IndexOf(pSearchString, curPos, pStrComp)
curPos = iFromChar
iToChar = iFromChar + pSearchString.Length - 1
sCurrentText = vbNullString
sTextInUsedChunks = vbNullString
FirstChunk = Nothing
LastChunk = Nothing
'Get first and last Chunks corresponding to this match found, from all Chunks in this line
For Each chk As TextChunk In ThisLineChunks
sCurrentText = sCurrentText & chk.text
'Check if we entered the part where we had found a matching String then get this Chunk (First Chunk)
If Not bStart AndAlso sCurrentText.Length - 1 >= iFromChar Then
FirstChunk = chk
bStart = True
End If
'Keep getting Text from Chunks while we are in the part where the matching String had been found
If bStart And Not bEnd Then
sTextInUsedChunks = sTextInUsedChunks & chk.text
End If
'If we get out the matching String part then get this Chunk (last Chunk)
If Not bEnd AndAlso sCurrentText.Length - 1 >= iToChar Then
LastChunk = chk
bEnd = True
End If
'If we already have first and last Chunks enclosing the Text where our String pSearchString has been found
'then it's time to get the rectangle, GetRectangleFromText Function below this Function, there we extract the pSearchString locations
If bStart And bEnd Then
FoundMatches.Add(GetRectangleFromText(FirstChunk, LastChunk, pSearchString, sTextInUsedChunks, iFromChar, iToChar, pStrComp))
curPos = curPos + pSearchString.Length
bStart = False : bEnd = False
Exit For
End If
Next
Next
End If
sb.Clear()
ThisLineChunks.Clear()
End If
ThisLineChunks.Add(chunk)
sb.Append(chunk.text)
Next
Return FoundMatches
End Function

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Create Table of Contents using iTextSharp - vb.net

Related

Footer and Watermark added by iTextSharp not appearing in Edge but OK in Chrome

Out of memory exception when merging multiple pdf

ITextSharp PdfReader not reading new text in different PDFs

ReportViewer.LocalReport.Render("PDF")

how to highlight a text or word in a pdf file using iTextsharp?

Categories

Resources