ITextSharp PdfReader not reading new text in different PDFs - vb.net

I have a windows services application that reads the text of PDFs using ITextSharp. I'm using a textbox to show the text of PDFs.
It works fine when it reads the first PDF, but when it reads the second PDF, the text does not change and the text is still of the first PDF. Here is my code :
dim vFileName as string
dim vFileEntries as string()
dim vPath as string = "C:\PDF"
if directory.exists(vPath) then
vFileEntries = directory.getfiles(vPath)
for each vFileName in vFileEntries
dim PR as PdfReader = new PdfReader(vFileName)
for CurrentPage as integer = 1 to PR.NumberOfPages
RichTextBox1.text = ""
dim ltestrategy as LocationTextExtractionStrategy = New LocationTextExtractionStrategy
dim currentext as string = PdfTextExtractor.GetTextFromPage(PR, CurrentPage, ltestrategy)
RichTextBox1.Text = RichTextBox1.Text + currentext
next
PR.close()
next vFileName
end if
Any help is appreciated

The way your code is set up now, it look like RichTextBox1.text will contain the text of the last page of the last pdf that got processed. The following change will bring in the text from all pages of all pdfs processed from your folder.
To make this happen you will want to change the following:
for CurrentPage as integer = 1 to PR.NumberOfPages
RichTextBox1.text = ""
dim ltestrategy as LocationTextExtractionStrategy = New LocationTextExtractionStrategy
dim currentext as string = PdfTextExtractor.GetTextFromPage(PR, CurrentPage, ltestrategy)
RichTextBox1.Text = RichTextBox1.Text + currentext
next
to:
for CurrentPage as integer = 1 to PR.NumberOfPages
currenttext = ""
dim ltestrategy as LocationTextExtractionStrategy = New LocationTextExtractionStrategy
dim currentext as string = PdfTextExtractor.GetTextFromPage(PR, CurrentPage, ltestrategy)
RichTextBox1.Text = RichTextBox1.Text + currentext
next
where you are re-initializing currentext instead of RichTextBox1.text. This will give you the text of all of the pdfs, with all of their pages, to the text box.

Related

How to match last line of text file in treeview and display text boxes in vb.net?

I have a problem with a sorted TreeView. I select the last line of a text file, then I extract from this text file the last child node added in the TreeView. Where the shoe pinch is that I can't do it! I have tried with the number of lines in this file, but no results. In fact, I do a bit of everything (not of course) to get the selected node to coincide in the treeview and the displays in the text boxes. Below is a screenshot and my code! I don't know if I made myself understood correctly, my English is translated English. Thank you. Claude.
Dim NbLine As Integer = 0
Dim SR As System.IO.StreamReader = New System.IO.StreamReader(OuvrirFichier)
While Not SR.EndOfStream
SR.ReadLine()
NbLine += 1
End While
SR.Close()
Dim lastLine As String = File.ReadLines(OuvrirFichier, Encoding.UTF8) _
.Where(Function(f As String) (Not String.IsNullOrEmpty(f))).Last.ToString
Dim mytext As String = lastLine.Substring(17, 90)
If NbLine > 0 Then
Dim lignesDuFichier As String() = File.ReadAllLines(OuvrirFichier, Encoding.UTF8)
Dim derniereLigne As String = lignesDuFichier(lignesDuFichier.Length - 1)
TreeView1.Focus()
TreeView1.SelectedNode = TreeView1.Nodes(0).Nodes(lignesDuFichier.Length - 1)
End If
Comments in line.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim OuvrirFichier = "C:\Users\maryo\Desktop\Code\Test Empty Line.txt" '"path to file"
'At least you will only be reading the file once
Dim AllLines = File.ReadAllLines(OuvrirFichier)
Dim LinesWithContent = AllLines.Where(Function(s) s.Trim() <> String.Empty)
Dim lastLine = LinesWithContent.Last
Dim mytext As String = lastLine.Substring(17, 90)
Debug.Print(mytext) 'Just checking that you get what was expected
Dim NbLine = AllLines.Length
Dim derniereLigne As String = AllLines(NbLine - 1) 'Another variable to hold last line???
'But this time it could be a blank line.
TreeView1.Focus()
'This makes no sense. An index of a subNode base on the number of lines in the text file
'is supposed to be the SelectedNode
'Why would this be the last node added?
TreeView1.SelectedNode = TreeView1.Nodes(0).Nodes(NbLine - 1)
'You never test the equality of the SelectedNode with mytext
End Sub

Showing the difference between two RichTextBox controls

I'm trying to compare both richtextbox text and show the difference into the 3rd richtextbox. After i do some changes to the code that i get from this forum, it still have some problems, which is there are words that are no different showing out at my 3rd richtextbox.... the right hand side of the rich text box is from a text file that have been checked in regex function before displayed in the box.
this is the source code that use for compare:
Dim txt1(DispBox.Text.Split(" ").Length) As String
Dim txt2(DispBox2.Text.Split(" ").Length) As String
txt1 = DispBox.Text.Split(" ")
txt2 = DispBox2.Text.Split(" ")
Dim diff1 As String = "" 'Differences between 1 and 2
Dim diff2 As String = "" 'Differences between 2 and 1
Dim diffPosition As Integer ' Set where begin to find and select in RichTextBox
diffPosition = 1 ' Initialize
For Each diff As String In txt1
If Array.IndexOf(txt2, diff.ToString) = -1 Then
diff1 += diff.ToString & " "
With DispBox
.Find(diff, diffPosition, RichTextBoxFinds.None) ' Find and select diff in RichTextBox1 starting from position diffPosition in RichtextBox1
.SelectionFont = New Font(.Font, FontStyle.Bold) ' Set diff in Bold
.SelectionColor = Color.Blue ' Set diff in blue instead of black
.SelectionBackColor = Color.Yellow ' highlight in yellow
End With
End If
diffPosition = diffPosition + Len(diff) ' re-Initialize diffPostion to avoid to find and select the same text present more than once
Next
DispBox3.Visible = True
DispBox3.Text = diff1
this is my upload button code to check the regex function
Dim result As DialogResult = OpenFileDialog1.ShowDialog()
' Test result.
If result = Windows.Forms.DialogResult.OK Then
' Get the file name.
Dim path As String = OpenFileDialog1.FileName
Try
' Read in text.
Dim text As String = File.ReadAllText(path)
Dim postupload As String = Regex.Replace(text, "!", "")
DispBox2.Text = postupload
' For debugging.
Me.Text = text.Length.ToString
Catch ex As Exception
' Report an error.
Me.Text = "Error"
End Try
End If
because inside the text file there will be "!" between the line, I would like to replace the "!" with "breakline/enter".
My problem is :
why the "Building & hostname" words count as wrong words.
why the new word that display in 3rd richtextbox is not in new line if the words is found in the middle of the line.
the other wrong words are not color, bold n highlight.....
Your code is splitting all the words based on a space, but it's ignoring the line breaks, so it makes "running-confing building" look like one word.
Try it this way:
Dim txt1 As String() = String.Join(" ", DispBox.Lines).Split(" ")
Dim txt2 As String() = String.Join(" ", DispBox2.Lines).Split(" ")

How to replace the html tagged text in a word Document in VB.NET

I have a VB.NET code that have always find and replace the text in the Word Document File(.docx). I am using OpenXml for this process.
But I wants to replace only the HTML tagged text and always removing the tags after replace the new text in the document.
my code is:
Public Sub SearchAndReplace(ByVal document As String)
Dim wordDoc As WordprocessingDocument = WordprocessingDocument.Open(document, True)
Using (wordDoc)
Dim docText As String = Nothing
Dim sr As StreamReader = New StreamReader(wordDoc.MainDocumentPart.GetStream)
Using (sr)
docText = sr.ReadToEnd
End Using
Dim regexText As Regex = New Regex("<ReplaceText>")
docText = regexText.Replace(docText, "Hi Everyone!")
Dim sw As StreamWriter = New StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create))
Using (sw)
sw.Write(docText)
End Using
End Using
Here's to help you resolve your problem.
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim Text As String = "Blah<foo>Blah"
'Prints Text
Console.WriteLine(Text)
Dim regex As New Regex("(<)[]\w\/]+(>)")
'Prints Text after replace the in-between the capturing group 1 and 2.
'Capturing group are marked between parenthesis in the regex pattern
Console.WriteLine(regex.Replace(Text, "$1foo has been replaced.$2"))
'Update Text
Text = regex.Replace(Text, "$1foo has been replaced.$2")
'Remove starting tag
Dim p As Integer = InStr(Text, "<")
Text = Text.Remove(p - 1, 1)
'Remove trailing tag
Dim pp As Integer = InStr(Text, ">")
Text = Text.Remove(pp - 1, 1)
'Print Text
Console.WriteLine(Text)
Console.ReadLine()
End Sub
End Module
Output:
The above code will not function if you have multiple tags per line.
I would advise not to use regex to parse HTML.

ReportViewer.LocalReport.Render("PDF")

I am using Report Viewer for WinForms. The problem i am having is this: I have a form that contains a form which is used to view a local report which work fine, but when I try to render the same report as a PDF, it is cut-off, but in report viewer the same report renders a report on one page. When I render to PDF it cuts it off and the part of the report that was cut-off renders on a 2nd page. So in other words, part of the same report is on page 1, and 2nd half is on 2nd page in the PDF?
Code:
Private Function GetPDfReport() As String
Dim parameters = Me.GetReportParms()
Dim query = Me.GetReportQuery()
Dim rView As Microsoft.Reporting.WinForms.ReportViewer = New Microsoft.Reporting.WinForms.ReportViewer
rView.Dock = DockStyle.Fill
rView.SetDisplayMode(DisplayMode.PrintLayout)
Dim pnl As New Panel()
pnl.Name = "pnlMain"
pnl.Location = New System.Drawing.Point(0, 25)
pnl.Size = New System.Drawing.Size(734, 478)
pnl.Controls.Add(rView)
Dim dbReader As New dbReader()
Dim ds As DataSet = dbReader.DataSet(query)
Dim rds As Microsoft.Reporting.WinForms.ReportDataSource = New Microsoft.Reporting.WinForms.ReportDataSource("DataSet1", ds.Tables(0))
rView.ProcessingMode = Microsoft.Reporting.WinForms.ProcessingMode.Local
rView.LocalReport.DataSources.Add(rds)
rView.LocalReport.ReportEmbeddedResource = "EasyDose.rptIncident.rdlc"
If Not IsNothing(parameters) Then
Dim Bound0 As Integer = parameters.GetUpperBound(0)
Dim Bound1 As Integer = parameters.GetUpperBound(1)
For index = 0 To Bound0
Dim rParameter As New ReportParameter(parameters(index, 0), parameters(index, 1))
rView.LocalReport.SetParameters(rParameter)
Next
End If
Dim ps As PageSettings = rView.GetPageSettings
ps.Margins.Top = 0 ' 10mm approx
ps.Margins.Right = 0
ps.Margins.Bottom = 0
ps.Margins.Left = 0
ps.Landscape = False
'ps.PaperSize = New PaperSize("LetterExtra", (9.275 * 100), (12 * 100)) ' Letter paper (8.5 in. by 11 in.) ' Letter extra paper (9.275 in. by 12 in.)
ps.PaperSize = New PaperSize("A4", (8.27 * 100), (11.69 * 100))
rView.RefreshReport()
Dim exePath As String = System.IO.Path.GetDirectoryName(Application.ExecutablePath)
Dim dir As New DirectoryInfo(System.IO.Path.Combine(exePath, "tmpDir"))
Dim file As New FileInfo(System.IO.Path.Combine( _
dir.FullName, String.Format("Patient_Details_{0:yyyyMMdd_hhmmss}.pdf", DateTime.Now)))
If Not dir.Exists Then
dir.Create()
End If
Dim bytes As Byte() = rView.LocalReport.Render("PDF")
Using fs As New System.IO.FileStream(file.FullName, System.IO.FileMode.Create)
fs.Write(bytes, 0, bytes.Length)
fs.Close()
End Using
Return file.FullName
End Function
are you seeing the local report in the embedded ReportViewer using the "Print Layout" option activated? That should show exactly the same output as your printed result.
If you have problems in the PDF is probably caused by the design of the report itself. Check the font, the page size and orientation, the margins, the page breaks.
uisng System.IO;
byte[] rep = reportViewer1.LocalReport.Render("pdf", deviceInfo: "");
// if a certificate warning appears just ignore and re-run
File.WriteAllBytes(filepath+filename+".pdf",rep);

Create Table of Contents using iTextSharp

I'm working on some code that I can't make it work.
I have a program that takes multiple pdf's and merges them into one file. Now I need to create a table of contents on the first page. You can see examples of the documents below.
I would like to outsource this to someone who is an expert with iTextSharp. I don't think this will take more than an hour or two the most.
The requirements are:
The toc will be based of the bookmarks.
The toc text will be linked to the proper page so the user can click on the text to go to the page.
The existing bookmarks in sampe1.pdf must remain.
The page numbers are already calculated, so do don't have to worry about that.
The working code must be part of the VB.Net project files I give you. I've tried several snippets without luck, I would like it to just work without me having to adapt the code.
The file I generate looks like this: http://gamepacks.org/sample1.pdf
The file with toc should look like this (layout, not the font style): http://gamepacks.org/sample2.pdf
I would appreciate anyone who can help me out.
The code I used to generate sample1.pdf looks like this to give you an idea what you need to work with.
Public Sub MergePdfFiles(ByVal docList As List(Of Portal.DocumentRow), ByVal outputPath As String)
'
' http://www.vbforums.com/showthread.php?475920-Merge-Pdf-Files-and-Add-Bookmarks-to-It-(Using-iTextSharp)
'
If docList.Count = 0 Then Exit Sub
Dim tmpFile As String = "c:\STEP_1_Working.pdf"
Dim OutlineList As List(Of PdfOutline) = New List(Of PdfOutline)
Dim FirstPageIndex As Integer = 1 ' Tracks which page to link the bookmark
Dim result As Boolean = False
Dim pdfCount As Integer = 0 'total input pdf file count
Dim fileName As String = String.Empty 'current input pdf filename
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim pageCount As Integer = 0 'current input pdf page count
Dim doc As iTextSharp.text.Document = Nothing 'the output pdf document
Dim writer As PdfWriter = Nothing
Dim cb As PdfContentByte = Nothing
'Declare a variable to hold the imported pages
Dim page As PdfImportedPage = Nothing
Dim rotation As Integer = 0
'Now loop thru the input pdfs
For Each row As Portal.DocumentRow In docList
reader = New iTextSharp.text.pdf.PdfReader(row.FilePath)
' Is this the first pdf file
If (row.Name = docList(0).Name) Then
doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1), 18, 18, 18, 18)
writer = PdfWriter.GetInstance(doc, New IO.FileStream(tmpFile, IO.FileMode.Create))
' Always show the bookmarks
writer.ViewerPreferences = PdfWriter.PageModeUseOutlines
'Set metadata and open the document
With doc
.AddAuthor("Sample Title")
.AddCreationDate()
.Open()
End With
'Instantiate a PdfContentByte object
cb = writer.DirectContentUnder
End If
For i As Integer = 1 To reader.NumberOfPages
'Get the input page size
doc.SetPageSize(reader.GetPageSizeWithRotation(i))
'Create a new page on the output document
doc.NewPage()
'If it is the 1st page, we add bookmarks to the page
If i = 1 Then
If row.Parent = "" Then
Dim oline As PdfOutline = New PdfOutline(cb.RootOutline, PdfAction.GotoLocalPage(FirstPageIndex, New PdfDestination(FirstPageIndex), writer), row.Name)
Else
Dim parent As PdfOutline = Nothing
For Each tmp As PdfOutline In cb.RootOutline.Kids
If tmp.Title = row.Parent Then
parent = tmp
End If
Next
' Create new group outline
If parent Is Nothing Then
parent = New PdfOutline(cb.RootOutline, PdfAction.GotoLocalPage(FirstPageIndex, New PdfDestination(FirstPageIndex), writer), row.Parent)
End If
' Add to new parent
Dim oline As PdfOutline = New PdfOutline(parent, PdfAction.GotoLocalPage(FirstPageIndex, New PdfDestination(FirstPageIndex), writer), row.Name)
OutlineList.Add(oline)
End If
FirstPageIndex += reader.NumberOfPages
End If
'Now we get the imported page
page = writer.GetImportedPage(reader, i)
'Read the imported page's rotation
rotation = reader.GetPageRotation(i)
'Then add the imported page to the PdfContentByte object as a template based on the page's rotation
If rotation = 90 Then
cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(i).Height)
ElseIf rotation = 270 Then
cb.AddTemplate(page, 0, 1.0F, -1.0F, 0, reader.GetPageSizeWithRotation(i).Width + 60, -30)
Else
cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0)
End If
Next
Next
doc.Close()
End Sub