Itext 7 IPdfTextLocation.GetPageNumber turning up 0 all the time - vb.net

I'm writing a (VB/Net) procedure that searches a PDF document for a Regex pattern and writes a text file with the matched locations:
Public Sub ReadAndMatch(ByVal InputFileName As String, OutputFileName As String, RegexPattern As String)
Dim pdfIn As New iText.Kernel.Pdf.PdfReader(InputFileName) ' A Pdfreader object associated with the input file name
Dim pdfDoc As New iText.Kernel.Pdf.PdfDocument(pdfIn) 'This object holds the actual document being analyzed
Dim strategy As New iText.Kernel.Pdf.Canvas.Parser.Listener.RegexBasedLocationExtractionStrategy(RegexPattern) 'extraction strategy
Dim Parser As iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor = New Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor(strategy)
Dim Loclist As System.Collections.ICollection 'all the matches
Dim Location As iText.Kernel.Pdf.Canvas.Parser.Listener.IPdfTextLocation 'one match
Dim CoordFile As New System.IO.StreamWriter(OutputFileName) 'initiate output stream
Dim TextString As String
Dim L, B, W, H As Single ' Left, bottom, width & height of the rectangle containing the extracted text
Dim pg As Integer = 0 'number of current page, number of matches in page, total number of matches
Do While pg < pdfDoc.GetNumberOfPages 'loop thru document pages
pg += 1
Parser.ProcessPageContent(pdfDoc.GetPage(pg)) 'parse page
Loop
Loclist = strategy.GetResultantLocations
If Loclist.Count = 0 Then Exit Sub
For Each Location In Loclist
L = Location.GetRectangle.GetLeft
B = Location.GetRectangle.GetBottom
W = Location.GetRectangle.GetWidth
H = Location.GetRectangle.GetHeight
TextString = Location.GetText
pg = Location.GetPageNumber
CoordFile.WriteLine(TextString & Chr(9) & L & Chr(9) & B & Chr(9) & W & Chr(9) & H & Chr(9) & pg & Chr(9) & InputFileName)
Next Location
'Finished processing
pdfDoc.Close() 'close pdf
CoordFile.Close() 'close output file
End Sub
I am getting the rectangle coordinates OK, and the matched text string, but location.GetPageNumber is always 0
What am I doing wrong?

Here is a workaround I used.
I re-initiate the strategy object and the parser object every page, and also output the locations matched by the Regex pattern per page.
That way I can use the pg counter of my own While loop instead of the one [not] given by the .PageNumber method:
Private Sub Main() 'the main processing routine
'Sub assumes that the PDF document (pdfDoc) and the output file (CoordFile) --both module level object-- are open and available
Dim Location As iText.Kernel.Pdf.Canvas.Parser.Listener.IPdfTextLocation
Dim TextString As String
Dim L, B, W, H As Single ' Left, bottom, width & height of the rectangle containing the extracted text
Dim pg As Integer = 0 'number of current page, number of matches
Dim N As Integer = 0
Do While pg < pdfDoc.GetNumberOfPages 'loop thru document pages
pg += 1
Dim strategy As New iText.Kernel.Pdf.Canvas.Parser.Listener.RegexBasedLocationExtractionStrategy(RegexPattern) 'extraction strategy (RegexPattern is a module variable)
Dim Parser As iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor = New Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor(strategy)
Parser.ProcessPageContent(pdfDoc.GetPage(pg)) 'parse page
If strategy.GetResultantLocations.Count > 0 Then
For Each Location In strategy.GetResultantLocations
TextString = Location.GetText
L = Location.GetRectangle.GetLeft
B = Location.GetRectangle.GetBottom
W = Location.GetRectangle.GetWidth
H = Location.GetRectangle.GetHeight
CoordFile.WriteLine(TextString & Chr(9) & L & Chr(9) & B & Chr(9) & W & Chr(9) & H & Chr(9) & pg)
N += 1
Next Location
End If
strategy.GetResultantLocations.Clear() 'dispose loclist
Loop
'Finished processing
pdfDoc.Close()
CoordFile.Close() 'close output file
End Sub
... not elegant but it works...

Related

Insert custom text on datagridview row export to TXT

For the code below, I want to put at the beginning of each line exported in TXT the following text : "S,1,___,,;" + the row exported from database.
How can I add for each line that text on export? For the moment I have the text only on the firs line ( check pictures attached)
If DataGridView1.RowCount = 0 Then
MessageBox.Show("Lista este goala")
Else
If Directory.Exists("C:\test") = False Then
Directory.CreateDirectory("C:\test")
End If
Dim sFile As String = "C:\test\test.txt"
If File.Exists(sFile) = True Then
My.Computer.FileSystem.DeleteFile(sFile,
FileIO.UIOption.OnlyErrorDialogs,
FileIO.RecycleOption.DeletePermanently, FileIO.UICancelOption.DoNothing)
End If
Using f As New IO.StreamWriter(sFile, True)
Dim col As String = ""
Dim a As String = "S,1,______,_,__;"
Dim row As String = ""
Dim i As Integer = 0
For Each r As DataGridViewRow In DataGridView1.Rows
For Each c As DataGridViewColumn In DataGridView1.Columns
row = row & Convert.ToString(r.Cells(c.HeaderText).Value) & ";"
Next
If i < DataGridView1.Rows.Count - 1 Then row &= Environment.NewLine
Next
f.WriteLine(a & row)
End Using
Using f2 As New IO.StreamWriter(sFile, True)
Dim col As String = ""
Dim row As String = ""
Dim i As Integer = 0
For Each r As DataGridViewRow In DataGridView3.Rows
For Each c As DataGridViewColumn In DataGridView3.Columns
row = row & Convert.ToString(r.Cells(c.HeaderText).Value) & ";"
Next
If i < DataGridView3.Rows.Count - 1 Then Row &= Environment.NewLine
Next
f2.WriteLine("T,1,______,_,__;" & row)
f2.Close()
MessageBox.Show("Bon printat")
End Using
End If
And also a picture with exported file:
the red circled line is the way that I need for export
the red circle string from code is my input which is mandatory for each line followed by DGV lines export.
Actual TXT export
Actual Code for input my mandatory text before DGV export
Thanks!
Provide some debug detail like what is Rows.Count etc.
If it is not going inside loop and even there are rows, better you try with accessing object in DataSource.
Something like ForEach loop on ((DataTable)DataGridView1.DataSource).Rows

MS- Access VBA Converting multiple characters to Asc

For a homework project I am trying to enter characters in a single textbox as (eg:"AbC" no spaces) and have the output in a captioned label as the corresponding ASCII value written out with commas and spaces. (eg: 65, 98, 67)
Private Sub cmdCode_Click()
Dim codeInt As Integer
strInput = txtInput.value
codeInt = Asc(strInput)
lblAnswer.Caption = codeInt & ", "
End Sub
I would like the result to look like: 65, 98, 67
I'm getting no errors but only receiving "65," as my output.
Here is my solution. It assumes that the input is always going to be three (3) characters long:
Private Sub cmdCode_Click()
Dim x As String
Dim y As String
Dim z As String
strInput = txtInput.value
x = Asc(Left(strInput, 1))
y = Asc(Mid(strInput, 2, 1))
z = Asc(Right(strInput, 1))
lblAnswer.Caption = x & ", " & y & ", " & z
End Sub
This can be done for generic usage - and a little smarter:
Public Function StrToAscList( _
ByVal Text As String) _
As String
Dim Chars() As Byte
Dim Item As Integer
Dim List As String
Chars() = StrConv(Text, vbFromUnicode)
For Item = LBound(Chars) To UBound(Chars)
If Item > 0 Then List = List & ", "
List = List & CStr(Chars(Item))
Next
StrToAscList = List
End Function
Then:
Me!lblAnswer.Caption = StrToAscList(strInput)

All SubDirectories and a file count for each in a lable

How can i have the list of all SubDirectories and for each SubDirectory a file count?
And most important it must be in a form of continuos text, not as a listbox.
My example just prints the last Directory. How can i get it to print them all like this:
Adobe: 45 / Adobe Media Player:5 / Java: 22 / etc....
Private Sub GetDir()
For Each x As String In System.IO.Directory.GetDirectories("C:\ProgramFiles")
y = x & " : " & CStr(x.Count)
Next
Label1.Text = y
End Sub
I need it in continuos text so i can mail it actually. That's why i can't work with listbox.
You can use a StringBuilder object to build the output and a DirectoryInfo object to easily get name and files count for each directory.
Dim list As New StringBuilder
For Each directory As String In IO.Directory.GetDirectories("C:\ProgramFiles")
Dim subDirectory As New IO.DirectoryInfo(directory)
list.Append(subDirectory.Name & ": " & subDirectory.GetFiles.Length & " / ")
Next
Dim text As String = list.ToString.Remove(list.Length - 3)
I had to add a space to "Program Files" or it was blowing up, but the += should add to your string in the loop fine.
Private Sub GetDir()
Dim y As String = ""
For Each dir As String In System.IO.Directory.GetDirectories("C:\Program Files")
y += dir & " : " & CStr(dir.Count)
Next
Label1.Text = y
End Sub

Compare contents of a text file based on line number VB2008

Let's say I have two text file, I will compare (based on the line numbering of text ->see below) because this is where the unique key is generated.
sample1.txt:
5th line -> _000_000F_01CE2577.B840E640
sample2.txt
5th line -> _000_000F_01CE2577.B840E640
Now here is my code:
Dim FILE_NAME As String = "C:\myfiles"
'This is to determine the number of lines in the text file
Dim count As Integer
count = 0
Dim obj As StreamReader
obj = New StreamReader(FILE_NAME)
Do Until obj.ReadLine Is Nothing
count = count + 1
Loop
'------------------------------
'this is my computation to get the number of line -->disregard this
Dim temp3 As Integer
temp3 = count - 3
'------------------------------
obj.Close()
'This is to read all the text in the text file
Dim fileReader(fs) As String
fileReader(fs) = My.Computer.FileSystem.ReadAllText(FILE_NAME, _
System.Text.Encoding.ASCII)
I have stored each file in an array
Example:
file[0]
file[1]
Then I have to read each file and its contents, now how will i compare the line of text to each other. i believe i have to use regex.
Please give me some pointers on how to compare the line of text...
e.g. 5th line in sample1.txt == 5th line of sample2.txt
I have to know if they are the same.
this should do the job for you
it will read each line in txt file , save it to an array then compare
note: set paths do your 2 txt files
it will go out of bounds if there is less lines in file 2 than file 1. You can add a little bit of code to handle that case though.
Option Explicit
Sub Read_text_File()
Dim firstFile() As String, secondFile() As String
Dim path1 As String, path2 As String
Dim i As Long
path1 = "C:\ ... .txt"
path2 = "C:\ ... .txt"
Call fill_array(firstFile, path1)
Call fill_array(secondFile, path2)
For i = LBound(firstFile) To UBound(firstFile) - 1
Debug.Print (firstFile(i) & vbTab & vbTab & vbTab & vbTab & secondFile(i))
If StrComp(firstFile(i), secondFile(i), vbTextCompare) = 0 Then
MsgBox "Line: " & i + 1 & " matches "
End If
Next i
End Sub
Sub fill_array(ByRef arr() As String, pathToFile As String)
Dim oFSO As New FileSystemObject
Dim oFS As TextStream
Dim cnt As Long
cnt = 0
Set oFS = oFSO.OpenTextFile(pathToFile)
Do Until oFS.AtEndOfStream
oFS.ReadLine
cnt = cnt + 1
Loop
ReDim arr(cnt)
Set oFS = oFSO.OpenTextFile(pathToFile)
cnt = 0
Do Until oFS.AtEndOfStream
arr(cnt) = oFS.ReadLine
cnt = cnt + 1
Loop
oFS.Close
Set oFS = Nothing
End Sub

VBA - Check if ContentControl text contains formatting?

So this is what I want to do, if it's possible.
I've got a lot of rich textboxes in a Word template. And I want to create a macro that basically checks if any characters in the text entered into the placeholder is formatted with superscript, subscript, bold or underline etc.
So, What I've got so far is this
Dim i As Long
Dim txtboxString as String
For i = 1 To ActiveDocument.ContentControls.Count
If ActiveDocument.ContentControls(i).Title = "Repporttitle" Or ActiveDocument.ContentControls(i).Title = "Subtitle" Then
If ActiveDocument.ContentControls(i).LockContentControl = True Then
ActiveDocument.ContentControls(i).LockContentControl = False
End If
txtboxString = ActiveDocument.ContentControls(i).Range.Text
End If
Next i
So, now, txtboxString contains the text that was typed into the placeholder. But I want to check each letter for it's formatting. The method above only gives me the text as a simple text string. I've seen that I can check each letter of the string this way:
Dim counter as integer
Dim contentText as string '(this is passed on via the above txtboxString)
Dim letter as string
For counter = 1 To Len(contentText)
letter = Mid(contentText, counter, 1)
Next
But, this won't give me the formatting of each letter. How can I do that?
Use Characters and Font instead of Text. Like this:
Sub GetCharacterFormatting()
Dim i As Long
Dim txtboxString As Characters ''# <- this was changed from "String" to "Characters"
Dim Bold As String
Dim Italic As String
Dim Subscript As String
Dim CharacterFont As Font
Dim ap As Document: Set ap = ActiveDocument
For i = 1 To ap.ContentControls.Count
If ap.ContentControls(i).Title = "Repporttitle" Or ap.ContentControls(i).Title = "Subtitle" Then
If ap.ContentControls(i).LockContentControl = True Then
ap.ContentControls(i).LockContentControl = False
End If
txtboxString = ap.ContentControls(i).Range.Characters ''# <- this was changed from "Text" to "Characters"
Dim counter As Integer
For counter = 1 To txtboxString.Count
Index = counter
CharacterText = txtboxString(i).Text
CharacterFont = txtboxString(i).Font
''# You can just grab all the formatting for the character or use If/Then statements
Bold = "Bold: " & CharacterFont.Bold & ", "
Italic = "Italic: " & CharacterFont.Italic & ", "
Subscript = "Subscript: " & CharacterFont.Subscript & " "
''#
Next
Debug.Print Index & " (" & CharacterText & ") : " & Bold; Italic; Subscript
End If
Next i
End Sub