I'm not sure what is wrong with my code. It reads the PDF file, and grabs all the text, but every item is combined together into one string with no separator of any kind.
Sample:
"Houses: 2
Bedrooms: 3
Bathsroom 4"
will get read as "Houses: 2Bedrooms: 3Bathsroom 4"
I've searched through all of the examples to no avail. I've also tried LocationTextExtractionStrategy to no avail. I've tried using the .split method and no help.
Public Shared Function ParseAllPdfText(ByVal filepath As String)
Dim sbtxt, currenttext As String
sbtxt = ""
Try
Using reader As New PdfReader(filepath)
For intPages As Integer = 1 To reader.NumberOfPages
currenttext = PdfTextExtractor.GetTextFromPage(reader, intPages, New LocationTextExtractionStrategy())
currenttext = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.[Default], Encoding.UTF8, Encoding.[Default].GetBytes(currenttext)))
sbtxt = sbtxt & currenttext & vbcrlf
Next
End Using
Catch ex As Exception
MsgBox(" There was an error extracting text from the file: " & ex.Message, vbInformation, "Error Extracting Text")
End Try
Return sbtxt
Nevermind, this was an oversight on my part. I realized the lines are separated by Chr(10). Chr(10) does not create a new line in textboxes, which is where I was outputting my string. It DOES however create a new line in MsgBox. So if anyone else runs into this problem, chr(10) is the separator. :-)
Related
Following this, I use a TextFieldParser to read a csv File:
Sub imp1(path As String)
With New TextFieldParser("C:\matrix1.csv")
.TextFieldType = FileIO.FieldType.Delimited
.Delimiters = New String() {";"}
.CommentTokens = New String() {"'"}
Debug.Print(.ReadToEnd)
' some more code to read the contents into a 2d-array
End with
End Sub
After setting .CommentTokens = New String() {"'"} I expected lines with leading single quotes being skipped.
However, from what I gather there is no difference at all when reading a csv like the following:
'comment1
1;0.5;0.9;0.3
0.5;1;0.6;0.2
0.9;0.6;1;0.1
0.3;0.2;0.1;1
I tried replacing the single quote ' with several common comment-characters (#, \, \*), both with and without a following blanks - still not getting the desired results.
In your code you are using TextFieldParser.ReadToEnd which simply returns the complete remaining text and does not ignore comments. This is documented:
The ReadToEnd method does not ignore blank lines and comments.
If you would use ReadFields the comments would be ignored (MS example):
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
' Include code here to handle the row.
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message &
" is invalid. Skipping")
End Try
End While
I'm grabbing the HTML source code of a webpage and try to convert it to a single-line string but I can't.
This is my code:
Dim source As String = client.DownloadString("http://www.whocallsme.gr/el/master/lookup/19588")
source = source.Replace(vbCrLf, "")
I also tried using Environment.NewLine and vbNewLine instead of vbCrLf but the result remains the same.
Try this:
source = source.Replace(vbLf, "").Replace(vbCr, "")
I have a program that needs to look through a text file line by line, the lines look like this:
10-19-2015 Brett Reinhard All Bike Yoga Run the Studio Design Your Own Strength
These are separated by tabs in the text file.
What I want to do is look at the second value, in this case "Brett Reinhard" and move the full line to another textfile called "Brett Reinhard"
I was thinking of using an array to check to see if the second 'column' in the line matched any value within a given array, if it does I want to perform a specific action.
The way I am thinking of doing this is with a For/next statement, now while it will work it will be a laborious process for the computer that I will be using it on.
The code I am thinking of using looks like this:
For intCounter=0 to Whatever Number is the last number of the array
If currentfield.contains(array(intCounter)) Then
Open StreamWriter(File directory & array(intcounter) & ".txt")
Streamwriter.Writeline(currentfield)
End IF
Is there a better way of doing this, such as referencing the second 'column' in the line, similar to the syntax used in VBA for excel.
Name=Cells(1,2).Value
If you can guarantee that a line will only use the tab characters as field separators, you can do something along this:
Open the stream for reading text
Open a stream for writing text
Read a line of text
Use the Split method to break the incoming line into an array of fields
If the second element in the array is your sentinel value, write the original line to the writer
Repeat yourself until you have reached the end of file (ReadLine will return Nothing, or null for those c# folk).
Close and dispose of your stream objects.
If you aren't sure of the format, you will want to take the hit and use the TextFieldParser as mentioned in an earlier comment.
So while its not using an array to search a file, what I ended up doing works just as well. I ended up using the split method thanks to #Martin Soles.
Here is what I came up with:
Sub Main()
Dim intCount As Integer = 1
Dim words As String
Dim split As String()
Using MyReader As New Microsoft.VisualBasic.
FileIO.TextFieldParser(
"I:\Games, Events, & Promotions\FRP\Back End\Approved.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
words = currentField
split = words.Split(New [Char]() {CChar(vbTab)})
For Each s As String In split
If intCount = 2 Then
Dim file As System.IO.StreamWriter
file = My.Computer.FileSystem.OpenTextFileWriter("I:\Games, Events, & Promotions\FRP\Back End\" & s & ".txt", True)
file.WriteLine(currentField)
file.Close()
End If
intCount = intCount + 1
Next s
intCount = 1
Next
Catch ex As Microsoft.VisualBasic.
FileIO.MalformedLineException
MsgBox("Line " & ex.Message &
"is not valid and will be skipped.")
End Try
End While
End Using
End Sub 'Main
Thank you guys for the suggestions.
For right now the split method will work for what is needed.
Using VB.Net and a text file
For Example #1: 10 LINES (Below are the text/data inside a text file)
Filename: Test1.txt
Note #1: I want to search the string "F1" then display the "I play Farmville" in a TextBox1.Text
FaceF1book 'line#1
I play Farmville 'line#2
'line#3
'line#4
TwitF2ter 'line#5
Occassionally use this site 'line#6
'line#7
'line#8
FriendsF3ter 'line#9
I don't want to use this site 'line#10
For Example #2: 12 LINES (Below are the text/data inside a text file)
Filename: Test2.txt
Note #2.1: I want to search the string "F2" then display the "Occassionally use this site" in a TextBox1.Text
Note #2.2: You can notice that the line position of the data aren't the same in the Example #1
FaceF1book 'line#1
I play Farmville 'line#2
I love to chat with my friends 'line#3
I want to be famous 'line#4
'line#5
'line#6
TwitF2ter 'line#7
Occassionally use this site 'line#8
'line#9
'line#10
FriendsF3ter 'line#11
I don't want to use this site 'line#12
Here's another approach:
Dim dataFile As String = System.IO.File.ReadAllText("C:\Users\WindowsUser\Desktop\Test Files\test1.txt")
If System.IO.File.Exists(dataFile) Then
Try
Dim lines As New List(Of String)
lines.AddRange(System.IO.File.ReadAllLines(dataFile))
Dim searchFor As String = "F1"
For i As Integer = 0 To lines.Count - 1
If lines(i).Contains(searchFor) Then
' ... do something with lines(i + 1) ... ?
Exit For
End If
Next
Catch ex As Exception
MessageBox.Show(ex.ToString, "Error Reading File")
End Try
Else
MessageBox.Show(dataFile, "File Not Found")
End If
I would be grateful with some help with reading a text file into a Richtext box. The code I have at present appends the first line of text as I want it but the rest of the lines of text do not alter. I need a loop to read to the end of file and display in Richtext box. the code i have at present is this:-
Dim FILE_NAME As String = "C:\Test.txt"
Dim sr As New System.IO.StreamReader(FILE_NAME)
RichTextBox1.Text = sr.ReadToEnd
Dim sb As New System.Text.StringBuilder(RichTextBox1.Text)
sb.Insert(5, " ")
sb.Insert(12, " ")
sb.Insert(18, " ")
sb.Insert(25, " ")
sb.Insert(29, " ")
sb.Insert(32, " ")
sb.Insert(37, " ")
sb.Insert(44, " ")
sb.Insert(45, " ")
RichTextBox2.Text = sb.ToString
sr.Close()
I think you just want RichTextBox1.LoadFile "C:/test.txt"
that should be a backslash in the file name but my keyboard doesn't have one on this pc
The reason for the spaces is because each line of text that have the same length characters with spaces needs to be seperated to make it more readable.The original text looks like this:-
17915WHITE BLUE 001.900116A T123456111
72451BLACK ORANGE000.500208 B A123456123 'worst case
72455BLACK WHITE 002.703501 C123456124
Needs to look like below.
17915:WHITE BLUE :001.9:001:16:A :T:123456:111
72451:BLACK ORANGE:000.5:002:08: B :A:123456:123
72455:BLACK WHITE :002.7:035:01: :C:123456:124
I can produce the first line to a text file but i cannot reproduce the rest of the lines of text i think i need a loop to keep reading over the text file until the file is read.