parsing a large csv file using vb.net without newline ending - vb.net

I was given a file that was created with a java program but does not have LF or endofline ending so I am working with a gigantic string. I tried splitting and then using the TextFieldParser but it seems the file is just too big to deal with. The contents are vital to I need to get this data somehow and then clean it up. Here is what I have tried:
Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\Users\Desktop\META3.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message & "is not valid and will be skipped.")
End Try
End While
End Using
I think the best way is to take substrings of the text and I wanted to take all values after the 7 occurrences of a comma which is what the file should have per line. Not sure how to do this and it seems like regex maybe the only option. Any ideas appreciated.
line = freader.Readline()
Dim ms As Match = Regex.Match(line, "(\w+),(\w+),(\w+),(\w+),(\w+),(\w+),")
line = ms.Value
will this work; does not give expected results.

If you can be guaranteed that the number of columns is always consistent why not add a counter that reads each column and then goes onto the next set. You can then create a new spreadsheet or file with the correct format. If you do an search on here there is a package for .net which allows you to build valid .xls and .xlsx files on the fly. The package is called "simple ooxml". i have used this to create all sorts of spreadsheets where I work. I built a command line app that passes and xml file with parameters and builds this as a fully fledged spreadsheet. Hope the above helps. Any questions let me know.

Related

For each loop keeps stopping on second cycle VB

The software I'm writing is being run in a service installed on a computer.
I want to read a text file, process it, and code it to a different path.
the software is doing exactly what it's supposed to do but it only processes 2 files and it stops. I believe that its something to do with the for each loop. I found some information online saying that its to do with the amount of memory being allocated to each cycle of the for each loop.
Any help is appreciated.
my code goes like this.
For Each foundFile As String In My.Computer.FileSystem.GetFiles("C:\Commsin\", FileIO.SearchOption.SearchTopLevelOnly, "ORDER-*.TXT")
Dim filenum As Integer
filenum = FreeFile()
FileOpen(filenum, foundFile, OpenMode.Input)
While Not EOF(filenum)
<do a bunch of stuff>
End While
<more code>
Dim arrayFileName() As String = GetFileName.Split("\")
Dim FileName As String = arrayFileName(2)
My.Computer.FileSystem.CopyFile(foundFile, "C:\Commsin\Done\" & FileName)
If IO.File.Exists("C:\Commsin\Done\" & FileName) Then
My.Computer.FileSystem.DeleteFile(foundFile, Microsoft.VisualBasic.FileIO.UIOption.AllDialogs, Microsoft.VisualBasic.FileIO.RecycleOption.SendToRecycleBin)
NoOfOrders -= NoOfOrders
End If
Next
Fundamental mistake: Don't modify the collection you are iterating over, i.e. avoid this pattern (pseudocode):
For Each thing In BunchOfThings:
SomeOperation()
BunchOfThings.Delete(thing)
Next thing
It's better to follow this pattern here (pseudocode again):
While Not BunchOfThings.IsEmpty()
thing = BunchOfThings.nextThing()
SomeOperation()
BunchOfThings.Delete(thing)
End While
I'll leave it as an exercise for you to convert your code from the first approach to the second.
It looks like you're trying to extract the filename from the full path using Split().
Why not just use:
Dim fileName As String = IO.Path.GetFileName(foundFile)
Instead of:
Dim arrayFileName() As String = GetFileName.Split("\")
Dim FileName As String = arrayFileName(2)
Thank you, everyone, for your suggestions, I have successfully implemented the recommended changes. It turned out that the issue wasn't with the code itself.
It was with one of the files I was using it had a text row that once split into an array it wasn't at a required length giving an error "Index was outside the bounds of the array."
It was a mistake on the file, I also added some check to prevent this error in the future.
Thank You.

Searching text in a .txt file using visual basic

I am making a project that searches a document based on some words provided by the user in a text box to search in the text file.
I am using visual studio 2013 on a basic windows form application, and want to open file component based on the word in the textbox.
the text must open from the required point.
If i've understood your question, you want to find a keyword inside a txt file based on the user input in order to do something, well. This is a little snippet to give you some example and a good starting point.
Snippet
Dim reader As StreamReader
Dim txtInput as String = "YOUR_TXT_FILE_PATH"
'The reader opens the txt input specified with the file path
reader = My.Computer.FileSystem.OpenTextFileReader(txtInput)
'Create a string from the txt file
Dim txtString as String = reader.ReadLine.TrimStart
Dim wordFound as String
'Use Regex inside a Try/Catch statement in order to catch any possible exception.
'If Regex.Match doesn't find your TextBox1.Text inside the string it will throw an exception
Try
'Use Regex to find your word or your expression inside the string created before
For Each data As Match In Regex.Matches(txtString, TextBox1.Text)
wordFound = txtString.Substring(data.index, TextBox1.Text.Lenght())
'Do your stuff with your word found in the txt...
Next
Catch ex As Exception
Err.Clear()
Finally
'Close the reader before the Try/Catch statement
reader.Close()
End Try
I suggest you to first read the documentation about Regex.Match and if you have any doubt feel free to ask.
Welcome to StackOverflow!

Strange Issue with Text Field Parser

I have a strange issue when using the "Microsoft.VisualBasic.FileIO.TextFieldParser" in VB.NET. I am creating a CSV file that uses ",*" as the delimiter and then trying to export that file into an EXCEL spreadsheet. The program creates the file perfectly, but when I try to pull each entry from the file and put them into the Excel table, some of the entries are getting split up. For example, one of the first entries in one of my rows is "H WIRE*30(SHDR-30V&SHDR-30V) 160mm" without the quotes. When I grab that entry using the TextFieldParser and export it into EXCEL, the 160mm gets dropped from the entry and its added to the row below. See image. Excel Image
Here is a link to the CSV that im using. Test CSV Text File
Here is the code. (I removed the excel portion of the code for this post.) I put a message box in to show each entry the Parser is pulling. When it gets to the "H WIRE*30(SHDR-30V&SHDR-30V) 160mm" entry, it reads "H WIRE*30(SHDR-30V&SHDR-30V)" as its own row with no entries after and then reads "160mm" as a new row with the remaining info that should be all together. (Sorry if that is confusing xD)
I just cant figure out why its doing this? The CSV looks fine, and all the other rows read and export into excel perfectly. But this same row, everytime, is messing up.
Thanks in advance for any help.
THANK YOU SO MUCH Idle_Mind.
Dim MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\TEST CSV.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",*")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MessageBox.Show(currentField)
Next
There is an unexpected Line Feed in that line. I simply downloaded your text file, opened it in Notepad++, and turned on the View --> Show Symbol --> Show End of Line feature:

reading a simple text file, splitting and sorting the contents using vb

I'm quite new to vb and doing simple basics, I have managed to access and read a specific file line by line. If I wanted to split information by a comma or space and then sort alphabetically or numerically, how would I go about this procedure? Would I create a loop within the reading loop to parse the information? A simple to follow example would really help...Thanks!
Dim file As String = "C:\Users\test.txt"
Dim Line As String
If System.IO.File.Exists(file) = True Then
Dim objReader As New System.IO.StreamReader(file)
Do While objReader.Peek() <> -1
Line = Line & objReader.ReadLine() & vbNewLine
Loop
Next
Label1.Text = Line
objReader.Close()
Else
MsgBox("File Does Not Exist")
End If
It depends what you want do do with the text you split really.
The Split() function will return you an array of string with the results of your split, from there it really depends on the data.
Here is an example of using split http://www.dotnetperls.com/split-vbnet
Since you mentioned you want to sort the data alphabetically you may wish to look at http://www.codepedia.com/1/VBNET_ArraySort or look into using LINQ.
It is quite acceptable to nest a loop within your main loop if you want to do something more complex with the data.

Avoiding 'End Of File' errors

I'm trying to import a tab delimited file into a table.
The issue is, SOMETIMES, the file will include an awkward record that has two "null values" and causes my program to throw a "unexpected end of file".
For example, each record will have 20 fields. But the last record will have only two fields (two null values), and hence, unexpected EOF.
Currently I'm using a StreamReader.
I've tried counting the lines and telling bcp to stop reading before the "phantom nulls", but StreamReader gets an incorrect count of lines due to the "phantom nulls".
I've tried the following code to get rid of all bogus code (code borrowed off the net). But it just replaces the fields with empty spaces (I'd like the result of no line left behind).
Public Sub RemoveBlankRowsFromCVSFile2(ByVal filepath As String)
If filepath = DBNull.Value.ToString() Or filepath.Length = 0 Then Throw New ArgumentNullException("filepath")
If (File.Exists(filepath) = False) Then Throw New FileNotFoundException("Could not find CSV file.", filepath)
Dim tempFile As String = Path.GetTempFileName()
Using reader As New StreamReader(filepath)
Using writer As New StreamWriter(tempFile)
Dim line As String = Nothing
line = reader.ReadLine()
While Not line Is Nothing
If Not line.Equals(" ") Then writer.WriteLine(line)
line = reader.ReadLine()
End While
End Using
End Using
File.Delete(filepath)
File.Move(tempFile, filepath)
End Sub
I've tried using SSIS, but it encounters the EOF unexpected error.
What am I doing wrong?
If you read the entire file into a string variable (using reader.ReadToEnd()) do you get the whole thing? or are you just getting the data up to those phantom nulls?
Have you tried using the Reader.ReadBlock() function to try and read past the file length?
At our company we do hundreds of imports every week. If a file is not sent in the correct, agreed to format for our automated process, we return it to the sender. If the last line is wrong, the file should not be processed because it might be missing information or in some other way corrupt.
One way to avoid the error is to use ReadAllLines, then process the array of file lines instead of progressing through the file. This is also a lot more efficient than streamreader.
Dim fileLines() As String
fileLines = File.ReadAllLines("c:\tmp.csv")
...
for each line in filelines
If trim(line) <> "" Then writer.WriteLine(line)
next line
You can also use save the output lines in the same or a different string array and use File.WriteAllLines to write the file all at once.
You could try the built-in .Net object for reading tab-delimited files. It is Microsoft.VisualBasic.FileIO.TextFileParser.
This was solved using a bit array, checking one bit at a time for the suspect bit.