Avoiding 'End Of File' errors - vb.net

I'm trying to import a tab delimited file into a table.
The issue is, SOMETIMES, the file will include an awkward record that has two "null values" and causes my program to throw a "unexpected end of file".
For example, each record will have 20 fields. But the last record will have only two fields (two null values), and hence, unexpected EOF.
Currently I'm using a StreamReader.
I've tried counting the lines and telling bcp to stop reading before the "phantom nulls", but StreamReader gets an incorrect count of lines due to the "phantom nulls".
I've tried the following code to get rid of all bogus code (code borrowed off the net). But it just replaces the fields with empty spaces (I'd like the result of no line left behind).
Public Sub RemoveBlankRowsFromCVSFile2(ByVal filepath As String)
If filepath = DBNull.Value.ToString() Or filepath.Length = 0 Then Throw New ArgumentNullException("filepath")
If (File.Exists(filepath) = False) Then Throw New FileNotFoundException("Could not find CSV file.", filepath)
Dim tempFile As String = Path.GetTempFileName()
Using reader As New StreamReader(filepath)
Using writer As New StreamWriter(tempFile)
Dim line As String = Nothing
line = reader.ReadLine()
While Not line Is Nothing
If Not line.Equals(" ") Then writer.WriteLine(line)
line = reader.ReadLine()
End While
End Using
End Using
File.Delete(filepath)
File.Move(tempFile, filepath)
End Sub
I've tried using SSIS, but it encounters the EOF unexpected error.
What am I doing wrong?

If you read the entire file into a string variable (using reader.ReadToEnd()) do you get the whole thing? or are you just getting the data up to those phantom nulls?
Have you tried using the Reader.ReadBlock() function to try and read past the file length?

At our company we do hundreds of imports every week. If a file is not sent in the correct, agreed to format for our automated process, we return it to the sender. If the last line is wrong, the file should not be processed because it might be missing information or in some other way corrupt.

One way to avoid the error is to use ReadAllLines, then process the array of file lines instead of progressing through the file. This is also a lot more efficient than streamreader.
Dim fileLines() As String
fileLines = File.ReadAllLines("c:\tmp.csv")
...
for each line in filelines
If trim(line) <> "" Then writer.WriteLine(line)
next line
You can also use save the output lines in the same or a different string array and use File.WriteAllLines to write the file all at once.

You could try the built-in .Net object for reading tab-delimited files. It is Microsoft.VisualBasic.FileIO.TextFileParser.

This was solved using a bit array, checking one bit at a time for the suspect bit.

Related

For each loop keeps stopping on second cycle VB

The software I'm writing is being run in a service installed on a computer.
I want to read a text file, process it, and code it to a different path.
the software is doing exactly what it's supposed to do but it only processes 2 files and it stops. I believe that its something to do with the for each loop. I found some information online saying that its to do with the amount of memory being allocated to each cycle of the for each loop.
Any help is appreciated.
my code goes like this.
For Each foundFile As String In My.Computer.FileSystem.GetFiles("C:\Commsin\", FileIO.SearchOption.SearchTopLevelOnly, "ORDER-*.TXT")
Dim filenum As Integer
filenum = FreeFile()
FileOpen(filenum, foundFile, OpenMode.Input)
While Not EOF(filenum)
<do a bunch of stuff>
End While
<more code>
Dim arrayFileName() As String = GetFileName.Split("\")
Dim FileName As String = arrayFileName(2)
My.Computer.FileSystem.CopyFile(foundFile, "C:\Commsin\Done\" & FileName)
If IO.File.Exists("C:\Commsin\Done\" & FileName) Then
My.Computer.FileSystem.DeleteFile(foundFile, Microsoft.VisualBasic.FileIO.UIOption.AllDialogs, Microsoft.VisualBasic.FileIO.RecycleOption.SendToRecycleBin)
NoOfOrders -= NoOfOrders
End If
Next
Fundamental mistake: Don't modify the collection you are iterating over, i.e. avoid this pattern (pseudocode):
For Each thing In BunchOfThings:
SomeOperation()
BunchOfThings.Delete(thing)
Next thing
It's better to follow this pattern here (pseudocode again):
While Not BunchOfThings.IsEmpty()
thing = BunchOfThings.nextThing()
SomeOperation()
BunchOfThings.Delete(thing)
End While
I'll leave it as an exercise for you to convert your code from the first approach to the second.
It looks like you're trying to extract the filename from the full path using Split().
Why not just use:
Dim fileName As String = IO.Path.GetFileName(foundFile)
Instead of:
Dim arrayFileName() As String = GetFileName.Split("\")
Dim FileName As String = arrayFileName(2)
Thank you, everyone, for your suggestions, I have successfully implemented the recommended changes. It turned out that the issue wasn't with the code itself.
It was with one of the files I was using it had a text row that once split into an array it wasn't at a required length giving an error "Index was outside the bounds of the array."
It was a mistake on the file, I also added some check to prevent this error in the future.
Thank You.

System.ArgumentException: 'Illegal characters in path.' Error

Please help. I have a piece of code that's already working in other parks of my program, however fails to work when accessed by a certain form so i can't see there can be an error with it. Its an information storage project using text files. A screenshot of the exact code and the error:
I expected it to change the label text to the contents of the text file its trying to read.
Thanks everyone :)
Well, there must be one or more illegal characters coming from your "zoots1.txt" file!
Build the filename and see what it looks like:
Dim zoot1s As String
zoot1s = My.Computer.FileSystem.ReadAllText("zoot1s.txt")
Dim fileName As String
fileName = zoot1s + "c.txt"
MessageBox.Show(fileName)
Dim ClassStrain As String
ClassStrain = My.Computer.FileSystem.ReadAllText(fileName)
TempLabel3.Text = ClassStrain
Timer1.Start()
--- EDIT --
My bad. I Found the issue to be that there is a skip in text where it goes to a new line. As if i had added vbNewline to it. Is there any way to edit the text file and take away the last character so there isnt a new line.
Use the Trim() function to get rid of white space. Also use Path.Combine() to make sure the path is correctly separated from the filename with the correct number of backslashes:
zoot1s = My.Computer.FileSystem.ReadAllText("zoot1s.txt").Trim()
Dim fileName As String = System.IO.Path.Combine(zoot1s, "c.txt")
I think you just missed the most important thing, which is the whole specific path of your file that will be reading data from, specifically after the ReadAllText() method, so instead of this line:
zoom1s=My.computer.FileSystem.ReadAllText('zoot1s.txt")
You should edit it like this:
zoom1s=My.computer.FileSystem.ReadAllTex(My.Computer.FileSystem.CurrentDirectory & \zoot1s.txt")
I hope this can solve your problem.
^_^

Having issues with illegal characters in file path

I am writing a program in VB.NET which loops through a file with some file paths in it to perform an action on. The file paths in this file are each on a line, and i'm looping through the file like:
Dim FileContents As String
FileContents = System.IO.File.ReadAllText("C:\File.txt")
Dim FileSplit As String()
FileSplit = FileContents.Split(vbCrLf)
For Each ThisLine In FileSplit
Dim FileModified As Date
FileModified = System.IO.File.GetLastWriteTime(ThisLine)
'Do something here
Next
Contents of File.txt is:
Y:\Users\localadmin\Desktop\MakeShadowCopy\FileInfo.vb
Y:\Users\localadmin\Desktop\MakeShadowCopy\FindFiles.vb
Y:\Users\localadmin\Desktop\MakeShadowCopy\MakeShadowCopy.sln
Y:\Users\localadmin\Desktop\MakeShadowCopy\MakeShadowCopy.v12.suo
The loop works fine, but it is throwing an exception on the line with GetLastWriteTime() on it, saying that the path contains illegal characters, but it is just a normal string with a file path in it.
If anyone has any ideas, or know how to escape the string going into GetLastWriteTime() that would be much appreciated :)
Thanks!
Probably the lines in your file are not correctly vbCrLf terminated.
If this is the case the Split cannot divide correctly your input in lines and you end up with the whole text passed to the GetLastWriteTime.
Instead of using ReadAllText you could use ReadAllLines and let the work to split the lines to the Framework that knows how to handle the file line break and carriage return codes.
For Each ThisLine In System.IO.File.ReadAllLines("C:\file.txt")
Dim FileModified As Date
FileModified = System.IO.File.GetLastWriteTime(ThisLine.Trim())
Next
Also add a Trim to the ThisLine variable to remove some unseen character added erroneusly to the line
Two ideas:
Use For instead of For Each and ensure that you're getting exception on the very first iteration. If not, you may have issues with one specific file path. Check out iteration variable value if that is the case.
Open the file in a hex editor and ensure that each line is terminating properly. You might have either CR (10) or LF(13) character at the end but not both as normal in Windows.

parsing a large csv file using vb.net without newline ending

I was given a file that was created with a java program but does not have LF or endofline ending so I am working with a gigantic string. I tried splitting and then using the TextFieldParser but it seems the file is just too big to deal with. The contents are vital to I need to get this data somehow and then clean it up. Here is what I have tried:
Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\Users\Desktop\META3.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message & "is not valid and will be skipped.")
End Try
End While
End Using
I think the best way is to take substrings of the text and I wanted to take all values after the 7 occurrences of a comma which is what the file should have per line. Not sure how to do this and it seems like regex maybe the only option. Any ideas appreciated.
line = freader.Readline()
Dim ms As Match = Regex.Match(line, "(\w+),(\w+),(\w+),(\w+),(\w+),(\w+),")
line = ms.Value
will this work; does not give expected results.
If you can be guaranteed that the number of columns is always consistent why not add a counter that reads each column and then goes onto the next set. You can then create a new spreadsheet or file with the correct format. If you do an search on here there is a package for .net which allows you to build valid .xls and .xlsx files on the fly. The package is called "simple ooxml". i have used this to create all sorts of spreadsheets where I work. I built a command line app that passes and xml file with parameters and builds this as a fully fledged spreadsheet. Hope the above helps. Any questions let me know.

How to : streamreader in csv file splits to next if lowercase followed by uppercase in line

I am using asp.Net MVC application to upload the excel data from its CSV form to database. While reading the csv file using the Stream Reader, if line contains lower case letter followed by Upper case, it splits in two line . EX.
Line :"1,This is nothing but the Example to explanationIt results wrong, testing example"
This line splits to :
Line 1: 1,This is nothing but the Example to explanation"
Line 2:""
Line 3:It results wrong, testing example
where as CSV file generates right as ""1,This is nothing but the Example to explanationIt results wrong, testing example"
code :
Dim csvFileReader As New StreamReader("my csv file Path")
While Not csvFileReader.EndOfStream()
Dim _line = csvFileReader.ReadLine()
End While
Why should this is happening ? how to resolve this.
When a cell in an excel spreadsheet contains multiple lines, and it is saved to a CSV file, excel separates the lines in the cell with a line-feed character (ASCII value 0x0A). Each row in the spreadsheet is separated with the typical carriage-return/line-feed pair (0x0D 0x0A). When you open the CSV file in notepad, it does not show the lone LF character at all, so it looks like it all runs together on one line. So, in the CSV file, even though notepad doesn't show it, it actually looks like this:
' 1,"This is nothing but the Example to explanation{LF}It results wrong",testing example{CR}{LF}
According to the MSDN documentation on the StreamReader.Readline method:
A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n").
Therefore, when you call ReadLine, it will stop reading at the end of the first line in a multi-line cell. To avoid this, you would need to use a different "read" method and then split on CR/LF pairs rather than on either individually.
However, this isn't the only issue you will run into with reading CSV files. For instance, you also need to properly handle the way quotation characters in a cell are escaped in CSV. In such cases, unless it's really necessary to implement it in your own way, it's better to use an existing library to read the file. In this case, Microsoft provides a class in the .NET framework that properly handles reading CSV files (including ones with multi-line cells). The name of the class is TextFieldParser and it's in the Microsoft.VisualBasic.FileIO namespace. Here's the link to a page in the MSDN that explains how to use it to read a CSV file:
http://msdn.microsoft.com/en-us/library/cakac7e6
Here's an example:
Using reader As New TextFieldParser("my csv file Path")
reader.TextFieldType = FieldType.Delimited
reader.SetDelimiters(",")
While Not reader.EndOfData
Try
Dim fields() as String = reader.ReadFields()
' Process fields in this row ...
Catch ex As MalformedLineException
' Handle exception ...
End Try
End While
End Using