Strange Issue with Text Field Parser - vb.net

I have a strange issue when using the "Microsoft.VisualBasic.FileIO.TextFieldParser" in VB.NET. I am creating a CSV file that uses ",*" as the delimiter and then trying to export that file into an EXCEL spreadsheet. The program creates the file perfectly, but when I try to pull each entry from the file and put them into the Excel table, some of the entries are getting split up. For example, one of the first entries in one of my rows is "H WIRE*30(SHDR-30V&SHDR-30V) 160mm" without the quotes. When I grab that entry using the TextFieldParser and export it into EXCEL, the 160mm gets dropped from the entry and its added to the row below. See image. Excel Image
Here is a link to the CSV that im using. Test CSV Text File
Here is the code. (I removed the excel portion of the code for this post.) I put a message box in to show each entry the Parser is pulling. When it gets to the "H WIRE*30(SHDR-30V&SHDR-30V) 160mm" entry, it reads "H WIRE*30(SHDR-30V&SHDR-30V)" as its own row with no entries after and then reads "160mm" as a new row with the remaining info that should be all together. (Sorry if that is confusing xD)
I just cant figure out why its doing this? The CSV looks fine, and all the other rows read and export into excel perfectly. But this same row, everytime, is messing up.
Thanks in advance for any help.
THANK YOU SO MUCH Idle_Mind.
Dim MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\TEST CSV.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",*")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MessageBox.Show(currentField)
Next

There is an unexpected Line Feed in that line. I simply downloaded your text file, opened it in Notepad++, and turned on the View --> Show Symbol --> Show End of Line feature:

Related

VB.NET - Working with tab delimited text file

I need help with how to work on text file (like database).
I create excel GUI (with macro's), that search imputed string in sheets with lots of data and display entire row with matching string (for people with installed MS office)
Now I must create alternative VB.Net application working only on tab delimited text files (without ADO.Net) for people who haven't installed MS office, and I don't know how start to work with it.
import them? if yes, then how.
working directly on them? if yes, then how.
My text files is exported excels files/sheets to tab delimited .txt, with loots of columns (100+) with headers, and lots of rows 500+
need help :)
thx
If you want to get the headers from the first line of the file then do this ...
Sub Main()
Dim dt = New DataTable
Dim lines = File.ReadAllLines("TextFile1.txt")
Dim headers = lines(0).Split(vbTab)
For Each header In headers
dt.Columns.Add(header)
Next
For Each line In lines.Skip(1)
Dim parts = line.Split(vbTab)
dt.Rows.Add(parts)
Next
End Sub

parsing a large csv file using vb.net without newline ending

I was given a file that was created with a java program but does not have LF or endofline ending so I am working with a gigantic string. I tried splitting and then using the TextFieldParser but it seems the file is just too big to deal with. The contents are vital to I need to get this data somehow and then clean it up. Here is what I have tried:
Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("C:\Users\Desktop\META3.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message & "is not valid and will be skipped.")
End Try
End While
End Using
I think the best way is to take substrings of the text and I wanted to take all values after the 7 occurrences of a comma which is what the file should have per line. Not sure how to do this and it seems like regex maybe the only option. Any ideas appreciated.
line = freader.Readline()
Dim ms As Match = Regex.Match(line, "(\w+),(\w+),(\w+),(\w+),(\w+),(\w+),")
line = ms.Value
will this work; does not give expected results.
If you can be guaranteed that the number of columns is always consistent why not add a counter that reads each column and then goes onto the next set. You can then create a new spreadsheet or file with the correct format. If you do an search on here there is a package for .net which allows you to build valid .xls and .xlsx files on the fly. The package is called "simple ooxml". i have used this to create all sorts of spreadsheets where I work. I built a command line app that passes and xml file with parameters and builds this as a fully fledged spreadsheet. Hope the above helps. Any questions let me know.

How to search data in a txt file through Visual Basic

I have this txt file with the following information:
National_Insurence_Number;Name;Surname;Hours_Worked;Price_Per_Hour so:
eg.: aa-12-34-56-a;Peter;Smith;36;12
This data has been inputed to the txt file through a VB form which works totally fine, the problem comes when, on another form. This is what I expect it to do:
The user will input into a text box the employees NI Number.
The program will then search through the file that NI Number and, if found;
It will fill in the appropriate text boxes with its data.
(Then the program calculates tax and national insurance which i got working fine)
So basically the problem comes telling the program to search that NI number and introduce each ";" delimited field into its corresponding text box.
Thanks for all.
You just need to parse the file like a csv, you can use Microsoft.VisualBasic.FileIO.TextFieldParser to do this or you can use CSVHelper - https://github.com/JoshClose/CsvHelper
I've used csv helper in the past and it works great, it allows you to create a class with the structure of the records in your data file then imports the data into a list of these for searching.
You can look here for more info on TextFieldParser if you want to go that way -
Parse Delimited CSV in .NET
Dim afile As FileIO.TextFieldParser = New FileIO.TextFieldParser(FileName)
Dim CurrentRecord As String() ' this array will hold each line of data
afile.TextFieldType = FileIO.FieldType.Delimited
afile.Delimiters = New String() {";"}
afile.HasFieldsEnclosedInQuotes = True
' parse the actual file
Do While Not afile.EndOfData
Try
CurrentRecord = afile.ReadFields
Catch ex As FileIO.MalformedLineException
Stop
End Try
Loop
I'd recommend using CsvHelper though, the documentation is pretty good and working with objects is much easier opposed to the raw string data.
Once you have found the record you can then manually set the text of each text box on your form or use a bindingsource.

How to : streamreader in csv file splits to next if lowercase followed by uppercase in line

I am using asp.Net MVC application to upload the excel data from its CSV form to database. While reading the csv file using the Stream Reader, if line contains lower case letter followed by Upper case, it splits in two line . EX.
Line :"1,This is nothing but the Example to explanationIt results wrong, testing example"
This line splits to :
Line 1: 1,This is nothing but the Example to explanation"
Line 2:""
Line 3:It results wrong, testing example
where as CSV file generates right as ""1,This is nothing but the Example to explanationIt results wrong, testing example"
code :
Dim csvFileReader As New StreamReader("my csv file Path")
While Not csvFileReader.EndOfStream()
Dim _line = csvFileReader.ReadLine()
End While
Why should this is happening ? how to resolve this.
When a cell in an excel spreadsheet contains multiple lines, and it is saved to a CSV file, excel separates the lines in the cell with a line-feed character (ASCII value 0x0A). Each row in the spreadsheet is separated with the typical carriage-return/line-feed pair (0x0D 0x0A). When you open the CSV file in notepad, it does not show the lone LF character at all, so it looks like it all runs together on one line. So, in the CSV file, even though notepad doesn't show it, it actually looks like this:
' 1,"This is nothing but the Example to explanation{LF}It results wrong",testing example{CR}{LF}
According to the MSDN documentation on the StreamReader.Readline method:
A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n").
Therefore, when you call ReadLine, it will stop reading at the end of the first line in a multi-line cell. To avoid this, you would need to use a different "read" method and then split on CR/LF pairs rather than on either individually.
However, this isn't the only issue you will run into with reading CSV files. For instance, you also need to properly handle the way quotation characters in a cell are escaped in CSV. In such cases, unless it's really necessary to implement it in your own way, it's better to use an existing library to read the file. In this case, Microsoft provides a class in the .NET framework that properly handles reading CSV files (including ones with multi-line cells). The name of the class is TextFieldParser and it's in the Microsoft.VisualBasic.FileIO namespace. Here's the link to a page in the MSDN that explains how to use it to read a CSV file:
http://msdn.microsoft.com/en-us/library/cakac7e6
Here's an example:
Using reader As New TextFieldParser("my csv file Path")
reader.TextFieldType = FieldType.Delimited
reader.SetDelimiters(",")
While Not reader.EndOfData
Try
Dim fields() as String = reader.ReadFields()
' Process fields in this row ...
Catch ex As MalformedLineException
' Handle exception ...
End Try
End While
End Using

Avoiding 'End Of File' errors

I'm trying to import a tab delimited file into a table.
The issue is, SOMETIMES, the file will include an awkward record that has two "null values" and causes my program to throw a "unexpected end of file".
For example, each record will have 20 fields. But the last record will have only two fields (two null values), and hence, unexpected EOF.
Currently I'm using a StreamReader.
I've tried counting the lines and telling bcp to stop reading before the "phantom nulls", but StreamReader gets an incorrect count of lines due to the "phantom nulls".
I've tried the following code to get rid of all bogus code (code borrowed off the net). But it just replaces the fields with empty spaces (I'd like the result of no line left behind).
Public Sub RemoveBlankRowsFromCVSFile2(ByVal filepath As String)
If filepath = DBNull.Value.ToString() Or filepath.Length = 0 Then Throw New ArgumentNullException("filepath")
If (File.Exists(filepath) = False) Then Throw New FileNotFoundException("Could not find CSV file.", filepath)
Dim tempFile As String = Path.GetTempFileName()
Using reader As New StreamReader(filepath)
Using writer As New StreamWriter(tempFile)
Dim line As String = Nothing
line = reader.ReadLine()
While Not line Is Nothing
If Not line.Equals(" ") Then writer.WriteLine(line)
line = reader.ReadLine()
End While
End Using
End Using
File.Delete(filepath)
File.Move(tempFile, filepath)
End Sub
I've tried using SSIS, but it encounters the EOF unexpected error.
What am I doing wrong?
If you read the entire file into a string variable (using reader.ReadToEnd()) do you get the whole thing? or are you just getting the data up to those phantom nulls?
Have you tried using the Reader.ReadBlock() function to try and read past the file length?
At our company we do hundreds of imports every week. If a file is not sent in the correct, agreed to format for our automated process, we return it to the sender. If the last line is wrong, the file should not be processed because it might be missing information or in some other way corrupt.
One way to avoid the error is to use ReadAllLines, then process the array of file lines instead of progressing through the file. This is also a lot more efficient than streamreader.
Dim fileLines() As String
fileLines = File.ReadAllLines("c:\tmp.csv")
...
for each line in filelines
If trim(line) <> "" Then writer.WriteLine(line)
next line
You can also use save the output lines in the same or a different string array and use File.WriteAllLines to write the file all at once.
You could try the built-in .Net object for reading tab-delimited files. It is Microsoft.VisualBasic.FileIO.TextFileParser.
This was solved using a bit array, checking one bit at a time for the suspect bit.