Append Data to each row in a CSV File VB.NET - vb.net

I have as csv file with a header. I need to loop through all of the lines in the csv and on each one append an additional column at the very end. I would like to not have to read the line in each because when it gets re-written it seems to change data. I have a weird field with symbols so it doesn't read it right. I know this has to be easy but I have spent two days researching options, and have not found the answer. I am on a huge time crunch to get this finished up today. Any suggestions?
Thanks in Advance

As suggested, if your data is getting mangled somehow then it's most likely an encoding issue. The answer to your problem is extremely simple, e.g.
Dim filePath As String
Dim encoding As Encoding
Dim newColumnHeader As String
'...
Dim lines = File.ReadAllLines(filePath, encoding)
lines(0) &= "," & newColumnHeader
For lineNumber = 1 To lines.GetUpperBound(0)
lines(lineNumber) &= "," & lineNumber
Next
File.WriteAllLines(filePath, lines, encoding)

Related

For each loop keeps stopping on second cycle VB

The software I'm writing is being run in a service installed on a computer.
I want to read a text file, process it, and code it to a different path.
the software is doing exactly what it's supposed to do but it only processes 2 files and it stops. I believe that its something to do with the for each loop. I found some information online saying that its to do with the amount of memory being allocated to each cycle of the for each loop.
Any help is appreciated.
my code goes like this.
For Each foundFile As String In My.Computer.FileSystem.GetFiles("C:\Commsin\", FileIO.SearchOption.SearchTopLevelOnly, "ORDER-*.TXT")
Dim filenum As Integer
filenum = FreeFile()
FileOpen(filenum, foundFile, OpenMode.Input)
While Not EOF(filenum)
<do a bunch of stuff>
End While
<more code>
Dim arrayFileName() As String = GetFileName.Split("\")
Dim FileName As String = arrayFileName(2)
My.Computer.FileSystem.CopyFile(foundFile, "C:\Commsin\Done\" & FileName)
If IO.File.Exists("C:\Commsin\Done\" & FileName) Then
My.Computer.FileSystem.DeleteFile(foundFile, Microsoft.VisualBasic.FileIO.UIOption.AllDialogs, Microsoft.VisualBasic.FileIO.RecycleOption.SendToRecycleBin)
NoOfOrders -= NoOfOrders
End If
Next
Fundamental mistake: Don't modify the collection you are iterating over, i.e. avoid this pattern (pseudocode):
For Each thing In BunchOfThings:
SomeOperation()
BunchOfThings.Delete(thing)
Next thing
It's better to follow this pattern here (pseudocode again):
While Not BunchOfThings.IsEmpty()
thing = BunchOfThings.nextThing()
SomeOperation()
BunchOfThings.Delete(thing)
End While
I'll leave it as an exercise for you to convert your code from the first approach to the second.
It looks like you're trying to extract the filename from the full path using Split().
Why not just use:
Dim fileName As String = IO.Path.GetFileName(foundFile)
Instead of:
Dim arrayFileName() As String = GetFileName.Split("\")
Dim FileName As String = arrayFileName(2)
Thank you, everyone, for your suggestions, I have successfully implemented the recommended changes. It turned out that the issue wasn't with the code itself.
It was with one of the files I was using it had a text row that once split into an array it wasn't at a required length giving an error "Index was outside the bounds of the array."
It was a mistake on the file, I also added some check to prevent this error in the future.
Thank You.

Validate a csv file

This is my sample file
#%cty_id1,#%ccy_id2,#%cty_src,#%cty_cd3,#%cty_nm4,#%cty_reg5,#%cty_natnl6,#%cty_bus7,#%cty_data8
690,ALL2,,AL,ALBALODMNIA,,,,
90,ALL2,,,AQ,AKNTARLDKCTICA,,,
161,IDR2,,AZ,AZLKFMERBALFKIJAN,,,,
252,LTL2,,BJ,BENLFMIN,,,,
206,CVE2,,BL,SAILFKNT BAFSDRTHLEMY,,,,
360,,,BW2,BOPSLFTSWLSOANA,,,,
The problem is for #%cty_cd3 is a standard column(NOT NULL) with length 2 letters only, but in sql server the record shifts to the other column,(due to a extra comma in btw)how do i validate a csv file,to make sure that
when there's a 2 character word need to be only in 4 column?
there are around 10000 records ?
Set of rules Defined !
Should have a standard set of delimiters for eachrow
if not
Check for NOT NULL values having Null values
If found Null
remove delimiter at the pointer
The 3 ,,, are not replaced with 2 ,,
#UPDATED : Can i know if this can be done using a script ?
Updated i need only a function That operates on records like
90,ALL2,,,AQ,AKNTARLDKCTICA,,, correct them using a Regex or any other method and put back into the source file !
Your best bet here may be to use the tSchemaComplianceCheck component in Talend.
If you read the file in with a tFileInputDelimited component and then check it with the tSchemaComplianceCheck where you set cty_cd to not nullable then it will reject your Antarctica row simply for the null where you expect no nulls.
From here you can use a tMap and simply map the fields to the one above.
You should be able to easily tweak this as necessary, potentially with further tSchemaComplianceChecks down the reject lines and mapping to suit. This method is a lot more self explanatory and you don't have to deal with complicated regex's that need complicated management when you want to accommodate different variations of your file structure with the benefit that you will always capture all of the well formatted rows.
You could try to delete the empty field in column 4, if column no. 4 is not a two-character field, as follows:
awk 'BEGIN {FS=OFS=","}
{
for (i=1; i<=NF; i++) {
if (!(i==4 && length($4)!=4))
printf "%s%s",$i,(i<NF)?OFS:ORS
}
}' file.csv
Output:
"id","cty_ccy_id","cty_src","cty_nm","cty_region","cty_natnl","cty_bus_load","cty_data_load"
6,"ALL",,"AL","ALBANIA",,,,
9,"ALL",,"AQ","ANTARCTICA",,,
16,"IDR",,"AZ","AZERBAIJAN",,,,
25,"LTL",,"BJ","BENIN",,,,
26,"CVE",,"BL","SAINT BARTH�LEMY",,,,
36,,,"BW","BOTSWANA",,,,
41,"BNS",,"CF","CENTRAL AFRICAN REPUBLIC",,,,
47,"CVE",,"CL","CHILE",,,,
50,"IDR",,"CO","COLOMBIA",,,,
61,"BNS",,"DK","DENMARK",,,,
Note:
We use length($4)!=4 since we assume two characters in column 4, but we also have to add two extra characters for the double quotes..
The solution is to use a look-ahead regex, as suggested before. To reproduce your issue I used this:
"\\,\\,\\,(?=\\\"[A-Z]{2}\\\")"
which matches three commas followed by two quoted uppercase letters, but not including these in the match. Ofc you could need to adjust it a bit for your needs (ie. an arbitrary numbers of commas rather than exactly three).
But you cannot use it in Talend directly without tons of errors. Here's how to design your job:
In other words, you need to read the file line by line, no fields yet. Then, inside the tMap, do the match&replace, like:
row1.line.replaceAll("\\,\\,\\,(?=\\\"[A-Z]{2}\\\")", ",,")
and finally tokenize the line using "," as separator to get your final schema. You probably need to manually trim out the quotes here and there, since tExtractDelimitedFields won't.
Here's an output example (needs some cleaning, ofc):
You don't need to entry the schema for tExtractDelimitedFields by hand. Use the wizard to record a DelimitedFile Schema into the metadata repository, as you probably already did. You can use this schema as a Generic Schema, too, fitting it to the outgoing connection of tExtractDelimitedField. Not something the purists hang around, but it works and saves time.
About your UI problems, they are often related to file encodings and locale settings. Don't worry too much, they (usually) won't affect the job execution.
EDIT: here's a sample TOS job which shows the solution, just import in your project: TOS job archive
EDIT2: added some screenshots
Coming to the party late with a VBA based approach. An alternative way to regex is to to parse the file and remove a comma when the 4th field is empty. Using microsoft scripting runtime this can be acheived the code opens a the file then reads each line, copying it to a new temporary file. If the 4 element is empty, if it is it writes a line with the extra comma removed. The cleaned data is then copied to the origonal file and the temporary file is deleted. It seems a bit of a long way round, but it when I tested it on a file of 14000 rows based on your sample it took under 2 seconds to complete.
Sub Remove4thFieldIfEmpty()
Const iNUMBER_OF_FIELDS As Integer = 9
Dim str As String
Dim fileHandleInput As Scripting.TextStream
Dim fileHandleCleaned As Scripting.TextStream
Dim fsoObject As Scripting.FileSystemObject
Dim sPath As String
Dim sFilenameCleaned As String
Dim sFilenameInput As String
Dim vFields As Variant
Dim iCounter As Integer
Dim sNewString As String
sFilenameInput = "Regex.CSV"
sFilenameCleaned = "Cleaned.CSV"
Set fsoObject = New FileSystemObject
sPath = ThisWorkbook.Path & "\"
Set fileHandleInput = fsoObject.OpenTextFile(sPath & sFilenameInput)
If fsoObject.FileExists(sPath & sFilenameCleaned) Then
Set fileHandleCleaned = fsoObject.OpenTextFile(sPath & sFilenameCleaned, ForWriting)
Else
Set fileHandleCleaned = fsoObject.CreateTextFile((sPath & sFilenameCleaned), True)
End If
Do While Not fileHandleInput.AtEndOfStream
str = fileHandleInput.ReadLine
vFields = Split(str, ",")
If vFields(3) = "" Then
sNewString = vFields(0)
For iCounter = 1 To UBound(vFields)
If iCounter <> 3 Then sNewString = sNewString & "," & vFields(iCounter)
Next iCounter
str = sNewString
End If
fileHandleCleaned.WriteLine (str)
Loop
fileHandleInput.Close
fileHandleCleaned.Close
Set fileHandleInput = fsoObject.OpenTextFile(sPath & sFilenameInput, ForWriting)
Set fileHandleCleaned = fsoObject.OpenTextFile(sPath & sFilenameCleaned)
Do While Not fileHandleCleaned.AtEndOfStream
fileHandleInput.WriteLine (fileHandleCleaned.ReadLine)
Loop
fileHandleInput.Close
fileHandleCleaned.Close
Set fileHandleCleaned = Nothing
Set fileHandleInput = Nothing
KillFile (sPath & sFilenameCleaned)
Set fsoObject = Nothing
End Sub
If that's the only problem (and if you never have a comma in the field bt_cty_ccy_id), then you could remove such an extra comma by loading your file into an editor that supports regexes and have it replace
^([^,]*,[^,]*,[^,]*,),(?="[A-Z]{2}")
with \1.
i would question the source system which is sending you this file as to why this extra comma in between for some rows? I guess you would be using comma as a delimeter for importing this .csv file into talend.
(or another suggestion would be to ask for semi colon as column separator in the input file)
9,"ALL",,,"AQ","ANTARCTICA",,,,
will be
9;"ALL";,;"AQ";"ANTARCTICA";;;;

VB.net will not read text file correctly

I've been trying to use StreamReader to read a log file. I cannot verify what it is encoded in, as when I open it in notepad++ and select ANSI encoding, I get this result:
I'm getting the characters needed when using ANSI but they are followed by things like [NULL][EOT][SOH][NUL][SI]
When I try and read the file in VB (using StreamReader or ReadAll) with ANSI encoding selected the resulting string I get back is completely wrong.
How could I read a file like this in VB.net?
You could use the IO.File.ReadAllText("File Location", encoding as System.Text.Encoding) method,
Dim textFromFile as string = IO.File.ReadAllText("C:\Users\Jason\Desktop\login20130417.rdb", System.Text.Encoding.ASCII) 'Or Unicode, UFT32, UFT8, UFT7, BigEndianUnicode or default. Default is ANSI.
If you still don't get the text you need by using the default encoding (ANSI), then you can always try the other 6 different encoding methods.
Update...
It appears that your file is corrupt, using the code below I was able to get a binary representation of whatever is in the file, I got this,
1111111111111101000001110000010000000000000001010000000000010011000000000000100000000000000111100000000000100110000000000011100000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111110100000111000001000000000000000101000000000001001100000000000010000000000000011110000000000010100000000000111111111111110100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000110011100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
The massive amount of null data would suggest that the file is corrupt, which would also explain why we are not getting a lot of data whenever we try to read the file.
The code,
Dim fileData As String = IO.File.ReadAllText("C:\Users\Jason\Desktop\login20130417.rdb")
Dim i As Integer = 0
Dim binaryData As String = ""
Dim ch As String = ""
Do Until i = fileData.Length
ch = fileData.Chars(i)
bin = bin & System.Convert.ToString(AscW(ch), 2).PadLeft(8, "0")
i = i + 1
Loop
As #Daniel A. White suggested in his comment, that file does not appear to be encoded like a "normal" text file. A StreamReader will not work in this situation. I would attempt to use a BinaryReader.
Rdb file? Never heard of it. Quick google makes it less clear - n64 database file, Darkbot, etc...
However considering the name you have, and the general look of the opened file, i would say its a binary file.
If you want to read the file in vb.net you'll need a library of sorts, and i can't help you with one until you are able to shed some light on what the file may be, or what it was created with.

reading a simple text file, splitting and sorting the contents using vb

I'm quite new to vb and doing simple basics, I have managed to access and read a specific file line by line. If I wanted to split information by a comma or space and then sort alphabetically or numerically, how would I go about this procedure? Would I create a loop within the reading loop to parse the information? A simple to follow example would really help...Thanks!
Dim file As String = "C:\Users\test.txt"
Dim Line As String
If System.IO.File.Exists(file) = True Then
Dim objReader As New System.IO.StreamReader(file)
Do While objReader.Peek() <> -1
Line = Line & objReader.ReadLine() & vbNewLine
Loop
Next
Label1.Text = Line
objReader.Close()
Else
MsgBox("File Does Not Exist")
End If
It depends what you want do do with the text you split really.
The Split() function will return you an array of string with the results of your split, from there it really depends on the data.
Here is an example of using split http://www.dotnetperls.com/split-vbnet
Since you mentioned you want to sort the data alphabetically you may wish to look at http://www.codepedia.com/1/VBNET_ArraySort or look into using LINQ.
It is quite acceptable to nest a loop within your main loop if you want to do something more complex with the data.

Avoiding 'End Of File' errors

I'm trying to import a tab delimited file into a table.
The issue is, SOMETIMES, the file will include an awkward record that has two "null values" and causes my program to throw a "unexpected end of file".
For example, each record will have 20 fields. But the last record will have only two fields (two null values), and hence, unexpected EOF.
Currently I'm using a StreamReader.
I've tried counting the lines and telling bcp to stop reading before the "phantom nulls", but StreamReader gets an incorrect count of lines due to the "phantom nulls".
I've tried the following code to get rid of all bogus code (code borrowed off the net). But it just replaces the fields with empty spaces (I'd like the result of no line left behind).
Public Sub RemoveBlankRowsFromCVSFile2(ByVal filepath As String)
If filepath = DBNull.Value.ToString() Or filepath.Length = 0 Then Throw New ArgumentNullException("filepath")
If (File.Exists(filepath) = False) Then Throw New FileNotFoundException("Could not find CSV file.", filepath)
Dim tempFile As String = Path.GetTempFileName()
Using reader As New StreamReader(filepath)
Using writer As New StreamWriter(tempFile)
Dim line As String = Nothing
line = reader.ReadLine()
While Not line Is Nothing
If Not line.Equals(" ") Then writer.WriteLine(line)
line = reader.ReadLine()
End While
End Using
End Using
File.Delete(filepath)
File.Move(tempFile, filepath)
End Sub
I've tried using SSIS, but it encounters the EOF unexpected error.
What am I doing wrong?
If you read the entire file into a string variable (using reader.ReadToEnd()) do you get the whole thing? or are you just getting the data up to those phantom nulls?
Have you tried using the Reader.ReadBlock() function to try and read past the file length?
At our company we do hundreds of imports every week. If a file is not sent in the correct, agreed to format for our automated process, we return it to the sender. If the last line is wrong, the file should not be processed because it might be missing information or in some other way corrupt.
One way to avoid the error is to use ReadAllLines, then process the array of file lines instead of progressing through the file. This is also a lot more efficient than streamreader.
Dim fileLines() As String
fileLines = File.ReadAllLines("c:\tmp.csv")
...
for each line in filelines
If trim(line) <> "" Then writer.WriteLine(line)
next line
You can also use save the output lines in the same or a different string array and use File.WriteAllLines to write the file all at once.
You could try the built-in .Net object for reading tab-delimited files. It is Microsoft.VisualBasic.FileIO.TextFileParser.
This was solved using a bit array, checking one bit at a time for the suspect bit.