How can I improve the efficiency of my simple file-splitting program - vb.net

I have a simple program that reads a .txt file, and then splits it up into many files of "pMaxRows" number of rows. These .txt files are huge - some are nearly 25Gb. Right now it is not running fast enough for my liking, I feel that there should be a way to improve the efficiency by maybe reading/writing multiple lines at once, but I am not very experienced with vb.net streamreader/streamwriter.
Code is below:
Public Sub Execute(ByVal pFileLocation As String, _
ByVal pMaxRows As Int32)
Dim sr As IO.StreamReader
Dim Row As String
Dim SourceRowCount As Int64
Dim TargetRowCount As int64
Dim TargetFileNumber As Int32
''Does the file exist in that location?
If IO.File.Exists(pFileLocation) = False Then
Throw New Exception("File does not exist at " & pFileLocation)
End If
''Split FileLocation into FileName and Folder Location
Dim arrFileLoc() As String = pFileLocation.Split("\")
Dim i As Integer = arrFileLoc.Length - 1
Dim FileName As String = arrFileLoc(i)
Dim FileLocationLength As Integer = pFileLocation.Length
Dim FileNameLength As Integer = FileName.Length
Dim Folder As String = pFileLocation.Remove(FileLocationLength - FileNameLength, FileNameLength)
''Read the file
sr = New IO.StreamReader(pFileLocation)
SourceRowCount = 0
TargetRowCount = 0
TargetFileNumber = 1
''Create First Target File Name
Dim TargetFileName As String
TargetFileName = TargetFileNumber & "_" & FileName
''Open streamreader and start reading lines
Do While Not sr.EndOfStream
''if it hits the target number of rows:
If (TargetRowCount = pMaxRows) Then
''Advance target file number
TargetFileNumber += 1
''Create New file with target file number
TargetFileName = TargetFileNumber & "_" & FileName
''Set target row count back to 0
TargetRowCount = 0
End If
''Read line
Row = sr.ReadLine()
''Write line
Using sw As New StreamWriter(Folder & TargetFileName, True)
sw.WriteLine(Row)
End Using
SourceRowCount += 1
TargetRowCount += 1
Loop
End Sub
Anyone have any suggestions? Even directing me to the right place if this has been answered before would be much appreciated

Related

VB.net Check items in a text doc and see if it's in a folder

I have a text document with a list of file names and their extensions. I need to go through this list and check a directory for the existence of each file. I then need to output the result to either foundFilesList.txt or OrphanedFiles.txt. I have two approaches to this function, and neither is working. The first example uses a loop to cycle through the text doc. The second one doesn't work it never sees a match for the file from the fileNamesList.
Thank you for taking the time to look at this.
First Code:
Dim FILE_NAME As String
FILE_NAME = txtFileName.Text
Dim fileNames = System.IO.File.ReadAllLines(FILE_NAME)
fCount = 0
For i = 0 To fileNames.Count() - 1
Dim fileName = fileNames(i)
'sFileToFind = location & "\" & fileName & "*.*"
Dim paths = IO.Directory.GetFiles(location, fileName, IO.SearchOption.AllDirectories)
If Not paths.Any() Then
System.IO.File.AppendAllText(orphanedFiles, fileName & vbNewLine)
Else
For Each pathAndFileName As String In paths
If System.IO.File.Exists(pathAndFileName) = True Then
Dim sRegLast = pathAndFileName.Substring(pathAndFileName.LastIndexOf("\") + 1)
Dim toFileLoc = System.IO.Path.Combine(createXMLFldr, sRegLast)
Dim moveToFolder = System.IO.Path.Combine(MoveLocation, "XML files", sRegLast)
'if toFileLoc = XML file exists move it into the XML files folder
If System.IO.File.Exists(toFileLoc) = False Then
System.IO.File.Copy(pathAndFileName, moveToFolder, True)
System.IO.File.AppendAllText(ListofFiles, sRegLast & vbNewLine)
fileFilename = (fileName) + vbCrLf
fCount = fCount + 1
BackgroundWorker1.ReportProgress(fCount)
'fileCount.Text = fCount
End If
End If
Next
End If
BackgroundWorker1.ReportProgress(100 * i / fileNames.Count())
'statusText = i & " of " & fileName.Count() & " copied"
fCount = i
Next
Second Code:
FILE_NAME = txtFileName.Text 'textfield with lines of filenames are located ]
Dim fileNamesList = System.IO.File.ReadAllLines(FILE_NAME)
location = txtFolderPath.Text
fCount = 0
' Two list to collect missing and found files
Dim foundFiles As List(Of String) = New List(Of String)()
Dim notfoundFiles As List(Of String) = New List(Of String)()
Dim fileNames As String() = System.IO.Directory.GetFiles(createXMLFldr)
For Each file As String In fileNamesList
Debug.Write("single file : " & file & vbCr)
' Check if the files is contained or not in the request list
Dim paths = IO.Directory.GetFiles(location, file, IO.SearchOption.AllDirectories)
If fileNamesList.Contains(Path.GetFileNameWithoutExtension(file)) Then
Dim FileNameOnly = Path.GetFileName(file)
Debug.Write("FileNameOnly " & FileNameOnly & vbCr)
If System.IO.File.Exists(FileNameOnly) = True Then
'if toFileLoc = XML file exists move it into the XML files folder
Dim moveToFolder = System.IO.Path.Combine(MoveLocation, "XML files", file)
foundFiles.Add(file) 'add to foundFiles list
fileFilename = (file) + vbCrLf 'add file name to listbox
fCount = fCount + 1
Else
notfoundFiles.Add(file)
End If
End If
Next
File.WriteAllLines(ListofFiles, foundFiles)
File.WriteAllLines(orphanedFiles, notfoundFiles)
This is just a starting point for you, but give it a try:
Friend Module Main
Public Sub Main()
Dim oFiles As List(Of String)
Dim _
sOrphanedFiles,
sSearchFolder,
sFoundFiles,
sTargetFile As String
sOrphanedFiles = "D:\Results\OrphanedFiles.txt"
sSearchFolder = "D:\Files"
sFoundFiles = "D:\Results\FoundFiles.txt"
oFiles = IO.File.ReadAllLines("D:\List.txt").ToList
oFiles.ForEach(Sub(File)
If IO.Directory.GetFiles(sSearchFolder, File, IO.SearchOption.AllDirectories).Any Then
sTargetFile = sFoundFiles
Else
sTargetFile = sOrphanedFiles
End If
IO.File.AppendAllText(sTargetFile, $"{File}{Environment.NewLine}")
End Sub)
End Sub
End Module
If I've misjudged the requirements, let me know and I'll update accordingly.
Explanations and comments in-line.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
'I presume txtFileName.Text contains the full path including the file name
'I also presume that this text file contains only file names with extensions
Dim FilesInTextFile = System.IO.File.ReadAllLines(txtFileName.Text)
'Instead of accessing the Directory over and over, just get an array of all the files into memory
'This should be faster than searching the Directory structure one by one
'Replace <DirectoryPathToSearch> with the actual path of the Directory you want to search
Dim FilesInDirectory = IO.Directory.GetFiles("<DirectoryPathToSearch>", "*.*", IO.SearchOption.AllDirectories)
'We now have an array of full path and file names but we just need the file name for comparison
Dim FileNamesInDirectory = From p In FilesInDirectory
Select Path.GetFileName(p)
'A string builder is more efficient than reassigning a string with &= because a
'string build is mutable
Dim sbFound As New StringBuilder
Dim sbOrphan As New StringBuilder
'Instead of opening a file, writing to the file and closing the file
'in the loop, just append to the string builder
For Each f In FilesInTextFile
If FileNamesInDirectory.Contains(f) Then
sbFound.AppendLine(f)
Else
sbOrphan.AppendLine(f)
End If
Next
'After the loop write to the files just once.
'Replace the file path with the actual path you want to use
IO.File.AppendAllText("C:\FoundFiles.txt", sbFound.ToString)
IO.File.AppendAllText("C:\OrphanFiles.txt", sbOrphan.ToString)
End Sub

recursive reading for files in folder using vb.net

I wrote a program which is not checking the file update time but its not checking recursive folder files.kindly help for recursive folder files as well.
My code is here :
Sub getfilestat1()
Dim fileName As String
Dim CurrCyleTime As Date
Dim PrevCycleTime As Date
Dim DBCycleTime As Date
Dim connectionString As String, sql As String
Dim _SQLConnection As AseConnection
Dim _SQLCommand As AseCommand
Dim _SQLAdapter As AseDataAdapter
Dim _DataSet As DataSet
Dim _SQLReader As AseDataReader
_SQLConnection = New AseConnection
_SQLCommand = New AseCommand
_SQLConnection.ConnectionString = "Data Source='10.49.196.97';Port=9713;Database=db_print;Uid=kuat199;Pwd=testing1; "
_SQLCommand.Connection = _SQLConnection
_SQLCommand.CommandText = ""
_SQLCommand.CommandType = CommandType.Text
_SQLCommand.CommandTimeout = 900000000
_SQLConnection.Open()
Dim command As New AseCommand("select * from Kampachi_Cycle", _SQLConnection)
Dim reader As AseDataReader = command.ExecuteReader()
While reader.Read()
' Console.WriteLine(reader("pol_no").ToString() & " " & Convert.ToString(reader("image_return")) & " " & Convert.ToString(reader("no_of_images")))
DBCycleTime = reader("CYCLE").ToString()
End While
' Dim asSettings As AppSettingsSection = cAppConfig.AppSettings
'Dim fi As New System.IO.DirectoryInfo("D:\Vimal\test")
Dim fi As New System.IO.DirectoryInfo("\\kaip3r7ciwf01\BicorData\report\kam\")
Dim files = fi.GetFiles("*", SearchOption.AllDirectories).ToList()
'For Each filename As String In IO.Directory.GetFiles(Directory, "*", IO.SearchOption.AllDirectories)
'For Each file In files Select file Order By file.CreationTime Descending
''Dim first = (From file In files Select file Order By file.CreationTime Ascending).FirstOrDefault
'Count the number files in network path
Dim fcount = files.Count()
'Fetching the previous cycle run time from config file
PrevCycleTime = ConfigurationManager.AppSettings("PrevCycleTime")
CurrCyleTime = Now()
ConfigurationManager.AppSettings("PrevCycleTime") = CurrCyleTime
''''My.Settings.Save()
For i As Integer = 0 To fcount - 1
If files(i).LastWriteTime > DBCycleTime.AddMinutes(-20) Then
fileName = files(i).Name.ToString()
Dim insertCmd As New AseCommand("INSERT INTO Kampachi_FilesProcess " + " ( FILENAME, FileReadStatus) " + " VALUES( #file_name, #read_stat )", _SQLConnection)
Dim parm As New AseParameter("#file_name", AseDbType.VarChar, 1000)
insertCmd.Parameters.Add(parm)
parm = New AseParameter("#read_stat", AseDbType.VarChar, 12)
insertCmd.Parameters.Add(parm)
Dim recordsAffected As Integer
insertCmd.Parameters(0).Value = fileName
insertCmd.Parameters(1).Value = "Y"
recordsAffected = insertCmd.ExecuteNonQuery()
If i = 0 Then
fileName = files(i).Name.ToString()
Dim updCmd As New AseCommand("update Kampachi_Cycle set CYCLE = Getdate()", _SQLConnection)
Dim updparm As New AseParameter("#file_name", AseDbType.VarChar, 1000)
recordsAffected = updCmd.ExecuteNonQuery()
End If
End If
Next
End Sub
After these changes it looks fine and giving out properly.
It is giving recursive reading as well.
Change this line:
Dim files = fi.GetFileSystemInfos.ToList()
To:
Dim files = fi.GetFiles("*", SearchOption.AllDirectories).ToList()
To answer below question about the If not checking all of the files: You are correct, but your code explicitly used the FirstOrDefault method so it would only ever examine the first file. I don't know what you're doing with the rest of your program here, and your question didn't specify, but the above answered your question about recursive file searching.
To get a list of all the files that are older than 25 minutes use this code:
Dim files As List(Of FileInfo) = fi.GetFiles("*", SearchOption.AllDirectories).ToList
Dim oldFileTimeStamp As DateTime = DateTime.Now.AddMinutes(-25)
Dim olderFiles As List(Of FileInfo) = files.Where(Function(fi2) fi2.LastWriteTime > oldFileTimeStamp).ToList()
Please, if this answered this specific question, please click the accepted answer button. If you have additional questions, unrelated to the original question, please open a new Stackoverflow question, and do not add new questions to an existing Stackoverflow question. This makes it easier for future viewers to find answers to your follow up question(s) (ie: search won't find questions inside of question, it only finds the original question).

vb.net Splitting string lines into new strings

I want to split a string - which includes multiple lines - into new strings.
As it seems that people dont understand my problem here some further informations:
I read out values into strings from a XML-file. Some of those strings countain multiple lines. Now I need every single value of that string on a new string(variable) so that I can tell Homer to drink a beer and tell Lenny to go to bed and not tell the whole Team to go to bed. (Hopefully this story helps you :D )
To keep this simple I'll define a "static" string for this sample.
I'll put 3 of my tries down below. I'd love to hear what's wrong with them. I also tried it with lists and enums where I could split the string but no define a new one..
But I assume that there is a much easier solution for my problem...
Dim team As String = "Simpson, Homer" & vbCrLf & "Leonard, Lenny" & vbCrLf & "Carlson, Carl"
1.
Dim objReader As New StringReader(team)
Dim tm() As String
Dim i As Integer = 1
Do While objReader.Peek() <> -1
tm(i) = objReader.ReadLine() & vbNewLine
i = i + 1
Loop
Dim i As Integer = 0
For Each Line As String In team.Split(New [Char]() {CChar(vbTab)})
Dim tm(i) As String = ReadLine(team, i)
i = i + 1
Next
3.
Dim tm() As String
Dim i As Integer = 0
Dim objReader As New StringReader(team)
Do While objReader.Peek() <> -1
tm(i) = ReadLine(team, i)
i = i + 1
Loop
And the function used in 2. and 3.
Public Function ReadLine(ByVal sFile As String, Optional ByVal nLine As Long = 1) As String
Dim sLines() As String
Dim oFSO As Object
Dim oFile As Object
On Error GoTo ErrHandler
oFSO = CreateObject("Scripting.FileSystemObject")
If oFSO.FileExists(sFile) Then
oFile = oFSO.OpenTextFile(sFile)
sLines = Split(oFile.ReadAll, vbCrLf)
oFile.Close()
Select Case Math.Sign(nLine)
Case 1
ReadLine = sLines(nLine - 1)
Case -1
ReadLine = sLines(UBound(sLines) + nLine + 1)
End Select
End If
ErrHandler:
oFile = Nothing
oFSO = Nothing
End Function
Thanks in advance for any shared thoughts.
There is in fact an easy solution for my problem. Sorry if I caused confusion.
Module Module1
Dim team As String = "Simpson, Homer" & vbCrLf & "Leonard, Lenny" & vbCrLf & "Carlson, Carl"
Sub Main()
Dim tm As String() = team.Split(vbLf)
'Test
Console.WriteLine(tm(0)) 'Homer
Console.WriteLine(tm(1)) 'Lenny
Console.WriteLine(tm(2)) 'Carl
End Sub
End Module

Combine multiple txt files alternately (vb.net)

Let me explain it on an excel sheet. I have few txt files in directory (f.txt, d.txt, s.txt, a.txt and q.txt). Each file has few lines of text. And I want to combine those files but in specific way - it is shown on screenshot.
and output should be:
I've already made a code but it doesn't work - I don't know why.
Dim fileEntries As String() = Directory.GetFiles("D:\dir\", "*.txt")
' Process the list of .txt files found in the directory. '
Dim i As Integer = 0
Dim filesCount As Integer = Directory.GetFiles("D:\dir\", "*.txt").Count
Do Until i = filesCount
'do it for every file in folder'
i = i + 1
Dim reader As New System.IO.StreamReader(fileEntries(i))
Dim files() As String = reader.ReadToEnd.Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
Dim lineCount = File.ReadAllLines(fileEntries(i)).Length
Dim w As Integer = 0
Dim dt As DataTable
dt.Columns.Add(i)
'add column "0" for file 1, "1" for file 2 etc.'
Do Until w = lineCount
dt.Rows.Add(files(w))
'write each line in file 1 to column 0, etc.'
w = w + 1
Loop
Loop
Can somebody help me?
Read/write
If your goal is as shown in the last image, write back to a file named output.txt, then this can be done in a single line of code.
My.Computer.FileSystem.WriteAllText("D:\dir\output.txt", String.Join(Environment.NewLine, (From path As String In Directory.GetFiles("D:\dir", "*.txt") Where IO.Path.GetFileNameWithoutExtension(path) <> "output" Select My.Computer.FileSystem.ReadAllText(path, Encoding.UTF8))), False, Encoding.UTF8)
You can of course make this a bit more readable if you don't like one-liners.
My.Computer.FileSystem.WriteAllText(
"D:\dir\output.txt",
String.Join(
Environment.NewLine,
(
From
path As String
In
Directory.GetFiles("D:\dir", "*.txt")
Where
IO.Path.GetFileNameWithoutExtension(path) <> "output"
Select
My.Computer.FileSystem.ReadAllText(path, Encoding.UTF8)
)
),
False,
Encoding.UTF8
)
Iterate
If you need to iterate each line and/or each file, store the result in a local variable.
Dim files As IEnumerable(Of String()) = (
From
path As String
In
Directory.GetFiles("D:\dir", "*.txt")
Select
My.Computer.FileSystem.ReadAllText(path, Encoding.UTF8).Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
)
For Each file As String() In files
For Each line As String In file
Next
Next
DataSet
If you need to create a DataSet from the result, then take advantage of anonymous types. This way you can store both the name of the file and its lines.
Dim files = (
From
path As String
In
Directory.GetFiles("D:\dir", "*.txt")
Select
New With {
Key .Name = IO.Path.GetFileNameWithoutExtension(path),
.Lines = My.Computer.FileSystem.ReadAllText(path, Encoding.UTF8).Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
}
)
Dim data As New DataSet()
With data
.BeginInit()
For Each item In files
With data.Tables.Add(item.Name)
.BeginInit()
.Columns.Add("Column1", GetType(String))
.EndInit()
.BeginLoadData()
For Each line As String In item.Lines
.Rows.Add(line)
Next
.EndLoadData()
End With
Next
.EndInit()
End With
There are few problems in your code:
Your datatable was not initialized
value of w is exceed than the size of files array
Note: I use DataSet to add each DataTable, However you can remove it if it's not required.
Try following code:
Dim fileEntries As String() = Directory.GetFiles("C:\dir\", "*.txt")
' Process the list of .txt files found in the directory. '
Dim filesCount As Integer = Directory.GetFiles("C:\dir\", "*.txt").Count()
Dim ds As New DataSet()
For i As Integer = 0 To filesCount - 1
'do it for every file in folder'
i = i + 1
Dim reader As New System.IO.StreamReader(fileEntries(i))
Dim files As String() = reader.ReadToEnd().Split(New String() {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
Dim lineCount = File.ReadAllLines(fileEntries(i)).Length
Dim w As Integer = 0
Dim dt As New DataTable()
dt.Columns.Add(i.ToString())
'add column "0" for file 1, "1" for file 2 etc.'
While w <> lineCount
If files.Length = w AndAlso w <> 0 Then
Continue While
End If
dt.Rows.Add(files(w))
'write each line in file 1 to column 0, etc.'
w = w + 1
End While
ds.Tables.Add(dt)
Next

Can't write to files properly in VB.NET

I am trying to make a program that writes the binary code of a file in a separate text file.
When I use this program on a text file, it doesn't write anything in the new file. I then tested this for .jpg and .mp3 files and the program seems to write most of the binary code but leaves out the last couple of bytes. Here is my code:
Sub Main()
Console.Write("Filename: ")
Dim Filename As String = Console.ReadLine()
Console.Write("Extension: ")
Dim Extension As String = Console.ReadLine()
Console.WriteLine()
Dim Stream_1 As FileStream = New FileStream(Filename & "." & Extension, FileMode.Open)
Dim Stream_2 As FileStream = New FileStream(Filename & "_b.txt", FileMode.Create)
Dim Reader_1 As BinaryReader = New BinaryReader(Stream_1)
Dim Writer_2 As StreamWriter = New StreamWriter(Stream_2)
Dim File_Bytes() As Byte = Reader_1.ReadBytes(Convert.ToInt32(Stream_1.Length))
Dim Binary_String As String = ""
'These are used to a add line break after every 8 bytes
Dim Binary_String_Collection As String = ""
Dim Counter As Integer
For Each File_Byte In File_Bytes
Counter += 1
Binary_String = Convert.ToString(File_Byte, 2)
For I = 1 To 8 - Binary_String.Length
Binary_String = "0" & Binary_String
Next
Binary_String_Collection = Binary_String_Collection & Binary_String & " "
If Counter = 8 Then
Writer_2.WriteLine(Binary_String_Collection)
Counter = 0
Binary_String_Collection = ""
End If
Next
If Binary_String_Collection <> "" Then
Writer_2.WriteLine(Binary_String_Collection)
End If
Console.ReadLine()
End Sub
At first I thought that my program wasn't reading the binary code properly so I added console outputs at locations where it writes to the file. The program displayed correct output so I'm confused why it isn't writing properly.
Make sure you close the file and Dispose of the streams correctly.