How to use Directory.EnumerateFiles - vb.net

msdn (https://msdn.microsoft.com/en-us/library/dd383458(v=vs.110).aspx) says:
The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
How can I start using the collection before the whole collection is returned?
The following code gives an elapsed time of more than 3 minutes for a directory with around 45000 files
Dim TIme1, TIme2 As String
TIme1 = TimeString
Dim DirFiles As Generic.List(Of String) = New Generic.List(Of String)(Directory.EnumerateFiles(SourceDirectory))
Dim NumberOfFiles As Integer
NumberOfFiles = DirFiles.Count()
TIme2 = TimeString
MsgBox("Begin time " & TIme1 & "There are " & NumberOfFiles & " Photos in the Directory ." & SourceDirectory & "End Time " & TIme2)
Can I already use entries in Dirfiles before the collection is entirely read? How?
I used to be a professional programmer before Microsoft launched Windows. My experience with windows programming is minimal.

While you can't make good use of count of files returned by EnumerateFiles, you can start working with individual files in the collection without any delay with For Each loop etc. which don't need the count of elements for its working.
So for example you can do:
Dim FileCount As Integer
Dim files = Directory.EnumerateFiles(srcDir)
For Each file in files
'Do something with this file
' e.g.
TextBox1.AppendText(file & vbCrLf)
FileCount += 1
Next
MsgBox ( FileCount.ToString & " files processed.")
So you see how it can be used?
[NB: freehand typed code..might contain typos. It is only meant to explain the concept.]

EnumerateFiles allows you to start processing files before all the files have been found. It appears that you want to know the number of files. You can't know that until all the files have been found, so EnumerateFiles doesn't help you in this case.

The signature for GetFiles is Directory.GetFiles(path As String) As String(). For it to return you results it must hit the hard drive and build the entire array first. If there are 45,000 files then it must build an array of 45,000 elements before it can give you a result.
The signature for EnumerateFiles is Directory.EnumerateFiles(path As String) As IEnumerable(Of String). In this case it doesn't need to hit the hard drive at all to give you a response. So you should be able to get a result almost instantly regardless of the number of files.
Take this test code:
Dim sw = Stopwatch.StartNew()
Dim files = Directory.GetFiles("C:\Windows\System32")
sw.Stop()
Console.WriteLine(sw.Elapsed.TotalMilliseconds)
I get a result of about 6.5 milliseconds to return the files.
But if I change GetFiles to EnumerateFiles I get a result back in 0.07 milliseconds. It's nearly 100 times slower to call GetFiles for this folder!
This is because EnumerateFiles return an IEnumerable<string>. The interface for IEnumerable(Of T) is:
Public Interface IEnumerable(Of Out T)
Inherits IEnumerable
Function GetEnumerator() As IEnumerator(Of T)
End Interface
Whenever we call foreach or .Count() or .ToArray() on an enumerable under the hood we are calling GetEnumerator() which in turn returns another object of type IEnumerator(Of T) with this signature:
Public Interface IEnumerator(Of Out T)
Inherits IDisposable
Inherits IEnumerator
ReadOnly Property Current As T
Function MoveNext() As Boolean
Sub Reset()
End Interface
It's this enumerator that actually does the hard work of returning all of the files. As soon as the first call to MoveNext is made the first file name is immediately available in Current. Then MoveNext is called in a loop until it returns a false and you then know the loop is over. Meanwhile you can collect all of the files from the Current property.
So, in your code, if you were performing some action over each and every file returned then EnumerateFiles would be the way to go.
But since you are doing New Generic.List(Of String)(Directory.EnumerateFiles(SourceDirectory)) you are forcing the iteration of the entire enumerable immediately. Any advantage of using EnumerateFiles is immediately lost.

GetFiles method will materialize the entire list of files that are in a directory. The preferred method to call now is Directory.EnumerateFiles as it will stream the files back (through a yield-like mechanism) as the underlying call to the OS yields the results back.
Solutions using the GetFiles/GetDirectories are kind of slow since the objects need to be created. Using the enumeration on the other hand doesnt do this, it doesn't create any temporary objects.
Either way in the end theres still iteration happening...
Example file count...
Directory.EnumerateFiles(directory, filetype, SearchOption.AllDirectories).Count()

I now use the following before enumeratefiles is started
Public Function FileCount(PathName As String) As Long
Dim fso As Scripting.FileSystemObject
Dim fld As Scripting.Folder
fso = CreateObject("Scripting.FileSystemObject")
If fso.FolderExists(PathName) Then
fld = fso.GetFolder(PathName)
FileCount = fld.Files.Count
End If
End Function
This needs Microsoft Scripting Runtime (set a reference to the VB script run-time library in your Project)

Related

Limiting the amount of files grabbed from system.io.directory.getfiles

I've got a folder browser dialogue populating the directory location (path) of a system.io.directory.getfiles. The issue is if you accidentally select a folder with hundereds or thousands of files (which there's no reason you would ever need this for this app) it will lock up the app while it grabs all the files. All I'm grabbing are the directory locations as strings and want to put a limit on the amount of files that can be grabbed. Here's my current code that isn't working.
If JigFolderBrowse.ShowDialog = DialogResult.OK Then
Dim dirs(50) As String
dirs = System.IO.Directory.GetFiles(JigFolderBrowse.SelectedPath.ToString, "*", System.IO.SearchOption.AllDirectories)
If dirs.Length> 50 Then
MsgBox("Too Many Files Selected" + vbNewLine + "Select A Smaller Folder To Be Organized")
Exit Sub
End If
'Seperate Each File By Type
For i = 0 To dirs.Length - 1
If Not dirs(i).Contains("~$") Then
If dirs(i).Contains(".SLDPRT") Or dirs(i).Contains(".sldprt") Then
PartsListBx.Items.Add(dirs(i))
ElseIf dirs(i).Contains(".SLDASM") Or dirs(i).Contains(".sldasm") Then
AssemListBx.Items.Add(dirs(i))
ElseIf dirs(i).Contains(".SLDDRW") Or dirs(i).Contains(".slddrw") Then
DrawingListBx.Items.Add(dirs(i))
ElseIf dirs(i).Contains(".pdf") Or dirs(i).Contains(".PDF") Then
PDFsListBx.Items.Add(dirs(i))
ElseIf dirs(i).Contains(".DXF") Or dirs(i).Contains(".dxf") Then
DXFsListBx.Items.Add(dirs(i))
ElseIf Not dirs(i).Contains(".db") Then
OtherFilesListBx.Items.Add(dirs(i))
End If
End If
The Directory.GetFiles method always retrieves the full list of matching files before returning. There is no way to limit it (outside of specifying a more narrow search pattern, that is). There is, however, the Directory.EnumerateFiles method which does what you need. From the MSDN article:
The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
So, for instance, you could do something like this:
dirs = Directory.
EnumerateFiles(
JigFolderBrowse.SelectedPath.ToString(),
"*",
SearchOption.AllDirectories).
Take(50).
ToArray()
Take is a LINQ extension method which returns only the first x-number of items from any IEnumerable(Of T) list. So, in order for that line to work, you'll need to import the System.Linq namespace. If you can't, or don't want to, use LINQ, you can just implement your own method that does the same sort of thing (iterates an IEnumerable list in a for loop and returns after reading only the first 50 items).
Side Note 1: Unused Array
Also, it's worth mentioning, in your code, you initialize your dirs variable to point to a 50-element string array. You then, in the very next line, set it to point to a whole new array (the one returned by the Directory.GetFiles method). While it's not breaking functionality, it is unnecessarily inefficient. You're creating that extra array, just giving the garbage collector extra work to do, for no reason. You never use that first array. It just gets dereferenced and discarded in the very next line. It would be better to create the array variable as null:
Dim dirs() As String
Or
Dim dirs() As String = Nothing
Or, better yet:
Dim dirs() As String = Directory.
EnumerateFiles(
JigFolderBrowse.SelectedPath.ToString(),
"*",
SearchOption.AllDirectories).
Take(50).
ToArray()
Side Note 2: File Extension Comparisons
Also, it looks like you are trying to compare the file extensions in a case-insensitive way. There are two problems with the way you are doing it. First, you only comparing it against two values: all lowercase (e.g. ".pdf") and all uppercase (e.g. ".PDF). That won't work with mixed-case (e.g. ".Pdf").
It is admittedly annoying that the String.Contains method does not have a case-sensitivity option. So, while it's a little hokey, the best option would be to make use of the String.IndexOf method, which does have a case-insensitive option:
If dirs(i).IndexOf(".pdf", StringComparison.CurrentCultureIgnoreCase) <> -1 Then
However, the second problem, which invalidates my last point of advice, is that you are checking to see if the string contains the particular file extension rather than checking to see if it ends with it. So, for instance, a file name like "My.pdf.zip" will still match, even though it's extension is ".zip" rather than ".pdf". Perhaps this was your intent, but, if not, I would recommend using the Path.GetExtension method to get the actual extension of the file name and then compare that. For instance:
Dim ext As String = Path.GetExtension(dirs(i))
If ext.Equals("pdf", StringComparison.CurrentCultureIgnoreCase) Then
' ...

Best Way to Sort GetFiles

I am trying to sort the following files in this order:
TMP_SDF_1180741.PDF
TMP_SDF_1179715.PDF
TMP_SDF_1162371.PDF
TMP_SDF_1141511.PDF
TMP_SDF_1131750.PDF
TMP_SDF_1117362.PDF
TMP_SDF_1104199.PDF
TMP_SDF_1082698.PDF
TMP_SDF_1062921.PDF
TMP_SDF_1043875.PDF
TMP_SDF_991514.PDF
TMP_SDF_970621.PDF
TMP_SDF_963154.PDF
TMP_SDF_952954.PDF
TMP_SDF_948067.PDF
TMP_SDF_917669.PDF
TMP_SDF_904315.PDF
TMP_SDF_899902.PDF
TMP_SDF_892398.PDF
TMP_SDF_882024.PDF
But the actual output is this:
TMP_SDF_991514.PDF
TMP_SDF_970621.PDF
TMP_SDF_963154.PDF
TMP_SDF_952954.PDF
TMP_SDF_948067.PDF
TMP_SDF_917669.PDF
TMP_SDF_904315.PDF
TMP_SDF_899902.PDF
TMP_SDF_892398.PDF
TMP_SDF_882024.PDF
TMP_SDF_1180741.PDF
TMP_SDF_1179715.PDF
TMP_SDF_1162371.PDF
TMP_SDF_1141511.PDF
TMP_SDF_1131750.PDF
TMP_SDF_1117362.PDF
TMP_SDF_1104199.PDF
TMP_SDF_1082698.PDF
TMP_SDF_1062921.PDF
TMP_SDF_1043875.PDF
I have tried researching sort methods by GetFiles but when I apply them, i get errors about system collections not able to bind to a 1-dimensional array and it is frustrating. Here is my code:
Dim di As New IO.DirectoryInfo("C:\temp")
Dim aryFi As IO.FileInfo() = di.GetFiles("*.PDF")
Dim fi As IO.FileInfo
For Each fi In aryFi
My.Computer.FileSystem.RenameFile("C:\TEMP\" & fi.Name, listBox1.SelectedItem.ToString & ".pdf")
listBox1.SelectedIndex = listBox1.SelectedIndex - 1
Next
I am renaming files to be a1 a2 a3 etc so that when I combine in PDF, they are in chronological order. The way i want the sorting, will place them in chronological order. I am sure there is an easier way. As you can tell, the higher the number in the PDF file (1180741) the most recent date of the content of the file. While 882024 would be the oldest file content.
As has been stated in the comments, you need to sort them numerically rather than alphabetically. I don't know the specific sorting algorithm that is used by Windows Explorer, or if it's possible to use the same library, but it's certainly possible to write your own algorithm that sorts however you want.
The first step in doing that is to extract just the numeric part that you want to use as the sort key. Without knowing more details, it's hard to say what the best option for that would be. If you know that the number always starts at a particular character position in the string, you could simply use String.SubString. If it's always delimited by "_" and "." you could use String.Split. If you need something more complex, or if you need the parsing rules to be configurable, you may want to consider using RegEx. As an example, here's a simple example method that uses String.Split:
Public Function GetSortKey(fileName As String) As Integer
Return Integer.Parse(fileName.Split({"_"c, "."c})(2))
End Function
Once you have a method that extracts the sort key for a given file name, you can use it to sort them like this:
di.GetFiles("*.PDF").OrderBy(Function(x) GetSortKey(x.Name))
Perhaps you could take advantage of some tools that you have at your hands
Dim reg As RegEx = new RegEx("\d+")
Dim ordered = new List(Of OrderedFiles)()
for each s in Directory.GetFiles("C:\temp", "*.PDF")
Dim aFile = new OrderedFiles()
aFile.FileName = s
aFile.Sequence = Convert.ToInt32(reg.Match(s).Value)
ordered.Add(aFile)
Next
for each aFile in ordered.OrderByDescending(Function(x) x.Sequence)
Console.WriteLine(Path.GetFileName(aFile.FileName))
Next
End Sub
Class OrderedFiles
Public FileName as String
Public Sequence as Integer
End Class
In this example you have a custom class with the filename and the numeric part that you want to sort. Then a Regex expression that matches any numeric value in your files is applied to your files to build a instance of the class with the name and the numeric part. At the end of the loop just call the Linq method that orders your list by descending order

VB6 map string to integer for headers

I'm trying to parse a CSV File into a VB6 application in order to update multiple records on a table on SQL with existing single record updating code already in the form. The CSV Files will have a header row whixh can be used to validate the information going into the correct place in the ADODB recordset. In C++ you can use a map to say like
map<String s, int x> column
column<"First Name", -1>
column<"Last Name",-1>
Then create a counter across the comma delimited values where if the third value is Last Name then the code could be written to change
column<"Last Name",-1> to column<"Last Name",3> and if x != -1 in any of the maps the file is valid for use, I would then loop through the remaining records and parse into a container using something similar to
strLastName = Array<column[3]>
to assign the record values to the correct variables. I am still very new to VB6, how can I accomplish something similar in VB6 and what containers should be used? So far I have
Public Sub GetImportValues()
On Error GoTo GetImportValues_Error:
Dim intFileNum As Integer
Open Path For Input As #intFileNum
Do Until EOF(intFileNum)
Line Input #intFileNum, vbCrLf
FunctionThatSavesInformationToSQL
Loop
Close #intFileNum
GetImportValues_Exit:
Exit Sub
GetImportValues_Error:
Err.Source = "frmMemberAdd.GetImportValues" & " | " & Err.Source
Err.Raise Err.Number, Err.Source, Err.Description
End Sub
with a dialog box returning the path as a string using App.path in a separate Function
*****************************************************Slight change to answer
The collection was on track for what I had asked but I did have to change it to dictionary because you cannot return items on a collection which kept me from comparing the items and changing the keys but dictionary can. Make sure if you use dictionary you switch the item and key.
If I understand your question correctly, you're trying to create a map (Dictionary<string, int> in C#). In VB6, you can use Collection for this purpose - it's roughly equivalent to C#'s Dictionary<string, object>. It uses String keys and stores all values as Variant. For example:
Dim oColl As Collection
Set oColl = New Collection
oColl.Add -1, "ColumnName"
Dim nColumnIndex As Long
'Get column index for column name.
nColumnIndex = oColl.Item("ColumnName")
If nColumnIndex = -1 Then
nColumnIndex = ...
'When you want to update a column index in the collection, you
'first have to remove the item and then add it back with the right
'index.
oColl.Remove "ColumnName"
oColl.Add nColumnIndex, "ColumnName"
End If
Edit 1:
One word of warning regarding VB6: you'll see many samples doing this:
Dim oObj As New SomeClass
It's ok to do this in VB.Net but don't ever do this in VB6. Declare and instantiate the object on separate statements because the single-statement form generates code where oObj is checked for Nothing and set to an instance before each use. This slows down your code (unnecessary checks) and creates hard-to-find bugs if you're using an instance that's supposed to be gone.
Always do this instead:
Dim oObj As SomeClass
Set oObj = New SomeClass
...
'Clean up the object when you're done with it. Remember, there's
'no garbage collection in COM / VB6, you have to manage object
'lifetimes.
Set oObj = Nothing
Also, use Long instead of Integer as much as you can - Long is a 32-bit integer, while Integer is only 16-bits. VB6 type names can be misleading frequently. Here's an old answer of mine with a bit more detail (not strictly related to your question but useful).
Alternatively, you can create a simplified wrapper around the .NET Dictionary class and expose it as a COM object: this would allow you to call it from VB6. This would likely be (somewhat) slower than Collection and it'd require the .NET Framework for your VB6 project to run.
Edit 2:
As #CMaster commented, Dictionary is available from the Microsoft Scripting Runtime library - you need to add a reference to it to use it (this is why I prefer Collection - it has no dependency). This answer has details about how to use it.

Visual Basic.NET - Add two numbers (I/O from file)

Following code should sum two numbers from file "input.txt" and write the sum to "output.txt". Compilation is succesfull, but "output.txt" is still empty after running program. What am I doing wrong?
Imports System.IO
Public Class test
Public Shared Sub Main()
Dim scan as StreamReader = new StreamReader("input.txt")
Dim writer as StreamWriter = new StreamWriter("output.txt", True)
Dim input as String
input = scan.ReadLine()
Dim ab() as String = Split(input)
Dim res as Integer = Val(ab(0))+Val(ab(1))
writer.writeLine(res)
writer.close()
End sub
End class
Your code works properly for me, so as long as your input file is formatted properly (i.e. a single line with two numbers separated by spaces, like "1 2") and you have the necessary OS permissions to read and write to those files, then it should work for you too. However, it's worth mentioning that there are several issues with your code that would be good to correct, since the fly in the face of typical best-practices.
First, you should, as much as possible, turn Option Strict On. I know that you have it Off because your code won't compile with it On. The following line is technically misleading, and therefore fails with Option Strict On:
Dim res As Integer = Val(ab(0)) + Val(ab(1))
The reason if fails is because the Val function returns a Double, not an integer, so, technically, depending on the contents of the file, the result could be fractional or could be too large to fit in an Integer. With Option Strict Off, the compiler is essentially automatically fixing your code for you, like this:
Dim res As Integer = CInt(Val(ab(0)) + Val(ab(1)))
In order to set the res variable equal to the result of the calculation, the more capable Double value must be converted down to an Integer. When you are forced to put the CInt in the code yourself, you are fully aware that the conversion is taking place and what the consequences of it might be. When you have Option Strict Off and it inserts the conversion behind-the-scenes, then you may very well miss a potential bug.
Secondly, the Val function is old-school VB6 syntax. While it technically works fine, it's provided mainly for backwards compatibility. The new .NET equivalent would be to use Integer.Parse, Integer.TryParse or Convert.ToInt32.
Thirdly, you never close the scan stream reader. You could just add scan.Close() to the end of your method, but is better, when possible, to create Using blocks for any disposable object, like this:
Using scan As StreamReader = New StreamReader("test.txt")
Using writer As StreamWriter = New StreamWriter("output.txt", True)
Dim input As String
input = scan.ReadLine()
Dim ab() As String = Split(input)
Dim res As Integer = Integer.Parse(ab(0)) + Integer.Parse(ab(1))
writer.WriteLine(res)
End Using
End Using
Lastly, as Hans pointed out, it's not good to rely on the current directory. It's always best to specify full paths for your files. There are different methods in the framework for getting various folder paths, such as the user's desktop folder, or the download folder, or the temp folder, or the application folder, or the current application's folder, or the folder of the current running assembly. You can use any such method to get your desired folder path, and then use Path.Combine to add the file name to get the full file path. For instance:
Dim desktopFolderPath As String = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory)
Dim inputFilePath As String = Path.Combine(desktopFolderPath, "input.txt")
Dim outputFilePath As String = Path.Combine(desktopFolderPath, "output.txt")

In vb.net, I can't get my DirectoryInfo function to work properly

I'm trying to perform file deletion but only on files that don't exist in a list.
Example:
Dim FilesToKeep As List(Of String) = MyFunctionThatPopulatesTheList
The FilesToKeep list consists of the filenames.
Here is where I'm having trouble, as these clause functions throw me off big-time.
Dim filesToDelete
filesToDelete = New DirectoryInfo(FilePath) _
.GetFiles("*", SearchOption.AllDirectories) _
.Where(Function(f) Not f.Attributes.HasFlag(FileAttributes.Hidden)) _
.Where(Function(f) Not FilesToKeep.ToString.Contains(f.Name)) _
.[Select](Function(f) New FileCollectionForDelete(f)).ToArray()
Two things I'm trying to do if you look at the bottom two lines of the DirectoryInfo function. I only want the files that do not exist in the FilesToKeep list. The second, is just a helper where I'm storing the information about the file.
But as it stands, filesToDelete returns every single file.
Thank you for your help.
=========== EDIT =============
After comments, I gave it another shot, but curious if anyone can offer opinion on stability of this function.
First, I created another variable called FilesToKeep2
Dim FilesToKeep2 As String = String.Join(",", FilesTOKeep.ToArray())
And my function I left how it was, as it isn't comparing the entire path, note the (f.Name).
So right now this seems to be working properly, but worried about gotcha's later on.
Would this function be as solid as iterating through each one individually?
The problem is this expression:
Function(f) Not FilesToKeep.ToString.Contains(f.Name)
The type of the FilesToKeep object is a List(Of String). Calling ToString on a List(Of String) returns the name of the type. Just remove that part of the expression and you'll be fine:
Function(f) Not FilesToKeep.Contains(f.Name)
Also, I think you're overthinking things with the final .Select(). Skip that (and the .ToArray() call) entirely.
Final code:
Dim FilesToKeep As List(Of String) = MyFunctionThatPopulatesTheList()
Dim filesToDelete = (New DirectoryInfo(FilePath)).GetFiles("*", SearchOption.AllDirectories).
Where(Function(f) Not f.Attributes.HasFlag(FileAttributes.Hidden)).
Where(Function(f) Not FilesToKeep.Contains(f.Name))
For Each fileName As String In filesToDelete
File.Delete(fileName)
Next
Regarding your edit: you're probably fine with that edit. However, you should know that commas are legal in file names, and therefore it's possible to create files that should be deleted, but will still match your string. For best results here, at least use a delimiter character that's not legal in file names.