Comparing Two Files After File Copy - Performance Improvements? - vb.net

I've built a file copying routine into a common library for a variety of different (WinForms) applications I'm currently working on. What I've built implements the commonly-used CopyFileEx method to actually perform the file copy while displaying the progress, which seems to be working great.
The only real issue I'm encountering is that, because most of the file copying I'm doing is for archival purposes, once the file is copied, I would like to "verify" the new copy of the file. I have the following methods in place to do the comparison/verification. I'm sure many of you will quickly see where the "problem" is:
Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean
Dim Match As Boolean = False
If File1.FullName = File2.FullName Then
Match = True
Else
If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then
If File1.Length = File2.Length Then
If File1.LastWriteTime = File2.LastWriteTime Then
Try
Dim File1Hash As String = HashFileForComparison(File1)
Dim File2Hash As String = HashFileForComparison(File2)
If File1Hash = File2Hash Then
Match = True
End If
Catch ex As Exception
Dim CompareError As New ErrorHandler(ex)
CompareError.LogException()
End Try
End If
End If
End If
End If
Return Match
End Function
Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String
Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)
Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider
Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)
Return System.Text.Encoding.Unicode.GetString(FileHash)
End Using
End Using
End Function
This CompareFiles() method checks a few of the "simple" elements first:
Is it trying to compare a file to itself? (if so, always return True)
Do both files actually exist?
Are the two files the same size?
Do they both have the same modification date?
But, you guessed it, here's where the performance takes the hit. Especially for large files, the MD5.ComputeHash method of the HashFileForComparison() method can take a while - about 1.25 minutes for a 500MB file for a total of about 2.5 minutes to compute both hashes for the comparison. Does anyone have a better suggestion for how to more efficiently verify the new copy of the file?

Related

VB.NET 2010 - Extracting an application resource to the Desktop

I am trying to extract an application resource from My.Resources.FILE
I have discovered how to do this with DLL & EXE files, but I still need help with the code for extracting PNG & ICO files.
Other file types also. (If possible)
Here is my current code that works with DLL & EXE files.
Dim File01 As System.IO.FileStream = New System.IO.FileStream("C:\Users\" + Environment.UserName + "\Desktop\" + "SAMPLE.EXE", IO.FileMode.Create)
File01.Write(My.Resources.SAMPLE, 0, My.Resources.SAMPLE.Length)
File01.Close()
First things first, the code you have is bad. When using My.Resources, every time you use a property, you extract a new copy of the data. That means that your second line is getting the data to write twice, with the second time being only to get its length. At the very least, you should be getting the data only once and assigning it to a variable, then using that variable twice. You should also be using a Using statement to create and destroy the FileStream. Even better though, just call File.WriteAllBytes, which means that you don't have to create your own FileStream or know the length of the data to write. You should also not be constructing the file path that way.
Dim filePath = Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "SAMPLE.EXE")
File.WriteAllBytes(filePath, My.Resources.SAMPLE)
As for your question, the important thing to understand here is that it really has nothing to do with resources. The question is really how to save data of any particular type and that is something that you can look up for yourself. When you get the value of a property from My.Resources, the type of the data you get will depend on the type of the file you embedded in first place. In the case of a binary file, e.g. DLL or EXE, you will get back a Byte array and so you save that data to a file in the same way as you would any other Byte array. In the case of an image file, e.g. PNG, you will get back an Image object, so you save that like you would any other Image object, e.g.
Dim filePath = Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "PICTURE.PNG")
Using picture = My.Resources.PICTURE
picture.Save(filePath, picture.RawFormat)
End Using
For an ICO file you will get back an Icon object. I'll leave it to you to research how to save an Icon object to a file.
EDIT:
It's important to identify what the actual problem is that you're trying to solve. You can obviously get an object from My.Resources so that is not the problem. You need to determine what type that object is and determine how to save an object of that type. How to do that will be the same no matter where that object comes from, so the resources part is irrelevant. Think about what it is that you have to do and write a method to do it, then call that method.
In your original case, you could start like this:
Dim data = My.Resources.SAMPLE
Once you have written that - even as you write it - Intellisense will tell you that the data is a Byte array. Your actual problem is now how to save a Byte array to a file, so write a method that does that:
Private Sub SaveToFile(data As Byte(), filePath As String)
'...
End Sub
You can now which you want to do first: write code to call that method as appropriate for your current scenario or write the implementation of the method. There are various specific ways to save binary data, i.e. a Byte array, to a file but, as I said, the simplest is File.WriteAllBytes:
Private Sub SaveToFile(data As Byte(), filePath As String)
File.WriteAllBytes(filePath, data)
End Sub
As for calling the method, you need to data, which you already have, and the file path:
Dim data = My.Resources.SAMPLE
Dim folderPath = My.Computer.FileSystem.SpecialDirectories.Desktop
Dim fileName = "SAMPLE.EXE"
Dim filePath = Path.Combine(folderPath, fileName)
SaveToFile(data, filePath)
Simple enough. You need to follow the same steps for any other resource. If you embedded a PNG file then you would find that the data is an Image object or, more specifically, a Bitmap. Your task is then to learn how to save such an object to a file. It shouldn't take you long to find out that the Image class has its own Save method, so you would use that in your method:
Private Sub SaveToFile(data As Image, filePath As String)
data.Save(filePath, data.RawFormat)
End Sub
The code to call the method is basically as before, with the exception that an image object needs to be disposed when you're done with it:
Dim data = My.Resources.PICTURE
Dim folderPath = My.Computer.FileSystem.SpecialDirectories.Desktop
Dim fileName = "SAMPLE.EXE"
Dim filePath = Path.Combine(folderPath, fileName)
SaveToFile(data, filePath)
data.Dispose()
The proper way to create and dispose an object in a narrow scope like this is with a Using block:
Dim folderPath = My.Computer.FileSystem.SpecialDirectories.Desktop
Dim fileName = "SAMPLE.EXE"
Dim filePath = Path.Combine(folderPath, fileName)
Using data = My.Resources.PICTURE
SaveToFile(data, filePath)
End Using
Now it is up to you to carry out the same steps for an ICO file. If you are a hands on learner then get your hands on.

How do I speed up this code for building a list of all files in a network directory?

I have a WPF program that grabs all of the directories within a certain network directory and lists them in a listview.
The problems is that there is so many directories that it can take up to 5 seconds for the list to load.
I am wondering if there is a way to speed this up with more efficient code. Or maybe even a way to store the list in an array and just look for changes every time after the first?
For Each i As String In Directory.GetDirectories(dir)
If File.Exists(Path.GetFullPath(i) & "\cover.jpg") Then
If (File.GetAttributes(Path.GetFullPath(i)) And FileAttributes.Hidden) <> FileAttributes.Hidden Then
Dim foldername As String = Path.GetFileName(i)
Dim moviename As String = foldername
If moviename.Length > 4 Then
If moviename.Substring(0, 4) = "The " Then
moviename = moviename.Remove(0, 4)
End If
End If
Dim display As Boolean = True
If IO.Directory.GetDirectories(Path.GetFullPath(i)).Length < 1 Then
If IO.Directory.GetFiles(Path.GetFullPath(i), "*.avi").Length < 1 Then
If IO.Directory.GetFiles(Path.GetFullPath(i), "*.mp4").Length < 1 Then
If IO.Directory.GetFiles(Path.GetFullPath(i), "*.mkv").Length < 1 Then
display = False
End If
End If
End If
End If
If display = True Then
Showslist.Items.Add(New With {Key .img = Path.GetFullPath(i) & "\cover.jpg", .name = foldername, .path = Path.GetFullPath(i), .created = Directory.GetCreationTime(Path.GetFullPath(i)), .moviename = moviename})
End If
End If
End If
Next
1. Do not read the contents of a directory more than once. Instead, read all required content once and cache it in a (in-memory) variable. This will help performance because accessing memory is a lot faster than doing I/O (here: accessing the file system). For example:
If IO.Directory.GetFiles(…, "*.avi").Length < 1 Then
If IO.Directory.GetFiles(…, "*.mp4").Length < 1 Then
If IO.Directory.GetFiles(…, "*.mkv").Length < 1 Then
…
You just queried the same directory three times. You could change this to only read the contents once (thus potentially speeding up your code up to three times), and then filter it in-memory:
'Imports System.IO
'Imports System.Linq
' access the file system only once and store the results in-memory…
Dim filePaths As String() = Directory.GetFiles(…, "*")
' … and perform everything else on that in-memory cache:
If Not filePaths.Any(Function(fp) Path.GetExtension(fp) = ".avi") Then
If Not filePaths.Any(…) Then
If Not filePaths.Any(…) Then
…
P.S.: Perhaps it's worth pointing out that unlike Directory.GetFiles, the System.IO.Path methods should not cause expensive file system hits: They simply operate on strings that are known to contain file system paths.
2. Consider reading the root directory's complete contents recursively. The Directory.GetFiles method has an overload Directory.GetFiles(String, String, SearchOption) whose SearchOption parameter can be set to SearchOption.AllDirectories. In that case, you will not only get the contents from the specified directory itself, but also from all of its sub-directories.
That means, you would need a single one call to Directory.GetFiles (for the root directory) instead of many calls, which means you're again reducing the number of expensive I/O calls. Store the results in an array and proceed to build your WPF list from there.
' read the contents of the network root directory, including all of its sub-directories…
Dim filePaths As String() = Directory.GetFiles(…, "*", SearchOption.AllDirectories)
' …and do everything else using the in-memory cache built above:
For Each filePath As String in filePaths
…
Showslist.Items.Add(…)
Next

Writeline overwriting the last line

I am trying to writeline into a text file this works accept it appears to overwrite the last line each time. I would like it to write to the next line instead of overwriting. Here is the code I'm using
Dim FileNumber As Integer = FreeFile()
FileOpen(FileNumber, "c:\Converted.txt", OpenMode.Output)
PrintLine(FileNumber, convertedDir)
FileClose(FileNumber)
You are using an old (VB6/VBA) code, better use the .NET StreamWriter:
Dim append As Boolean = True
Using writer As System.IO.StreamWriter = New System.IO.StreamWriter("c:\Converted.txt", append)
writer.WriteLine(convertedDir)
End Using
append indicates whether the given file should be appended. Nonetheless, as suggested by Boris B., you can set this variable always to True because StreamWriter is capable to deal with both situations (existing file or not) automatically.
In any case, I am including below the "theoretically right" way to deal with StreamWriter (by changing the append property depending upon the fact that the given file is present or not):
Dim append As Boolean = False
Dim fileName As String = "c:\Converted.txt"
If (System.IO.File.Exists(fileName)) Then
append = True
End If
Using writer As System.IO.StreamWriter = New System.IO.StreamWriter(fileName, append)
writer.WriteLine(convertedDir) 'Writes to a new line
End Using
For a quick solution based on existing code change the line
FileOpen(FileNumber, "c:\Converted.txt", OpenMode.Output)
to
FileOpen(FileNumber, "c:\Converted.txt", OpenMode.Append)
However, you should really update your method of writing files, since FileOpen and similar are there just for compatibility with older VB & VBA programs (and programmers :). For a more modern solution check out varocarbas' answer.

Deleting file if not just created (or being used)

I am creating a console app that will delete pictures from a directory every 30 minutes. Problem is that its being populated by files every minute or so. So if I go and delete files in that directory then it may cause an error trying to delete a file thats being created just then or opened.
I currently have this code to copy the files to another directory and then delete them from the source directory.
Dim f() As String = Directory.GetFiles(sourceDir)
For i As Integer = 0 To UBound(f)
'Check file date here in IF statement FIRST...
File.Copy(f(i), destDir & f(i).Replace(sourceDir, ""))
If File.Exists(f(i)) = True Then
File.Delete(f(i))
End If
Debug.Print(f(i) & " to >>> " & destDir & f(i).Replace(sourceDir, ""))
Next
How can I use:
File.GetCreationTime(f(i))
in an IF statement checking IF the currently file its on is newer than 30 seconds ago?
OR
Is there a way of only populating:
Dim f() As String = Directory.GetFiles(sourceDir)
with only those files that are more than 30 seconds old?
There isn't a reliable way to detect if a file is locked or not. Even if you did find out (it is technically possible), it could be locked before you tried to delete it. There are other reasons a delete may fail. In your case, I don't think it matters what the reason was.
The only way is to put the call to delete in a try/catch and trap IOException, and then retry if you want.
You need to use a FileInfo object to get the CreatedTime and compare to Now. You can also use LastAccessTime or LastWriteTime, but since these are all new files being written then, you don't need to.
Private Sub DeleteFiles()
Dim files = From f In Directory.GetFiles("c:\temp")
Let fi = New FileInfo(f)
Where fi.Exists AndAlso fi.CreationTime <= DateTime.Now.AddSeconds(-30)
For Each f In files
Try
f.Delete()
Catch ex As Exception
If TypeOf ex Is IOException AndAlso IsFileLocked(ex) Then
' do something?
End If
'otherwise we just ignore it. we will pick it up on the next pass
End Try
Next
End Sub
Private Shared Function IsFileLocked(exception As Exception) As Boolean
Dim errorCode As Integer = Marshal.GetHRForException(exception) And ((1 << 16) - 1)
Return errorCode = 32 OrElse errorCode = 33
End Function
IsFileLocked function lifted from this other thread on SO
Dim NewFileDate As DateTime = DateTime.Now.AddSeconds(-30)
' get the list of all files in FileDir
Dim PicFiles As List(Of String) = System.IO.Directory.GetFiles("C:\", "*.txt").ToList()
' filter the list to only include files older than NewFileDate
Dim OutputList As List(Of String) = PicFiles.Where(Function(x) System.IO.File.GetCreationTime(x) < NewFileDate).ToList()
' delete files in the list
For Each PicFile As String In OutputList
'wrap this next line in a Try-Catch if you find there is file locking.
System.IO.File.Delete(PicFile)
Next
Obviously targeting .Net 3.5 or 4.0

how do I put contents of C: into an array?

Am learning arrays at the moment and I have the below piece of code that goes through drive C: and displays the files in in a list box.
I want to try and expand it to use array.sort so that it gets the files, puts them into an array, and then I can sort by filename or file size. I have been rattling my brain over this - as to how do I put the files into an array.
Would like an explanation if possible as more interested in learning it rather than the answer.
Thanks!
Private Sub btnclick_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnclick.Click
Call Clearlist()
Dim strFilesinfo As System.IO.FileInfo
Dim strlength As Double = 0
Dim strname As String = ""
For Each strFiles As String In My.Computer.FileSystem.GetFiles("c:\")
strFilesinfo = My.Computer.FileSystem.GetFileInfo(strFiles)
strlength = strFilesinfo.Length
strname = strFilesinfo.Name
lstData.Items.Add(strname & " " & strlength.ToString("N0"))
Next
End Sub
End Class
To allow the data to be sortable, you'd need to be displaying something that could treat that information separately (i.e. a class or structure). You might also find that a different type of control, such as a DataGridView might be easier to get to grips with.
The .Net framework does define an interface, IBindingList which collections can implement to show that they report, amongst other things, sorting.
I'm providing this as a sample for learning purposes but it should not be used as-is. Getting every file from the entire C:\ should not be done like this. Aside from the performance issues there are windows security limitations that won't actually let you do this.
The FileList being populated here is getting just the TopDirectoryOnly. If you change that input to "AllDirectories" it will get all the subdirectories but it will fail as I stated before.
Dim path As String = "C:\"
Dim dir As New System.IO.DirectoryInfo(path)
Dim fileList = dir.GetFiles("*.*", IO.SearchOption.TopDirectoryOnly)
Dim fileSort = (From file In fileList _
Order By file.Name _
Select file.Name, file.Length).ToList
For Each file In fileSort
With file
lstData.Items.Add(String.Format("{0} {1}", .Name, .Length.ToString("N0")))
End With
Next file
Just change the Order By in the LINQ query to change how the sorting is done. There are many other ways to do the sorting but LINQ will handle it for you with very little code.