File comparison in VB.Net - vb.net

I need to know if two files are identical. At first I compared file sizes and creation timestamps, but that's not reliable enough. I have come up with the following code, that seems to work, but I'm hoping that someone has a better, easier or faster way of doing it.
Basically what I am doing, is streaming the file contents to byte arrays, and comparing thier MD5 hashes via System.Security.Cryptography.
Before that I do some simple checks though, since there is no reason to read through the files, if both file paths are identical, or one of the files does not exist.
Public Function CompareFiles(ByVal file1FullPath As String, ByVal file2FullPath As String) As Boolean
If Not File.Exists(file1FullPath) Or Not File.Exists(file2FullPath) Then
'One or both of the files does not exist.
Return False
End If
If String.Compare(file1FullPath, file2FullPath, True) = 0 Then
' fileFullPath1 and fileFullPath2 points to the same file...
Return True
End If
Dim MD5Crypto As New MD5CryptoServiceProvider()
Dim textEncoding As New System.Text.ASCIIEncoding()
Dim fileBytes1() As Byte, fileBytes2() As Byte
Dim fileContents1, fileContents2 As String
Dim streamReader As StreamReader = Nothing
Dim fileStream As FileStream = Nothing
Dim isIdentical As Boolean = False
Try
' Read file 1 to byte array.
fileStream = New FileStream(file1FullPath, FileMode.Open)
streamReader = New StreamReader(fileStream)
fileBytes1 = textEncoding.GetBytes(streamReader.ReadToEnd)
fileContents1 = textEncoding.GetString(MD5Crypto.ComputeHash(fileBytes1))
streamReader.Close()
fileStream.Close()
' Read file 2 to byte array.
fileStream = New FileStream(file2FullPath, FileMode.Open)
streamReader = New StreamReader(fileStream)
fileBytes2 = textEncoding.GetBytes(streamReader.ReadToEnd)
fileContents2 = textEncoding.GetString(MD5Crypto.ComputeHash(fileBytes2))
streamReader.Close()
fileStream.Close()
' Compare byte array and return result.
isIdentical = fileContents1 = fileContents2
Catch ex As Exception
isIdentical = False
Finally
If Not streamReader Is Nothing Then streamReader.Close()
If Not fileStream Is Nothing Then fileStream.Close()
fileBytes1 = Nothing
fileBytes2 = Nothing
End Try
Return isIdentical
End Function

I would say hashing the file is the way to go, It's how I have done it in the past.
Use Using statements when working with Streams and such, as they clean themselves up.
Here is an example.
Public Function CompareFiles(ByVal file1FullPath As String, ByVal file2FullPath As String) As Boolean
If Not File.Exists(file1FullPath) Or Not File.Exists(file2FullPath) Then
'One or both of the files does not exist.
Return False
End If
If file1FullPath = file2FullPath Then
' fileFullPath1 and fileFullPath2 points to the same file...
Return True
End If
Try
Dim file1Hash as String = hashFile(file1FullPath)
Dim file2Hash as String = hashFile(file2FullPath)
If file1Hash = file2Hash Then
Return True
Else
Return False
End If
Catch ex As Exception
Return False
End Try
End Function
Private Function hashFile(ByVal filepath As String) As String
Using reader As New System.IO.FileStream(filepath, IO.FileMode.Open, IO.FileAccess.Read)
Using md5 As New System.Security.Cryptography.MD5CryptoServiceProvider
Dim hash() As Byte = md5.ComputeHash(reader)
Return System.Text.Encoding.Unicode.GetString(hash)
End Using
End Using
End Function

This is what md5 is made for. You're doing it the right way. However, if you really want to improve it further, I can recommend some things to explore. The emphasis is on explore, because none of these are slam dunks. They may help, but they may also hurt, or they may be overkill. You'll need to evaluate them for your situation and determine (through testing) what will be the best solution.
The first recommendation is to compute the md5 hash without loading the entire file into RAM. The example is C#, but the VB.Net translation is fairly straightforward. If you're working with small files, then what you already have may be fine. However, for anything large enough to end up on .Net's Large Object Heap (85,000 bytes), you probably want to consider using the stream technique instead.
Additionally, if you're using a recent version of .Net, you might want to explore doing this asynchronously for each file. As a practical matter, I suspect you'll get best performance from what you have, as the disk I/O is likely to be the slowest part of this, and I'd expect traditional disks to perform best if you allow them to read from the files in sequence, rather than making your disk seek back and forth between the files. However, you may still be able to do better with asynchronous methods, especially if you follow the previous suggestion, because you can also await at the Read() call level, in addition to awaiting for the entire file. Also, if you're running this on an SSD, that would minimize the problems with seeks and could make an asynchronous solution a clear winner. One warning, though: this is a deep rabbit hole to chase... one that can be worthwhile, but you can also end up spending a lot of time on a YAGNI situation. This is the kind of thing, though, you might choose to explore once for a situation where you probably won't use it, so that you understand it well enough to know how it can help in the future for those situations when you do need it.
One more point is that, for the asynch recommendation to work, you need to isolate the hashing code into it's own method... but you should probably do this anyway.
My final recommendation is to remove the File.Exists() checks. This is a tempting test, I know, but it's almost always wrong. Especially if you adopt the first recommendation, just open the streams near the top of the method using an option that fails if the file does not exist, and make your check on whether the stream opened or not.

Related

Null Exception using StreamReader

I am trying to write a program that reads a textfile, which is neatly organized. I am trying to store the information on each line in two arrays. When I try to run the program, I get a NullReferenceException, and the program does not run. I am unsure as to what I am doing wrong.
Private Sub RentalCostCalculator_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim objReader As IO.StreamReader
Dim strFileLocation As String = "location"
Dim strErrorMessage As String = "The file is not available. Restart when ready."
Dim strErrorHeading As String = "Error"
Dim intCount As Integer = 0
Dim intFill As Integer
' Verifies the file exists
If IO.File.Exists(strFileLocation) = True Then
objReader = IO.File.OpenText(strFileLocation)
' Assigns the cities and city rental costs to the arrays
Do While objReader.Peek() <> -1
_strCityName(intCount) = objReader.ReadLine()
'This is where the error occurs
_decCityRentalCost(intCount) = Convert.ToDecimal(objReader.ReadLine())
intCount += 1
Loop
objReader.Close()
' Displays city in listbox
For intFill = 0 To (_strCityName.Length - 1)
lstTopTenCities.Items.Add(_strCityName(intFill))
Next
Else
MsgBox(strErrorHeading, MsgBoxStyle.Exclamation, strErrorHeading)
Close()
End If
End Sub
Paired arrays that match up by index are an anti-pattern: something to avoid. Better to make a single collection with a custom Class type or tuple.
This is less intuitive, but it's also poor practice to check if the file exists ahead of time. The better pattern handles the exception in the case where the file access fails. The file system is volatile, meaning you still have to be able to handle exceptions on file access, even if an Exists() check passes. We have to write this code anyway, so we may as well rely on it as the main file exists check, too.
But isn't handling exceptions slow? I'm glad you asked. Yes, yes it is. In fact, unrolling the stack to handle an exception is up there among the slowest things you can do in a single computer, which is why exceptions are normally avoided for program flow control like this. But you know what's even worse? Disk I/O. Disk I/O is so much worse even than exception handling. In checking if the file exists up front, we pay for an extra tip out to disk every time, where with exceptions we only the pay the performance penalty if the file access fails.
In summary: write more code and pay a worse cost every time, or pay a still-bad-but-less-cost some of the time, with less code. Skipping the File.Exists() check should be a no-brainer.
Finally, I don't know who taught you to use prefixes like str and obj with your variables, but that's no longer good advice. It made sense back in the vb6 era, but since since then we have a better type system and better tooling, and with the release of VB.Net way back in 2002 Microsoft (who invented the practice) updated their own style guidelines to say not to use those prefixes. It is still common practice to use prefixes for the WinForms control types, but otherwise it's best to avoid them.
Here's a solution that incorporates each of those points, and very likely solves the NullReferenceException as well.
Private Iterator Function ReadRentalFile(filePath As String) As IEnumerable(Of (String, Decimal))
Using rdr As New IO.StreamReader(filePath)
Dim City As String = Nothing
While (City = rdr.ReadLine()) IsNot Nothing
Yield (City, Decimal.Parse(rdr.ReadLine()))
End While
End Using
End Function
Private _cityCosts As IEnumerable(Of (String, Decimal))
Private Sub RentalCostCalculator_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim FileLocation As String = "location"
Try
_cityCosts = ReadRentalFile(FileLocation)
lstTopTenCities.Items.AddRange(_cityCosts.Select(Function(c) c.Item1).ToArray())
Catch
MsgBox("Error", MsgBoxStyle.Exclamation, "The file is not available. Restart when ready.")
End Try
End Sub
But looking at the original code, if the error occurs on this line:
_decCityRentalCost(intCount) = Convert.ToDecimal(objReader.ReadLine())
It's very likely that either the file is not quite as neatly-organized as you expect, and there's no result from objReader.ReadLine(), or (even more likely) that _decCityRentalCost doesn't refer to an actual array at this point, where it was never actually instantiated or the variable was changed to point somewhere else. Given this is in a Form_Load method, it's probably the former.

Will Putting This Onto A background Worker Stop This Issue

I have been trying to fix this for a number of days now without any success. I know I have created another post related to this issue but not sure if I should have continued with the other post rather than creating a new one as I am still quite new to how SO works so apologies if I have gone about this the wrong way.
Objective
Read a text file from disk, output the contents to a Textbox so I can then extract the last 3 lines from it. This is the only way I can think of doing this.
The text file is continuously been updated by another running program but I can still read it even though it is in use but cannot write to it.
I am probing this file through a Timer which ticks every 1 second in order to get the latest information.
Now to the issue...
I have noticed that after some time my app becomes sluggish which is noticeable when I try to move it across the screen or resize it and the CPU usage starts to creep up to over 33%
My Thought Process
As this reading the file is a continuous one, I was thinking that I could move it onto a BackgroundWorker which from my understanding would put it on a different thread and take some load off the main GUI.
Am I barking up the wrong tree on this one?
I am reaching out to more advanced users before I start to get all the text books out on learning how to use the BackgroundWorker.
Here is the code I am using to Read the txt file and output it to a text Box. I have not included the code for extracting the last 3 lines because I don't think that part is causing the issue.
I think the issue is because I am constantly probing the source files every second with a timer but not 100% sure to be honest.
Dim strLogFilePath As String
strLogFilePath = "C:\DSD\data.txt"
Dim LogFileStream As FileStream
Dim LogFileReader As StreamReader
'Open file for reading
LogFileStream = New FileStream(strLogFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
LogFileReader = New StreamReader(LogFileStream)
'populate text box with the contents of the txt file
Dim strRowText As String
strRowText = LogFileReader.ReadToEnd()
TextBox1.text = strRowText
'Clean Up
LogFileReader.Close()
LogFileStream.Close()
LogFileReader.Dispose()
LogFileStream.Dispose()
Firstly, you should use the Using keyword instead of manually disposing objects, because that way you are guaranteed that the object will get disposed, even if an unexpected exception occurs, for example:
' You can initialize variables in one line
Dim strLogFilePath As String = "C:\DSD\data.txt"
Using LogFileStream As New FileStream(strLogFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
' Everything goes in here
End Using
You don't need the reader for my solution. The reading will be done manually.
Next, you need to read the last n lines (in your case, 3) of the stream. Reading the entire file when you're only interested in a few lines at the end is inefficient. Instead, you can start reading from the end until you've reached three (or any number of) line seprators (based on this answer):
Function ReadLastLines(
NumberOfLines As Integer, Encoding As System.Text.Encoding, FS As FileStream,
Optional LineSeparator As String = vbCrLf
) As String
Dim NewLineSize As Integer = Encoding.GetByteCount(LineSeparator)
Dim NewLineCount As Integer = 0
Dim EndPosition As Long = Convert.ToInt64(FS.Length / NewLineSize)
Dim NewLineBytes As Byte() = Encoding.GetBytes(LineSeparator)
Dim Buffer As Byte() = Encoding.GetBytes(LineSeparator)
For Position As Long = NewLineSize To EndPosition Step NewLineSize
FS.Seek(-Position, SeekOrigin.End)
FS.Read(Buffer, 0, Buffer.Length)
If Encoding.GetString(Buffer) = LineSeparator Then
NewLineCount += 1
If NewLineCount = NumberOfLines Then
Dim ReturnBuffer(CInt(FS.Length - FS.Position)) As Byte
FS.Read(ReturnBuffer, 0, ReturnBuffer.Length)
Return Encoding.GetString(ReturnBuffer)
End If
End If
Next
' Handle case where number of lines in file is less than NumberOfLines
FS.Seek(0, SeekOrigin.Begin)
Buffer = New Byte(CInt(FS.Length)) {}
FS.Read(Buffer, 0, Buffer.Length)
Return Encoding.GetString(Buffer)
End Function
Usage:
Using LogFileStream As New FileStream(strLogFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
' Depending on system, you may need to supply an argument for the LineSeparator param
Dim LastThreeLines As String = ReadLastLines(3, System.Text.Encoding.UTF8, LogFileStream)
' Do something with the last three lines
MsgBox(LastThreeLines)
End Using
Note that I haven't tested this code, and I'm sure it can be improved. It may also not work for all encodings, but it sounds like it should be better than your current solution, and that it will work in your situation.
Edit: Also, to answer your question, IO operations should usually be performed asynchronously to avoid blocking the UI. You can do this using tasks or a BackgroundWorker. It probably won't make it faster, but it will make your application more responsive. It's best to indicate that something is loading before the task begins.
If you know when your file is being written to, you can set a flag to start reading, and then unset it when the last lines have been read. If it hasn't changed, there's no reason to keep reading it over and over.

VB.NET (2013) - Check string against huge file

I have a text file that is 125Mb in size, it contains 2.2 million records. I have another text file which doesn't match the original but I need to find out where it differs. Normally, with a smaller file I would read each line and process it in some way, or read the whole file into a string and do likewise, however the two files are too big for that and so I would like to create something to achieve my goal. Here's what I currently have.. excuse the mess of it.
Private Sub refUpdateBtn_Click(sender As Object, e As EventArgs) Handles refUpdateBtn.Click
Dim refOrig As String = refOriginalText.Text 'Original Reference File
Dim refLatest As String = refLatestText.Text 'Latest Reference
Dim srOriginal As StreamReader = New StreamReader(refOrig) 'start stream of original file
Dim srLatest As StreamReader = New StreamReader(refLatest) 'start stream of latest file
Dim recOrig, recLatest, baseDIR, parentDIR, recOutFile As String
baseDIR = vb.Left(refOrig, InStrRev(refOrig, ".ref") - 1) 'find parent folder
parentDIR = Path.GetDirectoryName(baseDIR) & "\"
recOutFile = parentDIR & "Updated.ref"
Me.Text = "Processing Reference File..." 'update the application
Update()
If Not File.Exists(recOutFile) Then
FileOpen(55, recOutFile, OpenMode.Append)
FileClose(55)
End If
Dim x As Integer = 0
Do While srLatest.Peek() > -1
Application.DoEvents()
recLatest = srLatest.ReadLine
recOrig = srOriginal.ReadLine ' check the original reference file
Do
If Not recLatest.Equals(recOrig) Then
recOrig = srOriginal.ReadLine
Else
FileOpen(55, recOutFile, OpenMode.Append)
Print(55, recLatest & Environment.NewLine)
FileClose(55)
x += 1
count.Text = "Record No: " & x
count.Refresh()
srOriginal.BaseStream.Seek(0, SeekOrigin.Begin)
GoTo 1
End If
Loop
1:
Loop
srLatest.Close()
srOriginal.Close()
FileClose(55)
End Sub
It's got poor programming and scary loops, but that's because I'm not a professional coder, just a guy trying to make his life easier.
Currently, this uses a form to insert the original file and the latest file and outputs each line that matches into a new file. This is less than perfect, but I don't know how to cope with the large file sizes as streamreader.readtoend crashes the program. I also don't need the output to be a copy of the latest input, but I don't know how to only output the records it doesn't find. Here's a sample of the records each file has:
doc:ARCHIVE.346CCBD3B06711E0B40E00163505A2EF
doc:ARCHIVE.346CE683B29811E0A06200163505A2EF
doc:ARCHIVE.346CEB15A91711E09E8900163505A2EF
doc:ARCHIVE.346CEC6AAA6411E0BEBB00163505A2EF
The program I have currently works... to a fashion, however I know there are better ways of doing it and I'm sure much better ways of using the CPU and memory, but I don't know this level of programming. All I would like is for you to take a look and offer your best answers to all or some of the code. Tell me what you think will make it better, what will help with one line, or all of it. I have no time limit on this because the code works, albeit slowly, I would just like someone to tell me where my code could be better and what I could do to get round the huge file sizes.
Your code is slow because it is doing a lot of file IO. You're on the right track by reading one line at a time, but this can be improved.
Firstly, I've created some test files based off the data that you provided. Those files contain three million lines and are about 130 MB in size (2.2 million records was less than 100 MB so I've increased the number of lines to get to the file size that you state).
Reading the entire file into a single string uses up about 600 MB of memory. Do this with two files (which I assume you were doing) and you have over 1GB of memory used, which may have been causing the crash (you don't say what error was shown, if any, when the crash occurred, so I can only assume that it was an OutOfMemoryException).
Here's a few tips before I go through your code:
Use Using Blocks
This won't help with performance, but it does make your code cleaner and easier to read.
Whenever you're dealing with a file (or anything that implements the IDisposable interface), it's always a good idea to use a Using statement. This will automatically dispose of the file (which closes the file), even if an error happens.
Don't use FileOpen
The FileOpen method is outdated (and even stated as being slow in its documentation). There are better alternatives that you are already (almost) using: StreamWriter (the cousin of StreamReader).
Opening and closing a file two million times (like you are doing inside your loop) won't be fast. This can be improved by opening the file once outside the loop.
DoEvents() is evil!
DoEvents is a legacy method from back in the VB6 days, and it's something that you really want to avoid, especially when you're calling it two million times in a loop!
The alternative is to perform all of your file processing on a separate thread so that your UI is still responsive.
Using a separate thread here is probably overkill, and there are a number of intricacies that you need to be aware of, so I have not used a separate thread in the code below.
So let's look at each part of your code and see what we can improve.
Creating the output file
You're almost right here, but you're doing some things that you don't need to do. GetDirectoryName works with file names, so there's no need to remove the extension from the original file name first. You can also use the Path.Combine method to combine a directory and file name.
recOutFile = Path.Combine(Path.GetDirectoryName(refOrig), "Updated.ref")
Reading the files
Since you're looping through each line in the "latest" file and finding a match in the "original" file, you can continue to read one line at a time from the "latest" file.
But instead of reading a line at a time from the "original" file, then seeking back to the start when you find a match, you will be better off reading all of those lines into memory.
Now, instead of reading the entire file into memory (which took up 600 MB as I mentioned earlier), you can read each line of the file into an array. This will use up less memory, and is quite easy to do thanks to the File class.
originalLines = File.ReadAllLines(refOrig)
This reads all of the lines from the file and returns a String array. Searching through this array for matches will be slow, so instead of reading into an array, we can read into a HashSet(Of String). This will use up a bit more memory, but it will be much faster to seach through.
originalLines = New HashSet(Of String)(File.ReadAllLines(refOrig))
Searching for matches
Since we now have all of the lines from the "original" line in an array or HashSet, searching for a line is very easy.
originalLines.Contains(recLatest)
Putting it all together
So let's put all of this together:
Private Sub refUpdateBtn_Click(sender As Object, e As EventArgs)
Dim refOrig As String
Dim refLatest As String
Dim recOutFile As String
Dim originalLines As HashSet(Of String)
refOrig = refOriginalText.Text 'Original Reference File
refLatest = refLatestText.Text 'Latest Reference
recOutFile = Path.Combine(Path.GetDirectoryName(refOrig), "Updated.ref")
Me.Text = "Processing Reference File..." 'update the application
Update()
originalLines = New HashSet(Of String)(File.ReadAllLines(refOrig))
Using latest As New StreamReader(refLatest),
updated As New StreamWriter(recOutFile, True)
Do
Dim line As String
line = latest.ReadLine()
' ReadLine returns Nothing when it reaches the end of the file.
If line Is Nothing Then
Exit Do
End If
If originalLines.Contains(line) Then
updated.WriteLine(line)
End If
Loop
End Using
End Sub
This uses around 400 MB of memory and takes about 4 seconds to run.

Visual Basic.NET - Add two numbers (I/O from file)

Following code should sum two numbers from file "input.txt" and write the sum to "output.txt". Compilation is succesfull, but "output.txt" is still empty after running program. What am I doing wrong?
Imports System.IO
Public Class test
Public Shared Sub Main()
Dim scan as StreamReader = new StreamReader("input.txt")
Dim writer as StreamWriter = new StreamWriter("output.txt", True)
Dim input as String
input = scan.ReadLine()
Dim ab() as String = Split(input)
Dim res as Integer = Val(ab(0))+Val(ab(1))
writer.writeLine(res)
writer.close()
End sub
End class
Your code works properly for me, so as long as your input file is formatted properly (i.e. a single line with two numbers separated by spaces, like "1 2") and you have the necessary OS permissions to read and write to those files, then it should work for you too. However, it's worth mentioning that there are several issues with your code that would be good to correct, since the fly in the face of typical best-practices.
First, you should, as much as possible, turn Option Strict On. I know that you have it Off because your code won't compile with it On. The following line is technically misleading, and therefore fails with Option Strict On:
Dim res As Integer = Val(ab(0)) + Val(ab(1))
The reason if fails is because the Val function returns a Double, not an integer, so, technically, depending on the contents of the file, the result could be fractional or could be too large to fit in an Integer. With Option Strict Off, the compiler is essentially automatically fixing your code for you, like this:
Dim res As Integer = CInt(Val(ab(0)) + Val(ab(1)))
In order to set the res variable equal to the result of the calculation, the more capable Double value must be converted down to an Integer. When you are forced to put the CInt in the code yourself, you are fully aware that the conversion is taking place and what the consequences of it might be. When you have Option Strict Off and it inserts the conversion behind-the-scenes, then you may very well miss a potential bug.
Secondly, the Val function is old-school VB6 syntax. While it technically works fine, it's provided mainly for backwards compatibility. The new .NET equivalent would be to use Integer.Parse, Integer.TryParse or Convert.ToInt32.
Thirdly, you never close the scan stream reader. You could just add scan.Close() to the end of your method, but is better, when possible, to create Using blocks for any disposable object, like this:
Using scan As StreamReader = New StreamReader("test.txt")
Using writer As StreamWriter = New StreamWriter("output.txt", True)
Dim input As String
input = scan.ReadLine()
Dim ab() As String = Split(input)
Dim res As Integer = Integer.Parse(ab(0)) + Integer.Parse(ab(1))
writer.WriteLine(res)
End Using
End Using
Lastly, as Hans pointed out, it's not good to rely on the current directory. It's always best to specify full paths for your files. There are different methods in the framework for getting various folder paths, such as the user's desktop folder, or the download folder, or the temp folder, or the application folder, or the current application's folder, or the folder of the current running assembly. You can use any such method to get your desired folder path, and then use Path.Combine to add the file name to get the full file path. For instance:
Dim desktopFolderPath As String = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory)
Dim inputFilePath As String = Path.Combine(desktopFolderPath, "input.txt")
Dim outputFilePath As String = Path.Combine(desktopFolderPath, "output.txt")

Better socket communication system

Currently, the client is sending messages like this:
Public Function checkMD5(ByVal userID As Integer, ByVal gameID As Integer, ByVal file As String, ByVal fileFull As String) As String
Dim make As New CMakeMSG
Dim md5 As New CMD5
make.append("checkfileMD5")
make.append(userID)
make.append(containerID)
make.append(file)
make.append(md5.GenerateFileHash(fileFull))
Return SocketSendAndReceiveMSG(make.makestring)
End Function
The server may receive something like this:
checkfileMD5-MSGDelimit0-12-MSGDelimit1-54-MSGDelimit2-filename.txt-MSGDelimit3-*md5hash*
Which it then reads out:
Private _message As String
Public Function handleMessage() As String
Dim brokenMessage As New ArrayList
brokenMessage = breakDown() 'Split to ArrayList
If brokenMessage(0) = "checkfileMD5" Then
Try
If brokenMessage.Count > 5 Then
Return "0-structureMessedUp"
End If
Return CompareFileMD5(brokenMessage(1), brokenMessage(2), brokenMessage(3), brokenMessage(4))
Catch ex As Exception
Return "0-structureMessedUp"
End Try
End If
End Function
So what it does is take the received message, and split it to an array using the -MSGDelimit- as a delimiter. So in this case the CompareFileMD5() function would receive 12,54,filename.txt,*md5hash*. And based on that it can return to the client whether or not the MD5 matched.
Sure, it works, but it feels sloppy and code on the server gets really messy.
Here's the less relevant functions from the above code (doubt it matters, but you never know):
Private Function breakDown() As ArrayList
Try
Dim theArray As New ArrayList
Dim copymsg As String = _message
Dim counter As Integer = 0
Do Until Not copymsg.Contains("-MSGDelimit")
Dim found As String
found = copymsg.Substring(0, copymsg.IndexOf("-MSGDelimit" & counter & "-"))
theArray.Add(found)
copymsg = copymsg.Replace(found & "-MSGDelimit" & counter & "-", "")
counter += 1
Loop
theArray.Add(copymsg)
Return theArray
Catch ex As Exception
Module1.msg(ex.Message)
End Try
End Function
Private Function CompareFileMD5(ByVal userID As Integer, ByVal gameID As Integer, ByVal filename As String, ByVal source As String) As String
Try
Dim tryFindFile As String = Module1.filedatabase.findfile(userID, gameID, filename)
If Not tryFindFile = "notFound" Then
Dim fileFull As String = tryFindFile & "\" & filename
Dim md5 As New CMD5
If md5.GenerateFileHash(fileFull) = source Then
Return "Match"
Else
Return "NoMatch"
End If
Else
Return "notFound"
End If
Catch ex As Exception
Module1.msg("0")
Return "0"
End Try
End Function
So, any advice on how to handle it better/cleaner/more professional?
Depending on the application, your current solution may be perfectly fine. There are a couple of things that do stand out a little bit:
The "protocol" is a bit heavy in terms of the amount of data sent. The delimiters between the data pieces adds quite a bit of overhead. In the example, it makes up maybe 50% of the payload. In addition, sending all data as text potentially makes the payload larger than absolutely necessary. All of this, though, is not necessarily a problem. If the traffic between the client and server is relatively light, then the extra data on the wire may not be a problem at all. For a request of this size (with or without the relatively high delimiter overhead), the main cost will be round trip costs and would likely change very little by reducing the size of this packet by half. If, though, there are requests with thousands of pieces of data, then reducing the payload size would help.
The use of the delimiters as shown is potentially ambiguous depending on the data sent. It is unlikely given the length and format of the delimiters, but it's something to keep in mind if there ever exists the possibility of having actual data that "looks" like a delimiter.
Assuming that the example shown is one of many similar protocols, I would be inclined to go a different route. One possibility would be to bundle up the request as a JSON object. There are existing packages available to create and read JSON. One example is Json.NET. JSON has a well-defined structure, it is easy for a human to read and verify, and it can be expanded easily. And depending on the data that you send, it would probably a little more lightweight than the current format. And (maybe the part you are interested in), it would maybe seem more "professional".
A couple of additional things that I would do (personal opinion):
Possibly add a client version to the data being sent so the server will know if it "recognizes" the request. Start the client version at some value (e.g., 1). If there are updates to the protocol format (e.g., different data, different structure), change the version to 2 in that release of the software. Then the server can look at the version number to see if it recognizes it. If it is the first version of the server and sees version 2, then it can return an error indicating the server needs to be updated. This is not necessary if you can guarantee that the client and server releases are always matched (sometimes this is hard in practice).
Use an integer value for the request type instead of a string ('checkFileMD5'). If there are going to be a large number of request types, the server can dispatch the request a little more efficiently (maybe) based on an integer value.