Best method to compare strings? - vb.net

Scenario: I created a software that calculates the hash of a file, and compares it to a hash list file in my possession (about 1 mln - growing), currently in txt format.
Which is the best way to make the comparison as fast as possibile?
I'm using this function:
Public HashList As New List(Of String)
Private Sub LoadHash()
For Each hash As String In IO.File.ReadAllLines("C:\test\hash.txt")
HashList.Add(hash)
Next
End Sub
Private Function CheckFile(ByVal filename As String) As Boolean
If HashList.Contains(MD5(filename)) Then
Return True
End If
Return False
End Function
Any suggestions for improve this code?
are there better methods?

Try using a better collection type like a HashSet. There are a lot in .NET that all have their use.
Public HashList As New HashSet(Of String)

Related

How to limit the scope of a shared variable so that it can only be access in one function?

Private Shared _twolettercountryCodeDict As Generic.Dictionary(Of String, String)
Private Function twolettercountrycode() As String
If _twolettercountryCodeDict Is Nothing Then
_twolettercountryCodeDict = New Generic.Dictionary(Of String, String) From {{"ty", "turkey"}, {"py", "pakinmay"}, {"ra", "romania"}, {"vm", "vietnam"}, {"bl", "brazil"}, {"et", "egypt"}, {"ka", "korea"}}
Dim listOfCountries = fileToCol(COUNTRYCODESFileName)
For Each var In listOfCountries
Dim ar = var.Split({"*"}, System.StringSplitOptions.None).ToList()
_twolettercountryCodeDict.Add(LCase(ar(1)), UCase(ar(0)))
Next
End If
Return _twolettercountryCodeDict(Me.twoletter.ToLower)
End Function
Here, I am using Private Shared _twolettercountryCodeDict As Generic.Dictionary(Of String, String)
That's because I want to share that _twolettercountryCodeDict for the whole program. I am basically implementing lazy loading. I do not want part of the code that read a text file and populate country codes are done again and again.
The thing is if I declare it as Private Shared, other methods on the same class can access that variable too. Which is not much of a problem but say I want to avoid it.
If I declare the variable as static inside the function then the twolettercountryCodeDict won't be shared.
So I am in a dilemma. What's the solution?
Let's just say that twolettercountrycode requires a private member, so it can't be a shared function. But I want _twolettercountryCodeDict to be shared and accessible only from twolettercountrycode. Can I do so?
This doesn't do precisely what you asked for, but it solves the requirement of only allowing the resource loading to be done once. You could achieve the same thing by using a Shared Constructor on a class that's solely for loading your resource.
You may also want to use a ReadOnlyDictionary (implementation) so that your dictionary can't be modified by callers.
Friend Shared ReadOnly Property twolettercountrycode As Generic.Dictionary
Get
Static _twolettercountryCodeDict As Generic.Dictionary = Nothing
If _twolettercountryCodeDict Is Nothing Then
_twolettercountryCodeDict = New Generic.Dictionary(Of String, String) From {{"ty", "turkey"}, {"py", "pakinmay"}, {"ra", "romania"}, {"vm", "vietnam"}, {"bl", "brazil"}, {"et", "egypt"}, {"ka", "korea"}}
Dim listOfCountries = fileToCol(COUNTRYCODESFileName)
For Each var In listOfCountries
Dim ar = var.Split({"*"}, System.StringSplitOptions.None).ToList()
_twolettercountryCodeDict.Add(LCase(ar(1)), UCase(ar(0)))
Next
End If
return _twolettercountryCodeDict
End Get
End Property

VB.Net to optimize execution by keeping data other than fetching data from database everytime

I am newbie in vb.net I am trying to optimize my code execution that i am working on . i am working on a program having more than 45 forms. In every form its calling the function IsPowerUser to check is power user
If we can store all details about user while logging in , then we need to use these values every time when we needed instead of collecting data from database . May be this question belongs to VB Basics.
You can use lazy initialization to postpone the database call until you actually need the information (if ever), and then store the value for re-use.
Private _isPowerUser As New Lazy(Of Boolean)(AddressOf GetIsPowerUser)
Public Function IsPowerUser() As Boolean
Return _isPowerUser.Value
End Function
Private Function GetIsPowerUser() As Boolean
' Retrieve the information from the database and return
Return True Or False
End Function
In this sample, _isPowerUser is a backing field that uses lazy initialization. It is first initialized when someone calls IsPowerUser. Under the covers, Lazy(Of T) then calls the delegate GetIsPowerUser() that does all the heavy lifting to retrieve the value from the database.
Lazy(Of T) only needs to call GetIsPowerUser() once.
In other words: any consecutive calls to IsPowerUser() do not trigger GetIsPowerUser().
EDIT (1/2)
Lazy(Of T) was not introduced until .NET 4, so here is a code sample that works for .NET 2 and above:
Private _isPowerUser As Boolean?
Private lazyLock As New Object
Public Function IsPowerUser() As Boolean
If Not _isPowerUser.HasValue Then
SyncLock lazyLock
If Not _isPowerUser.HasValue Then
_isPowerUser = GetIsPowerUser()
End If
End SyncLock
End If
Return _isPowerUser.Value
End Function
Private Function GetIsPowerUser() As Boolean
' Retrieve the information from the database and return
Return True Or False
End Function
EDIT (2/2)
I changed my code sample to work in a multi-user environment:
Private _powerUsers As New Dictionary(Of Long, Boolean)
Private ReadOnly LazyLock As New Object
Public Function IsPowerUser(userId As Long) As Boolean
If Not _powerUsers.ContainsKey(userId) Then
SyncLock LazyLock
If Not _powerUsers.ContainsKey(userId) Then
_powerUsers.Add(userId, GetIsPowerUser(userId))
End If
End SyncLock
End If
Return _powerUsers.Item(userId)
End Function
Private Function GetIsPowerUser(UId As Long) As Boolean
' Retrieve the information from the database and return
Return True Or False
End Function
Usage example:
Dim Steven As Long = 454151
If IsPowerUser(Steven) Then
Console.WriteLine("Steven is a power user.")
Else
Console.WriteLine("Steven is a normal user.")
End If

How to fill object variables defined in the dictionary based on JSON?

OK, that question sounds maybe a little confusing so I'll try to explain it with an example.
Pretend you have an object like this:
Class Something
Private varX As New Integer
Private varY As New String
'[..with the associated property definitions..]
Public Sub New()
End Sub
End Class
And another with:
Class JsonObject
Inherits Dictionary(Of String, String)
Public Function MakeObject() As Object 'or maybe even somethingObject
Dim somethingObject As New Something()
For Each kvp As KeyValuePair(Of String, String) In Me
'Here should happen something to use the Key as varX or varY and the Value as value for the varX or varY
somethingObject.CallByName(Me, kvp.Key, vbGet) = kpv.Value
Next
return somethingObject
End Function
End Class
I've got the 'CallByMe()' function from a previous question of myself
CallByName works different from the way you are trying to use it. Look at the documentation, it will tell you that in this particular case the correct usage would be
CallByName(Me, kvp.Key, vbSet, kpv.Value)
However, the function CallByName is part of a VB library that isn’t supported on all devices (notably it isn’t included in the .NET Mobile framework) and consequently it’s better not to use it.
Using proper reflection is slightly more complicated but guaranteed to work on all platforms.
Dim t = GetType(Something)
Dim field = t.GetField(kvp.Key, BindingFlags.NonPublic Or BindingFlags.Instance)
field.SetValue(Me, kvp.Value)

DataReader ordinal-based lookups vs named lookups

Microsoft (and many developers) claim that the SqlDataReader.GetOrdinal method improves the performance of retrieving values from a DataReader versus using named lookups ie. reader["ColumnName"]. The question is what is the true performance difference if dealing with small, paged record sets? Is it worth the extra overhead of finding and referencing ordinal indexes throughout the code?
Microsoft recommends not calling GetOrdinal within a loop.
That would include indirect calls with the string indexer.
You can use GetOrdinal at the top of your loop put the ordinals in an array and have the indexes in the array be const or have an enum for them (no GetOrdinal at all) or use GetOrdinal into individual variables with descriptive names.
Only if your sets are small would I really consider this to be premature optimization.
It's apparently a 3% penalty.
Any difference will be more than outweighed by maintenance overhead.
If you have that much data that it makes a noticeable difference, I'd suggest you have too much data in your client code. Or this is when you consider use ordinals rather than names
Yes and no.
If you're dealing with a massive amount of data then you'd certainly benefit from using the ordinals rather than the column names.
Otherwise, keep it simple, readable, and somewhat safer - and stick with the column names.
Optimize only when you need to.
I created a wrapper for SqlDataReader that stores orindals in a dictionary with the column name as the key.
It gives me ordinal performance gains while keeping the code more readable and less likely to break if someone changes the column order returned from stored procedures.
Friend Class DataReader
Implements IDisposable
Private _reader As SqlDataReader
Private _oridinals As Dictionary(Of String, Integer)
Private Shared _stringComparer As StringComparer = StringComparer.OrdinalIgnoreCase 'Case in-sensitive
Public Sub New(reader As SqlDataReader)
Me._reader = reader
Me.SetOrdinals()
End Sub
Private Sub SetOrdinals()
Me._oridinals = New Dictionary(Of String, Integer)(_stringComparer)
For i As Integer = 0 To Me._reader.FieldCount - 1
Me._oridinals.Add(Me._reader.GetName(i), i)
Next
End Sub
Public Function Read() As Boolean
Return Me._reader.Read()
End Function
Public Function NextResult() As Boolean
Dim value = Me._reader.NextResult()
If value Then
Me.SetOrdinals()
End If
Return value
End Function
Default Public ReadOnly Property Item(name As String) As Object
Get
Return Me._reader(Me.GetOrdinal(name))
End Get
End Property
Public Function GetOrdinal(name As String) As Integer
Return Me._oridinals.Item(name)
End Function
Public Function GetInteger(name As String) As Integer
Return Me._reader.GetInt32(Me.GetOrdinal(name))
End Function
Public Function GetString(ordinal As Integer) As String
Return Me._reader.GetString(ordinal)
End Function
Public Function GetString(name As String) As String
Return Me._reader.GetString(Me.GetOrdinal(name))
End Function
Public Function GetDate(name As String) As Date
Return Me._reader.GetDateTime(Me.GetOrdinal(name))
End Function
Public Function GetDateNullable(name As String) As Nullable(Of Date)
Dim o = Me._reader.GetValue(Me.GetOrdinal(name))
If o Is System.DBNull.Value Then
Return Nothing
Else
Return CDate(o)
End If
End Function
Public Function GetDecimal(name As String) As Decimal
Return Me._reader.GetDecimal(Me.GetOrdinal(name))
End Function
Public Function GetBoolean(name As String) As Boolean
Return Me._reader.GetBoolean(Me.GetOrdinal(name))
End Function
Public Function GetByteArray(name As String) As Byte()
Return CType(Me._reader.GetValue(Me.GetOrdinal(name)), Byte())
End Function
Public Function GetBooleanFromYesNo(name As String) As Boolean
Return Me._reader.GetString(Me.GetOrdinal(name)) = "Y"
End Function
'Disposable Code
End Class

VB.NET - Adding more than 1 string to .contains

I have an HTMLElementCollection that I'm going through using a For Each Loop to see if the InnerHTML contains certain words. If they do contain any of those keywords it gets saved into a file.
Everything works fine but I was wondering if there is a way to simplify. Here's a sample
For Each Helement As HtmlElement In elements
If Helement.InnerHtml.Contains("keyword1") Or Helement.InnerHtml.Contains("keyword2") Or Helement.InnerHtml.Contains("keyword3") Or Helement.InnerHtml.Contains("keyword4") Or Helement.InnerHtml.Contains("keyword5") = True Then
' THE CODE TO COPY TO FILE
End If
Next Helement
Does anything exist that would work like:
If Helement.InnerHtml.Contains("keyword1", "keyword2", "keyword3", "keyword4", "keyword5")
The way I'm doing it now just seems wasteful, and I'm pretty OCD about it.
1) One approach would be to match the InnerHtml string against a regular expression containing the keywords as a list of alternatives:
Imports System.Text.RegularExpressions
Dim keywords As New Regex("keyword1|keyword2|keyword3")
...
If keywords.IsMatch(HElement.InnerHtml) Then ...
This should work well if you know all your keywords beforehand.
2) An alternative approach would be to build a list of your keywords and then compare the InnerHtml string against each of the list's elements:
Dim keywords = {"keyword1", "keyword2", "keyword3"}
...
For Each keyword As String In keywords
If HElement.InnerHtml.Contains(keyword) Then ...
Next
Edit: The extension method suggested by Rob would result in more elegant code than the above approach #2, IMO.
You could write an Extension Method to string that provides a multi-input option, such as:
Public Module StringExtensionMethods
Private Sub New()
End Sub
<System.Runtime.CompilerServices.Extension> _
Public Function Contains(ByVal str As String, ByVal ParamArray values As String()) As Boolean
For Each value In values
If str.Contains(value) Then
Return True
End If
Next
Return False
End Function
End Module
You could then call that instead, as in your second example :)
Here's another extension method that cleans up the logic a little with LINQ:
<Extension()>
Public Function MultiContains(str As String, ParamArray values() As String) As Boolean
Return values.Any(Function(val) str.Contains(val))
End Function