Finding distinct lines in large datatables - vb.net

Currently we have a large DataTable (~152k rows) and are doing a for each over this to find a sub set of distinct entries (~124K rows). This is currently taking about 14 minutes to run which is just far too long.
As we are stuck in .NET 2.0 as our reporting won't work with VS 2008+ I can't use linq, though I don't know if this will be any faster in fairness.
Is there a better way to find the distinct lines (invoice numbers in this case) other than this for each loop?
This is the code:
Public Shared Function SelectDistinctList(ByVal SourceTable As DataTable, _
ByVal FieldName As String) As List(Of String)
Dim list As New List(Of String)
For Each row As DataRow In SourceTable.Rows
Dim value As String = CStr(row(FieldName))
If Not list.Contains(value) Then
list.Add(value)
End If
Next
Return list
End Function

Using a Dictionary rather than a List will be quicker:
Dim seen As New Dictionary(Of String, String)
...
If Not seen.ContainsKey(value) Then
seen.Add(value, "")
End If
When you search a List, you're comparing each entry with value, so by the end of the process you're doing ~124K comparisons for each record. A Dictionary, on the other hand, uses hashing to make the lookups much quicker.
When you want to return the list of unique values, use seen.Keys.
(Note that you'd ideally use a Set type for this, but .NET 2.0 doesn't have one.)

Related

How to randomly select strings vb.net

Is there a simple solution to select random strings in vb.net? I have a list of about twenty paragraphs where three need to go after each other and I want it to be random. Do I have to create a variable? Or is there a command that can be run on a button click?
One (fairly easy way) to accomplish this would be to have a collection of the paragraphs you want to use, and then use PeanutButter.RandomValueGen from the Nuget package PeanutButter.RandomGenerators (it's open-source too)
RandomValueGen.GetRandomFrom takes a collection of anything and returns a random item from the collection. As a bonus, you can specify a params list of values not to pick, so you can ensure that your paragraphs aren't repeated.
Whilst the library is written in C#, it can obviously be used from any .NET project. There are a lot of other generator methods on RandomValueGen too, if you're interested.
Full disclosure: I'm the author.
If you have a normal list, this should work:
If not, write what kind of list you have.
Dim rn As New Random
Dim selct As String = lst(rn.Next(0, lst.Count - 1))
selct is the output.
Replace lst with your list name.
if you don't want to have a dependency or need to stay on 4.0 for some odd reason or reason X, you can always try this instead
Private rnd As New Random
Public Function GetRandom(input As IEnumerable(Of String), itemToGet As Integer) As List(Of String)
If input.Count < itemToGet Then
Throw New Exception("List is too small")
End If
Dim copy = input.ToList
Dim result As New List(Of String)
Dim item As Integer
While itemToGet > 0
item = rnd.Next(0, copy.Count)
result.Add(copy(item))
copy.RemoveAt(item)
itemToGet -= 1
End While
Return result
End Function

Vb Net check if arrayList contains a substring

I am using myArrayList.Contains(myString) and myArrayList.IndexOf(myString) to check if arrayList contains provided string and get its index respectively.
But, How could I check if contains a substring?
Dim myArrayList as New ArrayList()
myArrayList.add("sub1;sub2")
myArrayList.add("sub3;sub4")
so, something like, myArrayList.Contains("sub3") should return True
Well you could use the ArrayList to search for substrings with
Dim result = myArrayList.ToArray().Any(Function(x) x.ToString().Contains("sub3"))
Of course the advice to use a strongly typed List(Of String) is absolutely correct.
As far as your question goes, without discussing why do you need ArrayList, because array list is there only for backwards compatibility - to select indexes of items that contain specific string, the best performance you will get here
Dim indexes As New List(Of Integer)(100)
For i As Integer = 0 to myArrayList.Count - 1
If DirectCast(myArrayList(i), String).Contains("sub3") Then
indexes.Add(i)
End If
Next
Again, this is if you need to get your indexes. In your case, ArrayList.Contains - you testing whole object [string in your case]. While you need to get the string and test it's part using String.Contains
If you want to test in non case-sensitive manner, you can use String.IndexOf

Dictionary returning same value for every key only when value is also an enumerable or custom

i am a noob, so please take this question with that in mind. Here is very simple piece of code:
Sub main()
{
Dim m_Dictionary as new Dictionary(Of Integer, List(Of String))
Dim workingList as new List(Of String)
Dim workingKey as Integer
Dim keyStash as List(Of Integer)
Dim workingDict as new Dictionary(Of Integer, String)
For i=0 to 9
Do
workingKey = RandomInteger()
Loop While workingList.ContainsKey(workingKey)
For n=0 to 4
workingList.Add(RandomString())
Next
keyStash.Add(workingKey)
workingDict.Add(workingKey, workingList)
Next
' now I just want to play back the generators of random data
For each Key As Integer in keyStash
For each Entry as String in workingDict(Key).Value
Line(Entry)
Next
Next
Instead of everything playing back nicely as one might expect, I am left with a fully accurate stash of keys for the dictionary. However, the values for strings inside each list instance are ALL THE SAME FOR EVERY KEY. Those values are equal to the values in the last loop of random data generation. So instead of playing back 50 uniques entries, it writes out 9 times the last loop. I looked inside - everything looks good. Get this. All lists, collections, hash-tables, all of iterated types and also custom types demonstrate this behavior. I found the solution, but it does not explain anything. Can anyone help explaining this, please!??
The variable that keeps the strings generated by RandomString is created outside the loop. Inside that loop you add continuosly new strings to the same instance and add the same list instance to every new integer key. At the end of the loop every integer key added has its value pointing to the same reference of the list. Of course they are identical....
A first fix to your code could be
Dim m_Dictionary as new Dictionary(Of Integer, List(Of String))
Dim workingKey as Integer
For i=0 to 9
' Internal to the loop. so at each loop you get a new list
' to use for the specific key generated in the current loop
Dim workingList as new List(Of String)
Do
workingKey = RandomInteger()
Loop While m_Dictionary.ContainsKey(workingKey)
For n=0 to 4
workingList.Add(RandomString())
Next
m_Dictionary.Add(workingKey, workingList)
Next
For each k in m_Dictionary
For each Entry in k.Value
' Line(Entry)
Console.WriteLine("Key=" & k.Key & " Values = " & Entry)
Next
Next
Please, remember to use Option Strict ON, the current code treats quietly strings as if they were numbers, and this is not a good practice. Option Strict ON will force you to think twice when you work with different type of data.

vb.NET Select distinct... how to use it?

Coming from a C# background I am a bit miffed by my inability to get this simple linq query working:
Dim data As List(Of Dictionary(Of String, Object))
Dim dbm As AccessDBManager = GlobalObjectManager.DBManagers("SecondaryAccessDBManager")
data = dbm.Select("*", "T町丁目位置_各務原")
Dim towns As IEnumerable(Of String())
towns = data.Select(Function(d) New String() {d("町名_Trim").ToString(), d("ふりがな").ToString()})
towns = towns.Where(Function(s) s(0).StartsWith(searchTerms) Or s(1).StartsWith(searchTerms)).Distinct()
Call UpdateTownsListView(towns.ToList())
I pasted together the relevant bits, so hopefully there is no error here...
data is loaded from an access database and is a list with the data from each row stored as a dictionary.
In this case element from data has a field containing the name of a Japanese town and its reading and some other stuff like the row ID etc.
I have a form with a textbox. When the user types something in, I would like to retrieve from data the town names corresponding to the search terms without duplicates.
Right now the results contain loads of duplicates> How can I get this sorted to only get distinct results?
I read from some other posts that a key might be needed, but how can I declare this with extension methods?
Distinct uses the default equality comparer to compare values.
Your collection contains arrays of strings, so Distinct won't work the way you expected since two different arrays never equals each other (since ReferenceEquals would be used in the end).
A solution is to use the Distinct overload which takes an IEqualityComparer.
Class TwoStringArrayEqualityComparer
Implements IEqualityComparer(Of String())
Public Function Equals(s1 As String(), s2 As String()) As Boolean Implements IEqualityComparer(Of String()).Equals
' Note that checking for Nothing is missing
Return s1(0).Equals(s2(0)) AndAlso s1(1).Equals(s2(1))
End Function
Public Function GetHashCode(s As String()) As Integer Implements IEqualityComparer(Of String()).GetHashCode
Return (s(0) + s(1)).GetHashCode() ' probably not perfect :-)
End Function
End Class
...
towns = towns.Where(...).Distinct(new TwoStringArrayEqualityComparer())

Building a dynamic LINQ query

I have a listbox which users can select from a list if Towns, I want to be able to build a LINQ query based on the selected items in the list e.g.
Dim ddlTowns As ListBox = CType(Filter_Accommodation1.FindControl("ddlTowns"), ListBox)
If Not ddlTowns Is Nothing Then
For Each Item In ddlTowns.Items
If Item.Selected Then
'// Build query
End If
Next
End If
I have researched LinqKit as it appears to be able to do what I need but I cannot after hours of trying make any headway. I cannot find anything in VB which translates in anything meaningful or usable.
Just had a Eureka moment and rather than using predicate I came up with this...
Private Function Filter_Accommomdation_QueryBuilder() As IEnumerable
Dim ddlTowns As ListBox = CType(Filter_Accommodation1.FindControl("ddlTowns"), ListBox)
Dim myList As New List(Of String)
If Not ddlTowns Is Nothing Then
For Each Item In ddlTowns.Items
If Item.Selected Then
myList.Add(Item.value)
End If
Next
End If
Dim Filter_Query = _
From c In InitialQuery _
Where myList.ToArray.Contains(c.MyData.element("townvillage").value) _
Select c
Return Filter_Query
End Function
As a note I'm using c.MyData as the nature of InitialQuery demands a number of structured fields (the query is reused from various tables which by poor design aren't very consistant).
Check out this question - contains some useful VB examples for you: Using PredicateBuilder with VB.NET