Array intersection function speed - vb.net

I've written a short function for array intersection and wanted to know why one function is faster than the other.
1)
Dim list2() As String 'Assume it has values'
Dim list2length As Integer = list2.length
Function newintersect(ByRef list1() As String) As String()
Dim intersection As New ArrayList
If (list1.Length < list2length) Then
'use list2'
For Each thing As String In list2
If (Array.IndexOf(list1, thing) <> -1) Then
intersection.Add(thing)
End If
Next
Else
'use list1'
For Each thing As String In list1
If (Array.IndexOf(list2, thing) <> -1) Then
intersection.Add(thing)
End If
Next
End If
Return intersection
End Function
2)
Dim list2() As String 'Assume it has values'
Dim list2length As Integer = list2.length
Function newintersect(ByRef list1() As String) As String()
Dim intersection As New ArrayList
If (list1.Length > list2length) Then 'changed >'
'use list2'
For Each thing As String In list2
If (Array.IndexOf(list1, thing) <> -1) Then
intersection.Add(thing)
End If
Next
Else
'use list1'
For Each thing As String In list1
If (Array.IndexOf(list2, thing) <> -1) Then
intersection.Add(thing)
End If
Next
End If
Return intersection
End Function
3)
Dim list2() As String 'Assume it has values'
Dim list2length As Integer = list2.length
Function newintersect(ByRef list1() As String) As String()
For Each thing As String In list1
If (Array.IndexOf(list2, thing) <> -1) Then
intersection.Add(thing)
End If
Next
Return intersection
End Function
So for my testcase, 1 take 65 seconds, 3 takes 63 seconds, while 2 actually takes 75 seconds. Anyone know why 3 is the fastest? And why is 1 faster than 2?
(Sorry about the poor formatting...can't seem to paste it right)

That's not much of a difference. Also, it doesn't seem like the methods would produce the same result, so it would be pointless to compare the performance, right?
Anyhow, the Array.IndexOf is not very efficient way to find items, and doesn't scale well. You should get a dramatic improvement if you use a hash key based collection as lookup instead, something like this:
Function newintersect(ByRef list1 As String(), ByRef list2 As String()) As String()
Dim smaller As HashSet(Of String)
Dim larger As String()
If list1.Length < list2.Length Then
smaller = New HashSet(Of String)(list1)
larger = list2
Else
smaller = New HashSet(Of String)(list2)
larger = list1
End If
Dim intersection As New List(Of String)
For Each item As String In larger
If smaller.Contains(item) Then
intersection.Add(item)
End If
Next
Return intersection.ToArray()
End Function

I'd expect you'd find that with different test cases you can reverse the results you had above and reach a situation where 2 is fastest and 1 & 3 are slower.
It's difficult to comment without knowing the make-up of the test case, it'll depend on the location of 'intersecting' items within the two arrays - if these tend to be close the to front on one array and closer to the end of another then the nesting order of the array iteration / IndexOf will have markedly different performance.
BTW - there are better ways of performing an intersection - sorting one or other array and performing a BinarySearch is one means - using a Dictionary(Of String, ...) or similar is another - and either would result in far better performance.

This is from the MSDN documentation
Dim id1() As Integer = {44, 26, 92, 30, 71, 38}
Dim id2() As Integer = {39, 59, 83, 47, 26, 4, 30}
' Find the set intersection of the two arrays.
Dim intersection As IEnumerable(Of Integer) = id1.Intersect(id2)
For Each id As Integer In intersection
Debug.WriteLine(id.ToString)
Next

Related

2-D array from txt in VB.NET

I am needing to create a code that is versatile enough where I can add more columns in the future with minimum reconstruction of my code. My current code does not allow me to travel through my file with my 2-D array. If I was to change MsgBox("map = "+ map(0,1) I can retrieve the value easily. Currently all I get in the code listed is 'System.IndexOutOfRangeException' and that Index was outside the bounds of the array. My current text file is 15 rows (down) and 2 columns (across) which puts it at a 14x1. they are also comma separated values.
Dim map(14,1) as string
Dim reader As IO.StreamReader
reader = IO.File.OpenText("C:\LocationOfTextFile")
Dim Linie As String, x,y As Integer
For x = 0 To 14
Linie = reader.ReadLine.Trim
For y = 0 To 1
map(x,y) = Split(Linie, ",")(y)
Next 'y
Next 'x
reader.Close()
MsgBox("map = " + map(y,x))``
Here's a generic way to look at reading the file:
Dim data As New List(Of List(Of String))
For Each line As String In IO.File.ReadAllLines("C:\LocationOfTextFile")
data.Add(New List(Of String)(line.Split(",")))
Next
Dim row As Integer = 1
Dim col As Integer = 10
Dim value As String = data(row)(col)
This is the method suggested by Microsoft. It is generic and will work on any properly formatted comma delimited file. It will also catch and display any errors found in the file.
Using MyReader As New Microsoft.VisualBasic.
FileIO.TextFieldParser(
"C:\LocationOfTextFile")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.
FileIO.MalformedLineException
MsgBox("Line " & ex.Message &
"is not valid and will be skipped.")
End Try
End While
End Using
Essentially what you are asking is how can I take the contents of comma-separated values and convert this to a 2D array.
The easiest way, which is not necessarily the best way, is to return an IEnuemrable(Of IEnumerable(Of String)). The number of items will grow both vertically based on the number of lines and the number of items will grow horizontally based on the values split on a respective line by a comma.
Something along these lines:
Private Function GetMap(path As String) As IEnumerable(Of IEnumerable(Of String)
Dim map = New List(Of IEnumerable(Of String))()
Dim lines = IO.File.ReadAllLines(path)
For Each line In lines
Dim row = New List(Of String)()
Dim values = line.Split(","c)
row.AddRange(values)
map.Add(row)
Next
Return map
End Function
Now when you want to grab a specific cell using the (row, column) syntax, you could use:
Private _map As IEnumerable(Of IEnumerable(Of String))
Private Sub LoadMap()
_map = GetMap("C:/path-to-map")
End Sub
Private Function GetCell(row As Integer, column As Integer) As String
If (_map Is Nothing) Then
LoadMap()
End If
Return _map.ElementAt(row).ElementAt(column)
End Function
Here is an example: https://dotnetfiddle.net/ZmY5Ki
Keep in mind that there are some issues with this, for example:
What if you have commas in your cells?
What if you try to access a cell that doesn't exist?
These are considerations you need to make when implementing this in more detail.
You can consider the DataTable class for this. It uses much more memory than an array, but gives you a lot of versatility in adding columns, filtering, etc. You can also access columns by name rather than index.
You can bind to a DataGridView for visualizing the data.
It is something like an in-memory database.
This is much like #Idle_Mind's suggestion, but saves an array copy operation and at least one allocation per row by using an array, rather than a list, for the individual rows:
Dim data = File.ReadLines("C:\LocationOfTextFile").
Select(Function(ln) ln.Split(","c)).
ToList()
' Show last row and column:
Dim lastRow As Integer = data.Count - 1
Dim lastCol As Integer = data(row).Length - 1
MsgBox($"map = {data(lastRow)(lastCol)}")
Here, assuming Option Infer, the data variable will be a List(Of String())
As a step up from this, you could also define a class with fields corresponding to the expected CSV columns, and map the array elements to the class properties as another call to .Select() before calling .ToList().
But what I really recommend is getting a dedicated CSV parser from NuGet. While a given CSV source is usually consistent, more broadly the format is known for having a number of edge cases that can easily confound the Split() function. Therefore you tend to get better performance and consistency from a dedicated parser, and NuGet has several good options.

New to programming language, need help to read a .dat file into array Vs VB, console .net framework

I am trying to read from a data file called TXT.dat and store the circled values into a separate array's using the type of commands I have used. I cannot use streamreader,streamwriter this is what I am learning at university but we were taught to read an array not certain values as I will be tested using a file which has
2 factorial values from a single file and store int two arrays.
I have spent hours trying before going to paid sites who always answer with i/o streamreader which doesnt help
.dat file i created to practice
MY CODE
Sub Main()
Const i_val As Integer = 6
Dim j As Integer = 6 'loop readers
Dim Arayn_Fact(i_val - 1) As Double 'array for 2nd value per line
Dim Aray_Fact2n(j - 1) As Double 'array for 4th value per line
Read_Values(i_val - 1, Arayn_Fact)
End Sub
Sub Read_Values(ByVal i As Integer, ByRef _A() As Double)
Dim fid1 As Integer = FreeFile()
Dim fid2 As Integer = FreeFile() + 1
Dim tmp As Double
FileOpen(fid1, "TXT.dat", OpenMode.Input, OpenAccess.Read)
For i = 0 To 5 Step 1
Input(fid1, tmp)
Input(fid1, _A(i))
Next i
FileClose(fid1)
Console.ReadKey()
End Sub
Without getting too fancy, here's some very basic code to do what you've outlined:
Sub Main()
Dim fileName As String = "txt.dat"
Dim Aray_Fact2() As Integer 'array for 2nd value per line
Dim Aray_Fact4() As Integer 'array for 4th value per line
Dim list2 As New List(Of Integer)
Dim list4 As New List(Of Integer)
Dim lines() As String = System.IO.File.ReadAllLines(fileName)
For Each line As String In lines
Dim values() As String = line.Split(",")
If values.Length >= 3 Then
Dim f2, f4 As Integer
If Integer.TryParse(values(1), f2) AndAlso Integer.TryParse(values(3), f4) Then
list2.Add(f2)
list4.Add(f4)
Else
Console.WriteLine("Invalid value in line: " & line)
End If
Else
Console.WriteLine("Invalid number of entries in line: " & line)
End If
Next
Aray_Fact2 = list2.ToArray
Aray_Fact4 = list4.ToArray
Console.WriteLine("Aray_Fact2: " & String.Join(",", Aray_Fact2))
Console.WriteLine("Aray_Fact4: " & String.Join(",", Aray_Fact4))
Console.Write("Press Enter to Quit...")
Console.ReadLine()
End Sub
If you are completely against the use of Lists and ToArray, then you'd have to read the file TWICE. Once to determine how many lines are in the file so you can dimension the arrays to the correct size. Then another time to actually read the values and populate the arrays.
First, some style things. Passing the array ByRef is incorrect. Arrays are already reference types, and so passing ByVal still passes the reference to the array object. But don't even pass the array in the first place. It's better practice to let the Read_Values() method create and return the array to the calling function.
And no one uses that ancient/ganky FileOpen() API anymore. I mean that. NO. ONE. It's not even appropriate for teaching, as the Stream types are closer to what the low level operating system/file system are doing. If you can't use StreamReader and friends, there are still better options out there.
Sub Main()
Dim Array_Fact() As Double = Read_Values("TXT.dat", 1)
Dim Array_Fact2n() As Double = Read_Values("TXT.dat", 3)
Console.ReadKey(True)
End Sub
Function Read_Values(filePath As String, position As Integer) As Double()
Dim lines() As String = File.ReadAllLines(filePath) 'Look Ma, no StreamReader
Dim result(lines.Length - 1) As Double
For i As Integer = 0 To Lines.Length - 1
result(i) = Double.Parse(lines(i).Split(",")(position).Trim())
Next line
Return result
End Function
But what I'd really do is keep the 2nd and 4th values in the same collection and read everything in one pass through the file. This uses features like Generics, Tuples, Lambdas+Linq, and more that you won't get to for some time, but I feel like it's useful to show you where you're going:
Sub Main()
Dim Facts = Read_Values("TXT.dat")
Console.WriteLine(String.Join(",",Facts.Select(Function(f) $"({f.Item1}:{f.Item2})")))
Console.ReadKey(True)
End Sub
Function Read_Values(filePath As String) As IEnumerable(Of (Double, Double))
Return File.ReadLines(filePath).Select(
Function(line) 'Poor man's CSV. In real code, I'd pull in a CSV parser from NuGet
Dim fields = line.Split(",")
Return (Double.Parse(fields(1).Trim()), Double.Parse(fields(3).Trim()))
End Function
)
End Function

Arraylist.Contains Doesn't Return True VB.NET

comps.Contains doesn't return TRUE even though comps contains it.
I debugged it step by step and I can't see where the problem is.
By the way the purpose of the code is to show the pairs that sum up to SUM value. (If the sum is 5 and theres 1 and 4 then the code should return 1 and 4)
Public Function getPairs(ByVal array As ArrayList, ByVal sum As Integer)
Dim comps, pairs As New ArrayList
For index = 0 To array.Count - 1
If (comps.Contains(array(index)) = True) Then
pairs.Add(array(index))
Else
comps.Add(sum - array(index))
End If
Next
Return pairs
End Function
Sub Main()
' 1,3,2,5,46,6,7,4
' k = 5
'Dim arraylist As New ArrayList()
Console.Write("Enter your array :")
Dim arraylist As New ArrayList
arraylist.AddRange(Console.ReadLine().Split(","))
Console.Write("Enter the sum:")
Dim sum As Integer = Console.ReadLine()
getPairs(arraylist, sum)
Console.ReadKey()
End Sub
The ArrayList you populate from user input contains strings (results from splitting the user input string). The comps ArrayList contains integers (results from subtraction). When it tries to find the string "2" in the ArrayList that contains a 2, it fails.
You should convert your user input to integers so that you are comparing the same data types.
First, turn on Option Strict. Tools Menu -> Options -> Projects and Solutions -> VB Defaults. This will point out problems with your code and help you to avoid runtime errors.
ArrayList is not used much in new code but is around for backward compatibility. List(Of T) is a better choice for new code.
Module Module1
Sub Main()
' 1,3,2,5,46,6,7,4
' k = 5
'Dim arraylist As New ArrayList()
Console.Write("Enter your array :")
Dim arraylist As New ArrayList
'Option Strict would not allow this line to compile
'.Split takes a Char, the same c tells the compiler that "," is a Char
arraylist.AddRange(Console.ReadLine().Split(","c))
Console.Write("Enter the sum:")
'Option Strict would not allow a string to be dumped into an integer
Dim sum As Integer
Dim Pairs As New ArrayList
If Integer.TryParse(Console.ReadLine, sum) Then
'Your Function returns an array list but you
'throw it away by not setting up a variable to receive it
Pairs = getPairs(arraylist, sum)
Else
Console.WriteLine("Program aborted. Sum was not a number.")
End If
For Each item In Pairs
Console.WriteLine(item)
Next
Console.ReadKey()
End Sub
'Functions need a return data type in the declaration
Public Function getPairs(ByVal array As ArrayList, ByVal sum As Integer) As ArrayList
Dim comps, pairs As New ArrayList
For index = 0 To array.Count - 1
'I don't see how this can ever be true since comps is empty
If comps.Contains(array(index)) Then 'Since .Contains returns a Boolean, no = True is necessary
pairs.Add(array(index))
Else
'Ideally each item in array should be tested to see if it is a number
'You will get an exception if CInt fails
comps.Add(sum - CInt(array(index)))
'You never use the comps ArrayList
End If
Next
'The pairs ArrayList is empty
Return pairs
End Function
End Module
I don't see how this code could accomplish what you describe as your goal. I think you should start again. Talk through how you would accomplish your task. Then write it out on paper, not in code. Then you will see more clearly how to code your project.
The big problem is the original code is this line:
Dim comps, pairs As New ArrayList
That code creates two ArrayList reference variables, but only one ArrayList object. comps remains null/Nothing.
But beyond that, the ArrayList type has been dead since .Net 2.0 came out back in 2005... more than 10 years now. It only exists today for backwards compatibility with old code. Don't use it!
This is better practice, especially in conjunction with Option Strict and Option Infer:
Public Function getPairs(ByVal items As IEnumerable(Of Integer), ByVal sum As Integer) As IEnumerable(Of Integer)
Dim comps As New HashSet(Of Integer)()
Dim result As New List(Of Integer)()
For Each item As Integer In items
If Not comps.Add(item) Then
result.Add(sum - item)
End If
Next
Return result
End Function
Sub Main()
Console.Write("Enter your array: ")
Dim input As String = Console.ReadLine()
Dim list As List(Of Integer) = input.Split(",").Select(Function(item) CInt(item)).ToList()
Console.Write("Enter the sum: ")
Dim sum As Integer = CInt(Console.ReadLine())
Dim pairs = getPairs(list, sum).Select(Function(s) s.ToString())
Console.WriteLine("Pairs are: {0}", String.Join(", " pairs))
Console.ReadKey()
End Sub

Vb.Net 2D Dictionary - very slow

I need to create a 2D dictionary/keyvalue pair.
I tried something like this.
Dim TwoDimData As New Dictionary(Of String, Dictionary(Of String, String))
'Create an empty table
For Each aid In AIDList '(contains 15000 elements)
TwoDimData.Add(aid, New Dictionary(Of String, String))
For Each bid In BIDList 'contains 30 elements
TwoDimData.Item(aid).Add(bid, "")
Next
Next
'Later populate values.
[some code here to populate the table]
'Now access the value
'The idea is to access the info as given below (access by row name & col name)
Msgbox TwoDimData.Item("A004").Item("B005") ' should give the value of 2
Msgbox TwoDimData.Item("A008").Item("B002") ' should return empty string. No error
Issue:
The issue is in Creating the empty table. It takes 70 seconds to create the TwoDimData table with empty values. Everything else seems to be fine. Is there any way to improve the performance - may be instead of using Dictionary?
I suggest you try Dictionary(Of Tuple(Of String, String), String) instead. That is, the keys are pairs of strings (Tuple(Of String, String)) and the values are strings. That would appear to correspond nicely to the diagram in your question.
Dim matrix As New Dictionary(Of Tuple(Of String, String), String)
' Add a value to the matrix:
matrix.Add(Tuple.Create("A003", "B004"), "3")
' Retrieve a value from the matrix:
Dim valueAtA003B004 = matrix(Tuple.Create("A003", "B004"))
Of course you can define your own key type (representing a combination of two strings) if Tuple(Of String, String) seems too generic for your taste.
Alternatively, you could also just use (possibly jagged) 2D arrays, but that would potentially waste a lot of space if your data is sparse (i.e. if there are many empty cells in that 2D matrix); and you'd be forced to use numeric indices instead of strings.
P.S.: Actually, consider changing the dictionary value type from String to Integer; your example matrix suggests that it contains only integer numbers, so it might not make sense to store them as strings.
P.P.S.: Do not add values for the "empty" cells to the dictionary. That would be very wasteful. Instead, instead of simply retrieving a value from the dictionary, you check whether the dictionary contains the key:
Dim valueA As String = "" ' the default value
If matrix.TryGetValue(Tuple.Create("A007", "B002"), valueA) Then
' the given key was present, and the associated value has been retrieved
…
End If
I would think that a simple structure would suffice for this?
Public Structure My2DItem
Public Row As Integer
Public Col As Integer
Public Value As String
End Structure
Public My2DArray As Generic.List(Of My2DItem) = Nothing
Public Size As Integer
Public MaxRows As Integer
Public MaxCols As Integer
'
Sub Initialise2DArray()
'
Dim CountX As Integer
Dim CountY As Integer
Dim Item As My2DItem
'
'initialise
MaxRows = 15000
MaxCols = 30
Size = MaxRows * MaxCols
My2DArray = New Generic.List(Of My2DItem)
'
'Create an empty table
For CountY = 1 To 15000
For CountX = 1 To 30
Item = New My2DItem
Item.Row = CountY
Item.Col = CountX
Item.Value = "0"
My2DArray.Add(Item)
Item = Nothing
Next
Next
'
End Sub
And to read the data out of the array,
Function GetValue(Y As Integer, X As Integer) As String
'
Dim counter As Integer
'
GetValue = "Error!"
If My2DArray.Count > 0 Then
For counter = 0 To My2DArray.Count - 1
If My2DArray(counter).Row = Y Then
If My2DArray(counter).Col = X Then
GetValue = My2DArray(counter).Value
Exit Function
End If
End If
Next
End If
'
End Function
And to read your sample cell A004 B005
MyStringValue = GetValue(4,5)
I would suggest creating a class that has properties for the AID and BID and use this as the basis for the values you want to store
Public Class AIdBId
Public Property AId As Integer
Public Property BId As Integer
Public Sub New(aId As Integer, bId As Integer)
Me.AId = aid
Me.BId = bid
End Sub
End Class
Note that I have used integers for everything because it seems that is all you need and it is more efficient that using a string
Then you can add values where they are non-zero:
'define your dictionary
Dim valueMatrix As New Dictionary(Of AIdBId, Integer)
'add your values
valueMatrix.Add(New AIdBId(1, 1), 1)
valueMatrix.Add(New AIdBId(2, 3), 1)
valueMatrix.Add(New AIdBId(4, 3), 3)
valueMatrix.Add(New AIdBId(5, 8), 8)
'check if a value exixts
Dim valueExixsts As Boolean = valueMatrix.ContainsKey(New AIdBId(9, 9))
'get a value
Dim i As Integer = valueMatrix(New AIdBId(4, 3))
So you can now combine these two to return the value if there is one or zero if not:
Private Function GetValue(valuematrix As Dictionary(Of AIdBId, Integer), aId As Integer, bId As Integer) As Integer
Dim xRef As New AIdBId(aId, bId)
If valuematrix.ContainsKey(xRef) Then
Return valuematrix(xRef)
Else
Return 0
End If
End Function

Getting the index of the largest integer in an array

I have an array of integers and I need to know the index of the largest number (not the actual value, just the index of whichever is highest).
However, if one or more indexes "tie" for the highest value, I need to have all of the indexes that share that high value.
I assume this function would need to return an array (since it could be one or more indexes), but I am not totally sure how to go about getting the more efficient solution.
If this is going to be a common thing you could write your own Extension. You should add some additional sanity/null checking but this will get you started:
Module Extensions
<System.Runtime.CompilerServices.Extension()> Function FindAllIndexes(Of T)(ByVal array() As T, ByVal match As Predicate(Of T)) As Integer()
''//Our return object
Dim Ret As New List(Of Integer)
''//Current array index
Dim I As Integer = -1
''//Infinite loop, break out when we no more matches are found
Do While True
''//Loop for a match based on the last index found, add 1 so we dont keep returning the same value
I = System.Array.FindIndex(array, I + 1, match)
''//If we found something
If I >= 0 Then
''//Append to return object
Ret.Add(I)
Else
''//Otherwise break out of loop
Exit Do
End If
Loop
''//Return our array
Return Ret.ToArray()
End Function
End Module
Then to call it:
Dim ints As Integer() = New Integer() {1, 2, 8, 6, 8, 1, 4}
Dim S = ints.FindAllIndexes(Function(c) c = ints.Max())
''//S now holds 2 and 4
If you are using .NET 3.5, you can use the Max() Extension function to easily find the highest value, and use Where to locate the matching records in your source array.
IList has an IndexOf member, which helps. This code is completely untested, and probably has at least one off-by-one error.
Public Function GetPostionsOfMaxValue(ByVal input() As Integer) As Integer()
Dim ints = New List(Of Integer)(input)
Dim maxval = ints.Max
Dim indexes As New List(Of Integer)
Dim searchStart As Integer = 0
Do Until searchStart >= ints.Count
Dim index = ints.IndexOf(maxval, searchStart)
If index = -1 Then Exit Do
indexes.Add(index)
searchStart = index + 1
Loop
Return indexes.ToArray
End Function