vb.net Sorting dictionary after Parallel loop - vb.net

I have issues with sorting an array of values. It does not matter what type of array mechanism I use, I always get errors like "Object reference not set to an instance of an object" while using Tuple, or "Value cannot be null" while sorting Dictionary.
The interesting thing is that the errors I mentioned above occur only while I use the parallel loop to process data faster. When I don't use a parallel loop, the data is sorted without any errors.
Here is a line that sorts Dictionary:
DictionaryValues = DictionaryValues.OrderBy(Function(x) x.Value).ToDictionary(Function(x) x.Key, Function(x) x.Value)
I also tried Tuple for this, and here is a Tuple sorting line:
lstToSort = lstToSort.OrderBy(Function(i) i.Item2).ToList
Finally, here is my "Parallel.For" block of code:
Dim ProcessedCTDataValues As New Dictionary(Of String, Double)
Dim t As Task = Task.Run(Sub()
Parallel.For(0, dv.Count - 1,
Sub(iii As Integer)
Dim cmprName As String = dv(iii)(0).ToString & "; " & dv(iii)(1).ToString & "; " & dv(iii)(2).ToString & "; " & dv(iii)(3).ToString
Dim cmprAddress As String = dv(iii)(2).ToString
If Not ProcessedCTDataValues.ContainsKey(cmprName) Then
Dim similarityName As Double = 0
Dim similarityAddress As Double = 0
similarityName += GetSimilarity.Invoke(InputKeyword.ToLower, cmprName.ToLower)
similarityAddress += GetSimilarity.Invoke(StreetNumber.Invoke.ToLower, cmprAddress.ToLower)
If cmprName.ToLower.Contains(InputKeyword.ToLower) Then similarityName += 1
If InputKeyword.ToLower.Contains(cmprName.ToLower) Then similarityName += 1
For Each word As String In InputKeyword.Split({" "c}, StringSplitOptions.RemoveEmptyEntries)
If cmprName.ToLower.Contains(word.ToLower) Then similarityName += 0.3
Next
ProcessedCTDataValues.Add(cmprName, similarityName + similarityAddress)
End If
End Sub)
End Sub)
t.Wait()
Again, if I don't use a parallel loop, I can sort data without any issues. But if I use this loop, I always get an error. Any help would be greatly appreciated! Thanks!

To answer my own question - this is a very specific problem, which can be solved by using a Concurrent dictionary.
From Microsoft Docs: "Represents a thread-safe collection of key/value pairs that can be accessed by multiple threads concurrently."
Since I am using Parallel, this was a perfect solution to look forward to.
Instead of using Dictionary to add keys and values while using "Parallel.For", you can create a Concurrent dictionary, add items to it in the same way you would add to a regular dictionary, and use it in the same way as well.
To declare a Concurrent dictionary, you can use:
Dim concDict As ConcurrentDictionary(Of String, Double) = New ConcurrentDictionary(Of String, Double)()
To add items to it, use:
concDict.GetOrAdd(cmprName, similarityName + similarityAddress)
The concurrent dictionary is part of the System.Collections.Concurrent Namespace
Cheers!

Related

2-D array from txt in VB.NET

I am needing to create a code that is versatile enough where I can add more columns in the future with minimum reconstruction of my code. My current code does not allow me to travel through my file with my 2-D array. If I was to change MsgBox("map = "+ map(0,1) I can retrieve the value easily. Currently all I get in the code listed is 'System.IndexOutOfRangeException' and that Index was outside the bounds of the array. My current text file is 15 rows (down) and 2 columns (across) which puts it at a 14x1. they are also comma separated values.
Dim map(14,1) as string
Dim reader As IO.StreamReader
reader = IO.File.OpenText("C:\LocationOfTextFile")
Dim Linie As String, x,y As Integer
For x = 0 To 14
Linie = reader.ReadLine.Trim
For y = 0 To 1
map(x,y) = Split(Linie, ",")(y)
Next 'y
Next 'x
reader.Close()
MsgBox("map = " + map(y,x))``
Here's a generic way to look at reading the file:
Dim data As New List(Of List(Of String))
For Each line As String In IO.File.ReadAllLines("C:\LocationOfTextFile")
data.Add(New List(Of String)(line.Split(",")))
Next
Dim row As Integer = 1
Dim col As Integer = 10
Dim value As String = data(row)(col)
This is the method suggested by Microsoft. It is generic and will work on any properly formatted comma delimited file. It will also catch and display any errors found in the file.
Using MyReader As New Microsoft.VisualBasic.
FileIO.TextFieldParser(
"C:\LocationOfTextFile")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.
FileIO.MalformedLineException
MsgBox("Line " & ex.Message &
"is not valid and will be skipped.")
End Try
End While
End Using
Essentially what you are asking is how can I take the contents of comma-separated values and convert this to a 2D array.
The easiest way, which is not necessarily the best way, is to return an IEnuemrable(Of IEnumerable(Of String)). The number of items will grow both vertically based on the number of lines and the number of items will grow horizontally based on the values split on a respective line by a comma.
Something along these lines:
Private Function GetMap(path As String) As IEnumerable(Of IEnumerable(Of String)
Dim map = New List(Of IEnumerable(Of String))()
Dim lines = IO.File.ReadAllLines(path)
For Each line In lines
Dim row = New List(Of String)()
Dim values = line.Split(","c)
row.AddRange(values)
map.Add(row)
Next
Return map
End Function
Now when you want to grab a specific cell using the (row, column) syntax, you could use:
Private _map As IEnumerable(Of IEnumerable(Of String))
Private Sub LoadMap()
_map = GetMap("C:/path-to-map")
End Sub
Private Function GetCell(row As Integer, column As Integer) As String
If (_map Is Nothing) Then
LoadMap()
End If
Return _map.ElementAt(row).ElementAt(column)
End Function
Here is an example: https://dotnetfiddle.net/ZmY5Ki
Keep in mind that there are some issues with this, for example:
What if you have commas in your cells?
What if you try to access a cell that doesn't exist?
These are considerations you need to make when implementing this in more detail.
You can consider the DataTable class for this. It uses much more memory than an array, but gives you a lot of versatility in adding columns, filtering, etc. You can also access columns by name rather than index.
You can bind to a DataGridView for visualizing the data.
It is something like an in-memory database.
This is much like #Idle_Mind's suggestion, but saves an array copy operation and at least one allocation per row by using an array, rather than a list, for the individual rows:
Dim data = File.ReadLines("C:\LocationOfTextFile").
Select(Function(ln) ln.Split(","c)).
ToList()
' Show last row and column:
Dim lastRow As Integer = data.Count - 1
Dim lastCol As Integer = data(row).Length - 1
MsgBox($"map = {data(lastRow)(lastCol)}")
Here, assuming Option Infer, the data variable will be a List(Of String())
As a step up from this, you could also define a class with fields corresponding to the expected CSV columns, and map the array elements to the class properties as another call to .Select() before calling .ToList().
But what I really recommend is getting a dedicated CSV parser from NuGet. While a given CSV source is usually consistent, more broadly the format is known for having a number of edge cases that can easily confound the Split() function. Therefore you tend to get better performance and consistency from a dedicated parser, and NuGet has several good options.

VB optimize loop avoid with an Evaluate()

I am running a loop that evaluates input given by the user over more than 150k objects. The user sets the info to read like "obj.Name", "obj.Path", "obj.Date"... which may contain also logic evaluations such as "IIf(obj.Params>5,1,0)". It is then provided into the programm as a string.
I used the Evaluate() functions and it does work well, just that it is slow. It takes almost 6h to go over all elements. I was thinking if there is a way in which I can take the requested info and turn it into a straight-forward executable somehow, and avoid using the Evaluate expression in the whole loop (it runs for the number of requested data by the user * 150k).
This is a schematic of the loop I am running:
For Each Object in ObjectsList
For Each UserRequest in Requests
ResultsMatrix(i,j) = Evaluate(Requests(i))
j += 1
Next
i += 1
Next
I store then the results in a Matrix which is pasted in an Excel file at the end. Is there a way in which I can do sort of working the string to be evaluated into a function's return? I'd like to avoid usig the Eval function and parse directly the string in an executable and dont evaluate it for each object. Any tips on speeding up the loop?
It might be worth considering writing the requests into a set of functions and using the .NET CodeDom compilers to build it into a DLL. You can then load the assembly, find the right functions using reflection and put them into an array, then call them using reflection - that way you'll be calling .NET code and it should be far faster. Some (incomplete) code to get you started from a project where I have done this...
Private Function CombineCode() As String
Dim ret As New System.Text.StringBuilder
ret.AppendLine("Imports System")
ret.AppendLine("Imports Microsoft.VisualBasic")
ret.AppendLine()
ret.AppendLine("Namespace " & MainNamespace)
ret.AppendLine("Public Class " & MainClassName)
For Each e In _Entries
ret.AppendLine(e.Value.Code)
Next
ret.AppendLine("End Class")
ret.AppendLine("End Namespace")
Return ret.ToString
End Function
Private Function Compile(Code As String) As Assembly
'Dim d As New Dictionary(Of String, String)
'd.Add("langversion", "14")
Dim VBP As New Microsoft.CodeDom.Providers.DotNetCompilerPlatform.VBCodeProvider()
Dim PM As New System.CodeDom.Compiler.CompilerParameters
'PM.GenerateInMemory = True
PM.GenerateExecutable = False
PM.OutputAssembly = IO.Path.Combine(_Path, GenerateFileName() & ".dll") ' "Generated.dll"
PM.MainClass = MainClassName
PM.IncludeDebugInformation = True
Dim ASM As System.Reflection.Assembly
For Each ASM In AppDomain.CurrentDomain.GetAssemblies()
Try
If ASM.Location <> "" Then PM.ReferencedAssemblies.Add(ASM.Location)
Catch
End Try
Next
PM.ReferencedAssemblies.Add("System.Web.dll")
'Get compilation results
Dim Results As System.CodeDom.Compiler.CompilerResults
Results = VBP.CompileAssemblyFromSource(PM, Code)
'Show possible compilation errors
Dim Err As System.CodeDom.Compiler.CompilerError
For Each Err In Results.Errors
Throw New SyntaxErrorException("Error N. " & Err.ErrorNumber &
" Message: " & Err.ErrorText & " Line " & Err.Line & " in code " & vbCrLf & Code)
Next
Return Results.CompiledAssembly
End Function
Private Sub FindMethods()
Dim dt = (From t In _LatestAssembly.GetTypes() Where t.Name = MainClassName).Single
For Each e In _Entries.Values
e.Method = dt.GetMethod(e.MethodName)
Next
End Sub
Assembly = Assembly.LoadFrom(System.IO.Path.Combine(Path, sd.LatestAssemblyFile))
The Evaluate function is just resources on the computer itself. It's a great candidate for using Parallel.For.
In this case, j is the implied index.
For Each Object in ObjectsList
Parallel.For(0, Requests.Length, New ParallelOptions(), Sub(j, loopState)
ResultsMatrix(i,j) = Evaluate(Requests(j))
End Sub
)
i += 1
Next
Note, that Requests(i) is getting called repeatedly and produces the same result, so I assume you mean Requests(j).

How to harness auto-completion of strings?

I'm writing an application in which I have to pass strings as parameters. Like these:
GetValue("InternetGatewayDevice.DeviceInfo.Description")
GetValue("InternetGatewayDevice.DeviceInfo.HardwareVersion")
CheckValue("InternetGatewayDevice.DeviceInfo.Manufacturer")
ScrambleValue("InternetGatewayDevice.DeviceInfo.ModelName")
DeleteValue("InternetGatewayDevice.DeviceInfo.ProcessStatus.Process.1")
The full list is about 10500 entries, and i tought that i'd be really lost in searching if i misspell something.
So I am trying to declare a namespace for every string segment (separated by ".") and declare the last as a simple class that widens to a String of its FullName (except the base app namespace):
Class xconv
Public Shared Widening Operator CType(ByVal d As xconv) As String
Dim a As String = d.GetType.FullName
Dim b As New List(Of String)(Strings.Split(a, "."))
Dim c As String = Strings.Join(b.Skip(1).ToArray, ".")
Return c
End Operator
End Class
So I'd have these declarations:
Namespace InternetGatewayDevice
Namespace DeviceInfo
Class Description
Inherits xconv
End Class
End Namespace
End Namespace
This way IntelliSense is more than happy to autocomplete that string for me.
Now I'd have to do this for every possible string, so I opted (in order to retain my sanity) to make a method that does that:
Sub Create_Autocomlete_List()
Dim pathlist As New List(Of String)(IO.File.ReadAllLines("D:\list.txt"))
Dim def_list As New List(Of String)
Dim thedoc As String = ""
For Each kl As String In pathlist
Dim locdoc As String = ""
Dim el() As String = Strings.Split(kl, ".")
Dim elc As Integer = el.Length - 1
Dim elz As Integer = -1
Dim cdoc As String
For Each ol As String In el
elz += 1
If elz = elc Then
locdoc += "Class " + ol + vbCrLf + _
"Inherits xconv" + vbCrLf + _
"End Class"
Else
locdoc += "Namespace " + ol + vbCrLf
cdoc += vbCrLf + "End Namespace"
End If
Next
locdoc += cdoc
thedoc += locdoc + vbCrLf + vbCrLf
Next
IO.File.WriteAllText("D:\start_list_dot_net.txt", thedoc)
End Sub
The real problem is that this is HORRIBLY SLOW and memory-intense (now i dot a OutOfMemory Exception), and I have no idea on how Intellisense would perform with the (not available in the near future) output of the Create_Autocomlete_List() sub.
I believe that it would be very slow.
So the real questions are: Am I doing this right? Is there any better way to map a list of strings to auto-completable strings? Is there any "standard" way to do this?
What would you do in this case?
I don't know how Visual Studio is going to perform with thousands of classes, but your Create_Autocomlete_List method can be optimized to minimize memory usage by not storing everything in memory as you build the source code. This should also speed things up considerably.
It can also be simplified, since nested namespaces can be declared on one line, e.g. Namespace First.Second.Third.
Sub Create_Autocomlete_List()
Using output As StreamWriter = IO.File.CreateText("D:\start_list_dot_net.txt")
For Each line As String In IO.File.ReadLines("D:\list.txt")
Dim lastDotPos As Integer = line.LastIndexOf("."c)
Dim nsName As String = line.Substring(0, lastDotPos)
Dim clsName As String = line.Substring(lastDotPos + 1)
output.Write("Namespace ")
output.WriteLine(nsName)
output.Write(" Class ")
output.WriteLine(clsName)
output.WriteLine(" Inherits xconv")
output.WriteLine(" End Class")
output.WriteLine("End Namespace")
output.WriteLine()
Next
End Using
End Sub
Note the use of File.ReadLines instead of File.ReadAllLines, which returns an IEnumerable instead of an array. Also note that the output is written directly to the file, instead of being built in memory.
Note Based on your sample data, you may run into issues where the last node is not a valid class name. e.g. InternetGatewayDevice.DeviceInfo.ProcessStatus.Process.1 - 1 is not a valid class name in VB.NET. You will need to devise some mechanism to deal with this - maybe some unique prefix that you could strip in your widening operator.
I'm also not sure how usable the resulting classes will be, since presumably you would need to pass an instance to the methods:
GetValue(New InternetGatewayDevice.DeviceInfo.Description())
It seems like it would be nicer to have Shared strings on a class:
Namespace InternetGatewayDevice
Class DeviceInfo
Public Shared Description As String = "Description"
Public Shared HardwareVersion As String = "HardwareVersion"
' etc.
End Class
End Namespace
So you could just reference those strings:
GetValue(InternetGatewayDevice.DeviceInfo.Description)
However, I think that would be a lot harder to generate without creating name clashes due to the various levels of nesting.

Find a variable with concatenation

I would like to find a variable with concatenation.
Exemple :
Dim oExcelRangeArray1(0, 0) As Object
Dim oExcelRangeArray2(0, 0) As Object
Dim oExcelRangeArray3(0, 0) As Object
For i As Integer = 1 To 3
oExcelRangeArray & i = xl.Range("A1:Z400").Value
Next
but oExcelRangeArray & i doesn't work.
Thank you
For the extent of my knowledge, there is no way to achieve what you are trying to do directly, because oExcelRangeArray & i will not be evaluated as a separate step before the variable assignment happens.
In my mind you have two choices:
Assign each variable individually,
oExcelRangeArray1 = x1.Range("A1:Z400").Value
oExcelRangeArray2 = x1.Range("A1:Z400").Value
oExcelRangeArray3 = x1.Range("A1:Z400").Value
oExcelRangeArray4 = x1.Range("A1:Z400").Value
Or, add each array to a list, and iterate through it,
Dim oExcelRangeArrayList As New List(Of Object)
oExcelRangeArrayList.Add(oExcelRangeArray1)
oExcelRangeArrayList.Add(oExcelRangeArray2)
oExcelRangeArrayList.Add(oExcelRangeArray3)
oExcelRangeArrayList.Add(oExcelRangeArray4)
For i As Integer = 0 To 3
oExcelRangeArrayList(i) = x1.Range("A1:Z400").Value
Next
[Note: Writing this freehand without checking it, code may not be verbatim; hopefully you get the concept. Corrections welcome.]

linq submitchanges runs out of memory

I have a database with about 180,000 records. I'm trying to attach a pdf file to each of those records. Each pdf is about 250 kb in size. However, after about a minute my program starts taking about about a GB of memory and I have to stop it. I tried doing it so the reference to each linq object is removed once it's updated but that doesn't seem to help. How can I make it clear the reference?
Thanks for your help
Private Sub uploadPDFs(ByVal args() As String)
Dim indexFiles = (From indexFile In dataContext.IndexFiles
Where indexFile.PDFContent = Nothing
Order By indexFile.PDFFolder).ToList
Dim currentDirectory As IO.DirectoryInfo
Dim currentFile As IO.FileInfo
Dim tempIndexFile As IndexFile
While indexFiles.Count > 0
tempIndexFile = indexFiles(0)
indexFiles = indexFiles.Skip(1).ToList
currentDirectory = 'I set the directory that I need
currentFile = 'I get the file that I need
writePDF(currentDirectory, currentFile, tempIndexFile)
End While
End Sub
Private Sub writePDF(ByVal directory As IO.DirectoryInfo, ByVal file As IO.FileInfo, ByVal indexFile As IndexFile)
Dim bytes() As Byte
bytes = getFileStream(file)
indexFile.PDFContent = bytes
dataContext.SubmitChanges()
counter += 1
If counter Mod 10 = 0 Then Console.WriteLine(" saved file " & file.Name & " at " & directory.Name)
End Sub
Private Function getFileStream(ByVal fileInfo As IO.FileInfo) As Byte()
Dim fileStream = fileInfo.OpenRead()
Dim bytesLength As Long = fileStream.Length
Dim bytes(bytesLength) As Byte
fileStream.Read(bytes, 0, bytesLength)
fileStream.Close()
Return bytes
End Function
I suggest you perform this in batches, using Take (before the call to ToList) to process a particular number of items at a time. Read (say) 10, set the PDFContent on all of them, call SubmitChanges, and then start again. (I'm not sure offhand whether you should start with a new DataContext at that point, but it might be cleanest to do so.)
As an aside, your code to read the contents of a file is broken in at least a couple of ways - but it would be simpler just to use File.ReadAllBytes in the first place.
Also, your way of handling the list gradually shrinking is really inefficient - after fetching 180,000 records, you're then building a new list with 179,999 records, then another with 179,998 records etc.
Does the DataContext have ObjectTrackingEnabled set to true (the default value)? If so, then it will try to keep a record of essentially all the data it touches, thus preventing the garbage collector from being able to collect any of it.
If so, you should be able to fix the situation by periodically disposing the DataContext and creating a new one, or turning object tracking off.
OK. To use the smallest amount of memory we have to update the datacontext in blocks. I've put a sample code below. Might have sytax errors since I'm using notepad to type it in.
Dim DB as YourDataContext = new YourDataContext
Dim BlockSize as integer = 25
Dim AllItems = DB.Items.Where(function(i) i.PDFfile.HasValue=False)
Dim count = 0
Dim tmpDB as YourDataContext = new YourDataContext
While (count < AllITems.Count)
Dim _item = tmpDB.Items.Single(function(i) i.recordID=AllItems.Item(count).recordID)
_item.PDF = GetPDF()
Count +=1
if count mod BlockSize = 0 or count = AllItems.Count then
tmpDB.SubmitChanges()
tmpDB = new YourDataContext
GC.Collect()
end if
End While
To Further optimise the speed you can get the recordID's into an array from allitems as an anonymous type, and set DelayLoading on for that PDF field.