I have the need to process millions of files. Currently, I use a custom thread manager to do the work by using a DataGridView to keep track of the threads and a timer to check if more threads can start; kind of like (sudo):
Private Sub ThreadManager()
If AVailableThreads > 0 then
Dim t as Threading.Thread = New Thread(AddressOf MyThread)
t.Start()
AvailableThreads = AvailableThreads - 1
ThreadManager()
End If
End Sub
This has many drawbacks, the main ones being that the CPU and memory usage is high as each of the above threads process a full directory instead of each file independently.
So I have rewritten the process. Now I have a class that will perform the process at the file level and report back to the main thread the results; like so:
Imports System.IO
Public Class ImportFile
Public Class ImportFile_state
Public ID as Long = Nothing
Public FilePath as String = Nothing
Public Result as Boolean = False
End Class
Public Event ReportState(ByVal state as ImportFile_state)
Dim _state as ImportFile_state = New ImportFile_State
Public Sub New(ByVal ID as Long, ByVal FilePath as String)
MyBase.New()
_state.ID = ID
_state.FilePath = FilePath
End Sub
Public Sub GetInfo()
'Do the work here, but just return the result for this demonstration
Try
_state.Result = True
Catch ex As Exception
_state.Result = False
Finally
RaiseEvent ReportState(_state)
End Try
End Sub
End Class
The above class works like a charm and is very fast, uses almost no memory and next to nothing of the CPU. Albeit that I have only been able to test this with a few hundred threads using the Threading.Thread process.
Now I would like to use the ThreadPool.QueueUserWorkItem to execute the above class for each file allowing the system to control the number of threads to have running at any given time. However, I know I cannot just dump several million threads into the ThreadPool without locking up my server. I have done a lot of research on this and I have only been able to find examples/discussions on using the ThreadPool.QueueUserWorkItem for a few threads. What I need is to fire off millions of these threads.
So, I have two questions: 1) Should I even be trying to use the ThreadPool.QueueUserWorkItem to run this many threads, and 2) is the code below sufficient to perform this process without locking up my server?
Here is my code so far:
For Each subdir As String In Directory.GetDirectories(DirPath)
For Each fl In Directory.GetFiles(subdir)
'MsgBox(fl)
Dim f As ImportFile = New ImportFile(0, fl)
AddHandler f.ReportState, AddressOf GetResult
ThreadPool.QueueUserWorkItem(New Threading.WaitCallback(AddressOf f.GetInfo))
ThreadPool.GetAvailableThreads(worker, io)
Do While (worker) <= 0
Thread.Sleep(5000)
ThreadPool.GetAvailableThreads(worker, io)
Loop
Next
Next
Related
I am trying to start using tasks, but I wanted to compare the speed difference when using a standard HttpWebRequest.BeginGetResponse.
From what I have found, it is taking ~600ms to send and complete 100 requests to example.com using BeginGetResponse
However, using Await GetResponseAsync is taking 5x that. Around 3000ms. In production, that really matters alot to me when scaled up. Am I doing something wrong, or is Await GetResponseAsync inherently slower than BeginGetResponse?
Imports System.Net
Public Class Form1
Private sw As New Stopwatch
Private respCounter As Integer
Private iterations As Integer = 100
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
sw.Start()
For i = 1 To iterations
Dim req As HttpWebRequest = HttpWebRequest.Create("http://example.com")
Dim state As New RequestState
state.req = req
req.BeginGetResponse(AddressOf respCallback, state)
Next
End Sub
Private Sub respCallback(ar As IAsyncResult)
Dim state As RequestState = ar.AsyncState
state.resp = state.req.EndGetResponse(ar)
state.respStream = state.resp.GetResponseStream
state.respStream.BeginRead(state.buffer, 0, 1024, AddressOf readCallback, state)
End Sub
Private Sub readCallback(ar As IAsyncResult)
Dim state As RequestState = ar.AsyncState
Dim read As Integer = state.respStream.EndRead(ar)
If read > 0 Then
state.respBody += System.Text.ASCIIEncoding.ASCII.GetString(state.buffer, 0, read)
state.respStream.BeginRead(state.buffer, 0, 1024, AddressOf readCallback, state)
Else
state.Dispose()
respCounter += 1
If respCounter = iterations Then
respCounter = 0
sw.Stop()
Debug.WriteLine(sw.ElapsedMilliseconds)
sw.Reset()
End If
End If
End Sub
Private Async Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
sw.Start()
For i = 1 To iterations
Dim req As HttpWebRequest = HttpWebRequest.Create("http://example.com")
Using resp As WebResponse = Await req.GetResponseAsync
Using sr As New IO.StreamReader(resp.GetResponseStream)
Dim respBody As String = Await sr.ReadToEndAsync
End Using
End Using
respCounter += 1
If respCounter = iterations Then
respCounter = 0
sw.Stop()
Debug.WriteLine(sw.ElapsedMilliseconds)
sw.Reset()
End If
Next
MsgBox("Execution!")
End Sub
End Class
Public Class RequestState
Implements IDisposable
Public req As HttpWebRequest
Public resp As HttpWebResponse
Public respStream As IO.Stream
Public buffer(1024) As Byte
Public respBody As String
#Region "IDisposable Support"
Private disposedValue As Boolean ' To detect redundant calls
' IDisposable
Protected Overridable Sub Dispose(disposing As Boolean)
If Not disposedValue Then
If disposing Then
' TODO: dispose managed state (managed objects).
respStream.Close()
respStream.Dispose()
resp.Close()
End If
' TODO: free unmanaged resources (unmanaged objects) and override Finalize() below.
' TODO: set large fields to null.
End If
disposedValue = True
End Sub
' TODO: override Finalize() only if Dispose(disposing As Boolean) above has code to free unmanaged resources.
'Protected Overrides Sub Finalize()
' ' Do not change this code. Put cleanup code in Dispose(disposing As Boolean) above.
' Dispose(False)
' MyBase.Finalize()
'End Sub
' This code added by Visual Basic to correctly implement the disposable pattern.
Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(disposing As Boolean) above.
Dispose(True)
' TODO: uncomment the following line if Finalize() is overridden above.
' GC.SuppressFinalize(Me)
End Sub
#End Region
End Class
is Await GetResponseAsync inherently slower than BeginGetResponse?
It's difficult to address your specific performance concern without a good Minimal, Complete, and Verifiable code example. That said…
It seems to me that you're comparing apples and oranges here. First and foremost, there's a major difference in the implementations. In your BeginGetResponse() version, you initiate all of the requests concurrently, so assuming the web server will tolerate it, they complete in parallel. In your GetResponseAsync() version, you only initiate a new request after the previous one completes.
This serialization will necessarily slow everything down.
Beyond that, the BeginGetResponse() version performs all of its work in the IOCP thread pool, while the GetResponseAsync() version uses the single UI thread to handle completion of I/O events. The UI thread is a bottleneck, both because it can only do one thing at a time, and because you have to wait for it to be available from performing other tasks before it can move on to dealing with the I/O completions (a variation on the "can only do one thing at a time" issue).
In addition to that, you also have to deal with the latency involved in the message loop that dequeues the asynchronous completions for execution in the UI thread.
It wouldn't surprise me at all to find that the GetResponseAsync() approach is slower, when used in the way you're using it.
If you want better performance from it, you should probably use ConfigureAwait(false) in your async calls. Of course, this assumes you can otherwise minimize the interaction with the UI thread (e.g. the processing of the results does not actually need direct posting back to the UI thread). But doing so will tell the framework to not bother marshaling completions back to the UI thread. At least in the code you posted, this would be safe, as you don't actually interact with UI objects in the async event handler method.
All that said, when I changed your code so that it would run the GetResponseAsync() version concurrently, I found that at least with the web server I tested with, it worked just as fast as the BeginGetResponse() version. It was able to complete 100 iterations in just over 10 seconds in both cases.
Private Async Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
sw.Start()
Dim tasks As List(Of Task(Of String)) = New List(Of Task(Of String))
For i = 1 To iterations
Dim req As HttpWebRequest = HttpWebRequest.Create("http://example.com/")
tasks.Add(ReadResponse(req))
Next
Await Task.WhenAll(tasks)
sw.Stop()
Debug.WriteLine(sw.ElapsedMilliseconds)
sw.Reset()
MsgBox("Execution!")
End Sub
Private Async Function ReadResponse(req As HttpWebRequest) As Task(Of String)
Using resp As WebResponse = Await req.GetResponseAsync
Using sr As New IO.StreamReader(resp.GetResponseStream)
Dim respBody As String = Await sr.ReadToEndAsync
Return respBody
End Using
End Using
End Function
It's possible with a faster web server, you might start to run into the UI-thread-as-a-bottleneck issue, but I would say the primary difference is likely just that the two implementations really aren't even logically the same.
Background:
I have a program that is processing lots of database records, and generating tasks to do. (In this case creating user accounts in AD).
Part of this is to create the user directories, for profiles and home directories, and setting the permissions on them.
This needs to wait until the ad account has replicated across all of our DC's.
So, my program will have a separate thread responsible for creating the directories, that will process a queue populated from the main thread.
I've done some research on Threading and come up with the following code pattern:
Imports System.Threading
Public Class Form1
Dim worker As Object
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
worker = New workerObj(AddressOf resultcallback)
Dim t As New Thread(AddressOf worker.mainloop)
End Sub
Public Sub resultcallback(ByVal item As String)
Outbox.AppendText(item)
End Sub
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
worker.addItem(inbox.Text)
End Sub
End Class
Public Delegate Sub resultcallback(ByVal item As String)
Public Class workerObj
Private myQueue As New Queue(Of String)
Private myCallback As resultcallback
Dim item As String = "nout"
Public Sub New(ByVal callbackdelegate As resultcallback)
myCallback = callbackdelegate
End Sub
Public Sub mainloop()
While True
If myQueue.Count > 0 Then
item = myQueue.Dequeue()
myCallBack(item)
End If
Thread.Sleep(5000)
End While
End Sub
Public Sub addItem(ByVal item As String)
myQueue.Enqueue(item)
End Sub
End Class
Problem:
On the line Dim t as new Thread.....
Error 1 Overload resolution failed because no accessible 'New' is most specific for these arguments:
'Public Sub New(start As System.Threading.ParameterizedThreadStart)': Not most specific.
'Public Sub New(start As System.Threading.ThreadStart)': Not most specific. n:\visual studio 2013\Projects\ThreadTest\ThreadTest\Form1.vb 7 13 ThreadTest
Can anyone help tell me where I have gone wrong?
Cheers.
Threads do not have a public constructor, you need to call Thread.Start. I'd suggest you don't do that though. Writing thread-safe code is tricky enough when you do know about multithreaded programming.
Eg in your code you modify a Queue from two different threads without locking. Queue isn't thread safe and you can corrupt the queue. You should lock access to it or use ConcurrentQueue which is thread-safe. Another error is trying to modify a TextBox from another thread - this will lead to an Exception because only the UI thread is allowed to modify UI controls.
A better option though is to use the ActionBlock class from the DataFlow library which already does what you want: queue requests and process them in one or more separate threads.
Your code can be as simple as this:
Dim myFileWorker=New ActionBlock(Of string)(Function(path) =>DoSomething(path))
For Each somePath As String in ListWithManyPaths
myFileWorker.Post(somePath)
Next somePath
myFileWorker.Complete()
myFileWorker.Completion.Wait()
By default only one path will be processed at a time. To process multiple paths you pass an ExecutionDataflowBlockOptions object with the desired MaxDegreeOfParallelism:
Dim options=New ExecutionDataflowBlockOptions() With { .MaxDegreeOfParallelism=5}
Dim myFileWorker=New ActionBlock(Of String) Function(path) DoSomething(path),options)
I'm not sure if this is the right approach but what I'm trying to do is create a queue that constantly runs so that I can keep adding things to it. Basically I would like to be able to add things to the queue and then it process it on a first come first served basis. I have the code below:
Namespace Managers
Public Class SQLQueueManager
Public Shared Sqlitems As New Queue
Public Sub StartProcessing()
Dim t1 As New Threading.Thread(AddressOf Process)
t1.Start()
End Sub
Public Shared Sub Process()
Do
SyncLock Sqlitems.SyncRoot
If Sqlitems.Count > 0 Then
SqlManager.ExecNonQuery(Sqlitems.Peek)
Sqlitems.Dequeue()
End If
End SyncLock
Loop
End Sub
End Class
End Namespace
i start this queue off using the following:
Dim sqlT As Thread
sqlT = New Thread(AddressOf SQLQueueManager.Process)
sqlT.Start()
and i add items to the queue using:
SQLQueueManager.Sqlitems.Enqueue(...)
Thanks
A custom approach like this can work. Consider, perhaps, using ThreadPool.QueueUserWorkItem :
ThreadPool.QueueUserWorkItem -- MSDN
I'm trying to run a multi-threaded console app in VB and am having thread cross-over. Basically I want to run 5 threads, have them continually access a queue, process, and repeat until there's nothing left. When all threads have processed I want them to do something else. I'm attempting to use SyncLock to prevent multiple threads from accessing but it does not seem to be working. Any help would be appreciated!
Dim iThread As Integer
Dim manualEvents(4) As ManualResetEvent
Sub Main()
For i = 0 To 4
manualEvents(i) = New ManualResetEvent(False)
ThreadPool.QueueUserWorkItem(AddressOf DoOne)
Next
For Each handle As WaitHandle In manualEvents
handle.WaitOne()
Next
' do other stuff
EndSub
Private Sub DoOne()
Dim lockObject As New Object()
SyncLock (lockObject)
If QueryQueue.DoOne() Then
DoOne()
Else
manualEvents(iThread).Set()
iThread = iThread + 1
End If
End SyncLock
End Sub
The problem is with the locked resource, you're using lockObject as a synchronization lock resource which should be shared accros threads.
You have to make it an instance field.
Private Shared lockObject As New Object()
Private Sub DoOne()
SyncLock (lockObject)
If QueryQueue.DoOne() Then
DoOne()
Else
manualEvents(iThread).Set()
iThread = iThread + 1
End If
End SyncLock
End Sub
The problem is that you are creating and using a new instance of an object for locking on each thread. The naive solution is to promote lockObject from a local variable to class variable. That way each thread is using the same object to lock on. I say this is naive because you have exchanged one problem for another (albeit less severe). The new problem is that you have now made your parallel algorithm a serial algorithm since only one thread can being doing work at any given time.
The solution would be to lock access to the queue only while it is being changed. Then operate on the dequeued objects outside the lock so that the threads can perform work concurrently.
If .NET 4.0 is available to you could refactored the code like this.
Public Class Example
Private m_Queue As ConcurrentQueue(Of SomeObject) = New ConcurrentQueue(Of SomeObject)()
Public Shared Sub Main()
' Populate the queue here.
Dim finished = New CountdownEvent(1)
For i As Integer = 0 to 4
finsihed.AddCount()
ThreadPool.QueueUserWorkItem(AddressOf DoOne, finished)
Next
finished.Signal()
finished.Wait()
End Sub
Private Shared Sub DoOne(ByVal state As Object)
Try
Dim item As SomeObject = Nothing
Do While m_Queue.TryDequeue(item) Then
' Process the dequeued item here.
Loop
' There is nothing left so do something else now.
Finally
Dim finished = DirectCast(state, CountdownEvent)
finished.Signal()
End Try
End Sub
End Class
I used ConcurrentQueue to avoid having to use SyncLock entirely. I used CountdownEvent as a more scalable alternative to wait for work items to complete.
You need to share the same lockObject across the threads by making it an instance variable.
I'm looking to pass two or more parameters to a thread in VB 2008.
The following method (modified) works fine without parameters, and my status bar gets updated very cool-y.
But I can't seem to make it work with one, two or more parameters.
This is the pseudo code of what I'm thinking should happen when the button is pressed:
Private Sub Btn_Click()
Dim evaluator As New Thread(AddressOf Me.testthread(goodList, 1))
evaluator.Start()
Exit Sub
This is the testthread method:
Private Sub testthread(ByRef goodList As List(Of OneItem), ByVal coolvalue As Integer)
StatusProgressBar.Maximum = 100000
While (coolvalue < 100000)
coolvalue = coolvalue + 1
StatusProgressBar.Value = coolvalue
lblPercent.Text = coolvalue & "%"
Me.StatusProgressBar.Refresh()
End While
End Sub
First of all: AddressOf just gets the delegate to a function - you cannot specify anything else (i.e. capture any variables).
Now, you can start up a thread in two possible ways.
Pass an Action in the constructor and just Start() the thread.
Pass a ParameterizedThreadStart and forward one extra object argument to the method pointed to when calling .Start(parameter)
I consider the latter option an anachronism from pre-generic, pre-lambda times - which have ended at the latest with VB10.
You could use that crude method and create a list or structure which you pass to your threading code as this single object parameter, but since we now have closures, you can just create the thread on an anonymous Sub that knows all necessary variables by itself (which is work performed for you by the compiler).
Soo ...
Dim Evaluator = New Thread(Sub() Me.TestThread(goodList, 1))
It's really just that ;)
Something like this (I'm not a VB programmer)
Public Class MyParameters
public Name As String
public Number As Integer
End Class
newThread as thread = new Thread( AddressOf DoWork)
Dim parameters As New MyParameters
parameters.Name = "Arne"
newThread.Start(parameters);
public shared sub DoWork(byval data as object)
{
dim parameters = CType(data, Parameters)
}
Dim evaluator As New Thread(Sub() Me.testthread(goodList, 1))
With evaluator
.IsBackground = True ' not necessary...
.Start()
End With
Well, the straightforward method is to create an appropriate class/structure which holds all your parameter values and pass that to the thread.
Another solution in VB10 is to use the fact that lambdas create a closure, which basically means the compiler doing the above automatically for you:
Dim evaluator As New Thread(Sub()
testthread(goodList, 1)
End Sub)
In addition to what Dario stated about the Delegates you could execute a delegate with several parameters:
Predefine your delegate:
Private Delegate Sub TestThreadDelegate(ByRef goodList As List(Of String), ByVal coolvalue As Integer)
Get a handle to the delegate, create parameters in an array, DynamicInvoke on the Delegate:
Dim tester As TestThreadDelegate = AddressOf Me.testthread
Dim params(1) As Object
params(0) = New List(Of String)
params(1) = 0
tester.DynamicInvoke(params)
Just create a class or structure that has two members, one List(Of OneItem) and the other Integer and send in an instance of that class.
Edit: Sorry, missed that you had problems with one parameter as well. Just look at Thread Constructor (ParameterizedThreadStart) and that page includes a simple sample.
Pass multiple parameter for VB.NET 3.5
Public Class MyWork
Public Structure thread_Data
Dim TCPIPAddr As String
Dim TCPIPPort As Integer
End Structure
Dim STthread_Data As thread_Data
STthread_Data.TCPIPAddr = "192.168.2.2"
STthread_Data.TCPIPPort = 80
Dim multiThread As Thread = New Thread(AddressOf testthread)
multiThread.SetApartmentState(ApartmentState.MTA)
multiThread.Start(STthread_Data)
Private Function testthread(ByVal STthread_Data As thread_Data)
Dim IPaddr as string = STthread_Data.TCPIPAddr
Dim IPport as integer = STthread_Data.TCPIPPort
'Your work'
End Function
End Class
I think this will help you...
Creating Threads and Passing Data at Start Time!
Imports System.Threading
' The ThreadWithState class contains the information needed for
' a task, and the method that executes the task.
Public Class ThreadWithState
' State information used in the task.
Private boilerplate As String
Private value As Integer
' The constructor obtains the state information.
Public Sub New(text As String, number As Integer)
boilerplate = text
value = number
End Sub
' The thread procedure performs the task, such as formatting
' and printing a document.
Public Sub ThreadProc()
Console.WriteLine(boilerplate, value)
End Sub
End Class
' Entry point for the example.
'
Public Class Example
Public Shared Sub Main()
' Supply the state information required by the task.
Dim tws As New ThreadWithState( _
"This report displays the number {0}.", 42)
' Create a thread to execute the task, and then
' start the thread.
Dim t As New Thread(New ThreadStart(AddressOf tws.ThreadProc))
t.Start()
Console.WriteLine("Main thread does some work, then waits.")
t.Join()
Console.WriteLine( _
"Independent task has completed main thread ends.")
End Sub
End Class
' The example displays the following output:
' Main thread does some work, then waits.
' This report displays the number 42.
' Independent task has completed; main thread ends.
With VB 14, you can do the following with Tuples:
Shared Sub _runner(data as (goodList As List(Of OneItem), coolvalue As Integer))
Console.WriteLine($"goodList: {data.goodList}")
Console.WriteLine($"coolvalue: {data.coolvalue}")
' do stuff...
End Sub
Dim thr As New Thread(AddressOf _runner)
thr.Start((myGoodList, cval))