I need help creating a TaskScheduler to prevent threading overload - vb.net

I want to add workers into a queue, but only have the first N workers processing in parallel. All samples I find are in C#.
This is probably simple for a programmer, but I'm not one. I know enough about VB to write simple programs.
But my first application runs fine until it suddenly hits 100% CPU and then crashes. Help, please (Yes, I've wasted 5 hours of work time searching before posting this...)
More Context: Performing a recursive inventory of directory structures, files, and permissions across file servers with over 1 million directories/subdirectories.
Process runs serially, but will take months to complete. Management already breathing on my neck. When I try using Tasks, it goes to about 1000 threads, then hits 100% CPU, stops responding, then crashes. This is on a 16 core server with 112 GB RAM.
--Added
So, with the sample provided on using Semaphores, this is what I've put in:
Public Class InvDir
Private mSm as Semaphore
Public Sub New(ByVal maxPrc As Integer)
mSm = New Semaphore(maxPrc, maxPrc)
End Sub
Public Sub GetInventory(ByVal Path As String, ByRef Totals As Object, ByRef MyData As Object)
mSm.WaitOne()
Task.Factory.StartNew(Sub()
Dim CurDir As New IO.DirectoryInfo(Path)
Totals.SubDirectoryCount += CurDir.GetDirectories().Count
Totals.FilesCount += CurDir.GetFiles().Count
For Each CurFile As IO.FileInfo in CurDir.EnumerateFiles()
MyData.AddFile(CurFile.FileName, CurFile.Extension, CurFile.FullName, CurFile.Length)
Next
End Sub).ContinueWith(Function(x) mSm.Release())
End Sub
End Class

You're attempting multithreading with disk I/O. It might be getting slower because you're throwing more threads at it. No matter how many threads there are, the disk can physically only seek one position at a time. (In fact, you mentioned that it works serially.)
If you did want to limit the number of concurrent threads you could use a Semaphore. A semaphore is like a syncLock except you can specify how many threads are allowed to execute the code at a time. In the example below, the semaphore allows three threads to execute. Any more than that have to wait until one finishes. Some modified code from the MSDN page:
Public Class Example
' A semaphore that simulates a limited resource pool.
'
Private Shared _pool As Semaphore
<MTAThread> _
Public Shared Sub Main()
' Create a semaphore that can satisfy up to three
' concurrent requests. Use an initial count of zero,
' so that the entire semaphore count is initially
' owned by the main program thread.
'
_pool = New Semaphore(0, 3)
End Sub
Private Sub SomeWorkerMethod()
'This is the method that would be called using a Task.
_pool.WaitOne()
Try
'Do whatever
Finally
_pool.Release()
End Try
End Sub
End Class
Every new thread must call _pool.WaitOne(). That tells it to wait its turn until there are fewer than three threads executing. Every thread blocks until the semaphore allows it to pass.
Every thread must also call _pool.Release() to let the semaphore know that it can allow the next waiting thread to begin. That's important, even if there's an exception. If threads don't call Release() then the semaphore will just block them forever.
If it's really going to take five months, what about cloning the drive and running the check on multiple instances of the same drive, each looking at different sections?

Related

VB.NET multithreading, block thread until notification received

Before I begin, I have to apologize for two things. One is that it is very difficult for me to explain things in a concise manner. Two is that I need to be somewhat vague due to the nature of the company I work for.
I am working on enhancing the functionality of an application that I've inherited. It is a very intensive application that runs a good portion of my company's day to day business. Because of this I am limited to the scope of what I can change--otherwise I'd probably rewrite it from scratch. Anyways, here is what I need to do:
I have several threads that all perform the same task but on different data input streams. Each thread interacts through an API from another software system we pay licensing on to write out to what is called channels. Unfortunately we have only licensed a certain number of concurrently running channels, so this application is supposed to turn them on an off as needed.
Each thread should wait until there is an available channel, lock the channel for itself and perform its processing and then release the channel. Unfortunately, I don't know how to do this, especially across multiple threads. I also don't really know what to search Google or this site for, or I'd probably have my answer. This was my thought:
A class that handles the distribution of channel numbers. Each thread makes a call to a member of this class. When it does this it would enter a queue and block until the channel handling class recognizes that we have a channel, signals the waiting thread that a channel is available and passing it the channel id. I have no idea where to begin even looking this up. Below I have some horribly written PsuedoCode of how in my mind I would think it would work.
Public Class ChannelHandler
Private Shared WaitQueue as New Queue(of Thread)
'// calling thread adds itself to the queue
Public Shared Sub WaitForChannel(byref t as thread)
WaitQueue.enqueue(t)
End Sub
Public Shared Sub ReleaseChannel(chanNum as integer)
'// my own processing to make the chan num available again
End Sub
'// this would be running on a separate thread, polling my database
'// for an available channel, when it finds one, somehow signal
'// the first thread in the queue that its got a channel and here's the id
Public Shared Sub ChannelLoop()
while true
if WaitQueue.length > 0 then
if thereIsAChannelAvailable then '//i can figure this out my own
dim t as thread = ctype(WaitQueue.dequeue(), Thread)
lockTheChannel(TheAvailableChannelNumber) 'performed by me
'// signal the thread, passing it the channel number
t => SignalReady(theAvailableChannelNumber) '// how to signal?
end if
end if
end while
End Sub
End Class
and then
'// this inside the function that is doing the processing:
ChannelHandler.requestChannel(CurrentThread)
while (waitingForSignal) '// how?
block '// how?
dim channelNumber as int => getChannelNumberThatWasSignaledBack
'// perform processing with channelNumber
ChannelHandler.ReleaseChannel(channelNumber)
I am working with the .NET Framework 3.5 in VB.NET. I am sure there has got to be some sort of mechanism already built for this, but as I said I have no idea exactly what keywords I should be searching for. Any input pointing me in the right direction (ie specific .NET framework classes to use or code samples) would be greatly appreciated. If I need to elaborate on anything, please let me know and I will to the best of my ability.
Edit: The other problem that I have is that these channels can be turned on/off from outside of this application, manually by the user (or as a result of a user initiated event). I am not concerned with a channel be shut down while a thread is using it (it would throw an exception and then pick back up next time it came through. But the issue is that there are not a constant number of threads fighting over a constant number of channels (if a user turns one on manually, the count is reduced, etc). Both items are variable, so I cant rely on the fact that there are no external forces (ie, something outside this set of threads, which is why I do some processing via my DB to determine an available channel number)
What I would do:
Switch the System.Threading.Thread by the System.Threading.Tasks.Task class.
If a new Task needs to be created, but the List(Of Task) (or, in your example, Queue(Of Task) ) count greater than the maximum permitted, use the Task.WaitAny method.
EDIT:
As I answered the previous block on my phone (which is pretty challenging for writing code), let now me write an example about how I would do it:
Imports System.Threading.Tasks
Imports System.Collections.Generic
Public Class Sample
Private Const MAXIMUM_PERMITTED As Integer = 3
Private _waitQueue As New Queue(Of Task)
Public Sub AssignChannel()
Static Dim queueManagerCreated As Boolean
If Not queueManagerCreated Then
Task.Factory.StartNew(Sub() ManageQueue())
queueManagerCreated = True
End If
Dim newTask As New Task(Sub()
' Connect to 3rd Party software
End Sub)
SyncLock (_waitQueue)
_waitQueue.Enqueue(newTask)
End SyncLock
End Sub
Private Sub ManageQueue()
Dim tasksRunning As New List(Of Task)
While True
If _waitQueue.Count <= 0 Then
Threading.Thread.Sleep(10)
Continue While
End If
If tasksRunning.Count > MAXIMUM_PERMITTED Then
Dim endedTaskPos As Integer = Task.WaitAny(tasksRunning.ToArray)
If endedTaskPos > -1 AndAlso
endedTaskPos <= tasksRunning.Count Then
tasksRunning.RemoveAt(endedTaskPos)
Else
Continue While
End If
End If
Dim taskToStart As Task
SyncLock (_waitQueue)
taskToStart = _waitQueue.Dequeue()
End SyncLock
tasksRunning.Add(taskToStart)
taskToStart.Start()
End While
End Sub
End Class

Should I double check before and after locking a list?

I have an in-application service which allows me to feed it messages from various sources, which will be put into a simple list. The service, running in its own thread, will, periodically, process all messages in the list into various files; one file for each source, which are then managed for size.
My question is about the proper way to check for messages and performing a lock around the code which accesses the list. There are only two places which access the list; one is where a message is added to the list and the other is where the messages are dumped from the list into a processing list.
Adding a message to the list:
Public Sub WriteMessage(ByVal messageProvider As IEventLogMessageProvider, ByVal logLevel As EventLogLevel, ByVal message As String)
SyncLock _SyncLockObject
_LogMessages.Add(New EventLogMessage(messageProvider, logLevel, Now, message))
End SyncLock
End Sub
Processing the list:
Dim localList As New List(Of EventLogMessage)
SyncLock _SyncLockObject
If (_LogMessages.Count > 0) Then
localList.AddRange(_LogMessages)
_LogMessages.Clear()
End If
End SyncLock
' process list into files...
My questions are: should I do a double check when I am processing the list, see below? And why? Or why not? And are there any dangers in accessing the list’s count property outside of the lock? Are either of the methods better or more efficient? And why? Or why not?
Dim localList As New List(Of EventLogMessage)
If (_LogMessages.Count > 0) Then
SyncLock _SyncLockObject
If (_LogMessages.Count > 0) Then
localList.AddRange(_LogMessages)
_LogMessages.Clear()
End If
End SyncLock
End If
' process list into files...
I understand that in this particular case, it may not matter if I do a double check given the fact that, outside of the processing function, the list can only grow. But this is my working example and I’m trying to learn about the finer details of threading.
Thank you in advance for any insights…
After some further research, thank you 'the coon', and some experimental programming, I have some further thoughts.
Concerning the ReaderWriterLockSlim, I have the following example which seems to work fine. It allows me to read the number of messages in the list without interfering with other code which may be trying to read the number of messages in the list, or the messages themselves. And when I desire to process the list, I can upgrade my lock to write mode, dump the messages into a processing list and process them outside of any read/write locks, thus not blocking any other threads which may want to add, or read, more messages.
Please note, that this example uses a simpler construct for the message, a String, as opposed to the previous example which used a Type along with some other metadata.
Private _ReadWriteLock As New Threading.ReaderWriterLockSlim()
Private Sub Process()
' create local processing list
Dim processList As New List(Of String)
Try
' enter read lock mode
_ReadWriteLock.EnterUpgradeableReadLock()
' if there are any messages in the 'global' list
' then dump them into the local processing list
If (_Messages.Count > 0) Then
Try
' upgrade to a write lock to prevent others from writing to
' the 'global' list while this reads and clears the 'global' list
_ReadWriteLock.EnterWriteLock()
processList.AddRange(_Messages)
_Messages.Clear()
Finally
' alway release the write lock
_ReadWriteLock.ExitWriteLock()
End Try
End If
Finally
' always release the read lock
_ReadWriteLock.ExitUpgradeableReadLock()
End Try
' if any messages were dumped into the local processing list, process them
If (processList.Count > 0) Then
ProcessMessages(processList)
End If
End Sub
Private Sub AddMessage(ByVal message As String)
Try
' enter write lock mode
_ReadWriteLock.EnterWriteLock()
_Messages.Add(message)
Finally
' always release the write lock
_ReadWriteLock.ExitWriteLock()
End Try
End Sub
The only problem I see with this technique is that the developer must be diligent about acquiring and releasing the locks. Otherwise, deadlocks will occur.
As to whether this is more efficient than using a SyncLock, I really could not say. For this particular example and its usage, I believe either would suffice. I would not do the double check for the very reasons ‘the coon’ gave about reading the count while someone else is changing it. Given this example, the SyncLock would provide the same functionality. However, in a slightly more complex system, one where multiple sources might read and write to the list, the ReaderWriterLockSlim would be ideal.
Concerning the BlockingCollection list, the following example works like the one above.
Private _Messages As New System.Collections.Concurrent.BlockingCollection(Of String)
Private Sub Process()
' process each message in the list
For Each item In _Messages
ProcessMessage(_Messages.Take())
Next
End Sub
Private Sub AddMessage(ByVal message As String)
' add a message to the 'global' list
_Messages.Add(message)
End Sub
Simplicity itself…
Theory:
Once a thread acquires the _SyncLockObject lock all other threads reentering that method will have to wait for the lock to be released.
So the check before and after the lock is irrelevant. In other words, it will have no effect. It is also not safe, because you're not using a concurrent list.
If one thread happens to check the Count in the first test while another is clearing or adding to the collection, then you'll get an exception with Collection was modified; enumeration operation may not execute.. Also, the second check can only be executed by one thread at a time (since it's synced).
This applies for your Add method as well. While the lock is owned by one thread (meaning the execution flow has reached that line), no other threads will be able to process or add to the list.
You should be careful to also lock if you are just reading from the list in some other places in your application. For more complex read/write scenarios (such as a custom concurrent collection), I recommend using ReaderWriterLockSlim.
Practice:
Use a BlockingCollection, since it is thread safe (i.e. it handles concurrency internally).

Code takes much longer to execute on a seperate thread in .net

In my VB.NET program is a time consuming function that gets data and updates the UI at a periodic interval. I moved this function to another thread, but it now takes much longer to execute. Using the stopwatch class, I calculated that when it is part of the main thread, it takes 130 ms, but in the separate thread it takes 542 ms, so that's more than 4 times slower.
My CPU is a Core I5 M520 (2 cores), so I don't now why is it taking so much longer.
I am using the System.Threading.Thread class. I also tried to set the new thread's priority higher, but this had no effect.
Why is the separate thread taking so much longer and is there a way I can speed it up?
Thanks
The code:
Public Sub update(ByVal temp As Visual)
SyncLock mUpdateQueue
If Not mUpdateQueue.Contains(temp) Then
mUpdateQueue.Enqueue(temp)
End If
End SyncLock
If Not mainThread.IsAlive Then ' moet hierdie beter doen
mainThread = New Thread(AddressOf DataFetchThread)
mainThread.Start()
End If
End Sub
Private Sub DataFetchThread()
Dim s As New Stopwatch()
s.Start()
Dim temp As Visual = Nothing
While mUpdateQueue.Count > 0
SyncLock mUpdateQueue
temp = mUpdateQueue.Peek()
End SyncLock
mDataCollector.updateV(temp)
SyncLock mUpdateQueue
mUpdateQueue.Dequeue()
End SyncLock
End While
s.Stop()
Debug.WriteLine("thread run time: " & s.ElapsedMilliseconds)
End Sub
mDataCollector.updateV(temp): This function get data from a database and plots the points on a picturebox to create a graph. It wouldn't make a lot of sense to add all of the code here.
To ask this question in another way: Is it normal that the second thread takes much longer to execute or is there something wrong with my code?
You are accessing the mUpdateQueue variable from multiple threads and using locks to gaurd access to it. This is fine, but using locks has an overhead (to aquire the lock, and during the time that the other threads wait to aquire the lock). This is probably why your new thread is taking longer: it is waiting on the locking.
You could try using the ReaderWriterLockSlim class which may provide faster access to your variables. Just remember that it implements IDisposable so you need to call Dispose on it when you're done with it.

Dividing work into multiple threads

I've read a lot of different questions on SO about multithreaded applications and how to split work up between them, but none really seem to fit what I need for this. Here's how my program currently basically works:
Module Module1
'string X declared out here
Sub Main()
'Start given number of threads of Main2()
End Sub
Sub Main2()
'Loops forever
'Call X = nextvalue(X), display info as needed
End Sub
Function nextvalue(byval Y as string)
'Determines the next Y in the sequence
End Function
End Module
This is only a rough outline of what actually happens in my code by the way.
My problem being that if multiple threads start running Main2(), they're dealing with the same X value as in the other threads. The loop inside of main2 executes multiple times per millisecond, so I can't just stagger the loops. There is often duplication of work done.
How can I properly divide up the work so that the two threads running simultaneously never have the same work to run?
You should synchronize the generation and storage of X so that the composite operation appears atomic to all threads.
Module Module1
Private X As String
Private LockObj As Object = New Object()
Private Sub Main2()
Do While True
' This will be used to store a snapshot of X that can be used safely by the current thread.
Dim copy As String
' Generate and store the next value atomically.
SyncLock LockObj
X = nextValue(X)
copy = X
End SyncLock
' Now you can perform operations against the local copy.
' Do not access X outside of the lock above.
Console.WriteLine(copy)
Loop
End Sub
End Module
A thread manager is required to manage the threads and the work that they do. Say it is desirable to split up the work into 10 threads.
Start the manager
Manager creates 10 threads
Assign work to the manager (queue up the work, let's say it queues up 10000 work items)
Manager assigns a work item to complete for each of the 10 threads.
As threads finish thier work, they report back to the manager that they are done and recieve another work item. The queue of work should be thread safe so that items can be enqueued and dequeued. The manager handles the management of work items. The threads just execute the work.
Once this is in place, work items should never be duplicated amongst threads.
Use a lock so that only one thread can access X at a time. Once one thread is done with it, another thread is able to use it. This will prevent two threads from calling nextvalue(x) with the same value.

How to limit CPU usage in a while loop

How do you limit the CPU of a while loop?
In this case, the code which is inside the while loop:
Private Sub wait(ByVal time)
Dim sw As New Stopwatch
sw.Start()
Do While sw.ElapsedMilliseconds < time And StillOpen = True
Application.DoEvents()
Loop
sw.Stop()
End Sub
But now, here is the issue. This loop is allowing the while loop to run every second, once a second, and the wait sub is causing this delay, as it should.
How can I limit the CPU that this is taking up? For some reason, my task manager says it is taking 50 CPUs to run this simple task, yet it should probably take no more than 1 or 2. Though the manager says it is taking that much CPU, my computer speed is not being affected at all, which is odd considering it is a two-year-old laptop.
I don't want any users to freak out about it, but knowing how people are these days....
Anyway, the language is vb.net. Can someone please help me?
Thanks!
EDIT: To clarify, that code is not inside the while loop itself, but a call for the subroutine is, i.e. wait(1000)
Use a timer event !!! Nearly no cpu effort.
You could always perform some kind of sleep between iterations of the loop...
I'm not familiar with VB.NET but a duration of 100-200ms will probably be more than enough to drop the CPU usage.
Eg:
Do while (...)
Application.blah();
System.Threading.Thread.Sleep(150);
End
Edit After some research, I think the function you want is: System.Threading.Thread.Sleep()
Your code is executing Application.DoEvents() constantly in the while loop, for the time duration specified in your time parameter. This will consume one core of your CPU, which is why you're seeing 50% processor usage (you have a dual-core processor, correct?). This is an ugly way to wait. You could instead call Thread.Sleep(), passing it the number of milliseconds you'd like your thread to wait.
If you'd like your application to stay responsive, you might also spin off a timer, and block the UI from any action until the timer triggers. Something like (lightly tested):
// constructor or designer code
System.Windows.Forms.Timer timer = new System.Windows.Forms.Timer();
timer.Tick += new EventHandler(timer_Tick);
void Wait(int interval)
{
timer.Interval = interval;
timer.Start();
BlockUIOperations(); // implement yourself
}
void timer_Tick(object sender, EventArgs e)
{
timer.Stop();
EnableUIOperations(); // implement yourself
}
Here's my attempt at a translation into VB:
'' Add a Timer object to the form named "Timer".
'' Hook its Tick event to Timer_Tick
Private Sub Wait(ByVal interval As Integer)
Timer.Interval = interval
Timer.Start()
BlockUIOperations() '' implement yourself
End Sub
Private Sub Timer_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Timer.Tick
Timer.Stop()
EnableUIOperations() '' implement yourself
End Sub
Well, the CPU is always running at 100% when it's running, so the only practical way to limit the CPU usage is to run bursts or loop and sleeping in between.
Laptop CPUs usually have some SpeedStep technology or equievalent that will slow down the CPU when it's not working hard, but it's not reasonable to assume that your application would have access to control that, at least not directly. You might be able to affect it indirectly by measuring the CPU usage and adjust the length of the work and sleep cycles to get the desired result.
If you don't mind blocking the current thread, you could use a WaitHandle.
Public Sub Wait(ByVal ms As Integer)
Using wh As New ManualResetEvent(False)
wh.WaitOne(ms)
End Using
End Sub
Sub Main()
Console.WriteLine("Hello World!")
Wait(5000)
Console.WriteLine("Good-Bye!")
End Sub
Of course, something more complex can be constructed depending on what you are trying to accomplish.
This is perfect as a VB.net sleep replacement. Now my console app is NOT reported as non responsive since I have no sleep commands!
Just add Imports System.Threading above your module and place this just above your sub main
Public Sub Wait(ByVal ms As Integer)
Using wh As New ManualResetEvent(False)
wh.WaitOne(ms)
End Using
End Sub
Then, in your sub main, use
wait(100)
to pause your app for 100 miliseconds.
Have fun
You should take note of if you are doing this in the main UI Thread or a thread you have spun off.
For Threads the easiest way is to just Thread.Sleep(x miliseconds)
On the main UI thread I tend to use a DoEvents function in vb.net and vb6 like this
Public Sub TimeKiller(byval secondstowait as integer)
dim tmptime as datetime = datetime.now
do while datetime.now < dateadd("s",secondstowait,tmptime)
Application.Doevents
end while
End Sub
On the question of CPU usage I look at it like this.... if you make just a hard loop that like
while true
end while
I would expect to see very high cpu usage over 50% because the UI thread is hard blocking on this.... in most cases the windows system will limit the cpu usage of any given program so that its threads dont block the entire system.
The DoEvents ensure that windows message pumps fire correct and respond to correct. It also ensures that the garbage collector fires on time.
Also if you have other threads spun up off of your UI.Thread your UI.Thread can respond to events fired from these other threads....
In such cases where your calling form controls from other threads and do form.InvokeRequired routines will be able to respond correctly.
Also The only time you should be hard looping on the MainUI thread is when it is in response to some user activity and you need to put waits in for the user to see progress of something....
If it is some kind of automated process that is always running... look to moving it to another thread.
Or if its something that runs periodically on a timer or a time that kicks off a thread.
Somebody please tell me if I am wrong on these assumptions....
Not sure about the Using wh As New ManualResetEvent(False) wh.WaitOne(ms) as I have never heard of that and have no idea what that does.