Threadpool to Task Conversion - vb.net

My current solution uses the ThreadPool to process transactions. Every couple minutes I grab 1-200 transactions and queue each one via the QueueUserWorkItem function. Something like this where 'trans' is my collection of transactions:
For Each t As ManagerTransaction In trans
Threading.ThreadPool.QueueUserWorkItem(AddressOf ProcessManagerTransaction, t)
Next
I want to switch it over to use the TPL, however, after much research I am still unsure of the best way to go about it. I have the following options yet I haven't been able to find a general consensus on what the best practice is.
1) Threading.Tasks.Parallel.ForEach(trans, AddressOf ProcessManagerTransaction)
Where "t" is an individual transaction in my "trans" collection
2) Task.Factory.StartNew(AddressOf ProcessManagerTransaction, t)
2a) Task.Factory.StartNew(Sub() ProcessManagerTransaction(t)
And this combination of the two:
3) Task.Factory.StartNew(Function() Parallel.ForEach(trans, AddressOf ProcessManagerTransaction))

The first option is generally perferrable because it does everything you want: parallelization and propagation of errors. Options 2 and 3 require additional means to propagate errors.
Option 2 might come into play if you require having tasks so you can compose them with other tasks.
I do not really see a case where I would use option 3.

Related

Reactor - Stop source when first empty

I have a requirement like this.
Flux<Integer> s1 = .....;
s1.flatMap(value -> anotherSource.find(value));
I need a way to stop this s1 when anotherSource.find gives me first empty. how to do that?
Note:
One possible solution is to throw error then capture it to stop.
anotherSource.find(value).switchIfempty(Mono.error(..))
I am looking for better solution than this.
You won't find a specific operator for this, you'll have to combine operators to achieve it. (Note that doesn't make it a "hack" per-se, reactive frameworks are generally intended to be used in a way where you combine basic operators together to achieve your use-case.)
I would agree that using an error to achieve is far from ideal though as it potentially disrupts the flow of real errors in the reactive chain - so that should really be a last resort.
The approach I've generally taken in cases where I want the stream to stop based on an inner publisher is to materialise the inner stream, filter out the onComplete() signals and then re-add the onComplete() wherever appropriate (in this case, if it's empty.) You can then dematerialise the outer stream and it'll respond to the completed signal wherever you've injected it, stopping the stream:
s1.flatMap(
value ->
anotherSource
.find(value)
.materialize()
.filter(s -> !s.isOnComplete())
.defaultIfEmpty(Signal.complete()))
.dematerialize()
This has the advantage of preserving any error signals, while also not requiring another object or special value.

Killing a BPMN subprocess with an external throw event

I have a usage issue I need some advice on.
I have a process with a main flow which loops, retrying a task every n hours until either a condition is met or a timeout is reached. So far so good.
There is a transactional sub process triggered to run in parallel to this main loop which, for as long as this main loop is active, carries out its own looping behaviour (every x days). This second loop should run for as long as the main loop continues, and be killed as soon as the main loop reaches one of its progression criteria.
The way I'd like to model it would be to use a message/signal throw event from the main flow after it has passed its progress criteria, with a corresponding catch message/signal as a boundary event on the sub process, which then triggers a sub process end/terminate event inside the boundaries of the sub process.
I've looked long and hard at resources and the standard, and I can't see any examples of people using boundary events in this way (as an input from outside the sub process, leading to an end event inside the sub process). Any idea if this is valid?
If not valid, anyone have a better method for having a main flow kill a sub process in this way?
Main Process: Start, parallel gateway (fork), first branch contains subprocess 1, second branch contains subprocess 2, exclusive gateway (join), end.
Subprocess 1: Start, loop, exit from the loop under some condition, then end.
Subprocess 2: Start, loop, no end node.
This way, subprocess 2 can't cause an end of looping of its own. But subprocess 1 can end, and by the exclusive join gateway, subprocess 2 will end as well.
I'm not quite sure whether a parallel fork, followed by an exclusive join, is actually allowed formally in BPMN. But some tools can handle it, and I received this hint from a tool vendor (Bonita).

How can I use multi-threading (Parallel ForEach), and batch? Or should I handle it differently?

I'm creating an app to send out bulk emails, but I'm lacking understanding of batch, multi-threads, or the best way to handle it.
Say I do something like this:
Dim options as New ParallelOptions
options.MaxDegreeOfParallelism = Environment.ProcessorCount * 10
Parallel.ForEach(recipients.AsEnumerable(), options, _
Function(row)
Return SendEmail(args)
End Function)
What if there are a large amount of emails? I was thinking of adding a batch option, but not sure if it's beneficial. So if I have 500,000 subscribers, could this Parallel looping become unstable? Should there be a pause or sleep at some point? When thinking of emails I think of "Batch = #" but I'm unsure of how to take this concept into multi-threading like this or if I should be doing something entirely different.
I've heard of getting blacklisted if not handled correctly

How to go about testing go routines?

An example of this problem is when a user creates a resource and deletes a resource. We will perform the operation and also increment (decrement) a counter cache.
In testing, there is sometimes a race condition where the counter cache has not been updated by the go routine.
EDIT: Sorry about the confusion, to clarify: the counter cache is not in memory, it is actually a field in the database. The race condition is not to a variable in memory, it is actually that the goroutine might be slow to write into the database itself!
I currently use a 1 second sleep after the operation to ensure that the counter cache has been updated before testing the counter cache. Is there another way to test go routine without the arbitrary 1 second sleep to wait for the go routine to finish?
Cheers
In testing, there is sometimes a race condition where the counter cache has not been updated by the go routine. I currently use a 1 second sleep after the operation to ensure that the counter cache has been updated before testing the counter cache.
Yikes, I hate to say it, but you're doing it wrong. Go has first-class features to make concurrency easy! If you use them correctly, it's impossible to have race conditions.
In fact, there's a tool that will detect races for you. I'll bet it complains about your program.
One simple solution:
Have the main routine create a goroutine for keeping track of the counter.
the goroutine will just do a select and get a message to increment/decrement or read the counter. (If reading, it will be passed in a channel to return the number)
when you create/delete resources, send an appropriate message to the goroutine counter via it's channel.
when you want to read the counter, send a message for read, and then read the return channel.
(Another alternative would be to use locks. It would be a tiny bit more performant, but much more cumbersome to write and ensure it's correct.)
One solution is to make to let your counter offer a channel which is updated as soon as the value
changes. In go it is common practice to synchronize by communicating the result. For example your
Couter could look like this:
type Counter struct {
value int
ValueChange chan int
}
func (c *Counter) Change(n int) {
c.value += n
c.ValueChange <- c.value
}
Whenever Change is called, the new value is passed through the channel and whoever is
waiting for the value unblocks and continues execution, therefore synchronizing with the
counter. With this code you can listen on ValueChange for changes like this:
v := <-c.ValueChange
Concurrently calling c.Change is no problem anymore.
There is a runnable example on play.

VB.NET, best practice for sorting concurrent results of threadpooling?

In short, I'm trying to "sort" incoming results of using threadpooling as they finish. I have a functional solution, but there's no way in the world it's the best way to do it (it's prone to huge pauses). So here I am! I'll try to hit the bullet points of what's going on/what needs to happen and then post my current solution.
The intent of the code is to get information about files in a directory, and then write that to a text file.
I have a list (Counter.ListOfFiles) that is a list of the file paths sorted in a particular way. This is the guide that dictates the order I need to write to the text file.
I'm using a threadpool to collect the information about each file, create a stringbuilder with all of the text ready to write to the text file. I then call a procedure(SyncUpdate, inlcluded below), send the stringbuilder(strBld) from that thread along with the name of the path of the file that particular thread just wrote to the stringbuilder about(Xref).
The procedure includes a synclock to hold all the other threads until it finds a thread passing the correct information. That "correct" information being when the xref passed by the thread matches the first item in my list (FirstListItem). When that happens, I write to the text file, delete the first item in the list and do it again with the next thread.
The way I'm using the monitor is probably not great, in fact I have little doubt I'm using it in an offensively wanton manner. Basically while the xref (from the thread) <> the first item in my list, I'm doing a pulseall for the monitor. I originally was using monitor.wait, but it would eventually just give up trying to sort through the list, even when using a pulse elsewhere. I may have just been coding something awkwardly. Either way, I don't think it's going to change anything.
Basically the problem comes down to the fact that the monitor will pulse through all of the items it has in the queue, when there's a good chance the item I am looking for probably got passed to it somewhere earlier in the queue or whatever and it's now going to sort through all of the items again before looping back around to find a criteria that matches. The result of this is that my code will hit one of these and take a huge amount of time to complete.
I'm open to believing I'm just using the wrong tool for the job, or just not using tool I have correctly. I would strongly prefer some sort of threaded solution (unsurprisingly, it's much faster!). I've been messing around a bit with the Parallel Task functionality today, and a lot of the stuff looks promising, but I have even less experience with that vs. threadpool, and you can see how I'm abusing that! Maybe something with queue? You get the idea. I am directionless. Anything someone could suggest would be much appreciated. Thanks! Let me know if you need any additional information.
Private Sub SyncUpdateResource(strBld As Object, Xref As String)
SyncLock (CType(strBld, StringBuilder))
Dim FirstListitem As String = counter.ListOfFiles.First
Do While Xref <> FirstListitem
FirstListitem = Counter.ListOfFiles.First
'This makes the code much faster for reasons I can only guess at.
Thread.Sleep(5)
Monitor.PulseAll(CType(strBld, StringBuilder))
Loop
Dim strVol As String = Form1.Volname
Dim strLFPPath As String = Form1.txtPathDir
My.Computer.FileSystem.WriteAllText(strLFPPath & "\" & strVol & ".txt", strBld.ToString, True)
Counter.ListOfFiles.Remove(Xref)
End SyncLock
End Sub
This is a pretty typical multiple producer, single consumer application. The only wrinkle is that you have to order the results before they're written to the output. That's not difficult to do. So let's let that requirement slide for a moment.
The easiest way in .NET to implement a producer/consumer relationship is with BlockingCollection, which is a thread-safe FIFO queue. Basically, you do this:
In your case, the producer threads get items, do whatever processing they need to, and then put the item onto the queue. There's no need for any explicit synchronization--the BlockingCollection class implementation does that for you.
Your consumer pulls things from the queue and outputs them. You can see a really simple example of this in my article Simple Multithreading, Part 2. (Scroll down to the third example if you're just interested in the code.) That example just uses one producer and one consumer, but you can have N producers if you want.
Your requirements have a little wrinkle in that the consumer can't just write items to the file as it gets them. It has to make sure that it's writing them in the proper order. As I said, that's not too difficult to do.
What you want is a priority queue of some sort onto which you can place an item if it comes in out of order. Your priority queue can be a sorted list or even just a sequential list if the number of items you expect to get out of order isn't very large. That is, if you typically have only a half dozen items at a time that could be out of order, then a sequential list could work just fine.
I'd use a heap because it performs well. The .NET Framework doesn't supply a heap, but I have a simple one that works well for jobs like this. See A Generic BinaryHeap Class.
So here's how I'd write the consumer (the code is in pseudo-C#, but you can probably convert it easily enough).
The assumption here is that you have a BlockingCollection called sharedQueue that contains the items. The producers put items on that queue. Consumers do this:
var heap = new BinaryHeap<ItemType>();
foreach (var item in sharedQueue.GetConsumingEnumerable())
{
if (item.SequenceKey == expectedSequenceKey)
{
// output this item
// then check the heap to see if other items need to be output
expectedSequenceKey = expectedSequenceKey + 1;
while (heap.Count > 0 && heap.Peek().SequenceKey == expectedSequenceKey)
{
var heapItem = heap.RemoveRoot();
// output heapItem
expectedSequenceKey = expectedSequenceKey + 1;
}
}
else
{
// item is out of order
// put it on the heap
heap.Insert(item);
}
}
// if the heap contains items after everything is processed,
// then some error occurred.
One glaring problem with this approach as written is that the heap could grow without bound if one of your consumers crashes or goes into an infinite loop. But then, your other approach probably would suffer from that as well. If you think that's an issue, you'll have to add some way to skip an item that you think won't ever be forthcoming. Or kill the program. Or something.
If you don't have a binary heap or don't want to use one, you can do the same thing with a SortedList<ItemType>. SortedList will be faster than List, but slower than BinaryHeap if the number of items in the list is even moderately large (a couple dozen). Fewer than that and it's probably a wash.
I know that's a lot of info. I'm happy to answer any questions you might have.