How would you create a cyclic task graph in TPL, and/or is this possible? - .net-4.0

My project has a requirement to gather data from a number of sources, then do things in response to the completion of the gathering of that data. Some of the gathering tasks have dependencies on prior gathering tasks. TPL has been a good fit because it naturally continues with tasks from their antecedents, and the "final" tasks that use the results are again dependents. Great. However, we would like to have a "sleep and regather" task that starts upon completion of the "final" tasks; this task's job is logically to be the antecedent of the "final" tasks and kick off the next cycle. In effect, the TPL's DAG becomes cyclic, or, if thought of sequentially, a loop.
Is it possible to express this cyclic requirement completely within the TPL API? If so, how? Our current implementation instead does a WaitAll() on the antecedents, and then a Task.StartNew() given a delegate that does a sleep followed by rebuilding a task graph with the WaitAll(). This works, but seems a bit artificial.

There are a few options here. What you are doing now seems reasonable.
However, you could potentially setup the entire operation as a producer/consumer scenario using BlockingCollection<T>. If your consuming enumerable used a ManualResetEvent that was set after the WaitAll completed, it could allow a single "item" to be consumed at a time, using tasks as you have it written now.
That being said, this seems like a perfect candidate for the TPL Dataflow library (in CTP).

Related

How best to handle errors and or missing data in a Neuraxle pipeline?

Let's assume you have a pipeline with steps that can fail for some input elements for example:
FetchSomeImagesFromIds -> Resize -> DoSomethingElse
In this case the 1st step downloads 10 out of a 100 images... and passes those to resize..
I'm looking for suggestions on how to report or handle this missing data at the pipeline level for example something like:
Pipeline.errors() -> PluginX: Succeed: 10, Failed: 90, Total: 100, Errors: key: error
My current implementation removes the missing keys from current_keys so that the key -> data mapping is kept and actually exits the whole program if there's anything missing.. given the previous problem with https://github.com/Neuraxio/Neuraxle/issues/418
Thoughts?
I think that using a Service in your pipeline would be the good way. Here's what I'd do if I think about it, although more solutions could exist:
Create your pipeline and pipeline steps.
Create a context and add to the context a custom memory bank service in which you can keep track of what data processed properly or not properly. Depending on your needs and broader context, it could be either a positive data bank, or negative one, in which you'd respectively either add the processed examples or substract them from the set.
Adapt the pipeline made at point 1, and its steps, such that it can use the service from the context in the handle_transform_data_container methods. You could even have a WhileTrue() step which would loop forever until a BreakIf() step would evaluate that everything has been processed for instance, if you want your pipeline to work until everything has been processed, and fetching the batches as they come without an end condition other than the BreakIf step and its lambda. The lambda would call the service indeed to know where the data processing is at.
At the end of the whole processing, wheter you breaked prematurely (without any while loop) or wheter you did break only at the end, you still have access to the context and to what's stored inside.
More info:
To see an example on how to use the service and context together and using this in steps, see this answer: https://stackoverflow.com/a/64850395/2476920
Also note that the BreakIf and While steps are core steps that are not yet developed in Neuraxle. We've recently had a brilliant ideas with Vincent Antaki where Neuraxle is a language, and therefore steps in a pipeline are like basic language keywords (While, Break, Continue, ForEach) and so forth. This abstraction is powerful in the sense that it's possible to control a data flow as a logical execution flow.
This is my best solution for now and this exactly has never been done yet. There may be much more other ways to do this, with creativity. One could even think of doing TryCatch steps in the pipeline to catch some errors and managing what happens in the execution flow of the data.
All of this is to be done in Neuraxle and is not yet done. We're looking for collaborators. That would do a nice paper as well: "Machine Learning Pipelines as a Programming Language" :)

Composite Tasks using OptaPlanner

I am trying to build Pools of Workers in a chained TWVRP scenario with multiple anchors. One Composite Task will be split into multiple smaller Task and distributed onto the chains in an optimal manner. Now, how can I ensure that all tasks that belong to the same composite task have the same start time? Can I solve this using custom moves or is using Drools to model this behaviour my only option?
I studied the documentation on custom moves but I just couldn't figure out how to use them in this case... Does anyone have a hint for me?
Make the startTime of a single Task a shadow variable that is the maximum previousTaskEndTime of all the single tasks that belong to the same CompositeTask.

Correct way to represent a while loop with one task in BPMN?

Which is the correct~er way in BPMN to represent a simple while loop that redirects to one task only?
I would say that using the loop activity is the better option as it helps keep the process model tidy.
Also be careful when creating loop in a process as usually task definition change between the first iteration and the second. e.g. first iteration is creation of a file, second will actually be an edition of the file: two different actions (create and edit) should not be in a single task definition.
Normally, the BPMN represents activities marching through time in a linear fashion similar to a Value Stream Map. To create a backward loop would disrupt the timeline.

(Fluent) NHibernate progress events for lengthy transactions?

We've hooked up the ISaveOrUpdateEventListener event and hoped we could tie it to a progress bar update for each node being visited during the save traversal of a pretty big model, BUT the event only fires once when the save operations starts (only on the node on which the Save( ) was inititated and not on any subnodes).
Are there any other events that are more appropriate to listen to for this?
We've also tried breaking up the save operation (of a hierarchical model) by doing the traversal ourselves, but that seems to degrade the performance even further.
Perhaps we're trying to solve a problem for which FNH wasn't aimed to be used. We're new to it.
We've also set up an alternative solution using SqlBulkCopy, as recommended elsewhere.
We've seen the comments that FNH is primarily supposed for smaller transactions (OLTP) and not the type of exhaustive model we're bound to by our problem (signal processing of huge data volumes).
Background:
We're trying to use Fluent NHibernate on a larger database project with data gathered from fairly complex real time analysis (high frequency, multiple input signals, long experiment times etc). In a prototype we've built we see pretty scary wait times for the moment, and need to hook in some sort of reliable progress indicator.
Yes, now confirmed - as mentioned in my comment above. One (possible) solution to this is to simply turn of Cascades and do the model traversal manually and do explicit Save( ) calls.
This works, although it's not as neat as just handling an event. Still, given the genuin design of NHibernate, I bet there's certainly an event somewhere that could be intercepted - the question is just under what name. ... I bet someone on here knows more.
Also to improve performance we used a Stateless Session, experiemented with differnet batch size, and periodically/explicitly call Flush() and Clear(). See articles below for further details:
http://davybrion.com/blog/2008/10/bulk-data-operations-with-nhibernates-stateless-sessions/
http://ideas-net.blogspot.com/2009/03/nhibernate-update-performance-issue.html
Hope this helps.

Restarting agent program after it crashes

Consider a distributed bank application, wherein distributed agent machines modify the value of a global variable : say "balance"
So, the agent's requests are queued. A request is of the form wherein value is added to the global variable on behalf of the particular agent. So,the code for the agent is of the form :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
lock_global_variable(balance,agent_machine_id);
///////////////////// **POINT A**
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
/////////////////// **POINT B**
dequeue(); // once transaction is complete, request can be dequeued
}
Now, if an agent's code crashes at POINT B, then obviously the request should not be processed again, otherwise the variable will be modified twice for the same request. To avoid this, we can make the code atomic, thus :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
*atomic*
{
lock_global_variable(balance,agent_machine_id);
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
dequeue(); // once transaction is complete, request can be dequeued
}
}
I am looking for answers to these questions :
How to identify points in code which need to be executed atomically 'automatically' ?
IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
Again,logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
In general,how can identify such atomic blocks in case of agents that work together. If one agent fails, others have to wait for it to restart ? How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
How to make the atomic blocks more fine-grained, to reduce performance bottlenecks ?
Q> How to identify points in code which need to be executed atomically 'automatically' ?
A> Any time, when there's anything stateful shared across different contexts (not necessarily all parties need to be mutators, enough to have at least one). In your case, there's balance that is shared between different agents.
Q> IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
A> It can help, but it has high costs attached. You need to rollback X entries, replay the scenario, etc. Better approach is to either make it all-transactional or have effective automatic rollback scenario.
Q> Again, logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
A> In some cases you can relax consistency. For example, CopyOnWriteArrayList does a concurrent write-behind and switches data on for new readers after when it becomes available. If write fails, it can safely discard that data. There's also compare and swap. Also see the link for the previous question.
Q> In general,how can identify such atomic blocks in case of agents that work together.
A> See your first question.
Q> If one agent fails, others have to wait for it to restart ?
A> Most of the policies/APIs define maximum timeouts for critical section execution, otherwise risking the system to end up in a perpetual deadlock.
Q> How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
A> It can to a fair degree. However testing concurrent code requires as much skills as to write the code itself, if not more.
Q> How to make the atomic blocks more fine-grained, to reduce performance bottlenecks?
A> You have answered the question yourself :) If one atomic operation needs to modify 10 different shared state variables, there's nothing much you can do apart from trying to push the external contract down so it needs to modify more. This is pretty much the reason why databases are not as scalable as NoSQL stores - they might need to modify depending foreign keys, execute triggers, etc. Or try to promote immutability.
If you were Java programmer, I would definitely recommend reading this book. I'm sure there are good counterparts for other languages, too.