How to move hand over hand along restarts - error-handling

Is it possible to move hand over hand along restarts like this:
(handler-bind ((simple-error #'(lambda(condition)
(write condition)
(invoke-restart 'alle condition))))
(restart-case
(restart-case
(error 'simple-error)
(next (err)))
(alle (err) (invoke-restart 'next))))
this currently leads to an
No restart NEXT is active.
[Condition of type SB-INT:SIMPLE-CONTROL-ERROR]
I want to be able to implement a general restart like "just-log-all-conditions" which then calls the correct restart for any condition signaled within its expression.

You might check if that's really what you want to do... Typically you want a handler to select the best restart. The handler sees all available restarts. Moving from restart to restart is unusual. IMHO. There is also not a 'correct' restart for some condition. Several restarts might be available and useful. This can either be determined programmatically or interactively by the user. A restart might also be useful for several different conditions.
The Common Lisp Condition System has several basic concepts:
conditions, typically implemented as CLOS classes
signalling a condition object, that's typically done in user code.
handling the condition. Based on the condition a handler is selected and called. When the handler is running it can inspect and decide what to do. A handler typically declines handling the condition or selects one of the available restarts. In a typical development environment, this might involve presenting the restarts and asking the user for a choice.
restarting. The restart then is responsible to get out of the condition situation. Tranfer of control to the restart gets us out of the error context. We can enforce execution of code via 'UNWIND-PROTECT'. Once we are restarting, the context where the condition was signaled is gone.
That means that only the handler sees all the available restarts and a handler can also transfer control to the next handler.
Jumping from restart to restart is not really part of this model.
For some background on the condition system idea see this text by Kent Pitman: Condition Handling in the Lisp Language Family.

You should specify everything in reverse order, like this:
(restart-case
(restart-case
(handler-bind ((simple-error #'(lambda(condition)
(write condition)
(invoke-restart 'alle condition))))
(error 'simple-error))
(alle (err) (invoke-restart 'next)))
(next () #|do nothing|#))
See CLHS for details.

Related

How to properly implement a terminating end event that can trigger at any point during the process?

I’m currently modeling a process with 2 exception statuses (a patient dies & No Neurologist found).
If no Neurologist is found (this can only happen once in my process), the process stops.
Another exception status is triggered when a patient dies at any point during the process. If this exception status occurs, the process stops.
I have difficulties modeling these exception statuses. Attached you can find my current attempt. I’m not 100% sure it is correct.
Example of my attempt
Terminating events are rarely needed. There are usually more elegant, clearer solution than this 'kill all switch'. Their purpose is to terminate any parallel activities / consume any tokes which exist in the same scope. The same can usually be achieved with interrupting (e.g. conditional) boundary events, which get triggered e.g. by a data change. A boundary event makes it clearly visible in the process where a cancellation can occur, under which circumstances, and allows ending a process in more controlled manner.
In your particular use case (diagram you attached) you don't need to use the terminating events at all. You are using two interrupting boundary events (escalation and error) on a scope created by the embedded sub process. The scope of the embedded sub process is already terminated when these events interrupting occur. A subsequent terminating event in the parent process' scope would cancel everything in this scope. In your case the parent scope is the root process instance, but since there is no token flow parallel to the embedded sub process, there is nothing to cancel.
Also see:
https://docs.camunda.org/manual/latest/reference/bpmn20/events/terminate-event/
https://docs.camunda.org/manual/latest/reference/bpmn20/events/error-events/#error-boundary-event

Pause a sub-process BPMN

I've recently started at a new business and some of the processes are becoming a bit of a challenge to map out. Quite frequently we have a process that needs to go "on hold" when an event, which can occur at any point, is triggered. The problem I'm having mapping this out correctly is how to "restart" the process from where it left off, since it can effectively pause/unpause at any point.
Here's what I currently have:
Process Example
Basically, I need to have "Something Happened 2" not fully interrupt the sub-process, it just needs to put it on "hold". The actual situation is essentially that a customer can make a complaint while we handle their overdue bill, so we put the process on hold wherever it was at until we resolve the complaint, and then restart the process.
I'm not entirely sure the best approach to documenting this and couldn't find anything clear in the documentation, since a non-interupting event seems to have the rest of the process still continue forward in parallel.
Any help would be majorly appreciated.
If you really want to restart the whole sub-process from the beginning, then you could frontload an exclusive gateway. Once the complaint is dealt with, you can direct the sequence flow to that gateway, which would restart the sub-process. See below for an example (I have simplified your diagram a bit).

Understanding Eventual Consistency, BacklogItem and Tasks example from Vaughn Vernon

I'm struggling to understand how to implement Eventual Consistency with the exposed example of BacklogItems and Tasks from Vaughn Vernon. The statement I've understood so far is (considering the case where he splits BacklogItem and Task into separate aggregate roots):
A BacklogItem can contain one or more tasks. When all remaining hours from a the tasks of a BacklogItem are 0, the status of the BacklogItem should change to "DONE"
I'm aware about the rule that says that you should not update two aggregate roots in the same transaction, and that you should accomplish that with eventual consistency.
Once a Domain Service updates the amount of hours of a Task, a TaskRemainingHoursUpdated event should be published to a DomainEventPublisher which lives in the same thread as the executing code. And here it is where I'm at a loss with the following questions:
I suppose that there should be a subscriber (also living in the same thread I guess) that should react to TaskRemainingHoursUpdated events. At which point in your Desktop/Web application you perform this subscription to the Bus? At the very initialization of your app? In the application code? Is there any reasoning to place domain subscriptors in a specific place?
Should that subscriptor (in the same thread) call a BacklogItem repository and perform the update? (But that would be a violation of the rule of not updating two aggregates in the same transaction since this would happen synchronously, right?).
If you want to achieve eventual consistency to fulfil the previously mentioned rule, do I really need a Message Broker like RabbitMQ even though both BacklogItem and Task live inside the same Bounded Context?
If I use this message broker, should I have a background thread or something that just consumes events from a RabbitMQ queue and then dispatches the event to update the product?
I'd appreciate if someone can shed some clear light over this since it is quite complex to picture in its completeness.
So to start with, you need to recognize that, if the BacklogItem is the authority for whether or not it is "Done", then it needs to have all of the information to compute that for itself.
So somewhere within the BacklogItem is data that is tracking which Tasks it knows about, and the known state of those tasks. In other words, the BacklogItem has a stale copy of information about the task.
That's the "eventually consistent" bit; we're trying to arrange the system so that the cached copy of the data in the BacklogItem boundary includes the new changes to the task state.
That in turn means we need to send a command to the BacklogItem advising it of the changes to the task.
From the point of view of the backlog item, we don't really care where the command comes from. We could, for example, make it a manual process "After you complete the task, click this button here to inform the backlog item".
But for the sanity of our users, we're more likely to arrange an event handler to be running: when you see the output from the task, forward it to the corresponding backlog item.
At which point in your Desktop/Web application you perform this subscription to the Bus? At the very initialization of your app?
That seems pretty reasonable.
Should that subscriptor (in the same thread) call a BacklogItem repository and perform the update? (But that would be a violation of the rule of not updating two aggregates in the same transaction since this would happen synchronously, right?).
Same thread and same transaction are not necessarily coincident. It can all be coordinated in the same thread; but it probably makes more sense to let the consequences happen in the background. At their core, events and commands are just messages - write the message, put it into an inbox, and let the next thread worry about processing.
If you want to achieve eventual consistency to fulfil the previously mentioned rule, do I really need a Message Broker like RabbitMQ even though both BacklogItem and Task live inside the same Bounded Context?
No; the mechanics of the plumbing matter not at all.

blocked requests in io_service

I have implemented client server program using boost::asio library.
In my implementation there are times when io_service.run() blocks indefinitely. In case I pass another request to io_service, the blocked call begins to execute normally.
Is there any way to see what are the pending requests inside the io_service queue ?
I have not used work object to block the run call!
There are no official ways to query into the io_service to find all pending request. However, there are a few techniques to debug the problem:
Boost 1.47 introduced handler tracking. Simply define BOOST_ASIO_ENABLE_HANDLER_TRACKING and Boost.Asio will write debug output, including timestamps, an identifier, and the operation type, to the standard error stream.
Attach a debugger dig through the layers to find and examine operation queues. This answer covers both understanding handler tracking and using a debugger to examine an operation queue for the epoll_reactor.
Finally, if you believe it is a bug, then it may be worth updating to the latest version or checking the revision history for relevant changes. Regardless, describing the problem in more detail may allow others to help identify the source of the problem and potential solutions.
Now i spent a few hours reading and experimenting (i need more boost::asio functionality for work as well) and it turns out: Kind of.
But it is not as straightforward or readable as one might hope.
Under the hood (well, under the outermost hood) io_service has a bunch of other services registered, which do the work async_ operations of their respective fields require.
These are the "Services" described in the reference.
Now sadly, the services stay registered, wether there is work to do or not. For example if your io_service has a udp socket, it will still have all the corresponding services, even if the socket itself is inactive.
But you can ask your io_service which services it has. Lets say you want to know wether your io_service called m_io_service has an udp datagram_socket_service. Then you can call something like:
if (boost::asio::has_service<boost::asio::datagram_socket_service<boost::asio::ip::udp> >(m_io_service))
{
//Whatever
}
That does not help a lot, because it will be true no matter wether the socket is active or not. But after you know, that you have that service, you can get a ref to it using use_service instead of has_service but with the same elegant amount of <>.
And now you can inspect the service to see what it is up to. Sadly, it will not tell you what the outstanding handlers names are (probably partly because it does not know them) but if it is a socket, you can get its implemention_type and with that check whether it currently is_open or find either the local_endpoint as well as the remote_endpoint.
In case of a deadline_timer_service you can, among other stuff, find out when it expires_at.
See the reference for more information what the service is and is not willing to tell you.
http://www.boost.org/doc/libs/1_54_0/doc/html/boost_asio/reference.html
This information should then hopefully allow you to determine which async_ operation did not return.
And if not, at the very least you can cancel any unexpectedly active services.

How do I get a list of worker threads of nservicebus

How do I get a list of worker threads of nservicebus. I need to register workerThread ids in to db and then bind some type of messages to the exact workerthread. Real idea is handling poison messages. Want to block all the threads not to handle poison messages except specified ones. There will be a seperate service that will manage threads through database.
I would not try to do that. It is almost sure to run into problems.
Of course, in order to get some sort of "identity" for each thread, you could place something like this in your message handler:
[ThreadStatic]
private static readonly Guid ThreadId = Guid.NewGuid();
But again, I wouldn't do that! The guids would change every time the endpoint was restarted, for one.
You could also query the list of threads direct from .NET and try to determine which ones were the message handling threads, but that sounds so scary I don't even want to go into it.
The real issue: Poison Message Handling
As your comment states, the real problem is that a poison message is REALLY poison. Not only is it failing, but it's taking so long to do so that it's really screwing up all the other threads!
Since you are able to identify these messages based on certain properties of the message, I would detect and throw an exception before the operation that times out. All the time.
If you want to be able to test periodically to see if the issue has been fixed, you have a few options:
Test via other means, and return the messages to the source queue when it has been fixed.
Add an appSetting so that the quick-throw behavior is skipped when the config setting is enabled. Then periodically you can edit the config, restart the endpoint, see if it's fixed, and then switch it back if it isn't.
Create another message handler that maintains a thread-locked increment value of zero. Send it a control message to say "Hey, try one now." Then your quick-throw behavior can decrement that value and allow one through to see what happens. This is also dangerous of course. Make sure your locking is tight since you are now sharing this state between different message processing threads.