nservicebus sagas - stuck trying to understand the purpose and benefit - nservicebus

I have read multiple times the documentation on the website. I am reading again and again the same articles and I cannot understand what they are trying to achieve with sagas. Besides, there are almost no resources in internet related to this subject.
But I am completely stuck trying to understand the purpose and benefit of defining so called sagas. I understand handlers (IHandleMessages) - these are interceptors. But I can't understand what Saga is for. The language in the documentation assumes that I am supposed to know something special to grasp that idea, but I dont.
Can someone explain to me in simple words, hopefully with real-life example a situation where I must or should define Saga, and what is the benefit of doing so? I have created an app with multiple endpoints and Saga definition as shown in samples, it works (I guess) but I don't understand what these sagas were defined for... In many samples they use RequestTimeout() method in Saga class. Why, why would anyone want to cause a timeout intentionally? I dont want to put any code fragments here, because its unrelated, I need to understand why I would want to use "Sagas" whatever that means?
Thank you.

NServiceBus Saga is a variant of a Process Manager described in the Enterprise Integration Patterns book.
To understand when to use Saga, one has to need it. Let's assume you're using regular message handlers only to implement new user registration process. At some point in time, you discover that only 40% of the brand-new registrants confirm their email address and becoming active user accounts. There are two things you'd like to address.
Remind new registrants to confirm their email after 24 hours after registration by sending a reminder.
Remove registrant info (email for example) from the data store to be compliant with GDPR within 48 hours.
Now how do you do that with a regular message handler? A handler would receive the initial request (first message, m1) to kick off registration by generating an email with a confirmation link and that's it. Once the handler is done, it's done for good. But your process is not finished. It's a long-running logical process that has to span 48 hours before completed. It's no longer just a single message processing, but a workflow at this point. A workflow with multiple checkpoints. Similar to a state machine. To move from one state to another, a certain condition has to be fulfilled. In case of NServiceBus, those would be messages. A message to send a reminder after 24 hours (let's call it m2) is not going to be triggered by any user action. It's a "system" message. A timed message that should be kicked off automatically. So is with the message to instruct the system to remove registrant information if validation link was not activated. The theme can be observed: need to schedule messages in the future to re-hydrate the workflow and continue from the state it was left last time.
That's what timeouts are. Those are requests to re-hydrate/continue saga/workflow from the point it was left last time at a certain point in time - minutes, hours, days, months, years.
This is what this kind of workflow would look like as a saga (oversimplified and doesn't take into consideration all the edge cases).
class RegistrationWorkflow :
Saga<WorkflowState>,
IAmStartedByMessages<RegisterUser>,
IHandleMessages<ActivationReceived>,
IHandleTimeouts<NoResponseFor24Hours>,
IHandleTimeouts<NoResponseFor48Hours>
{
protected override void ConfigureHowToFindSaga(SagaPropertyMapper<WorkflowState> mapper)
{
// omitted for simplicity, see message correlation
// https://docs.particular.net/nservicebus/sagas/message-correlation
}
public async Task Handle(RegisterUser message, IMessageHandlerContext context)
{
Data.RegistrationId = message.RegistrationEmail;
await RequestTimeout<NoResponseFor24Hours>(context, TimeSpan.FromHours(24));
}
public async Task Handle(ActivationReceived message, IMessageHandlerContext context)
{
Data.ConfirmationReceived = true;
// email was confirmed and account was activated
await context.Send(new PromoteCandidateToUser
{
CandidateEmail = Data.RegistrationEmail
});
MarkAsComplete()
}
public async Task Timeout(NoResponseFor24Hours timeout, IMessageHandlerContext context)
{
if (Data.ConfirmationReceived)
{
return;
}
await context.Send(new SendReminderEmailToActivateAccount { Email = Data.RegistrationEmail });
await RequestTimeout(context, TimeSpan.FromHours(24), new NoResponseFor48Hours());
}
public async Task Timeout(NoResponseFor48Hours timeout, IMessageHandlerContext context)
{
if (Data.ConfirmationReceived)
{
return;
}
context.Send(new CleanupRegistrationInformationForGDPRCompliancy
{
RegistrationEmail = Data.RegistrationEmail
});
MarkAsComplete();
}
}
Since this is a state machine, the state is persisted between Saga invocations. Invocation would be caused either by a message a saga can handle (RegisterUser and ActivationReceived) or by timeouts that are due (NoResponseFor24Hours and NoResponseFor48Hours). For this specific saga, the state is defined by the following POCO:
class WorkflowState : ContainSagaData
{
public string RegistrationEmail { get; set; }
public bool ConfirmationReceived { get; set; }
}
Timeouts are nothing but plain IMessages that get deferred. The timeouts used in this samples would be
class NoResponseFor24Hours : IMessage {}
class NoResponseFor48Hours : IMessage {}
Hope this clarifies the idea of Sagas in general, what Timeouts are and how they are used. I did not go into Message Correlation, Saga Concurrency, and some other details as those can be found at the documentation site you've referred to. Which bring us to the next point.
I have read multiple times the documentation on their website. It is absolutely terrible. I am reading again and again the same articles and I cannot comprehend what they are trying to achieve.
The site has a feedback mechanism you should absolutely provide.
Besides there almost no resources in internet related to this subject.
Hope to see you posting a blog (or a series of posts) on this topic. By doing so you'll have a positive contribution.
Full disclaimer: I work on NServiceBus

Related

SignalR, how to ensure that only one user can edit a given form at a time?

I have a dashboard with a list of items and a finite number of users. I want to show "an item is being edited" near said item to avoid simultaneous edits and overwrites of data.
This seems to me like updating a flag in the database and relatively simple signalr implementation with the javascript simply adding/removing a css class.
I have seen this:
Prevent multiple people from editing the same form
which describes a method with posting every X minutes and clearing the flag from the database when there are no more update messages from the user.
The issue is:
I was wondering if there was a signalr method (like disconnect; i know it exists but I don't know if it fits this scenario) to do that elegantly rather than running a timer function. If so, is it possible for the server to miss the event and permanently leave the flagged as "editing" when it is not?
you could implement a hub for this, here is a example:
public class ItemAccessHub : Hub
{
public override Task OnConnectedAsync()
{
// your logic to lock the object, set a state in the db
return base.OnConnectedAsync();
}
public override Task OnDisconnectedAsync(Exception exception)
{
// your logic to unlock the object
return base.OnDisconnectedAsync(exception);
}
}
to get information from the query you can access the HttpContext:
Context.GetHttpContext().Request.Query.TryGetValue("item-id", out var itemId)
so you could start a connection when the user is accessing the form and send the id of the item in the query:
/hub/itemAccess?item-id=ITEM_ID
and when the user closes the form then disconnect the connection.
with this method the item is also unlocked when the user loses his network connection.
the on disconnect method is allays invoked when a client disconnects, so you can do your clean up in this method.
in this hub you can than also implement the update function
i hope this is what you are looking for

Kafka Error handling : Processor.output().send(message, kafkaTimeoutInMS) always returns true and its async

May be this issue is already reported and resolved .I didn't find the solution and any open issues which talk about this, so creating new one.
I am trying to handle error while publishing data to kafka topic.
With kafka spring steam we are pushing to kafka by using this
if (processor.output().send(messsage , kafkaTimeoutInMS) && acknowledgment != null)
{
LOGGER.debug("Acknowledgment provided");
LOGGER.info("Sending to Kafka successful");
acknowledgment.acknowledge();
}
else
{
LOGGER.error("Sending to Kafka failed", message);
}
Send() method always returns true, I tried stopping kafka manual while running in debug mode, but still it returns true. I have that read it is asynchronous.
I Tried setting
bindings: output: producer: sync: true
This didnt help.
But I see some error which I cant use in my logic to decide whether there is failure or success.
We are manually acknowledging hence we are only supposed to acknowledge when its sent to topic successfully and we need to log all failed messages.
Any suggestions?
I believe you've misinterpreted on how spring-cloud-stream works.
As a framework there is certain contract between the user and the framework and when it comes to messaging the acks, retries, DLQ and many more aspects are handled automatically to ensure the user doesn't have to be exposed to this manually (as you are trying to do).
Consider spending a little time and going through the user guide - https://docs.spring.io/spring-cloud-stream/docs/Fishtown.M3/reference/htmlsingle/
Also, here is the very basic example that will demonstrates a typical interaction of user(developer) with the framework. As you can see, all you're doing is implementing a simple handler which receives and returns a piece of data. The rest (the actual receive from Kafka and send to Kafka or any other messaging system) is handled by the framework provided binders.
#SpringBootApplication
#EnableBinding(Processor.class)
public class ProcessorApplication {
public static void main(String[] args) {
SpringApplication.run(ProcessorApplication.class);
}
#StreamListener(Processor.INPUT)
#SendTo(Processor.OUTPUT)
public String echo(String message) {
return message;
}
}

Service Fabric self-deleting service

I'd like to add a service that executes some initialization operations for the system when it's first created.
I'd imagine it would be a stateless service (with cluster admin rights) that should self-destruct when it's done it's thing. I am under the impression that exiting the RunAsync function allows me to indicate that I'm finished (or in an error state). However, then it still hangs around on the application's context and annoyingly looking like it's "active" when it's not really doing anything at all.
Is it possible for a service to remove itself?
I think maybe we could try using the FabricClient.ServiceManager's DeleteServiceAsync (using parameters based on the service context) inside an OnCloseAsync override but I've not been able to prove that might work and it feels a little funky:
var client = new FabricClient();
await client.ServiceManager.DeleteServiceAsync(new DeleteServiceDescription(Context.ServiceName));
Is there a better way?
Returning from RunAsync will end the code in RunAsync (indicate completion), so SF won't start RunAsync again (It would if it returned an exception, for example). RunAsync completion doesn't cause the service to be deleted. As mentioned, for example, the service might be done with background work but still listening for incoming messages.
The best way to shut down a service is to call DeleteServiceAsync. This can be done by the service itself or another service, or from outside the cluster. Services can self-delete, so for services whose work is done we typically see await DeleteServiceAsync as the last line of RunAsync, after which the method just exits. Something like:
RunAsync(CancellationToken ct)
{
while(!workCompleted && !ct.IsCancellationRequested)
{
if(!DoneWithWork())
{
DoWork()
}
if(DoneWithWork())
{
workCompleted == true;
await DeleteServiceAsync(...)
}
}
}
The goal is to ensure that if your service is actually done doing the work it cleans itself up, but doesn't trigger its own deletion for the other reasons that a CancellationToken can get signaled, such as shutting down due to some upgrade or cluster resource balancing.
As mentioned already, returning from RunAsync will end this method only, but the service will continue to run and hence not be deleted.
DeleteServiceAsync certainly is the way to go - however it's not quite as simple as just calling it because if you're not careful it will deadlock on the current thread (especially in local developer cluster). You would also likely get a few short-lived health warnings about RunAsync taking a long time to terminate and/or target replica size not being met.
In any case - solution is quite simple - just do this:
private async Task DeleteSelf(CancellationToken cancellationToken)
{
using (var client = new FabricClient())
{
await client.ServiceManager.DeleteServiceAsync(new DeleteServiceDescription(this.Context.ServiceName), TimeSpan.FromMinutes(1), cancellationToken);
}
}
Then, in last line of my RunAsync method I call:
await DeleteSelf(cancellationToken).ConfigureAwait(false);
The ConfigureAwait(false) will help with deadlock issue as it will essentially return to a new thread synchronization context - i.e. not try to return to "caller context".

How to invoke a web service after redeliveries exhausted in Apache Camel?

I have failed to find an enterprise integration pattern or recipe that promotes a solution for this problem:
After the re-delivery attempts have been exhausted, I need to send a web service request back to the originating source, to notify the sender of a failed delivery.
Upon exhaustion of all re-delivery attempts, should I move the message to a dead letter queue? Then create a new consumer listening on that DL queue? Do I need a unique dead letter queue for each of my source message queues? Should I add a message header, noting the source queue, before I move it to the dead letter queue? If all messages go to a single dead letter queue, how should my consumer know where to send the web service request?
Can you point me to a book, blog post, or article? What is the prescribed approach?
I'm working in a really old version of Fuse ESB but I expect that solutions in ServiceMix to be equally applicable.
Or maybe, what I'm asking for is an anti-pattern or code-smell. Please advise.
If you are new to Camel and really want to get an in-depth knowledge of it, I would recommend Camel in Action, a book by Claus Ibsen. There's a second edition in the works, with 14 out of 19 chapters already done so you may also give that a shot.
If that's a bit too much, online documentation is pretty okay, you can find out the basics just fine from it. For error handling I recommend starting with the general error handling page then moving on to error handler docs and exception policy documentation.
Generally, dead letter channel is the way to go - Camel will automatically send to DLC after retries have been exhausted, you just have to define the DLC yourself. And its name implies, it's a channel and doesn't really need to be a queue - you can write to file, invoke a web-service, submit a message to a message queue or just write to logs, it's completely up to you.
// error-handler DLC, will send to HTTP endpoint when retries are exhausted
errorHandler(deadLetterChannel("http4://my.webservice.hos/path")
.useOriginalMessage()
.maximumRedeliveries(3)
.redeliveryDelay(5000))
// exception-clause DLC, will send to HTTP endpoint when retries are exhausted
onException(NetworkException.class)
.handled(true)
.maximumRedeliveries(5)
.backOffMultiplier(3)
.redeliveryDelay(15000)
.to("http4://my.webservice.hos/otherpath");
I myself have always preferred having a message queue and then consuming from there for any other recovery or reporting. I generally include failure details like exchange ID and route ID, message headers, error message and sometimes even stacktrace. The resulting message, as you can imagine, grows quite a bit but it tremendously simplifies troubleshooting and debugging, especially in environments where you have quite a number of components and services. Here's a sample DLC message from one my projects:
public class DeadLetterChannelMessage {
private String timestamp = Times.nowInUtc().toString();
private String exchangeId;
private String originalMessageBody;
private Map<String, Object> headers;
private String fromRouteId;
private String errorMessage;
private String stackTrace;
#RequiredByThirdPartyFramework("jackson")
private DeadLetterChannelMessage() {
}
#SuppressWarnings("ThrowableResultOfMethodCallIgnored")
public DeadLetterChannelMessage(Exchange e) {
exchangeId = e.getExchangeId();
originalMessageBody = e.getIn().getBody(String.class);
headers = Collections.unmodifiableMap(e.getIn().getHeaders());
fromRouteId = e.getFromRouteId();
Optional.ofNullable(e.getProperty(Exchange.EXCEPTION_CAUGHT, Exception.class))
.ifPresent(throwable -> {
errorMessage = throwable.getMessage();
stackTrace = ExceptionUtils.getStackTrace(throwable);
});
}
// getters
}
When consuming from the dead letter queue, route ID can tell you where the failure originated from so you can then implement routes that are specific for handing errors coming from there:
// general DLC handling route
from("{{your.dlc.uri}}")
.routeId(ID_REPROCESSABLE_DLC_ROUTE)
.removeHeaders(Headers.ALL)
.unmarshal().json(JsonLibrary.Jackson, DeadLetterChannelMessage.class)
.toD("direct:reprocess_${body.fromRouteId}"); // error handling route
// handle errors from `myRouteId`
from("direct:reprocess_myRouteId")
.log("Error: ${body.errorMessage} for ${body.originalMessageBody}");
// you'll probably do something better here, e.g.
// .convertBodyTo(WebServiceErrorReport.class) // requires a converter
// .process(e -> { //do some pre-processing, like setting headers/properties })
// .toD("http4://web-service-uri/path"); // send to web-service
// for routes that have no DLC handling supplied
onException(DirectConsumerNotAvailableException.class)
.handled(true)
.useOriginalMessage()
.removeHeaders(Headers.ALL)
.to({{my.unreprocessable.dlc}}); // errors that cannot be recovered from

Which ServiceControl.Contracts messages should IHandle?

I'm trying to make sure that 3rd party dependencies are running, and built a service to do this based on the Monitoring 3rd party Sample Application, which emits ServiceControl CheckResult messages.
This works fine; ServicePulse alerts me when I stop/start my local and remote windows services, Databases, Flux Capacitors, etc.
I now want to build a windows service / nServiceBus Endpoint, like ServicePulse, but with logic that can attempt recovery, send emails etc. I don't really want to put this code into the 3rdParty monitor.
I followed the servicecontrol/external-integrations and servicecontrol/contracts tutorials, and created my MendStuffOrEmail endpoint - But it doesn't work; It doesn't receive any messages.
I was going to ask "what am I doing wrong?", but I think I know; I'm using IHandleMessages<ServiceControl.Contracts.MessageFailed> which is for failed messages.
I need to listen for the "CheckResult" type messages - but what are they? I have looked through the ServiceControl and ServicePulse code, but cannot work out what is being sent/received. How can I find this out, or has anyone else actually done this and already knows?
UPDATE
After more extensive rummaging, I also subscribed to CustomCheckFailed and CustomCheckSucceeded messages. I implemented IHandle interfaces for them, but I'm still not getting any messages. The log shows autosubscriber has taken out a subscription to them. What should I check for next?
I compared my code to Sean's posted
example and found the mistake:
I had implemented two of the interfaces, IConfigureThisEndpoint and AsA_Server in the wrong class (a 2am cut 'n' paste error).
The example listens for failed messages, but for anyone else trying to do this, you do need to subscribe to CustomCheckFailed and CustomCheckSucceeded messages (nuget ServiceControl.Contracts).
public partial class MessageHandler : IHandleMessages<CustomCheckFailed>,
IHandleMessages<CustomCheckSucceeded>
{
public void Handle(CustomCheckFailed message)
{
this.HandleImplementation(message);
}
partial void HandleImplementation(CustomCheckFailed message);
public void Handle(CustomCheckSucceeded message)
{
this.HandleImplementation(message);
}
partial void HandleImplementation(CustomCheckSucceeded message);
public IBus Bus { get; set; }
}
then the logic to do something with the messages. (I left in my original test - sending email - but our system has a library with all sorts of recovery & notification methods. You'll need something similar to stop an email flood):
public partial class MessageHandler
{
partial void HandleImplementation(CustomCheckFailed message)
{
var messageBody = string.Format("Message with id {0} failed with reason {1}", message.CustomCheckId, message.FailureReason);
MailMessageFactory.sendEmail("Failure Notification", messageBody);
Console.Out.WriteLine(messageBody);
}
}
And a similar file with the logic for recovery messages (CustomCheckSucceeded). You probably want a check in there to detect it is actually recovering from a failure, not just passing the test.
So anyway, fixed - on my dev pc.
The next problem was making it work on the server, which took a support call. It turns out ServiceControl ALSO needs a licence, available as part of the "Advanced", "Enterprise", and "Ultimate" editions - Not part of the platform on the standard licence.