Triggering an update on all microservices - asp.net-core

Using ASP.NET Core microservices, both API and worker roles, running in Azure Service Fabric.
We use Service Bus to do inter-microservice communication.
Consider the following situation;
Each microservice holds a local, in-mem copy of cached objects of type X.
One worker role is responsible for processing a message that would result in a rebuild of this cache for all instances.
We are having multiple nodes, and thus multiple instances of each microservice in Service Fabric.
What would be the best approach to trigger this update?
I though of the following approaches:
Calling SF for all service replica's and firing an HTTP POST on each replica to trigger the update
This however does not seem to work as worker roles don't expose any APIs
Creating a specific 'broadcast' topic where each instance registers a subscription for, and thus using pub/sub mechanism
I fail to see how I can make sure each instance has it's own subscription, but also I don't end up with ghost subscriptions when something happens like a crash

You can use the OSS library Service Fabric Pub Sub for this.
Every service partition can create its own subscription for messages of a given type.
It uses the partition identifier for subscriptions, so crashes and moves won't result in ghost subscriptions.
It uses regular SF remoting, so you won't need to expose API's for messaging.

Related

How to organize scheduled data polling during the application scaling?

I have a microservice that among other things is used as a "caching proxy" (I'm not sure that this term is correct). It is in between the application API and Azure API. This microservice periodically fetches some data from Azure for several resources and stores it in Redis. Application API from the other side requests the resource data but reads it not from Azure itself, but from Redis.
(This is done in order to limit the scale of requests hitting the Azure API when having a high load on the application API.)
The periodical polling is currently implemented as a naive "while not canceled - fetch, update Redis and sleep for 15 seconds".
This worked well while I had only one instance of the microservice. But now due to new requirements, I have an automatic scaling of my microservice. And that means that if there are 5 instances of the microservice running right now - I'm hitting the Azure API 5 times more frequently than I should.
My question is how can I fix this to do "one request to Azure API per resource once in 15 seconds" - no matter how many microservice instances I have?
My constraints are:
do the minimal changes since the microservice is already in Production;
use the existing resources as much as possible (apart from Redis the microservice is already using message queues - Azure Service Bus).
Ideas I have:
make only one instance a "master" - only this instance will fetch data from Azure. But what should I do when auto-scaling shuts this instance down? How can I detect this and decide on a new master instance? Maybe I could store the master instance identifier in a short-living key in Redis and prolong it every time the resource data is retrieved from Azure? If there is no key in Redis - a new master instance is selected.
use Azure Service Bus message scheduling - on microservice application startup the instance schedules a message in the next 15 seconds which will be received by only one microservice instance. On receiving this message the microservice instance will fetch the data from Azure, update Redis - and schedule another message in the next 15 seconds. This time another microservice instance can receive the instance and do the same - fetch data, update Redis, and schedule the next message. But I don't know how to avoid parallel message chains initiated when several microservice instances are started/restarted.
Anyway, I don't see any good solution for my problem and would appreciate a hint.

Using NServiceBus with Azure Service Fabric

I've read other questions on StackOverflow regarding using NSB on SF and also the sample on github (outdated) and I'm still not sure how to configure NServiceBus properly for this platform.
I'm looking to set up a send only publish/subscribe workflow. What I can't determine through my research is how to set this up so that only one instance of a particular service responds to the message.
For example: 3 services running on the standard 5 nodes (so pretend 5 instances of each of the 3 services).
Existing load balancer routes an http request to a specific instance of Service A.
Service A publishes the "OrderComplete" event
Services B and C both subscribe to the event.
How can I make sure that only one instance of Services B and C respond instead of all 5 instances of Service B and all 5 instances of Service C?
All the services are currently Stateless services.
I was thinking of using the AzureServiceBus or AzureStorageQueue transport.
Stateless approach is fine. You do not need to go into stateful services with a single partition unless you want to leverage reliable collections for your services. But let's look at both options
Going with Stateless services
It's ok to have multiple instances of your services. Yes, they all will create subscriptions. I'd argue that is exactly what you want - competing consumers. More service instances you have, more throughput you'll get, i.e. handling more messages.
What I can't determine through my research is how to set this up so that only one instance of a particular service responds to the message.
This will happen automatically due to the nature of the competing consumer transport (both ASB and ASQ).
Going with Stateful services
With stateful services you need to be very careful. Yes, you could go with a single partition per service, hence having a single primary replica handling your messages. But then, arguably, you're wasting your cluster resources by not utilizing them for concurrent processing of many messages. If you decide to partition your service, then you won't be able to use reliable collections as replicas of services do not share reliable collections among themselves. Should you choose to use partitioned stateful services w/o reliable collections, well, then you better to utilize stateless counterpart.
Note: NSB will provide support for running with stateful services to take advantage of the reliable collections for persistence needs, but even then partitioning is something that would need to be through through to align with business needs. If you do not have a need like that, I'd suggest to stick to stateless services and Azure Storage persistence.
In the NSB/SF sample on github there is a Stateful service that handles a command. What is important is that in the application it has an PartitionCount=1. The same goes for all other solutions with NSB I have seen, only one partition or instance of each service that handles messages. Otherwise you would end up with one subscription per instance for each message as you describe.
Perhaps you could adopt the Distributor to achieve load balancing between multiple instances of the same service, but afaik Distributor only works with MSMQ so you will have to rewrite it to work with SF and Azure Service Bus.
If you stick with single instances, it should work fine for you. You would still get some benefit from SF as it ensures your services are up and running, but load balancing between multiple instances will require some work for you.

CommonDomain / EventStore with Raven persistance for multi-tenant app

How should I setup EventStore's RavenPersistence in a multi-tenant application?
I have an Azure worker role that processes commands received through service bus.
Each message may belong to a different tenant. The actual tenant is sent in the message header, which means that I know which database to use only after I receive each message.
I'm using CommonDomain so my command handlers have IRepository injected.
Right now I build a new store while processing each message (I set DefaultDatabase) but I have a feeling this may not be the most optimal way.
Is there a way to create a single event store and then just switch databases?
If not, can I cache the stores for each tenant?
Do you know about any multi-tenant sample that uses EventStore with RavenDB?
We do exactly the same - spawn new instance of EventStore for every request. JOliver EventStore was designed without multi-tenancy support in mind. So this is the only way ...

NServiceBus Sagas and REST API Integration best-practices

What is the most sensible approach to integrate/interact NServiceBus Sagas with REST APIs?
The scenario is as follows,
We have a load balanced REST API. Depending on the load we can add more nodes.
REST API is a wrapper around a DomainServices API. This means the API can be consumed directly.
We would like to use Sagas for workflow and implement NServiceBus Distributor to scale-out.
Question is, if we use the REST API from Sagas, the actual processing happens in the API farm. This in a way defeats the purpose of implementing distributor pattern.
On the other hand, using DomainServives API directly from Sagas, allows processing locally within worker nodes. With this approach we will have to maintain API assemblies in multiple locations but the throughput could be higher.
I am trying to understand the best approach. Personally, I’d prefer to consume the API (if readily available) but this could introduce chattiness to the system and could take longer to complete as compared to to in-process.
A typical sequence could be similar to publishing an online advertisement,
Advertiser submits a new advertisement request via a web application.
Web application invokes the relevant API endpoint and sends a command
message.
Command message initiates a new publish advertisement Saga
instance.
Saga sends a command to validate caller permissions (in
process/out of process API call)
Saga sends a command to validate the
advertisement data (in process/out of process API call)
Saga sends a
command to the fraud service (third party service)
Once the content and fraud verifications are successful,
Saga sends a command to the billing system.
Saga invokes an API call to save add details. (in
process/out of process API call)
And this goes on until the advertisement is expired, there are a number of retry and failure condition paths.
After a number of design iterations we came up with the following guidelines,
Treat REST API layer as the integration platform.
Assume API endpoints are capable of abstracting fairly complex micro work-flows. Micro work-flows are operations that executes in a single burst (not interruptible) and completes with-in a short time span (<1 second).
Assume API farm is capable of serving many concurrent requests and can be easily scaled-out.
Favor synchronous invocations over asynchronous message based invocations when the target operation is fairly straightforward.
When asynchronous processing is required use a single message handler and invoke API from the handlers. This will delegate work to the API farm. This will also eliminate the need for a distributor and extra hardware resources.
Avoid Saga’s unless if the business work-flow contains multiple transactions, compensation logic and resumes. Tests reveals Sagas do not perform well under load.
Avoid consuming DomainServices directly from a message handler. This till do the work locally and also introduces a deployment hassle by distributing business logic.
Happy to hear out thoughts.
You are right on with identifying that you will need Sagas to manage workflow. I'm willing to bet that your Domain hooks up to a common database. If that is true then it will be faster to use your Domain directly and remove the serialization/network overhead. You will also lose the ability to easily manage the transactions at the database level.
Assuming your are directly calling your Domain, the performance becomes a question of how the Domain performs. You may take steps to optimize the database, drive down distributed transaction costs, sharding the data, etc. You may end up using the Distributor to have multiple Saga processing nodes, but it sounds like you have some more testing to do once a design is chosen.
Generically speaking, we use REST APIs to model the commands as resources(via POST) to allow interaction with NSB from clients who don't have direct access to messaging. This is a potential solution to get things onto NSB from your web app.

How a WCF request can be correlated with multiple Workflow instances?

The scenario is a follow:
I have multiple clients in which they can register themselves on a workflow server, using WCF requests, to receive some kind of notifications. The information of the notifications will be received from an external system using another receive activity. The workflow then should get the notification information and callback all registered clients using send activity and callback correlations (the clients are exposing callback interfaces implemented in there and the end-point addresses passed initially with the registration requests). "Log-running workflow service" approach is used with a persistent storage.
Now, I'm looking for some way to correlate the incoming information of the notifications received from the external system with the persisted workflow instances created previously when the registration requests, so that all clients will be notified using end-points that already passed with the registration requests. Is WF 4.0 capable of resuming and executing multiple workflow instances when the information of the notification received without storing end-points somehow manually and go though them? If yes, how can I do that?
Also, if my approach of doing so is not correct, then please advice me about the best practice of doing such system using WCF services.
Your help is highly appreciated.
When you use request correlation with workflow services the correlation key must always match a single workflow instance, you can't have multiple workflow instances react to a single message. So you either need to multicast the message using all the different correlation keys or resume you workflow instances in some other way. That other way could be to store the request somewhere, like a SQL table, and have the workflows periodically check that location if they need to notify the client.