Duplicate detection with NServiceBus on Azure Service Buus

Duplicate detection with NServiceBus on Azure Service Buus - nservicebus

I'm using NServiceBus as an abstraction layer for Azure Service Bus (in case we move away from Azure). I find that when working with multiple subscribers (who subscribe to the same events) the number of duplicate messages increases. I know Azure Service Bus (ASB) has a way of detecting these duplicates and I can see that the feature is configurable through NServiceBus (according to documentation). However, I can only get a sample of of achieving duplication detection by means of configuration section. What I require is a sample of how to achieve this with code.
Thanks
Suraj

You can specify configuration using code-based approach as well. NServiceBus has to contracts that can help with that IConfigurationSource and IProvideConfiguration<T>. Here's an example how you can take a configuration file section (UnicastBusConfig) and specify values via code.
Specifically to what you've asked, implementing IProvideConfiguration<AzureServiceBusQueueConfig> will allow you configure ASB transport, specifying duplicates and such.
The observation about number of duplicates increasing as a result of increasing subscribers feels as a symptom, not the problem. That is probably a different question, not related to the configuration. Saying that, I'd look into it prior to enabling the native de-dupplication. While you can specify RequiresDuplicateDetection and DuplicateDetectionHistoryTimeWindow be aware that ASB performing duplicate detection on the ID property only. Also, it is better to build your handlers as idempotent, rather than relying on the native de-duplication.

Related

Seperate or Merge Kafka Consumer and API services together

After recently reading about event-based architecture, I wanted to change my architecture into one making use of such strengths.
I have two services that expose an API (crud, graphql), each based around a different entity and using a different database.
However, now whenever someone deletes a certain type of row in service A, i need to delete a coupled row in Service B.
So I added Kafka to my design, and whenever I delete the entity in service A, it publishes a notification message into Kafka.
In service B I am currently consuming the same topic so whenever a new message is received the Service will also handle the deletion of the matching entity, because it already has access that table because the same service already exposes the CRUD API to users.
What i'm not sure about is whether putting the Kafka Consumer and the API together in the same service is a good design. It contradicts the point of single responsibility in micro services, and whether there is an issue in one part of the service, it will likely affect the second.
However, creating a new service will also cause me issues - i will have 2 different services accessing the same table, and i will have to make sure i always maintain them together, whenever making changes to the table or database.
What is the best practice in a incident such as this? Is it inevitable to have different services have data coupling or is it not so bad to use the same service for two, similiar usages.

There is nothing wrong with using Kafka... You could do the same with point-to-point service communication, however (JSON-RPC / gRPC), however.
The real problem you seem to be asking about is dual-writes or race-conditions leading to data inconsistency.
While you could use a single consumer group and one topic-partition to preserve order and locking across consumers interested in those events, that does not lock out other consumer-groups from interacting with the database to perform the same action. Therefore, Kafka itself won't help with this problem.
You'll need external, distributed locks (e.g. Zookeeper can be used here) that fence off your database clients while you are performing actions against it.
To the original question, Kafka Connect offers an API and is also a Producer and Consumer client (and would be recommended for database interactions). So is Confluent Schema Registry, KSQLdb, etc.

I believe that the consumer of your service B would not be considered "a service" or part of the "service", as in that it is not called as part the code which services requests. Yet it does provide functionality that is required for the domain function of your microservice. So yes I would consider the consumer part of the Microservice in terms of team/domain responsibility.
There may be different opinions on if the consumer code should share the same code base/repo as the "service" code. Some people believe that it is better to limit the repo scope to a single "executable", others believe it is beneficial to keep the domain scope and have everything in a single repo. I probably belong to the latter group but do not have a very strong opinion on it. I would argue it is more important to have a central documentation / wiki for the domain that will point to the repos involved etc.

Max IEndpointInstances per process

Is there an upper limit to the number of unique IEndpointInstances that be hosted within in a single process?
I'm considering a design that will see up to a 100 unique IEndpointInstances, all listening on separate queues, be active simultaneously.
Will this cause a problem for NServiceBus? Could the process deadlock or spin up so many threads as to be unresponsive and useless?
The question NServiceBus - How to get separate queue for each message type receiver subscribes to? seems to suggest that you can not have multiple endpoints in a process, but this is an older post. I have built a small sample against NServiceBus 6--beta4 that does work.
There is a similar question NServiceBus Single Process, but Multiple Input queues that concluded, based on the OP's context using Satellite Features was the recommended approach. However, in my case, I have 100 (functionally different) sagas (1 per queue), where each saga could need to receive similar messages, but I need to make sure that only the correct saga receives the message. Therefor, I don't think implementing a custom feature will meet my requirements. Or will Satellite Features support Sagas?

One of the options is to use self multi hosting. Using this approach, you self the endpoints yourself in the same process. There are a few things to take into consideration, such as:
Assembly scanning (might require custom scanning logic per endpoint).
Throughput (for heavy throughput endpoints I'd recommend a separate hosting process).
To update/redeploy a single endpoint, you'll be taking all of the other 99 endpoints down as well.
While there's no hard limit on how many endpoints can be co-hosted, 100 sounds a bit a lot. Saying that, it also depends how heavy the load on those endpoints is. If you process 1 msg/sec or 1K msg/sec determine a lot if this is a viable option or not.
Have a look at the sample that does exactly that.

Mule outbound endpoint level statistics to facilitate integration testing

I am looking for a pragmatic solution to do Integration testing of our Integration tier based on Mule.
This article here has some excellent pointers about it, but looks a tad outdated. I am reproducing an excellent idea from the article here
Keeping track of the delivery of messages to external systems. Interrogating all the systems that have been contacted with test messages after the suite has run to ensure they all received what was expected would be too tedious to realize. How to keep track of these test messages? One option could be to run Mule ESB with its logging level set to DEBUG and analyze the message paths by tracking them with their correlation IDs. This is very possible. I decided to follow a simpler and coarser approach, which would give me enough certitude about what happened to the different messages I have sent. For this, I decided to leverage component routing statistics to ensure that the expected number of messages where routed to the expected endpoints (including error messages to error processing components). Of course, if two messages get cross-sent to wrong destinations, the count will not notice that. But this error would be caught anyway because each destination will complain about the error, hence raising the count of error messages processed.
Using this technique when I test my integration tier I will not have to stand up all the external systems and can test the integration tier in isolation which would be great.
#David Dassot has provided a reference implementation as well, however I think it was based on Mule 2.X and hence I cannot find the classes in the Mule 3.X codebase.
Looking around I did find FlowConstructStatistics but this is flow specific statistics and I am looking for endpoint specific statistics.
I do agree that as a work around we could wrap all outbound endpoints within sub-flows and get this working, but I would like to avoid doing this ...
Any techniques that help query the endpoint for the number of calls made, payload passed through the endpoints would be great!

First take a look to JMX, perhaps what you need is available right there.
Otherwise, if looging is not enough, and upgrading to the enterprise version is not ok for you. Give it a try to the endpoint level notifications.

NServicebus hierarchy and structure

I am just starting out learning NServicebus (and SOA in general) and have a few questions and points I need clarification on regarding how the solution is typically structured and common best practices:
The documentation doesn't really explain what an endpoint is. From what I gather it is a unit of
deployment and your service will have 1 or more endpoints. Is this correct?
Is it considered best practice to have one VS solution per Service you are developing? With a project for messages, then a project for each endpoint, and finally a project that is shared with the endpoints containing your domain layer?
From what I read, services are usually comprised of individual components. Can (or should) any component in the service access the same database, or should it be one database per component?
Thanks for any clarification or insight one can offer.

I will try and answer your questions the best i can...
I'm not sure about the term "Best Practices" i would consider the term "Best Thinking" or "Paradigm"
Q1: yes, an endpoint is a effectively a deployed process that consumes the messages of it's queue.
It's dose not have to belong to a single "Service" (logical) (In the case of a web endpoint for example), an endpoint can have one or more handlers deployed to it.
Q2: I would go with one solution (and later repo) per logical domain service, Inside a solution I would create a project per message handler, because as you scale you will need to move you handler between endpoints, or to their own endpoint depending on scale. Messages however are contracts, so i would put them in a solution/s maybe split commands and events. You may consider something like nuget to publish your message packages.
Q3: A "Service" is a logical composition of autonomous components, each is a vertical slice of functionality, so they can share the same database, but i would say that only one component has the authority to modify it's own data. I would always try to think what would happen when you need to scale.
Dose this make sense?

how to configure nservicebus using gateway when MSDTC not available either end

I am new to NServiceBus, trying to introduce messaging into a WCF/RPC solution.
Because of architectural constraints and overhead (memory and cpu usage already high) IT Operations will not allow MSDTC. (I'm also keen to avoid 2PC fwiw). I also require messages over http so the NSB bridge looks like a great solution.
Based on these posts (how-i-avoid-two-phase-commit and extending-nservicebus-avoiding-two-phase-commits) it looks to me as though it's possible to use NSB with the DTC disabled.
It sounds like EventStore does manage to avoid 2PC in the same way that I want to setup NSB but at the moment I just want to get NSB to work rather than adding event sourcing into the mix.
Questions:
Are there any examples of configuring NSB to work this way? I'm quite happy to add the extra complexity (custom message handler with local message state storage) - without 2PC there isn't really another option. I already know of this example (IdempotentConsumer) but the test projects for this repo contain no code. It would be even more helpful if there was an example using nosql storage.
Will I need to alter the NSB bridge to deal with no DTC? I'm guessing no - bridge transactions are only against the local queue but the process that consumes the local queue will obviously need to coded to avoid 2PC. Correct?
Are there any other useful resources/posts around using NSB without MSDTC? The solution (how-i-avoid-two-phase-commit) seems not too complex but given I'm just starting out with NSB it would be great to find a quickstart for this...
I would have thought this would be a common scenario - but there doesn't seem to be much written about avoiding MSDTC while still using NSB. Surely there are others who are using a message bus but aren't allowed to use MSDTC... Is there another obvious way that I've missed?
thanks

2) Yes you should be fine. Since you're doing the deduplication your self you don't need the gateway to do it for you. Just configure it to use the InMemory persistence and you should be fine.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas