Many to many filtering in Message Broker - rabbitmq

I have a Person object in the system. When a Person does some action there is an Administrator actor who is interested in monitoring these kind of events.
Person
{
Id: string
}
PersonAction
{
ActionType: enum
PersonId: string
}
Currently I have this subscription implemented throught ServiceBus topic and subscriptions: Administrators subscribe to actions of all Persons in the system:
Azure Service Bus broker has PersonActions topic.
Every time when Person does any action a PersonAction event is sent to the Topic.
Every Administrator creates it's own subscription to the topic and monitors all Persons actions.
Now I have a new requirement that introduces grouping of Persons and I need a way to allow Administrators to subscribe to PersonActions events based on groups they want to monitor:
Persons can be part of one ore more groups.
Administrators are interested in monitoring groups of Persons and, hence, receiving all PersonAction events for groups they are monitoring.
Administrators may subscribe to one or several groups.
Here are my thoughts how to do this:
Add to PersonAction a routing property that will contain information about groups this Person is member of
When Administrator creates new subscription he will specify a set of groups that he wants to monitor and it should be than somehow used in subscription filter to filter PersonAction messages in the Topic.
So, cutting to the case, I want to leverage Service Bus Topic filtering capabilities to deliver PersonAction messages specificaly to Administrators that are interested in them based on Groups.
In general this doesn't seem to be a straightforward task to do with ServiceBus(or any other message broker) because there is a many-to-many relation: one Person can be in multiple groups and Administrator may want to subscribe to multiple groups. Usually all filters support filtering when event has a single property(like "groupId=1234") and in my case it's an array.
So far, I've came up with two solutions but don't quite like any of them:
Use LIKE SqlFilter. Concatenate all groups of the Person into a single comma-separated string (groups=1,2,5,8) and than have filter groups LIKE %1% OR groups LIKE %5% (in reality group ids will be guids, so don't mind the problem with one group id being a substring of another)
Add each group id as a property with an empty value and than use EXISTS filter to check if event has this group id defined. Filter would be EXISTS(1) OR EXISTS(5) and PersonAction properties: {1:null, 2:null, 5:null, 8:null}
Is there a better way to do such filtering and how is many-to-many filtering rule done in message brokers?
Answers describing this for Any message broker(not only ServiceBus) will be also extremely helpful.

I'm not really that familiar with other brokers but here is something that comes to mind for Azure Service Bus.
You could have 2 (3 with bonus) level of entities instead of 1 for such a scenario
The first level is a topic where all the PersonAction messages come into and would have subscriptions for each group with auto-forward setup to their own topics
The second level is where each group has its own topic and administrators would subscribe to multiple topics based on the groups they want to monitor but will have to de-duplicate messages
You could remove this level and have direct subscriptions (one per group per administrator) but would likely hit the limit of 2000 subscriptions per topic
(Bonus) Auto Forward the messages from the subscriptions into administrator queues and enable Duplicate Detection
Note that the number of operations billed would increase as mentioned in the Auto Forward Considerations section of the docs
Here is a more elaborate explanation for the same
1. Input Topic
This is where the PersonAction messages would first come in.
This topic would have subscriptions that filter messages based on the group (either of your approaches; I'd prefer using a Correlation Filter since its more efficient) and would auto-forward the messages into respective topics.
2. Topic per Group
This is where the PersonAction messages filtered by group go into.
At this point, there would copies of the same message in different topics based on all of the groups the user is part of.
Administrators would create subscriptions to the topics required depending on the groups they want to monitor but will have to handle the duplicate messages that they could potentially receive.
3. (Bonus) Administrator Queues
The subscriptions created by administrators could be setup to auto-forward messages into their personal queue and these queues could have duplicate detection enabled allowing the administrators to freely process the messages as-is without worrying about duplicates.

Related

How do I make a program that mimics Kafka operations?

As part of my big data course in university, I'm required to mimic kafka. It involves setting up a mini-Kafka on a student's system, complete with a Producer, Subscriber and a Publish-Subscribe architecture.
High-Level Overview
You are required to set up a mini-Zookeeper, multiple Kafka Brokers, one of which is a leader, and multiple Producers and Consumers.
The number of Producers and Consumers must be dynamic and not hard-coded, i.e.,the user must be able to specify the number of Producers and Consumers.
The number of topics must also be dynamic, the user should be able to create and delete topics on demand.
To help you get started with the project, the following sections will provide a detailed description of all the individual modules.
Architecture

Multiple subscriptions to a topic

I have been using pubsub for a bit of asynchronous work, and was wondering why someone may create multiple subscriptions for a single topic. My default values are as follows:
project_id = 'project'
topic_name = 'app'
subscription_name = 'general'
The routing of the actual function -- and how to process that -- is being doing in the subscriber receiver itself.
What would be reasons why there would be various subscription names? The only thing I can think of is to spread items across multiple servers for processing, such as:
server1 -- `main-1`
server2 -- `main-2`
etc.
Are there any other reasons why a subscription name would not work well with one value?
In general, there are two paradigms for having multiple subscribers:
Load balancing: The goal is to parallelize the processing of the load by having multiple subscribers using the same subscription. In this scenario, every subscriber receives a subset of the messages. One can horizontally scale processing by creating more subscribers for the same subscription.
Fan out: The goal is to have multiple subscribers receive the entire feed of messages. This is accomplished by having multiple subscriptions. The reason to have fan out is if there are multiple downstream applications interested in the full feed of messages. Imagine there is a feed where the messages are user events on a shopping website. Perhaps one application backs up the data to files, another analyzes the feed for trends in what people are looking at, and another looks through activity to try to find potentially fraudulent transactions. In this scenario, every one of those applications acting as a subscriber needs the full feed of messages, which requires separate subscriptions.

Google PubSub : How to customize distribution of messages to consumers?

I have a scenario where we will be sending customer data to pubsub and consume it with java subscribers. I have multiple subscribers subscribed to same subscription. Is there a way to route all messages of same customerID to same subscriber ?
I know Google Dataflow has session based windowing. However, I wanted to know if we can achieve it using simple java consumers.
Update June 2020: Filtering is now an available feature in Google Cloud Pub/Sub. When creating a subscription, one can specify a filter that looks at message attributes. If a message does not match the filter, the Pub/Sub service automatically acknowledges the message without delivering it to the subscriber.
In this case, you would need to have different subscriptions and each subscriber would consume messages from one of the subscriptions. Each subscription would have a filter set up to match the customer ID. If you know the list of customer IDs and it is short, you would set up an exact match filter for each customer ID, e.g.,
attribute.customerID = "customerID1"
If you have a lot of customer IDs and wanted to partition the set of IDs received by each subscriber, you could use the prefix operator to do so. For example, if the IDs are numbers, you could have filters such as:
hasPrefix(attribute.customerID, "0")
hasPrefix(attribute.customerID, "1")
hasPrefix(attribute.customerID, "2")
hasPrefix(attribute.customerID, "3")
...
hasPrefix(attribute.customerID, "9")
Previous answer:
At this time, Google Cloud Pub/Sub has no way to filter messages delivered to particular subscribers, no. If you know a priori the number of subscribers you have, you could to it yourself, though. You could create as many topics as you have subscribers and then bucket customer IDs into different topics, publishing messages to the right topic for each customer ID. You'd create a single subscription on each topic and each subscriber would receive messages from one of these subscriptions.
The disadvantage is that if you have any subscribers that want the data for all customer IDs, then you'll have to have an additional subscription on each topic and that subscriber will have to get messages from all of those subscriptions.
Keep in mind that you won't want to create more than 10,000 topics or else you may run up against quotas.

How to use Azure service bus topics & Subscriptions to load balance messages

In reading many MSDN pages about the Azure Service Bus, it alludes to the ability to set up a "Load Balancing" pattern with the "Topic/Subscription" model, but never says how this is done.
My question is, is this possible. Essentially, we are looking to create Topics that would have a possible n number of subscribers that could be dynamically ramped up and down, based upon incoming load. So, it would not be using the traditional "multicast" pattern but round robining the messages to the subscribers. The reason we want to use this pattern is that we want to take advantage of the rules and filtering that reside in the Topics and Subscriptions, while allowing for dynamic scaling.
Any ideas?

Lock message to single subscriber using Topics?

I apologize for such a non-specific question, but I'm in the research stage of a project and had one question about the Windows Enterprise Service Bus that I can't seem to get a clear answer to.
The project entails users sending different types of "jobs" as messages to the ESB, which should then hand off the message to one of several available severs for background processing.
Considering we will have multiple different "jobs", I thought it would be best to create a subscription per background server and have each message be filtered by it's type, this way we wouldn't have to build in a dequeuer ourselves. However, my concern is that I will not be able to lock a message to one subscription in time and the message will be processed by each subscription that handles the particular type of "job".
I've been hard-pressed to find good research material on this subject and it seems that a Queue and a Subscription are mostly handled the same with the Service Bus, but the only part I can't find is when you lock a message on a topic, can it be locked only to one subscriber.
Thanks for any help or guidance towards the answer.
A message sent to a topic is essentially duplicated/copied to all subscribers. So there is no way for one subscriber to "lock" the message. The approach for this is to have a single subscriber by type, then have multiple receivers associated with that subscriber.
Unlike subscribers, receivers are competitive, giving you the "only one get its" behavior you appear to be after.