Message ordering in distributed system

Message ordering in distributed system - system

I want to build a distributed system where I have "threads" (a collection of messages with it's own ID, not a system process) that are distributed across many servers. These threads must have two critical properties:
Each message in thread must have an order number that reflects it's position in thread based on time. For example by saying "thread1/message10" I can get to message #10 in thread #1
Once new message gets added to thread a system must be able to assign it an order number that is consistent for all instances of thread on all servers and this number must never change.
I want to know if there's any known solution, library or algorithm that can help me implement a second option because now I see it as a big problem because due to many factors different servers can get the same message at different times and that might affect it's order number.
Just to outline my thoughts on a problem so far say I have 3 servers with my distributed thread which already contains 5 messages and each server sends a new message to it's own thread and to remaining two.
Naive ordering. Each server think it's own message number is 6 and the remaining two messages from other servers will get their numbers on arrival depending on network latency and many other random factors so order numbers are not consistent across servers. This is unacceptable right away.
UTC timestamp based ordering. When each thread gets a new message I take say 10 preceding messages that already have correct order numbers, extract their timestamps and determine a new message's order number by finding it's timestamp place in a list of last 10 timestamps.
This might work I guess but it does require that some message's order number can be assigned and then changed at some point which is unacceptable. Also I'm not sure if this will work right when a number of incoming messages is huge.
Thanks for all the help.

This is a fundamental problem in distributed systems known as Atomic Broadcast, with a number of solutions offering different performance and applicability trade-offs (see the survey referenced by the wikipedia page). In practice, the most commonly used are based on Paxos (e.g., libpaxos) or on Totem (e.g., Corosync or Spread). A key issue when selecting one of these is what do you expect to happen if the network partitions: Should it stop ordering messages (block) or should it produce independent orders for each partition?

Related

Upper limit on number of redis streams consumer groups?

We are looking at using redis streams as a cluster wide messaging bus, where each node in the cluster has a unique id. The idea is that each node, when spawned, creates a consumer group with that unique id to a central redis stream to guarantee each node in the cluster gets a copy of every message. In an orchestrated environment, cluster nodes will be spawned and removed on the fly, each having a unique id. Over time I can see this resulting in there being 100's or even 1000's of old/unused consumer groups all subscribed to the same redis stream.
My question is this - is there an upper limit to the number of consumer groups that redis can handle and does a large number of (unused) consumer groups have any real processing cost? It seems that a consumer group is just a pointer stored in redis that points to the last read entry in the stream, and is only accessed when a consumer of the group does a ranged XREADGROUP. That would lead me to assume (without diving into Redis code) that the number of consumer groups really does not matter, save for the small amount of RAM that the consumer groups pointers would eat up.
Now, I understand we should be smarter and a node should delete its own consumer groups when it is being killed or we should be cleaning this up on a scheduled basis, but if a consumer group is just a record in redis, I am not sure it is worth the effort - at least at the MVP stage of development.
TL;DR;
Is my understanding correct, that there is no practical limit on the number of consumer groups for a given stream and that they have no processing cost unless used?

Your understanding is correct, there's no practical limit to the number of CGs and these do not impact the operational performance.
That said, other than the wasted RAM (which could become significant, depending on the number of consumers in the group and PEL entries), this will add time complexity to invocations of XINFO STREAM ... FULL and XINFO GROUPS as these list the CGs. Once you have a non-trivial number of CGs, every call to these would become slow (and block the server while it is executing).
Therefore, I'd still recommend implementing some type of "garbage collection" for the "stale" CGs, perhaps as soon as the MVP is done. Like any computing resource (e.g. disk space, network, mutexes...) and given there are no free lunches, CGs need to be managed as well.
P.S. IIUC, you're planning to use a single consumer in each group, and have each CG/consumer correspond to a node in your app's cluster. If that is the case, I'm not sure that you need CGs and you can use the simpler XREAD (instead of XREADGROUP) while keeping the last ID locally in the node.
OTOH, assuming I'm missing something and that there's a real need for this use pattern, I'd imagine Redis being able to support it better by offering some form of expiry for idle groups.

RabbitMQ - allow only one process per user

To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!

The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.

If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.

Best way to handle timouts on rabbitmq message processing

I am trying to get my head around an issue I have recently encountered and I hope someone will be able to point me in the most reasonable direction of solving it.
I am using Riak KV store and working on CRDT data, where I have some sort of counter inside each CRDT item stored in database.
I have a rabbitmq queue, where each message is a request to increase or decrease a certain amount of aforementioned counters.
Finally, I have a group of service-workers, that listens on the queue, and for each request try to change the amount of counters accordingly.
The issue I have is as follows: While a single worker is processing a request, it may get stuck for a while on a write operation to database – let’s say on a second change of counters out of three. It’s connection with rabbitmq gets lost (timeout) so the message-request gets back on to the queue (I cannot afford to miss one). Then it is picked up by second worker, which begins all processing anew. However, the first worker finishes its work, and as a results I have processed a single message twice.
I can split those increments into single actions, but this still leaves me with dilemma – can still change value of counter twice, if some worker gets stuck on a write operation for a long period.
I do not have possibility of making Riak KV CRDT writes work faster, nor can I accept missing out a message-request. I need to implement some means of checking whether a request was already processed before.
My initial thoughts were to use some alternative, quick KV store for storing rabbitMQ message ID if they are being processed. That way other workers could tell whether they are not starting to process a message that is already parsed elsewhere.
I could use any help and pointers to materials I can read.

You can't have "exactly one delivery" semantic. You can reduce double-sent messages or missed deliveries, so it's up to you to decide which misbehavior is the least inconvenient.
First of all are you sure it's the CRDTs that are too slow ? Are you using simple counters or counters inside maps ? In my experience they are quite fast, although slower than kv. You could try:
- having simple CRDTs (no maps), and more CRDTs objects, to lower their stress( can you split the counters in two ?)
- not using CRDTs but using good old sibling resolution on client side on simple key/values.
- accumulate the count updates orders and apply them in batch, but then you're accepting an increase in latency so it's equivalent to increasing the timeout.
Can you provide some metrics? Like how long the updates take, what numbers you'd expect, if it's as slow when you have few updates or many updates, etc

RabbitMQ : One queue per message type, or post routing?

I use RabbitMQ as an integration distribution system, kind of ETL, pollers are querying tables from source databases, publish results on RabbitMQ, and results are consumed according their source (1 queue per source (app.) to be saved in another form.
I'm asking if it would be better to split queues per query AND source (app..), actually it's done only by source, and "postrouted" using a custom payload header.
The only advantage I see, that could be a defect, is that there are a same number of consumer as there are queries to do. But it could become a problem ...
Thanks.

I would say that one queue per query could get out of hand quickly in terms of managing and monitoring them.
I find it works well to have one queue per destination, and to then use the routing key to specify how things should be handled within your consumer code (i.e. for the type). That way, you get RabbitMQ to do the multiplexing for you, and the consumer code can run separately on the same messages on each destination point.
There are course, always many different ways, but I find that this tends to work well for ETL applications. If you have tons of destinations, perhaps you would want to move towards adding the destination to the routing key as well. If you don't have any ordering requirements (i.e. due to RDBMS Foreign Key Constraints), you could also consider adding multiple consumers to the same queue to improve throughput. (For cases where you do have such ordering requirements, that's where the one queue per destination and the multiplexing that provides proves to be especially useful.)

What is a real world use for ConcurrentBag<T>?

A ConcurrentBag will allow multiple threads to add and remove items from the bag. It is possible that a thread will add an item to the bag and then end up taking that same item right back out. It says that the ConcurrentBag is unordered, but how unordered is it? On a single thread, the bag acts like a Stack. Does unordered mean "not like a linked list"?
What is a real world use for ConcurrentBag?

Because there is no ordering the ConcurrentBag has a performance advantage over ConcurrentStack/Queue. It is implemented by Microsoft as local thread storage. So every thread that adds items does this in it's own space. When retrieving items they come from the local storage. Only when that is empty the thread steals item from another threads storage. So instead of a simple list a ConcurrentBag is a distributed list of items. And is almost lockfree and should scale better with high concurrency.
Unfortunately in .NET 4.0 there was a performance issue (fixed in 4.5) see
http://ayende.com/blog/156097/the-high-cost-of-concurrentbag-in-net-4-0

Bags are really useful for tracking instance counts. For example, if you want to keep a record of which hosts you're servicing web requests for, you can add their IP to the bag when you start servicing the request, and remove it when done.
Using a bag will allow you to tell at a glance which IPs you're currently servicing. It will also let you quickly query whether you're servicing a given IP address.
If you use a set for this rather than a bag, then having multiple concurrent requests from the same IP address will mess up your record-keeping.

Anything where you just need to keep track of what's there and don't need random access or guaranteed order. If you have a thread that adds items to process, and a thread that removes items in order to process them, a concurrent bag would work well if you don't care that they're processed in FIFO order.

Thanks to #Chris Jester-Young I came up with a good, real world, scenario that actually applies to a project i'm working on.
Find - Process - Store
Find - threads 1 & 2 are set to find or scrape data (file system, web, etc). These results are stored in ConcurrentBag1.
Process - threads 3 & 4 are set to take out of ConcurrentBag1, clean/transform/process the data and then store the results in ConcurrentBag2.
Store - threads 5 is set to gather results from ConcurrentBag2 and store the results in SQL.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas