While working with redis and other queue, i wonder if it's possible two consumer for the same queue get the same value?
Does anyone know how redis or kafka solve this problem and how's the performance?
I can't say about Redis, but for Kafka it's simple - individual consumers are consume only from specific partition assigned only to it - nobody else works with this partition, and inside this partition it receives messages one by one.
Related
I have got a task to check whether created RabbitMQ cluster is idle(has been used) or not. I can think of only one case which is non existence of queues and exchanges. If no queues are created then we can easily say that the created cluster has not been used. But my task is to collect all such cases by which we can check if created cluster is idle or been used.So I want everyone to help me to get more cases or situations where a RabbitMQ cluster will not be active for some time and be idle.
Because of RabbitMQ's behavior, a cluster that is currently not being used (but once was) looks exactly the same as one that has never been used (which is a good thing for performance).
Assuming no client deletes the queue it is using, or the cluster creation involves creating new queues or exchanges, then checking if there are any existing queues (or any non-default exchanges) is your best bet at guessing if any client has ever used a RabbitMQ cluster.
I am trying to understand how redis streams does partitioning , if a specific message can be sent to a specific partition (similar to how you can do with Kafka).
I have checked the redis-cli api , and there is nothing similar to partitioning , also nothing about this when using the StackExchangeRedis redis library.
The only method is : IDatabase.StreamAdd(key,streamKey,streamValue,messageId)
Am i missing anything ? Is the partitioning done only in a fixed way ?
P.S If partitioning can be done , can the partitioning key be composed ?
You cannot achieve Kafka partition with a single Redis Stream. Redis Stream's consumer group looks like a load balancer, which distributes messages in the stream to different consumers without any partition. If one consumer is faster than others, it will consume more messages than others, regardless of the logical partition of messages.
If you want to do partition, you have to use multiple Redis Streams, i.e. multiple keys, and each key for a different partition. And your producers add messages in a logical partition to a dedicated Redis Stream. The different consumers xread messages from different key.
I'm using Redis as a simple pubsub broker, managed by the redis-py library, using just the default 'main' channel. Is there a technique, in either Redis itself or the wrapping Python library to count the number of messages in this queue? I don't have deeper conceptual knowledge of Redis (in particular how it implements broker functionality) so am not sure if such a question makes sense
Exact counts, lock avoidance etc. is not necessary; I only need to check periodically (on the order of minutes) whether this queue is empty
Redis Pub/Sub doesn't hold any internal queues of messages see - https://redis.io/topics/pubsub.
If you need a more queue based publish mechanism you might to check Redis Streams. Redis Streams provides two methods that might help you XLEN and XINFO.
I have a question related to a tricky situation in an event-driven system that I want to ask for advise. Here is the situation:
In our system, I use redis as a memcached database, and kafkaa as message queues. To increase the performance of redis, I use lua scripting to process data, and at the same time, push events into a blocking list of redis. Then there will be a process to pick redis events in that blocking list and move them to kafka. So in this process, there are 3 steps:
1) Read events from redis list
2) Produce in batch into kafka
3) Delete corresponding events in redis
Unfortunately, if the process dies between 2 and 3, meaning that after producing all events into kafka, it doesn't delete corresponding events in redis, then after that process is restarted, it will produce duplicated events into kafka, which is unacceptable. So does any one has any solution for this problem. Thanks in advance, I really appreciate it.
Kafka is prone to reprocess events, even if written exactly once. Reprocessing will almost certainly be caused by rebalancing clients. Rebalancing might be triggered by:
Modification of partitions on a topic.
Redeployment of servers and subsequent temporary unavailabilty of clients.
Slow message consumption and subsequent recreation of client by the broker.
In other words, if you need to be sure that messages are processed exactly once, you need to insure that at the client. You could do so, by setting a partition key that ensures related messages are consumed in a sequential fashion by the same client. This client could then maintain a databased record of what he has already processed.
I would like to create a cluster for high availability and put a load balancer front of this cluster. In our configuration, we would like to create exchanges and queues manually, so one exchanges and queues are created, no client should make a call to redeclare them. I am using direct exchange with a routing key so its possible to route the messages into different queues on different nodes. However, I have some issues with clustering and queues.
As far as I read in the RabbitMQ documentation a queue is specific to the node it was created on. Moreover, we can only one queue with the same name in a cluster which should be alive in the time of publish/consume operations. If the node dies then the queue on that node will be gone and messages may not be recovered (depends on the configuration of course). So, even if I route the same message to different queues in different nodes, still I have to figure out how to use them in order to continue consuming messages.
I wonder if it is possible to handle this failover scenario without using mirrored queues. Say I would like switch to a new node in case of a failure and continue to consume from the same queue. Because publisher is just using routing key and these messages can go into more than one queue, same situation is not possible for the consumers.
In short, what can I to cope with the failures in an environment explained in the first paragraph. Queue mirroring is the best approach with a performance penalty in the cluster or a more practical solution exists?
Data replication (mirrored queues in RabbitMQ) is a standard approach to achieve high availability. I suggest to use those. If you don't replicate your data, you will lose it.
If you are worried about performance - RabbitMQ does not scale well.
The only way I know to improve performance is just to make your nodes bigger or create second cluster. Adding nodes to cluster does not really improve things. Also if you are planning to use TLS it will decrease throughput significantly as well. If you have high throughput requirement +HA I'd consider Apache Kafka.
If your use case allows not to care about HA, then just re-declare queues/exchanges whenever your consumers/publishers connect to the broker, which is absolutely fine. When you declare queue that's already exists nothing wrong will happen, queue won't be purged etc, same with exchange.
Also, check out RabbitMQ sharding plugin, maybe that will do for your usecase.