I am trying to understand how redis streams does partitioning , if a specific message can be sent to a specific partition (similar to how you can do with Kafka).
I have checked the redis-cli api , and there is nothing similar to partitioning , also nothing about this when using the StackExchangeRedis redis library.
The only method is : IDatabase.StreamAdd(key,streamKey,streamValue,messageId)
Am i missing anything ? Is the partitioning done only in a fixed way ?
P.S If partitioning can be done , can the partitioning key be composed ?
You cannot achieve Kafka partition with a single Redis Stream. Redis Stream's consumer group looks like a load balancer, which distributes messages in the stream to different consumers without any partition. If one consumer is faster than others, it will consume more messages than others, regardless of the logical partition of messages.
If you want to do partition, you have to use multiple Redis Streams, i.e. multiple keys, and each key for a different partition. And your producers add messages in a logical partition to a dedicated Redis Stream. The different consumers xread messages from different key.
Related
I'm learning Redis and one thing I'm not certain about is when multiple redis clusters are present, if data partition will be done so that each cluster stores part of the data, my understanding is as below.
When multiple ElastiCache redis clusters are present, a key will be hashed using consistent hashing or similar algorithm and the request will be routed to any of the cluster in equal chances. Once the request is routed to a cluster then it will be hashed again to determine which of the 16364 slot it will be in and based on where the slot is allocated, the request will be on a particular shard of that cluster.
To summarize, when multiple clusters are present and cluster mode is enabled, there are two rounds of request level load balancing are done, one is to determine which cluster is to be used and the other is to determine which shard the request to be routed to.
In terms of data partition, each cluster will have a partition of the entire data and within each cluster, the data will be further split into multiple shards.
Please let me know if my understanding is correct, thanks!
I'm using Redis as a simple pubsub broker, managed by the redis-py library, using just the default 'main' channel. Is there a technique, in either Redis itself or the wrapping Python library to count the number of messages in this queue? I don't have deeper conceptual knowledge of Redis (in particular how it implements broker functionality) so am not sure if such a question makes sense
Exact counts, lock avoidance etc. is not necessary; I only need to check periodically (on the order of minutes) whether this queue is empty
Redis Pub/Sub doesn't hold any internal queues of messages see - https://redis.io/topics/pubsub.
If you need a more queue based publish mechanism you might to check Redis Streams. Redis Streams provides two methods that might help you XLEN and XINFO.
Redis team introduce new Streams data type for Redis 5.0. Since Streams looks like Kafka topics from first view it seems difficult to find real world examples for using it.
In streams intro we have comparison with Kafka streams:
Runtime consumer groups handling. For example, if one of three consumers fails permanently, Redis will continue to serve first and second because now we would have just two logical partitions (consumers).
Redis streams much faster. They stored and operated from memory so this one is as is case.
We have some project with Kafka, RabbitMq and NATS. Now we are deep look into Redis stream to trying using it as "pre kafka cache" and in some case as Kafka/NATS alternative. The most critical point right now is replication:
Store all data in memory with AOF replication.
By default the asynchronous replication will not guarantee that XADD commands or consumer groups state changes are replicated: after a failover something can be missing depending on the ability of followers to receive the data from the master. This one looks like point to kill any interest to try streams in high load.
Redis failover process as operated by Sentinel or Redis Cluster performs only a best effort check to failover to the follower which is the most updated, and under certain specific failures may promote a follower that lacks some data.
And the cap strategy. The real "capped resource" with Redis Streams is memory, so it's not really so important how many items you want to store or which capped strategy you are using. So each time you consumer fails you would get peak memory consumption or message lost with cap.
We use Kafka as RTB bidder frontend which handle ~1,100,000 messages per second with ~120 bytes payload. With Redis we have ~170 mb/sec memory consumption on write and with 512 gb RAM server we have write "reserve" for ~50 minutes of data. So if processing system would be offline for this time we would crash.
Could you please tell more about Redis Streams usage in real world and may be some cases you try to use it themself? Or may be Redis Streams could be used with not big amount of data?
long time no see. This feels like a discussion that belongs in the redis-db mailing list, but the use case sounds fascinating.
Note that Redis Streams are not intended to be a Kafka replacement - they provide different properties and capabilities despite the similarities. You are of course correct with regards to the asynchronous nature of replication. As for scaling the amount of RAM available, you should consider using a cluster and partition your streams across period-based key names.
While working with redis and other queue, i wonder if it's possible two consumer for the same queue get the same value?
Does anyone know how redis or kafka solve this problem and how's the performance?
I can't say about Redis, but for Kafka it's simple - individual consumers are consume only from specific partition assigned only to it - nobody else works with this partition, and inside this partition it receives messages one by one.
I am new to Apache Kafka and was playing around with it. If I have 2 brokers and one topic with 4 partitions and assume one of my broker is heavily loaded, will kafka takes care of balancing the incoming traffic from producers to the other free broker ? If so how it is done ?
If you have multiple partitions, it's the producers responsibility/choice of which partition they want to send it to.
Producers publish data to the topics of their choice. The producer is responsible for choosing which message to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the message). link
In Kafka producer, a partition key can be specified to indicate the destination partition of the message. By default, a hashing-based partitioner is used to determine the partition id given the key, and people can use customized partitioners also. To reduce # of open sockets, in 0.8.0 (https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning key is not specified or null, a producer will pick a random partition and stick to it for some time (default is 10 mins) before switching to another one. link
If you specify which partition you want the data to go into, it will always go into that specific partition. If you don't specify, the producer could send it to any partition. The Kafka broker never internally moves or balances messages/partitions.
I believe this decision is to provide certain guarantees for the ordering of messages in a Kafka partition.
Kafka producer tends to distribute messages equally among all partitions unless you override this behavior, then you need to have a look if the four partitions is distributed evenly among brokers.
It depends on what do you mean by "one of the brokers is heavily loaded". if it is because of that topic or this cluster has any other topics (e.g. __consumer_offset).
You can choose the brokers in which partition resides with a cli tools with Kafka or with some kind of UI like yahoo kafka-manager.