Do Redis streams benefit from Cluster mode? Imagine you had 10 streams, would they be distributed across the cluster or all on the same node? I'm planning on using Redis streams for really high throughput (2m+ messages/s) so I'm worried about the performance of Redis streams at this scale.
Any guidance towards scaling Redis streams horizontally would be awesome if it doesn't scale out of the box in Cluster mode.
would they be distributed across the cluster or all on the same node? I
It depends on the keys of these streams. Redis Cluster distributes these streams based on the key of each stream. If these keys have the same hash tag, they will be located on the same node. Otherwise, the it takes the CRC16 of the key to distribute the streams. Check this for detail.
Related
Based on below latentcy comparisons given at https://gist.github.com/jboner/2841832 SSD Read is almost similar to Network Read in same datacenter in terms of cost.
I am trying to understand if Redis deployment on separate node/cluster will be performant due to network latency introduced? Won't deploying Redis on app nodes itself be a better option? This is assuming app nodes are using SSD disks and data is sharded across app nodes.
This is for a large deployment with more than 10 app nodes.
Obviously if you can run Redis on the same node as your app you'll get better latency than over the network (and you can also use Unix socket to reduce it more).
But the questions you need to ask your self:
How are you going to shard the data between the app nodes?
What about high availability?
Are there cases where one app node will need data from another node?
Can you be sure the load will be evenly distributed between the nodes so no Redis node will get out of memory?
What about scale out? How are you going to reshard the data?
Redis team introduce new Streams data type for Redis 5.0. Since Streams looks like Kafka topics from first view it seems difficult to find real world examples for using it.
In streams intro we have comparison with Kafka streams:
Runtime consumer groups handling. For example, if one of three consumers fails permanently, Redis will continue to serve first and second because now we would have just two logical partitions (consumers).
Redis streams much faster. They stored and operated from memory so this one is as is case.
We have some project with Kafka, RabbitMq and NATS. Now we are deep look into Redis stream to trying using it as "pre kafka cache" and in some case as Kafka/NATS alternative. The most critical point right now is replication:
Store all data in memory with AOF replication.
By default the asynchronous replication will not guarantee that XADD commands or consumer groups state changes are replicated: after a failover something can be missing depending on the ability of followers to receive the data from the master. This one looks like point to kill any interest to try streams in high load.
Redis failover process as operated by Sentinel or Redis Cluster performs only a best effort check to failover to the follower which is the most updated, and under certain specific failures may promote a follower that lacks some data.
And the cap strategy. The real "capped resource" with Redis Streams is memory, so it's not really so important how many items you want to store or which capped strategy you are using. So each time you consumer fails you would get peak memory consumption or message lost with cap.
We use Kafka as RTB bidder frontend which handle ~1,100,000 messages per second with ~120 bytes payload. With Redis we have ~170 mb/sec memory consumption on write and with 512 gb RAM server we have write "reserve" for ~50 minutes of data. So if processing system would be offline for this time we would crash.
Could you please tell more about Redis Streams usage in real world and may be some cases you try to use it themself? Or may be Redis Streams could be used with not big amount of data?
long time no see. This feels like a discussion that belongs in the redis-db mailing list, but the use case sounds fascinating.
Note that Redis Streams are not intended to be a Kafka replacement - they provide different properties and capabilities despite the similarities. You are of course correct with regards to the asynchronous nature of replication. As for scaling the amount of RAM available, you should consider using a cluster and partition your streams across period-based key names.
I want to build a RabbitMQ system which is able to scale out for the sake of performance.
I've gone through the official document of RabbitMQ Clustering. However, its clustering doesn't seem to support scalability. That's because only through master queue we can publish/consume, even though the master queue is reachable from any node of a cluster. Other than the node on which a master queue resides, we can't process any publish/consume.
Why do we cluster then?
Why do we cluster then?
To ensure availability.
To enforce data replication.
To spread the load/data accross queues on different nodes. Master queues can be stored on different node and replicated with a factor < number of cluster nodes.
Other than the node on which a master queue resides, we can't process
any publish/consume.
Client can be connected on any node of the cluster. This node will transfer 'the request' to the master queue node and vice versa. As a downside it will generate extra hop.
Answer to the question in the title Is RabbitMQ Clustering including scalability too? - yes it does, this is achieved by simply adding more nodes/removing some nodes to/from the cluster. Of course you have to consider high availability - that is queue and exchange mirroring etc.
And just to make something clear regarding:
However, its clustering doesn't seem to support scalability. That's
because only through master queue we can publish/consume, even though
the master queue is reachable from any node of a cluster.
Publishing is done to exchange, queues have nothing to with publishing. A publishing client publishes only to an exchange and a routing key. It doesn't need any knowledge about the queue.
When I add a node in a redis cluster, it has 0 hash slots. Why redis cluster doesn't automatically does a resharding operation in order to make the new node fully functional?
As you can see here, redis supports now automatic partitioning.
The process of adding a node consists of two steps:
Introduce the node to other nodes via CLUSTER MEET so that all the nodes start to communicate via cluster bus
Make the node to act as Master via CLUSTER ADDSLOTS or as a slave via CLUSTER REPLICATE
The separation helps to keep the commands simple.
Automatic resharding is part of the Redis 4.2 roadmap
regard of my expirence, automatic reshard is not i want.
the case i was dealing with is that some node has high read throughput(100k qps), so i add new nodes to reshard only these high load node for the purpose of decreasing the pressure.
you may ask why the load is different? Cause we use hash tag (eg. {user}123456 )to ensure the same kind data stored on the same node.
so automatic reshard is useless.
I have a three node cluster but did not to the reliable queue. I am using puka for python as the client.
For load balancing on ec2 I am using route53 and assign an equal weight to a private ip address. So..if I have three ec2 instances I have 3 route53 entries.
So...my question is this why the cluster? What is the difference with three nodes not clustered on route53 versus three nodes clustered on route53? Are all rabbits writable and readable?
My understaing is that if I want HA and reliable queues then rabbit becomes a master slave and a working cluster is required first before turing the custer into reliable queues.
I am rather confused about how to best cluster and the differences between a cluster vs HA.
Thanks
Clustered nodes will have equally weighted nodes, that no master and no slave, the only advantage is that when a publisher pushes a message to some queue located on other node, the message will traverse from node to node (through Erlang's clustered VM layer) to reach its consumer/worker.
On the other hand, in the HA mode, All queues and exchanges (as per some policy you specify) will be replicated across all the nodes, more over, there is only one master and one or more slaves, where the master is the oldest existing node, and when it dies the second oldest node will take over and be the master.
Let me know if that was the answer you expected.
Here is an article outlining both HA and load-balancing techniques, and how to combine the two efficiently, across a RabbitMQ cluster.