Redis Streams vs Kafka Streams/NATS - redis

Redis team introduce new Streams data type for Redis 5.0. Since Streams looks like Kafka topics from first view it seems difficult to find real world examples for using it.
In streams intro we have comparison with Kafka streams:
Runtime consumer groups handling. For example, if one of three consumers fails permanently, Redis will continue to serve first and second because now we would have just two logical partitions (consumers).
Redis streams much faster. They stored and operated from memory so this one is as is case.
We have some project with Kafka, RabbitMq and NATS. Now we are deep look into Redis stream to trying using it as "pre kafka cache" and in some case as Kafka/NATS alternative. The most critical point right now is replication:
Store all data in memory with AOF replication.
By default the asynchronous replication will not guarantee that XADD commands or consumer groups state changes are replicated: after a failover something can be missing depending on the ability of followers to receive the data from the master. This one looks like point to kill any interest to try streams in high load.
Redis failover process as operated by Sentinel or Redis Cluster performs only a best effort check to failover to the follower which is the most updated, and under certain specific failures may promote a follower that lacks some data.
And the cap strategy. The real "capped resource" with Redis Streams is memory, so it's not really so important how many items you want to store or which capped strategy you are using. So each time you consumer fails you would get peak memory consumption or message lost with cap.
We use Kafka as RTB bidder frontend which handle ~1,100,000 messages per second with ~120 bytes payload. With Redis we have ~170 mb/sec memory consumption on write and with 512 gb RAM server we have write "reserve" for ~50 minutes of data. So if processing system would be offline for this time we would crash.
Could you please tell more about Redis Streams usage in real world and may be some cases you try to use it themself? Or may be Redis Streams could be used with not big amount of data?

long time no see. This feels like a discussion that belongs in the redis-db mailing list, but the use case sounds fascinating.
Note that Redis Streams are not intended to be a Kafka replacement - they provide different properties and capabilities despite the similarities. You are of course correct with regards to the asynchronous nature of replication. As for scaling the amount of RAM available, you should consider using a cluster and partition your streams across period-based key names.

Related

How to use backpressure with Redis streams?

Am I missing something, or is there no way to generate backpressure with Redis streams? If a producer is pushing data to a stream faster consumers can consume it, there's no obvious way to signal to the producer that it should stop or slow down.
I expected that there would be a blocking version of XADD, that would block the client until room became available in a capped stream (similar to the blocking version of XREAD that allows consumers to wait until data becomes available), but this doesn't seem to be the case.
How do people deal with the above scenario — signaling to a producer that it should hold off on adding more items to a stream?
I understand that some data stream systems such as Kafka do not require backpressure, but Redis doesn't appear to have a comparable solution, and it seems like this would be a relatively common problem for many Redis streams use cases.
If you have persistence (either RDB or AOF) turned on, your stream messages will be persisted, hence there's no need for backpressure.
And if you use replicas, you have another level of redudancy.
Backpressure is needed only when Redis does not have enough memory (or enough network bandwidth to the replicas) to hold the messages.
And, honestly, I have never seen this scenario.
Why would you want to ? Unless you run out of memory it is not an issue and each consumer slow and fast can read at their leisure.
Note not using consumer groups just publishing via XADD and readers read via XRANGE via position stored in a key which is closer to Kafka. Using one stream per partition.
Producer can check if the table size gets too big every 1K messages (via XLEN) to slow down if this is an issue and cant you cant throw HW at it 5 nodes with 20 Gig each is pretty easy with the streams spread across the cluster .. Don't understand this should be easy so im probably missing something.
There is also an XADD version that trims the size of the table to ensure you don't over fill with the above but that world require some pretty extreme stuff. For us this is 2 days worth for frequent stuff which sends the latest state and 9 months for others.
Another thing dont store large messages in the stream , use a blob or separate key/ store.

Redis PUB/SUB and high availability

Currently I'm working on a distributed test execution and reporting system. I'm planning to use Redis PUB/SUB as a message queue and message distribution system.
I'm new to Redis, so I'm trying to read as many docs as I can and play around with it. One of the most important topics is high availability. As I said, I'm not an expert, but I'm aware of the possible options - using Sentinel, replication, clustering, etc.
What's not clear for me is how the Pub/Sub feature and the HA options are related each other. What's the best practice to build a reliable messaging system with Redis? By reliable I mean if my Redis message broker is down there should be some kind of a backup node (a slave?) that should be able to take over this role.
Is there a purely server-side solution? Or do I need to create a smart wrapper around the Redis client to handle this? Will a Sentinel-driven setup help me?
Doing pub sub in Redis with failover means thinking about additional factors in the client side. A key piece to understand is that subscriptions are per-connection. If you are subscribed to a channel on a node and it fails, you will need to handle reconnect and resubscribe. Because subscriptions are done at the connection level it is not something which can be replicated.
Regarding the details as to how it works and what you can expect to see, along with ways around it see a post I made earlier this year at https://objectrocket.com/blog/how-to/reliable-pubsub-and-blocking-commands-during-redis-failovers
You can lower the risk surface by subscribing to slaves and publishing to the master, but you would then need to have non-promotable slaves to subscribe to and still need to handle losing a slave - there is just as much chance to lose a given slave as there is a master.
IMO, PUB/SUB is not a good choice, may be disque (comes from antirez, author of the Redis) fits better:
Disque, an in-memory, distributed job queue

Difference between Redis and Kafka

Redis can be used as realtime pub-sub just as Kafka.
I am confused which one to use when.
Any use case would be a great help.
Redis pub-sub is mostly like a fire and forget system where all the messages you produced will be delivered to all the consumers at once and the data is kept nowhere. You have limitation in memory with respect to Redis. Also, the number of producers and consumers can affect the performance in Redis.
Kafka, on the other hand, is a high throughput, distributed log that can be used as a queue. Here any number of users can produce and consumers can consume at any time they want. It also provides persistence for the messages sent through the queue.
Final Take:
Use Redis:
If you want a fire and forget kind of system, where all the messages that you produce are delivered instantly to consumers.
If speed is most concerned.
If you can live up with data loss.
If you don't want your system to hold the message that has been sent.
The amount of data that is gonna be dealt with is not huge.
Use kafka:
If you want reliability.
If you want your system to have a copy of messages that has been sent even after consumption.
If you can't live up with data loss.
If Speed is not a big concern.
data size is huge
Redis 5.0+ version provides the Stream data structure. It could be considered as a log data structure with delivery guarantees. It offers a set of blocking operations allowing consumers to wait for new data added to a stream by producers, and in addition to that, a concept called Consumer Groups.
Basically Stream structure provides the same capabilities as Kafka.
Here is the documentation https://redis.io/topics/streams-intro
There are two most popular Java clients that support this feature: Redisson and Jedis
Redisson provides ReliableTopic object if reliability of delivery is required. https://github.com/redisson/redisson/wiki/6.-distributed-objects/#613-reliable-topic

Zookeeper vs In-memory-data-grid vs Redis

I've found different zookeeper definitions across multiple resources. Maybe some of them are taken out of context, but look at them pls:
A canonical example of Zookeeper usage is distributed-memory computation...
ZooKeeper is an open source Apache™ project that provides a centralized infrastructure and services that enable synchronization across a cluster.
Apache ZooKeeper is an open source file application program interface (API) that allows distributed processes in large systems to synchronize with each other so that all clients making requests receive consistent data.
I've worked with Redis and Hazelcast, that would be easier for me to understand Zookeeper by comparing it with them.
Could you please compare Zookeeper with in-memory-data-grids and Redis?
If distributed-memory computation, how does zookeeper differ from in-memory-data-grids?
If synchronization across cluster, than how does it differs from all other in-memory storages? The same in-memory-data-grids also provide cluster-wide locks. Redis also has some kind of transactions.
If it's only about in-memory consistent data, than there are other alternatives. Imdg allow you to achieve the same, don't they?
https://zookeeper.apache.org/doc/current/zookeeperOver.html
By default, Zookeeper replicates all your data to every node and lets clients watch the data for changes. Changes are sent very quickly (within a bounded amount of time) to clients. You can also create "ephemeral nodes", which are deleted within a specified time if a client disconnects. ZooKeeper is highly optimized for reads, while writes are very slow (since they generally are sent to every client as soon as the write takes place). Finally, the maximum size of a "file" (znode) in Zookeeper is 1MB, but typically they'll be single strings.
Taken together, this means that zookeeper is not meant to store for much data, and definitely not a cache. Instead, it's for managing heartbeats/knowing what servers are online, storing/updating configuration, and possibly message passing (though if you have large #s of messages or high throughput demands, something like RabbitMQ will be much better for this task).
Basically, ZooKeeper (and Curator, which is built on it) helps in handling the mechanics of clustering -- heartbeats, distributing updates/configuration, distributed locks, etc.
It's not really comparable to Redis, but for the specific questions...
It doesn't support any computation and for most data sets, won't be able to store the data with any performance.
It's replicated to all nodes in the cluster (there's nothing like Redis clustering where the data can be distributed). All messages are processed atomically in full and are sequenced, so there's no real transactions. It can be USED to implement cluster-wide locks for your services (it's very good at that in fact), and tehre are a lot of locking primitives on the znodes themselves to control which nodes access them.
Sure, but ZooKeeper fills a niche. It's a tool for making a distributed applications play nice with multiple instances, not for storing/sharing large amounts of data. Compared to using an IMDG for this purpose, Zookeeper will be faster, manages heartbeats and synchronization in a predictable way (with a lot of APIs for making this part easy), and has a "push" paradigm instead of "pull" so nodes are notified very quickly of changes.
The quotation from the linked question...
A canonical example of Zookeeper usage is distributed-memory computation
... is, IMO, a bit misleading. You would use it to orchestrate the computation, not provide the data. For example, let's say you had to process rows 1-100 of a table. You might put 10 ZK nodes up, with names like "1-10", "11-20", "21-30", etc. Client applications would be notified of this change automatically by ZK, and the first one would grab "1-10" and set an ephemeral node clients/192.168.77.66/processing/rows_1_10
The next application would see this and go for the next group to process. The actual data to compute would be stored elsewhere (ie Redis, SQL database, etc). If the node failed partway through the computation, another node could see this (after 30-60 seconds) and pick up the job again.
I'd say the canonical example of ZooKeeper is leader election, though. Let's say you have 3 nodes -- one is master and the other 2 are slaves. If the master goes down, a slave node must become the new leader. This type of thing is perfect for ZK.
Consistency Guarantees
ZooKeeper is a high performance, scalable service. Both reads and write operations are designed to be fast, though reads are faster than writes. The reason for this is that in the case of reads, ZooKeeper can serve older data, which in turn is due to ZooKeeper's consistency guarantees:
Sequential Consistency
Updates from a client will be applied in the order that they were sent.
Atomicity
Updates either succeed or fail -- there are no partial results.
Single System Image
A client will see the same view of the service regardless of the server that it connects to.
Reliability
Once an update has been applied, it will persist from that time forward until a client overwrites the update. This guarantee has two corollaries:
If a client gets a successful return code, the update will have been applied. On some failures (communication errors, timeouts, etc) the client will not know if the update has applied or not. We take steps to minimize the failures, but the only guarantee is only present with successful return codes. (This is called the monotonicity condition in Paxos.)
Any updates that are seen by the client, through a read request or successful update, will never be rolled back when recovering from server failures.
Timeliness
The clients view of the system is guaranteed to be up-to-date within a certain time bound. (On the order of tens of seconds.) Either system changes will be seen by a client within this bound, or the client will detect a service outage.

distributed cluster questions about performance

I'm using 6 servers to make a cluster and they are all disk nodes. I use rabbitmq for collecting log file for our website. Now at the peak hour, the publish rate is about 30k message per second. There are 2 main consumers(hdfs and elasticsearch) and each one need to handle all message, so the delivery rate hit about 60k per second.
In my scenario, a single server can hold 10k delivery rate and I use 6 node to load balance the pressure. My solution is that I created 2 queues on each node. Each message is with a random routing-key(something like message.0, message.1, etc) to distribute the pressure to every node.
What confused me is:
All message send to one node. Should I use a HA Proxy to load balance this publish pressure?
Is there any performance difference between Durable Queues and Transient Queues?
Is there any performance difference between Memory Node and Disk Node? What I know is the difference between memory node and disk node is only about the meta data such as queue configuration.
How can I imrove the performance in publish and delivery codes? I've researched and I know several methods:
disable the confirm mechanism(in publish codes?)
enable HiPE(I've done that and it helped a lot)
For example, input is 1w mps(message per second), there are two consumers to consume all message. Then the output is 2w mps. If my server can handle 1w mps, I need two server to handle the 2w-mps-pressure. Now a new consumer need to consume all message, too. As a result, output hits 3w mps, so I need another one more server. For a conclusion, one more consumer for all message, one more server?
"All message send to one node. Should I use a HA Proxy to load balance this publish pressure?"
This article outlines a number of designs aimed at distributing load in RabbitMQ.
"Is there any performance difference between Durable Queues and Transient Queues?"
Yes, Durable Queues are backed up to disk so that they can be reinstated on server-restart, for example. This adds a nominal overhead, though the actual process occurs asynchronously.
"Is there any performance difference between Memory Node and Disk Node?"
Not that I'm aware of, but that would depend on the machine itself.
"How can I imrove the performance in publish and delivery codes?"
Try this out.