Is it possible to use Redis streams with persistent storage or are streams limited to in memory data only ?
I know that it is possible to use Redis with persistent storage of core data structures but I have been able understand if one can also avail of persistent storage for streams in Redis.
Redis Streams are persisted as any other data type. Streams are a data structure on its own right, a core one in the sense that it is part of Redis core since 5.0.
There is no way to actually persist only some data types. It persists them all if AOF or RDB are set up.
Pub/sub is not persisted at all, but that's because messages in pub/sub exist only while the message is being processed, i.e. being sent to all clients subscribed at that moment.
Here more on What are the main differences between Redis Pub/Sub and Redis Stream?
Related
Redis team introduce new Streams data type for Redis 5.0. Since Streams looks like Kafka topics from first view it seems difficult to find real world examples for using it.
In streams intro we have comparison with Kafka streams:
Runtime consumer groups handling. For example, if one of three consumers fails permanently, Redis will continue to serve first and second because now we would have just two logical partitions (consumers).
Redis streams much faster. They stored and operated from memory so this one is as is case.
We have some project with Kafka, RabbitMq and NATS. Now we are deep look into Redis stream to trying using it as "pre kafka cache" and in some case as Kafka/NATS alternative. The most critical point right now is replication:
Store all data in memory with AOF replication.
By default the asynchronous replication will not guarantee that XADD commands or consumer groups state changes are replicated: after a failover something can be missing depending on the ability of followers to receive the data from the master. This one looks like point to kill any interest to try streams in high load.
Redis failover process as operated by Sentinel or Redis Cluster performs only a best effort check to failover to the follower which is the most updated, and under certain specific failures may promote a follower that lacks some data.
And the cap strategy. The real "capped resource" with Redis Streams is memory, so it's not really so important how many items you want to store or which capped strategy you are using. So each time you consumer fails you would get peak memory consumption or message lost with cap.
We use Kafka as RTB bidder frontend which handle ~1,100,000 messages per second with ~120 bytes payload. With Redis we have ~170 mb/sec memory consumption on write and with 512 gb RAM server we have write "reserve" for ~50 minutes of data. So if processing system would be offline for this time we would crash.
Could you please tell more about Redis Streams usage in real world and may be some cases you try to use it themself? Or may be Redis Streams could be used with not big amount of data?
long time no see. This feels like a discussion that belongs in the redis-db mailing list, but the use case sounds fascinating.
Note that Redis Streams are not intended to be a Kafka replacement - they provide different properties and capabilities despite the similarities. You are of course correct with regards to the asynchronous nature of replication. As for scaling the amount of RAM available, you should consider using a cluster and partition your streams across period-based key names.
I am aware of Redis having persistence option of RDB and AOF which to me is more or less entire redis cache store back-up.
Do we have persistence capability only for selected keys ?
One solution is to have long TTL but that would still be lost in case of a power failure or crash.
My requirement is not to persist entire data from redis but selected keys.
Thanks,
Ashish
No - Redis' data persistence applies to the entire dataset that the server manages, meaning all keys in all numbered databases.
If you want to persist just a bunch of keys, provision a separate Redis database for these and configure its persistency (AOF and/or RDB) accordingly.
Redis can be used as realtime pub-sub just as Kafka.
I am confused which one to use when.
Any use case would be a great help.
Redis pub-sub is mostly like a fire and forget system where all the messages you produced will be delivered to all the consumers at once and the data is kept nowhere. You have limitation in memory with respect to Redis. Also, the number of producers and consumers can affect the performance in Redis.
Kafka, on the other hand, is a high throughput, distributed log that can be used as a queue. Here any number of users can produce and consumers can consume at any time they want. It also provides persistence for the messages sent through the queue.
Final Take:
Use Redis:
If you want a fire and forget kind of system, where all the messages that you produce are delivered instantly to consumers.
If speed is most concerned.
If you can live up with data loss.
If you don't want your system to hold the message that has been sent.
The amount of data that is gonna be dealt with is not huge.
Use kafka:
If you want reliability.
If you want your system to have a copy of messages that has been sent even after consumption.
If you can't live up with data loss.
If Speed is not a big concern.
data size is huge
Redis 5.0+ version provides the Stream data structure. It could be considered as a log data structure with delivery guarantees. It offers a set of blocking operations allowing consumers to wait for new data added to a stream by producers, and in addition to that, a concept called Consumer Groups.
Basically Stream structure provides the same capabilities as Kafka.
Here is the documentation https://redis.io/topics/streams-intro
There are two most popular Java clients that support this feature: Redisson and Jedis
Redisson provides ReliableTopic object if reliability of delivery is required. https://github.com/redisson/redisson/wiki/6.-distributed-objects/#613-reliable-topic
Redis is a database in-memory but persistent on disk meanwhile.
Q1: So I wonder does this mean that when redis server starts, it will automatically load all the data on the disk into memory?
Q2: And when writing data to redis, will it both update in the memory and the disk?
Can anyone please help me answer my two questions?
Q1: So I wonder does this mean that when redis server starts, it will
automatically load all the data on the disk into memory?
Yes, depending on the configuration, Redis performs snapshots of memory to disk and, when Redis is restarted it can take latest snapshot and take it to memory again automatically.
Q2: And when writing data to redis, will it both update in the memory
and the disk?
Redis prioritizes writes on memory and writes to disk are done in a separate thread. The answer then is yes, it writes data to both memory and disk, but it might happen that a server failure may produce a data loss since it's not mandatory to Redis to persist data to disk.
Check official docs about persistence to learn more about the topic.
I have a question that is bugging me quite heavily. What is the Redis pub/sub feature actually used for? I can only think of inter-process communication over TCP (either locally or distributed), however not much else.
Can someone please prove me wrong.
It's an easy way to plug into an event stream, generally between processes or machines. For instance, an user creates a published event. One process handles updating the database from the event, another updates user stats, another global stats, another updates the text search database, etc. They're all loosely coupled by subscribing to the channel. You can add new processes for testing updates and monitoring the system. It's a little different from a message queue in that there's no storing messages until they're processed, but Redis has other structures for those sorts of jobs.
a real use case in my experience.
Lets say you have a web application deployed on 4 different servers(nodes,virtual machines) mostly on your virtual private cloud.
The web application maintains an in memory java map for its static data cache which occasionally changes .
Now every time the data changes in your database you would need all your servers to update there own in memory caches,this is the problem.
one way is to maintain all the static data in redis or any other cache on a separate server and the cache updates based on a scheduler.But here to access the static content which occasionally changes you need a scheduler and a separate cache server like redis or memcached etc. and each server points to this external cache.
Using pubsub of redis here:
all servers subscribe to redis channel and if redis publishes the message when ever there is an update,addition,deletion of the data as a message to all of its subscribers.On receiving the message object and its type of update(ADD,REMOVED,UPDATED) each server updates its in memory static data map.