we are developing data pipeline app using Kafka, storm and redis. Realtime events from different systems will be published to Kafka and storm do the event processing based on rules configured. State is managed in redis.
we have a requirement to implement different WAIT_TIME before processing for different events. we are looking at following options.
we initially looked at storm windowing [sliding or tumbling window] but provides option only to configure fixed intervals. we need varying wait_time based on rules
we are exploring other options of storing the events in a redis cache for varying duration [TTL] and once each events are evicted we need to have a callback back to storm to process it.
Do redis support callback on eviction ? Is there a better way to do this with storm and redis ?
we resolved the problem by calculating the expiry time for each streaming events & storing the events in redis against expiry time [expiry as the key] , on top storm scheduler will query the events which qualifies for eviction and process it.
Related
I try to use redis for pub/sub by two systems and one of them is now ours (other company maintain it). I would like to have time stamp when I publish something in redis channel. Can someone help with this idea?
I already user redis log with debug level of information (the highest one) - but there is no time info for pub/sub messages in the log.
I tested redis monitor: redis-cli monitor . It's exactly what I want, but it decrease the performance of the system by 50%.
The only way is to implement the time log by myself - may be SET some time info before pub command in redis? This will put in redis local time and it will be slightly before pub.
You cannot achieve the goal with pubsub.
You might want to try Redis Streaming. For each streaming message, the first part of the automatically generated ID is the Unix timestamp when the ID is generated, i.e. the message is received by Redis.
I have a question related to a tricky situation in an event-driven system that I want to ask for advise. Here is the situation:
In our system, I use redis as a memcached database, and kafkaa as message queues. To increase the performance of redis, I use lua scripting to process data, and at the same time, push events into a blocking list of redis. Then there will be a process to pick redis events in that blocking list and move them to kafka. So in this process, there are 3 steps:
1) Read events from redis list
2) Produce in batch into kafka
3) Delete corresponding events in redis
Unfortunately, if the process dies between 2 and 3, meaning that after producing all events into kafka, it doesn't delete corresponding events in redis, then after that process is restarted, it will produce duplicated events into kafka, which is unacceptable. So does any one has any solution for this problem. Thanks in advance, I really appreciate it.
Kafka is prone to reprocess events, even if written exactly once. Reprocessing will almost certainly be caused by rebalancing clients. Rebalancing might be triggered by:
Modification of partitions on a topic.
Redeployment of servers and subsequent temporary unavailabilty of clients.
Slow message consumption and subsequent recreation of client by the broker.
In other words, if you need to be sure that messages are processed exactly once, you need to insure that at the client. You could do so, by setting a partition key that ensures related messages are consumed in a sequential fashion by the same client. This client could then maintain a databased record of what he has already processed.
Redis team introduce new Streams data type for Redis 5.0. Since Streams looks like Kafka topics from first view it seems difficult to find real world examples for using it.
In streams intro we have comparison with Kafka streams:
Runtime consumer groups handling. For example, if one of three consumers fails permanently, Redis will continue to serve first and second because now we would have just two logical partitions (consumers).
Redis streams much faster. They stored and operated from memory so this one is as is case.
We have some project with Kafka, RabbitMq and NATS. Now we are deep look into Redis stream to trying using it as "pre kafka cache" and in some case as Kafka/NATS alternative. The most critical point right now is replication:
Store all data in memory with AOF replication.
By default the asynchronous replication will not guarantee that XADD commands or consumer groups state changes are replicated: after a failover something can be missing depending on the ability of followers to receive the data from the master. This one looks like point to kill any interest to try streams in high load.
Redis failover process as operated by Sentinel or Redis Cluster performs only a best effort check to failover to the follower which is the most updated, and under certain specific failures may promote a follower that lacks some data.
And the cap strategy. The real "capped resource" with Redis Streams is memory, so it's not really so important how many items you want to store or which capped strategy you are using. So each time you consumer fails you would get peak memory consumption or message lost with cap.
We use Kafka as RTB bidder frontend which handle ~1,100,000 messages per second with ~120 bytes payload. With Redis we have ~170 mb/sec memory consumption on write and with 512 gb RAM server we have write "reserve" for ~50 minutes of data. So if processing system would be offline for this time we would crash.
Could you please tell more about Redis Streams usage in real world and may be some cases you try to use it themself? Or may be Redis Streams could be used with not big amount of data?
long time no see. This feels like a discussion that belongs in the redis-db mailing list, but the use case sounds fascinating.
Note that Redis Streams are not intended to be a Kafka replacement - they provide different properties and capabilities despite the similarities. You are of course correct with regards to the asynchronous nature of replication. As for scaling the amount of RAM available, you should consider using a cluster and partition your streams across period-based key names.
I have a requirement where i am pushing my keys to redis with some expiration time. Also have a subscriber for listening key expiration events and then have a callback to my other system which can perform some business rules on it. Is it a good design to have faith in redis pub-sub for this usecase?
Average TTL for keys will be in range ~15 minutes.
Using other design will make me having a scheduler/cron(every minute) or some polling system.
Yes, I have been using Redis pub/sub for exactly the same use case in production and has been working quite well for me without any issues.
Redis can be used as realtime pub-sub just as Kafka.
I am confused which one to use when.
Any use case would be a great help.
Redis pub-sub is mostly like a fire and forget system where all the messages you produced will be delivered to all the consumers at once and the data is kept nowhere. You have limitation in memory with respect to Redis. Also, the number of producers and consumers can affect the performance in Redis.
Kafka, on the other hand, is a high throughput, distributed log that can be used as a queue. Here any number of users can produce and consumers can consume at any time they want. It also provides persistence for the messages sent through the queue.
Final Take:
Use Redis:
If you want a fire and forget kind of system, where all the messages that you produce are delivered instantly to consumers.
If speed is most concerned.
If you can live up with data loss.
If you don't want your system to hold the message that has been sent.
The amount of data that is gonna be dealt with is not huge.
Use kafka:
If you want reliability.
If you want your system to have a copy of messages that has been sent even after consumption.
If you can't live up with data loss.
If Speed is not a big concern.
data size is huge
Redis 5.0+ version provides the Stream data structure. It could be considered as a log data structure with delivery guarantees. It offers a set of blocking operations allowing consumers to wait for new data added to a stream by producers, and in addition to that, a concept called Consumer Groups.
Basically Stream structure provides the same capabilities as Kafka.
Here is the documentation https://redis.io/topics/streams-intro
There are two most popular Java clients that support this feature: Redisson and Jedis
Redisson provides ReliableTopic object if reliability of delivery is required. https://github.com/redisson/redisson/wiki/6.-distributed-objects/#613-reliable-topic