How to delete a redis stream after success processing

How to delete a redis stream after success processing - redis

I use redis as transport in messenger, I thought that after processing a flow the deletion was automatic but alas not. I do not know how to delete a repeat stream when the processing has been carried out with success.
I use symfony 4.4.latest and redis server 6.0
Thanks

The way to do it is by using XTRIM command.
You can call you process couple of messages you trim the stream to retain only the messages that were not processed. By, calling XLEN you can get the stream size and if you subtract the amount of messages you processed you should be left with the right argument for the XTRIM.

Simple steps:
Read a stream message and keep its ID.
When you are done processing the message use the XDEL command to delete specific entries from the redis stream.
BTW redis streams are not supposed to be used like this, better use pub/sub functionality as #Dudo mentioned. This is a good introduction to redis streams: https://redis.io/topics/streams-intro

Want to add here that in Symfony 5 ?delete_after_ack=true was added.
MESSENGER_TRANSPORT_DSN=redis://localhost:6379/messages?delete_after_ack=true
Why the default in Symfony 5 is false. The default in Symfony 6 is true so Symfony 6 will automatically remove acked message.
See also: https://symfony.com/doc/current/messenger.html#redis-transport
There are also other parameters like delete_after_reject or stream_max_entries which trims the stream. Keep in mind stream_max_entries is trim messages aways which also are not processed. So it should be a high enough value.

Related

Kafka Streams Fault Tolerance with Offset Management in Parellel

Description :
I have one Kafka Stream application which is consuming from a topic.
The events are coming at high volumes.
KafkaStream will consume the events as a terminal operation and club the events in a bunch say 1000 events and writes it to AWS S3.
I have threads that are writing to s3 in parallel after consuming events from Kafka topic.
Not using kafka-connector-s3 due to some business application logics and processings.
Problem ::
I want the application to be fault-tolerant don't want to loose messages.
--> CRASH SCENARIO
Suppose the application has 10 threads all are running and trying to put the events in S3, and a crash happens, in that case, since the KafkaStream has ( enable.auto.commit = false )and we cannot commit the offset manually and all the threads have consumed messages from Kafka topic.
In this case, KafkaStreams has already committed the offset after reading but it could not have processed the events to S3.
I need a mechanism so that I can be sure of that what was the last offset till the events get written to the S3 file successfully.
And In crash scenarios, how should I deal with this and how to manage the Kafka offsets in Kafka Streams as I am using say 10 threads. What if some failed to write to s3 and some are passed. How do I ensure the ordering of offset getting successfully processed to s3 or not?
Let me know if I am not clear to describe my problem statement.
Thanks!

I can assure you that enable.auto.commit is set to false in Kafka Streams. The Javadocs at https://kafka.apache.org/26/javadoc/org/apache/kafka/streams/StreamsConfig.html state
"enable.auto.commit" (false) - Streams client will always disable/turn off auto committing
You are right that Kafka Streams will automatically commit in more or less regular intervals. However, Kafka Streams waits until records are processed before committing the corresponding offsets. That means you would at least get at-least-once guarantees and not lose messages.
As far as I understand your application, your terminal processor does not block until the records are sent to S3. That means, Kafka Streams cannot know when the sending is completed. Kafka Streams just sees that the terminal processor completed its processing and then -- if the commit interval elapsed -- it commits the offsets.
You say
Not using kafka-connector-s3 due to some business application logics and processings.
Could you put the business application logic in the Kafka Streams application, write the results to a Kafka topic with operator to(), and then use the kafka-connector-s3 to send the messages in that topic to S3?
I am not a connect expert, but I guess that would make sure that messages are not lost and would make your implementation simpler.

Using kafka-stream ,you could aggragate 5000 messages from source topic to one big message and send the big one to another topic like middle_topic. You need another proceccor source from the middle_topic and sink to s3 using s3-connector.

Is there a way to publish a message as record is added to a key in redis?

Here goes my use case:
We use redis appender to write our log messages to redis. These messages have MDC data (Trace Id) to track individual requests. We want other application to subscribe to the trace id and get all the messages logged (As they are inserted). Can we have some sort of a trigger that can publish the message as it is being added?
The appender does not provide us with the ability to publish to a channel and we don't want to create a custom publisher for this use case. I am sure this use case is not unique and am hoping for a recommendation. basically looking for something like a trigger that rdbms have on insert.

Redis Keyspace Notifications sound like they might fit your use case: https://redis.io/topics/notifications
You can subscribe to a variety of notification types and I would guess that one of those would fit your need.

Consider using the Stream (v5) data type for storing your log, and having consumers consume that stream for incoming updates.

Redis keyspace notifications - get values(small size) of set operations

I'm working on creating DB with Redis.
One of my recruitments is that all the clients in the system will be able to listen to set events and get information about both key and value change.
I know that publishing value may be big(512 MB) but I know that in my system the size of value will not be more than 100 chars.
I have 3 possible solutions and I wonder which one will be better or consider other solutions:
1) After each set operation client will also publish it (PUB/SUB)
2)Edit setGenericCommand function to publish the value as well and use keyspace binding.
3)After client receive keyspace notification it will get the value with get operation.
I would like to understand which approach will be better?
Thank you!

So, 1st and foremost, remember that PubSub is at-most-once delivery. If you really need to process every change in the client, you should consider a more resilient way to do so.
That said, assuming you're ok with PubSub's promises, 1 is the simplest and I'd go with that. At most, I'd provide the clients with a Lua wrapper that combines the SET and PUBLISH commands. This, of course, removes the need to actually listen to Keyspace notifications as you basically implementing it yourself.
2 means hacking Redis, which is great but means you'll have to maintain your own which is meh--;
3 is also simple enough, but with 1 you get away with a single round trip instead of 2.
Another (4) approach is to write a custom module, but IMO too complex for this need. Go with 1 and Lua, and may the force be with you.

Redis publish/subscribe: see what channels are currently subscribed to

I am currently interested in seeing what channels are subscribed to in a Redis pub/sub application I have. When a client connects to our server, we register them to a channel that looks like:
user:user_id
The reason for this is I want to be able to see who's "online". I currently blindly fire off messages to a channel without knowing if a client is online since it's not critical that they receive these types of messages.
In an effort to make my application smarter, I'd like to be able to discover if a client is online or not using the pub/sub API, and if they are offline, cache their messages to a separate redis queue which I can push to them when they get back online.
This does not have to be 100% accurate, but the more accurate it is, the better. I'm assuming a generic key does not get created when a channel gets subscribed to, so I cannot do something as trivial as:
redis-cli keys user* to find all online users.
The other strategy I've thought of is just maintaining my own Redis Set whenever a user published or removes themselves from a channel (which the client automatically handles when they hop online and close the app). That would be an additional layer of complexity that I need to manage and I'm hoping there is a more trivial approach with the data that's already available.

As of Redis 2.8 you can do:
PUBSUB CHANNELS [pattern]
The PUBSUB CHANNELS command has O(N) complexity, where N is the number of active channels.
So in your case:
redis-cli PUBSUB CHANNELS user*
would give you want you want.

There is currently no command for showing what channels "exist" by way of being subscribed to, but there is and "approved" issue and a pull request that implements this.
https://github.com/antirez/redis/issues/221
https://github.com/antirez/redis/pull/412
Due to the nature of this call, it is not something that can scale, and is thus a "DEBUG" command.
There are a few other ways to solve your problem, however.
If you have reason to believe that a channel may be subscribed to, you can send it a message and look at the result. The result is the number of subscribers that got the message. If you got 0, you know that they're not there.
Assuming that your user_ids are incremental, you might be interested in using SETBIT to set a 1 or 0 to a user's offset bit to track presence. You can then do cool things like the new BITCOUNT to see how many users are online, and GETBIT to determine if a specific user is online.
The way I have solved your problem more specifically in the past is by signaling a subscription manager that I have subscribed to a channel. The manager then "pings" the channel by sending a blank message to confirm that there is a subscriber, and occasionally pings the channel thereafter to determine if the user is still online. Not ideal, but better than using DEBUG CHANNELS in production.

From version 2.8.0 redis has a pubsub command that would help in this case:
http://redis.io/commands/pubsub
Remark: currently the state of 2.8.0 is not stable yet (RC2)

I am unaware of any specific way to query what channels are being subscribed to, and you are correct that there isn't any key created when this happens. Also, I wouldn't use the KEYS command in production anyway, as it's really a debugging command.
You have the right idea about using a set to add the user when they're online, and then query this with SISMEMBER <set> <user_id> to determine if the messages should be sent to them or added to a Redis list for processing once they do come online.
You will need to figure out when a user logs off so you can remove them from the list of online users, but I don't know enough about your system to know exactly how you would go about that.
If the connected clients have the ability to send a message back to inform the server that the message(s) were consumed, you could use this to keep track of which messages should be stored for later retrieval.
Cheers,
Mike

* PUBSUB NUMSUB [channel-1 ... channel-N]
Returns the number of subscribers (not counting clients subscribed to patterns) for the specified channels.
https://redis.io/commands/pubsub

Redis Pub/Sub with Reliability

I've been looking at using Redis Pub/Sub as a replacement to RabbitMQ.
From my understanding Redis's pub/sub holds a persistent connection to each of the subscribers, and if the connection is terminated, all future messages will be lost and dropped on the floor.
One possible solution is to use a list (and blocking wait) to store all the message and pub/sub as just a notification mechanism. I think this gets me most of the way there, but I still have some concerns about the failure cases.
what happens when a subscriber dies, and comes back online, how should it process all it's pending messages?
when a malformed message comes though the system, how do you handle those exceptions? DeadLetter Queue?
is there a standard practice to implementing a retry policy?

When a subscriber (consumer) dies, your list will continue to grow until the client returns. Your producer could trim the list (from either side) once it reaches a specific limit, but that is something you would need to handle at the application level. If you include a timestamp within each message, your consumer can then act on the age of a message, assuming you have application logic you want to enforce on message age.
I'm not sure how a malformed message would enter the system, as the connection to Redis is usually TCP with the its integrity assurances. But if this happens, perhaps due to a bug in message encoding at the producer layer, you could provide a general mechanism for handling errors by keeping a queue-per-producer that received consumer's exception messages.
Retry policies will depend greatly on your application needs. If you need 100% assurance that a message has been received and processed, then you should consider using Redis transactions (MULTI/EXEC) to wrap the work done by a consumer, so you can ensure that a client doesn't remove a message unless it has completed its work. If you need explicit acknowlegement, then you could use an explicit ACK message on a queue dedicated to the producer process(es).
Without knowing more about your application needs, it's hard to know how to choose wisely. Generally, if your messages require full ACID protection, then you probably also need to use redis transactions. If your messages are only meaningful when they are timely, then transactions may not be needed. It sounds as though you can't tolerate dropped messages, so your approach of using a list is good. If you need to implement a priority queue for your messages, you can use the sorted set (the Z-commands) to store your messages, using their priority as the score value, along with a polling consumer.

If you want a pub/sub system where subscribers won't lose messages when they die, consider using Redis Streams instead of Redis Pub/sub.
Redis Streams have their own architecture and pros/cons to Redis Pub/sub. With Redis Streams, a subscriber can issue the command:
the last message I received was X, now give me the next message;
if there is no new message, then wait for one to arrive.
Antirez's article linked above is a good intro to Redis streams with more info.

What I did is use a sorted set using the timestamp as the score and the key to the data as the member value. I use the score from the last item to retrieve the next few ones and then get the keys. Once the work is done I wrap both the zrem and the del in a MULTI/EXEC transaction.
Essentially what Edward said, but with the twist of storing the keys in the sorted set, as my messages can be pretty big.
Hope this helps!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas