Unable to read message from Redis stream after increasing number of consumers

Unable to read message from Redis stream after increasing number of consumers - redis

We are using Redis 7.0.4
We have created a stream with a consumer group. One program is writing to that stream and another multi-threaded program is reading from the stream with auto-acknowledge. The program that reads, also deletes the message after processing it.
While messages were being written, we increased the number of consumers in the program that was reading. Originally it was 16 consumers, and we increased it to 32.
At the end of everything, exactly 16 messages are still present in the stream even when all writes were finished.
Now, we are unable to read those messages. We have used offset as 0-0 and ID as '>', we even hardcoded the oldest ID and tried from a new consumer group. But those messages are not delivered to the consumer.
We have checked that no messages are pending in the original consumer group.
What are we doing wrongly ?

Related

RabbitMQ more messages than expected on fixed size queue

I have a publisher that sends messages to a consumer that moves a motor.
The motor has a work queue which I cannot access, and it works slower than the rate of the incoming messages, so I'm trying to control the traffic on the consumer.
To keep updated and relevant data coming to the motor without the queue filling up and creating a traffic jam, I set the RabbitMQ queue size limit to 5 and basicQos to 1.
The idea is that the RabbitMQ queue will drop the old messages when it is filled up, so the newest commands are at the front of the queue.
Also by setting basicQos to 1 I ensure that the consumer doesn't grab all messages from the queue and bombards the motor at once, which is exactly what i'm trying to avoid since I can't do anything once the command was sent to the motor.
This way the consumer takes messages from the queue one by one, while new messages replace the old ones on the queue.
Practically this moves the bottleneck to the RabbitMQ queue instead of the motor's queue.
I also cannot check the motor's work queue, so all traffic control must be done on the consumer.
I added messageId and tested, and found out many messages are still coming and going long after the publisher is being shut down.
I'm expecting around 5 messages after shutdown since that's the size of the queue, but i'm getting hundreds.
I also added a few seconds of sleep inside the callback to make sure this isn't the robot queue that's acting up, but i'm still getting many messages after shutdown, and I can see in the logs the callback is being called every time so it's definitely still getting messages from somewhere.
Please help.
Thanks.

Moving the acknowledgment to the end of the callback solved the problem.
I'm guessing that by setting basicQos to 1 it did execute the callback for each message one after another, but in the background it kept grabbing messages from the queue.
So even when the publisher was shutdown, the consumer still had messages that were taken from the queue in it, and those messages were the ones that I saw being executed.

Camel RabbitMQ connector reads thousands of message before using them

In my app, we are using a Camel route to read messages from a RabbitMQ queue.
The configuration looks like that :
from("rabbitmq:myexchange?routingKey=mykey&queue=q")
The producer can send 50k messages within a few minutes, and each message can take 1 second or more to process.
What I can see is that that ALL messages are consumed very fast, but the processing of this messages can take many hours. Many hours of processing is expected but does that mean that the 50k messages are stored in memory ? If so, I would like to disable this behavior because I don't want to loose messages when the process goes down ... Actually, we are loosing most of the messages even when the process stays up, which is even worse. It looks like the connector is not designed to handle so many messages at once, but I cannot say if it is because of the connector himself or because we did not configure it properly.
I tried with the option autoAck :
from("rabbitmq:myexchange?routingKey=mykey&queue=q&autoAck=false")
This way the messages are rollbacked when something goes wrong but keeping 50k messages unacknowledge at the same time does not seem to be a good idea anyway...

There are a couple of things that i will like to share.
AutoAck - Yes in case when you want to process the message ( after receiving it ) you should set AutoAck to False and explicitly acknowledge the message once it is processed.
Setting Consumer PreFetch - You need to fine tune the PreFetch size , the pre fetch size is the max number of messages which RabbitMQ will present to the consumer in a go i.e. at the most your total un-acknowledged message count will be equal to the Pre Fetch size. Depending on your system if every message is critical you can set the pre fetch size to 1 , if you have multi threaded model of processing messages you can set the pre fetch size to match the number of threads where each thread processes one message and likewise.
In a way it acts like a buffer architecturally. If your process goes down while processing those message any message which was un acked before the process went down will still be there in the queue and the consumer will get it again for processing.

Pushing messages to new subscribers

I am creating a bulk video processing system using spring-boot. Here the user will provide all the video related information through an xlsx sheet and we will process the videos in the backend. I am using the Rabbitmq for queuing up the request.
Let say a user has uploaded a sheet with 100 rows,then there will be 100 messages in the Rabbitmq queue. In the back-end, we are auto-scaling the subscribers (servers). So we will start with one subscriber-only and based on the load (number of messages in the queue) we will scale up to 15 subscribers.
But our producer is very fast and it allocating all the messages to our first subscriber (before other subscribers are coming up) and all our new subscriber are not getting any messages from the queue.
If all the subscribers are available before producer started pushing the messages then it is allocating the messages to all servers.
Please provide me a solution of how can our new subscribers pull the messages from the queue that were produced earlier.

You are probably being affected by the listener container prefetchCount property - it defaults to 250 with recent versions (since 2.0).
So the first consumer will get up to 250 messages when it starts.
It sounds like you should reduce it to a small number, even all the way down to 1 so only one message is outstanding at each consumer.

message queue for processing streams of data

I have streams of input data that I process. Each stream is sent in chunks of data. I can only process the N+1st chunk of data of a stream i after I finished processing the Nth chunk of data of the same stream i. Therefore, parallelization can happen by processing multiple streams at once, but I can never split one stream on multiple workers.
Chunks of one stream are added to the queue in order (although chunks from several streams can be added at the same time).
Most message queues, like RabbitMQ, guarantee ordered delivery when multiple workers operate on one queue. However, to achieve the behaviour I would like, I'd need to restrict the number of workers to 1 for each queue, so that the next chunk is always only processed when the previous chunk was finished. To parallelize, I could create a queue for each stream, or a queue for each worker, and have another process that redirects the streams to the worker queues. In fact, the one-queue-per-worker approach is what I do right now, using RabbitMQ's consistent-hashing and shovels. Of course, in terms of load balancing and dynamic scaling of the number of workers, that is far from ideal.
I've read a lot about Kafka, and how it is designed for time-series data (like logs). Yet, I couldn't figure out how I could apply Kafka - or any other message queue out there - to solve my problem.
I would greatly appreciate some hints on how to best use a message queue for my problem.

You could use Kafka, but you'd have to use some stream identification to hash messages on the Producer side, so that messages from one stream always go to the same partition.
Then, on the Consumer side, you'd have to use the low-level consumer to spawn as much consuming threads as you have partitions, where each thread would consume from a single partition.
That would mean that you always process messages in order within each of your streams.
I haven't yet checked out how Kafka 0.9 Producer works, but there were some changes, so you should probably look into those if you want to use the latest version.

Why don't you push the next chunk only after receiving the delivery acknowledgement of the former chunk to the worker? Or some kind of a flag that the former chuck is processed by the worker, flag is set to true & then push the next chunk.
If you need to parallelize work create several queues with unique routing keys, based on routing keys push the chunks to respective queues. And have separate flags for every routing key.

Apache Storm Join Pattern - At least once

I'm implementing a bolt in Storm that receives messages from a RabbitMQ spout (https://github.com/ppat/storm-rabbitmq).
Each event I have to process in Storm arrives as two messages from Rabbit so I have a fieldsGrouping on the bolt so that the two messages arrive in the same bolt.
My first approach I would:
Receive the first Tuple and save the message in Memory
Ack the first tuple
When second Tuple arrived fetch the first from Memory and emit a new tuple anchored to the second from the spout.
This worked but I could loose messages if a worker died because I would ack the first tuple before getting the second and processing.
I changed this to:
Receive the first Tuple and save it in Memory
When second Tuple arrived fetch the first from Memory, emit a new tuple anchored to both input tuples and ack both input tuples.
The in-memory cache is a Guava cache with time expiration and when a Tuple is evicted due to timeout I will fail() it in the topology so that it gets reprocessed latter.
This seemed to work but when I did some tests I got to a situation where the system would stop fetching messages from the Rabbit Queue.
The prefetch on the queue is set to 5, and spout with setMaxSpoutPending at 7. In the Rabbit interface I see 5 Unacked messages.
In the storm logs I see the same Tuples being evicted from the cache over and over again.
I understand that the problem is that spout will only fetch 5 messages that are all the first part of a pair. I can increase the prefetch but that is no warranty that this will not happen in production.
So my question is: How to implement a join while handling these problems in Storm?

Storm does not provide a good solution for this... What you would need is a reliable storage that buffers the first tuple (ie, a stateful operator). Thus, you could ack the first tuple immediately and recover the state after a failure.
As far as I know, Trident supports some state handling. But I never used it.
As a second alternative, you could use a distributed key-value store (like Casandra) as buffer. Of course, this would be a hand-written solution, ie, you need to code all Casandra interactions by yourself.
Last but not least, you could switch to a stream processing system that does support stateful operators like Apache Flink. (disclaimer: I am a committer at Flink)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas