I have a set of Camel routes configured to read and write to RabbitMQ queues, more or less like this:
from("rabbitmq:$rabbitMQVhost?connectionFactory=#customConnectionFactory&queue=${it.rabbitMQQueue}&routingKey=${it.rabbitMQQueue}&SOME_MORE_PROPERTIES")
.log("Read message from queue ${it.rabbitMQQueue}")
.routeId(it.rabbitMQQueue)
.noAutoStartup()
.bean(it.rabbitMQBean)
.choice()
.`when`(PredicateBuilder.and(simple("$myCondition"), isNotNull(body())))
.split(body())
.toD("rabbitmq:$rabbitMQVhost?connectionFactory=#customConnectionFactory&queue=${it.rabbitMQQueueDestination}&autoDelete=false&routingKey=${it.rabbitMQQueueDestination}&bridgeEndpoint=true")
.endChoice()
.otherwise()
end()
Where SOME_MORE_PROPERTIES is basically autoDelete=false&autoAck=false and some message prefetch settings.
My ConnectionFactory is a org.springframework.amqp.rabbit.connection.CachingConnectionFactory.
Whenever a message comes in on my source queue, a thread is started to process it; however, after the processing is completed it hangs in WAIT state, never being released or terminated, so my application memory saturates after a while and there's nothing the garbage collector can do about it.
After some time running, my application is basically in this state:
If I manually restart the routes, the threads are terminated and the memory released.
Is there something I'm doing wrong in my routes configuration that is preventing the threads from terminating properly?
I'd like to avoid having to write a quartz job to restart the routes every once in a while.
Edit: I also recently updated from Camel 2.24.0 to the latest RC for Camel 3, but the issue is still happening.
Ok so it turns out that the threads being there in WAIT state were supposed to be there, as by default the threadpool size for Camel consumers is 10 (and in fact, I had at most 10 * my number of routes) threads.
Now, configuring the thread pool size is something that can be done, but it's not as easy as it seems, as there are several different ways to do so.
What fixed this for me is to set the threadPoolSize parameter in the rabbitMQ URI:
from("rabbitmq:$rabbitMQVhost?threadPoolSize=5&connectionFactory=#customConnectionFactory&queue=${it.rabbitMQQueue}
By doing so, the number of threads in WAIT state after several messages are processed on all routes is, as expected, 5 * my number of routes, which is better in my case: I don't have any big concurrency requirement but I have a significant number of routes, and having 10 hanging threads, with their memory footprint, for each route was draining my memory pretty fast.
Leaving this here as it looks like there's not much documentation around this topic and I had to bang my head for days on it.
Related
We have been trying to use RabbitMQ to transfer data from Project A to Project B.
We created a producer who takes the data from Project A and puts it in a queue, and that was relatively easy. Then, create a k8s pod for Project B, which listens to the appropriate queue with the ConsumerMixin of kombu.
Overall, the integration was reasonable and straightforward. But when we started to process long messages, we noticed that they were coming back into the queue repeatedly.
After research, we found out that whenever the processing of the message takes more than 20 seconds, the message showed up in the queue again, even though the processing was successful.
The source of this issue lies with the heartbeat of RabbitMQ. We set the heartbeat for 10 seconds, and the RabbitMQ checks the connection twice before it kills it. However, because the process of the callback takes more than 20 seconds, and the .ack() (acknowledge) of the message happens at the end of the callback (to ensure it was successful), the heartbeat is being blocked by the process of this message (as described here: https://github.com/celery/kombu/issues/621#issuecomment-251836611).
We have been trying to find a workaround with Threading, to process the message on a different thread and avoid the block of the heartbeat, but it didn't work. Also, it feels like we were trying to hack things and not solve the problem.
So my question here is if there is a proper workaround to handle this situation, or what alternatives do we have? RabbitMQ seemed like the right choice since we use it in standalone projects with Celery, and it is also recommended on the internet.
I am struggling to find proper setting to delay timeout for workers in RabbitMQ.
By default prefetchCount since the version 2.0 are set to 250 and exactly this amount of messages are being received and processed.
I would like to keep workers busy, until they clear up an entire queue (lets say 10k messages).
I can manipulate this number manually, such as changing default limit or assigning more threads resulting in multiplying default number.
Results are always the same. Once the number is reached, workers stop their job and application finish its execution
o.s.a.r.l.SimpleMessageListenerContainer : Successfully waited for workers to finish.
I would like them to finish when the queue is empty. Any ideas?
The logger.info("Successfully waited for workers to finish."); happens only in one place - doShutdown(). And this one is called from the shutdown(), which is called from the destroy() or stop().
I somehow think that you exit from your application by some reason. You just don't block the main() to work permanently.
Please, share a simple project we can play with.
In the console pane rabbitmq one day I had accumulated 8000 posts, but I am embarrassed that their status is idle at the counter ready and total equal to 1. What status should be completed at the job, idle? In what format is registered x-pires? It seems to me that I had something wrong =(
While it's difficult to fully understand what you are asking, it seems that you simply don't have anything pulling messages off of the queue in question.
In general, RabbitMQ will hold on to a message in a queue until a listener pulls it off and successfully ACKs, indicating that the message was successfully processed. You can configure queues to behave differently by setting a Time-To-Live (TTL) on messages or having different queue durabilities (eg. destroyed when there are no more listeners), but the default is to play it safe.
What is "GridInterceptingMessageHandler"? I did a search and I can find no mention of this on nservicebus.com. Also, I see the samples have the line:
.LoadMessageHandlers(First<GridInterceptingMessageHandler>.Then<SagaMessageHandler>())
What does that do exactly?
If you look at the source and its documentation you'll see the following:
Intercepts all messages, not allowing any through if the endpoint has had its number of worker threads reduced to zero.
GridInterceptingMessageHandler
NSB allows you to dynamically tune the number of work threads and endpoint is using to process messages. If the number of work threads has been reduced to zero, the endpoint becomes disabled and will not continue to process messages. The tuning of threads is useful if you would like to increase the speed of message processing(assuming everything else will scale as well) while not having to restart the endpoint.
This is especially helpful if you want to slowing drain the system of messages so that you can perform upgrades or other maintenance duties. By default this is wired up for you, you would only reference it if you decided to override how the message handlers are loaded(as in the example).
What is the optimal way to configure/code NServiceBus to delay retrying messages?
In its default configuration retry happens almost immediately up to the number of attempts defined in the configuration file. I'd ideally like to retry again after an hour, etc.
Also, how does HandleCurrentMessageLater() work? What does the Later aspect refer to?
The NSB retries is there to remedy temporary problems like deadlocks etc. Longer retries is better handled by creating another process that monitors the error queue and puts them back into to the source queue at the interval you like. Take a look at the ReturnToSourceQueue.exe that comes with NSB for reference.
Edit: NServiceBus now supports this , we call it Second Level Retries, see http://docs.particular.net/ for more details
Here is a blog post on why NServiceBus doesn't include a retry delay that I wrote after asking Udi this very same question in his distributed systems architecture course:
NServiceBus Retries: Why no back-off delay?
And here is a discussion thread covering some of the points involved in building an error queue monitor/retry endpoint:
http://tech.groups.yahoo.com/group/nservicebus/message/10964
As far as HandleCurrentMessageLater(), all that does is puts the current message back at the end of the queue. If there are no other messages waiting, it's going to be processed again immediately.
As of NServiceBus 3.2.1, they provide an out of the box solution to handle back off delays in the event of consecutive message failures. The previously existing retry mechanism still retries failures without a delay to handle cases like Database deadlocks, quickly self healing network issues, etc.
Once a message has been retried the configured number of times, the message is moved to a "Second Level Retry" queue. This queue, as configured below, will retry after a 10, 20, and 30 second delay, then the message will be moved to the configured error queue. You're free to change these values to something that better suites your environment.
You can also check out this link:
http://docs.particular.net/nservicebus/second-level-retries