No of consumers getting reduced as time is passed in Torquebox - ruby-on-rails-3

I have a rails app which is on torquebox. I am using processors to some background jobs. I have alloted 4 workers to that processor.
queues:
/queue/company:
messaging:
/queue/company:
CompanyWorker:
concurrency: 4
The CompanyWorker is doing a call to some other sites. It may raise an exception but I have caught inside by worker itself. But as I noticed my log I have seen that no. of threads/workers are reduced as time passes. After 10-15 hours only one thread/processor is working. How do I stop this from happening and keep all 4 workers/processors alive.
After Some hours I only see
22:29:40,945 INFO [stdout] (Thread-124 (HornetQ-client-global-threads-1460048766))
only thread 124 doing its job,
And after few hours I need to restart the server to get all 4 processors working

Make sure all your processors are completing, i.e. not hanging indefinitely. Also make sure you're catching Throwable so nothing thrown escapes.

Related

RabbitMQ + kombu - A long callback blocks the heartbeat leading to aborting the connection

We have been trying to use RabbitMQ to transfer data from Project A to Project B.
We created a producer who takes the data from Project A and puts it in a queue, and that was relatively easy. Then, create a k8s pod for Project B, which listens to the appropriate queue with the ConsumerMixin of kombu.
Overall, the integration was reasonable and straightforward. But when we started to process long messages, we noticed that they were coming back into the queue repeatedly.
After research, we found out that whenever the processing of the message takes more than 20 seconds, the message showed up in the queue again, even though the processing was successful.
The source of this issue lies with the heartbeat of RabbitMQ. We set the heartbeat for 10 seconds, and the RabbitMQ checks the connection twice before it kills it. However, because the process of the callback takes more than 20 seconds, and the .ack() (acknowledge) of the message happens at the end of the callback (to ensure it was successful), the heartbeat is being blocked by the process of this message (as described here: https://github.com/celery/kombu/issues/621#issuecomment-251836611).
We have been trying to find a workaround with Threading, to process the message on a different thread and avoid the block of the heartbeat, but it didn't work. Also, it feels like we were trying to hack things and not solve the problem.
So my question here is if there is a proper workaround to handle this situation, or what alternatives do we have? RabbitMQ seemed like the right choice since we use it in standalone projects with Celery, and it is also recommended on the internet.

Apache Camel RabbitMQ leaving behind threads in WAIT state

I have a set of Camel routes configured to read and write to RabbitMQ queues, more or less like this:
from("rabbitmq:$rabbitMQVhost?connectionFactory=#customConnectionFactory&queue=${it.rabbitMQQueue}&routingKey=${it.rabbitMQQueue}&SOME_MORE_PROPERTIES")
.log("Read message from queue ${it.rabbitMQQueue}")
.routeId(it.rabbitMQQueue)
.noAutoStartup()
.bean(it.rabbitMQBean)
.choice()
.`when`(PredicateBuilder.and(simple("$myCondition"), isNotNull(body())))
.split(body())
.toD("rabbitmq:$rabbitMQVhost?connectionFactory=#customConnectionFactory&queue=${it.rabbitMQQueueDestination}&autoDelete=false&routingKey=${it.rabbitMQQueueDestination}&bridgeEndpoint=true")
.endChoice()
.otherwise()
end()
Where SOME_MORE_PROPERTIES is basically autoDelete=false&autoAck=false and some message prefetch settings.
My ConnectionFactory is a org.springframework.amqp.rabbit.connection.CachingConnectionFactory.
Whenever a message comes in on my source queue, a thread is started to process it; however, after the processing is completed it hangs in WAIT state, never being released or terminated, so my application memory saturates after a while and there's nothing the garbage collector can do about it.
After some time running, my application is basically in this state:
If I manually restart the routes, the threads are terminated and the memory released.
Is there something I'm doing wrong in my routes configuration that is preventing the threads from terminating properly?
I'd like to avoid having to write a quartz job to restart the routes every once in a while.
Edit: I also recently updated from Camel 2.24.0 to the latest RC for Camel 3, but the issue is still happening.
Ok so it turns out that the threads being there in WAIT state were supposed to be there, as by default the threadpool size for Camel consumers is 10 (and in fact, I had at most 10 * my number of routes) threads.
Now, configuring the thread pool size is something that can be done, but it's not as easy as it seems, as there are several different ways to do so.
What fixed this for me is to set the threadPoolSize parameter in the rabbitMQ URI:
from("rabbitmq:$rabbitMQVhost?threadPoolSize=5&connectionFactory=#customConnectionFactory&queue=${it.rabbitMQQueue}
By doing so, the number of threads in WAIT state after several messages are processed on all routes is, as expected, 5 * my number of routes, which is better in my case: I don't have any big concurrency requirement but I have a significant number of routes, and having 10 hanging threads, with their memory footprint, for each route was draining my memory pretty fast.
Leaving this here as it looks like there's not much documentation around this topic and I had to bang my head for days on it.

Spring AMQP RabbitMQ does not consume all messages, workers finish prematurely

I am struggling to find proper setting to delay timeout for workers in RabbitMQ.
By default prefetchCount since the version 2.0 are set to 250 and exactly this amount of messages are being received and processed.
I would like to keep workers busy, until they clear up an entire queue (lets say 10k messages).
I can manipulate this number manually, such as changing default limit or assigning more threads resulting in multiplying default number.
Results are always the same. Once the number is reached, workers stop their job and application finish its execution
o.s.a.r.l.SimpleMessageListenerContainer : Successfully waited for workers to finish.
I would like them to finish when the queue is empty. Any ideas?
The logger.info("Successfully waited for workers to finish."); happens only in one place - doShutdown(). And this one is called from the shutdown(), which is called from the destroy() or stop().
I somehow think that you exit from your application by some reason. You just don't block the main() to work permanently.
Please, share a simple project we can play with.

Timeouts - Request them all in the beginning or one by one?

In general, when designing a system which has multiple events happening in some well pre-defined logical order, are there any benefits to either requesting all necessary timeouts at the beginning of the process, or requesting always only the "next" timeout (or in other words, the timeout for the next event)?
To clarify, I'm talking about a scenario when you want a number of things to happen sequentially.
Event A should happen 3 hours after initialization, Event B 10 hours after initialization, and Event C 48 hours after initialization of some process.
When the process is started, should it request a timeout only for Event A (which would then in turn request a timeout for Event B, and so on), or should it immediately request a timeout for all the Events?
In our case the process might be stopped at any point in time - Thus if it's stopped 5 hours after initialization then Event A should have already happened, and Events B and C should not happen at all.
A process might also in special cases be initiated midway through (ie "Start process 5 hours in", in which case Event B should happen 5 hours later), and the timelines of individual processes might be updated manually (ie "Lets postpone Event B by 2.5 hours for this single process instance).
Any thoughts appreciated,
If I got your scenario ok, you can start this with a saga that is started by an initial message that starts the process, on handling the initial message you would request the timeouts you expect and in the timeout handlers checking whether the other events/operation where handled and acting based on the current state...
Does that make sense?

NServiceBus over AzureStorage Dispatching the message took longer than a visibility timeout

I have NServiceBus running over the AzureStorageQueues Transport.
Sometimes, my message handler shows the following two entries in it's log:
2018-08-27 12:27:23.0329 INFO 5 Handling Message...
2018-08-27 12:27:23.7359 WARN 5 Dispatching the message took longer
than a visibility timeout. The message will reappear in the queue and
will be obtained again.
NServiceBus.AzureStorageQueues.LeaseTimeoutException: The pop receipt
of the cloud queue message '2ebd6dd4-f4a1-40c6-a52e-499e22bc9f2f' is
invalid as it exceeded the next visible time by '00:00:09.7359860'.
I understand that there is a Visiblity Timeout that can be configured, but is 30 seconds by default. And the message being handled is taking longer than this 30 seconds to process.
But what doesn't make sense is the timing of those two log entries. The handler is kicked off at 23.0329 seconds...while the warning pops up at 23.7359 seconds. That's a mere 0.7 seconds. Why is that? I would expect the warning from NServiceBus to pop up after the 30 second InvisibilityTimeout.
Assuming you're using the default settings, messages are retrieved in batches. All messages in the batch have the same visibility timeout value of 30 seconds. There's also processing concurrency limit (calculated as max(2, number of logical processors), which could have an impact, causing some messages from the batch to wait for previous messages to finish processing. Therefore it's possible that your message is retrieved as part of a batch but is not processed right away, causing visibility timeout to expire.
Adjusting configuration by tuning those to address your specific scenario should get rid of those repeated attempts to process messages.