Active MQ performance

Active MQ performance - activemq

I have a performance problem with active mq. I send small messages with a string of 20 characters.
When I send 30.000 messages in, active mq gets very slow. I increase the memory now to 2GB before it was 1GB, now the performance is better.
My question is now, why does active mq needs so much memory for so small messages?

I would start my debugging with a thread dump and heap dump for activemq process id.. you can use jmap for heap dump analysis and samurai for thread dump analysis

Related

How to understand rabbitmq Variable Queue

when I read https://github.com/rabbitmq/internals/blob/master/variable_queue.md, the variable_queue keeps messages on four queue data structures,but I am always confused why this design？Any one can give me a more intuitive explanation？
Thanks.

"q4. The need for these four queues becomes apparent once disk paging is taken into account." Per the authors from the link you provided.
Have you ever ran into a time where your queue ran into the 44 million messages range waiting to be processed? The reason for this design is those 44 million message have to go somewhere either the disk or memory, and going into memory would be really expansive.
Seems like the design for a variable queue is meant to keep messages in a queue while creating a buffer from the disk so you are never waiting for a message in any one of the other queues.
Essentially you have a queue of a queue of a queue that feeds queues messages being read from the disk to save on memory. Reading and writing to the disk is slow compared to writing/reading from memory, thus having this design seems to add some concurrency so you can keep getting your messages.

OutOfMemory error in a queue consumer application

I have a datapipeline component that reads SQS messages, generated at S3 upload trigger, and parses and publishes the message for a batchpipeline component.
I have recently observed that in production system, my datapipeline keeps crashing with OutOfMemory error under heavy load but it never crashes when tested locally with similar loads? The batchpipeline never seems to crash in Production ever.
How do I go about debugging it when I can't reproduce it locally?

As I have found a solution, after 2 weeks, to my problem above, I figured I'll document it for others and my future self.
I wasn't able to replicate the issue because the aws command-
aws s3 cp --recursive dir s3://input-queue/dir
somehow wasn't uploading messages fast enough that it could stress my local datapipeline. So I brought down the datapipeline and once there were 10k SQS messages in the queue, I started it and as expected, it crashed with Out Of Memory error after processing ~3000 messages. It turns out that the pipeline was able to handle continuous throughput but it broke when it started with 10k message load.
My hypothesis was that the issue is happening because Java garbage collection is unable to properly clean up objects after execution. So, I started analyzing the generated heap dump and after some days of research, I stumbled on the possible root cause for Out of Memory error. There were ~5000 instances of my MessageHandlerTask class, when ideally they should have been GC'd after being processed and not keep on piling up.
Further investigation on that line of thought led me to the root cause- it turned out that the code was using Executors.newFixedThreadPool() to create an ExecutorService for submitting tasks to. This implementation used an unbounded queue of tasks, so if too many tasks were submitted, all of them waited in the queue, taking up huge memory.
The reality was similar- messages were being polled faster than they could be processed. This caused a lot of valid MessageHandlerTask instances to be created that filled the heap memory if there was a message backlog.
The fix was to have create a ThreadPoolExecutor with an ArrayBlockingQueue of capacity 100 so that there is a cap on number of instances of MessageHandlerTask and its member variables.
Having figured out the fix, I moved on to optimize the pipeline for maximum throughput by varying the maximumPoolSize of the ThreadPoolExecutor. It turned out there were some SQS connection exceptions happening at higher thread counts. Further investigation revealed that increasing the SQS connection pool size ameliorated this issue.
I ultimately settled on a count of 40 threads for the given Xmx heap size of 1.5G and 80 SQS connection pool size so that the task threads do not run out of SQS connections while processing. This helped me achieve a throughput of 44 messages/s with just a single instance of datapipeline.
I also found out why the batchpipeline never crashed in Production, despite suffering from a similar ExecutorService implementation- turns out the datapipeline could be stressed by too many concurrent S3 uploads but the messages for batchpipeline were produced by datapipeline in a gradual fashion. Besides, the batchpipeline had a much higher throughput that I benchmarked at 347 messages/s when using 70 maximumPoolSize.

ActiveMQ broker storage usage

I have 3 ActiveMQ brokers, out of the three, one broker is running into an issue which says persistent store is full.
Sample error:
INFO | Usage(default:store:queue://foo.bar:store) percentUsage=99%, usage=537210471, limit=536870912, percentUsageMinDelta=1%;Parent:Usage(default:store) percentUsage=100%, usage=537210471, limit=536870912,percentUsageMinDelta=1%: Persistent store is Full, 100% of 536870912. Stopping producer (ID: AKUNTAMU-1-31754-1388571228628-1:1:1:1) to prevent flooding queue://foo.bar. See http://activemq.apache.org/producer-flow-control.html for more info (blocking for: 155s)
I have configured my storeUsage limit as 100GB for persistent messages but when i go and check the disk usage of the kahadb it is more than 100GB (it is 190Gb).
My understanding is kahadb folder contains both the persistent messages plus the journal log files.
Question:
1) Can we query kahadb to see which queue is eating up space?
2) Inside the kahadb folder, how do we segregate the space occupied by messages and other database related files. because everything is data*.log files.
3) for the other 2 brokers, on the activemq web console the store limit used is showing as 0%, confused on this part. so how i validate if its actually zero percent on the other two brokers?
Thanks in advance.

Whenever we configure ActiveMQ we provide a setting as to how much space on the Disk the MQ should be using,
The settings have 3 parameters
Memory Usage
Store Usage
Temp Usage
Out of which TempUsage is the maximum memory that MQ can use to store the non-persistent messages, you should most probably have this value set to 50GB(which is 536870912 bytes).
Have a look at this answer on how to find this value
https://stackoverflow.com/a/27549226/2551236
I haven't seen this limit being breached, is there no consumer on the queue or some slow consumer? any which ways if you want to increase the limit you can tweak your activemq.xml file as mentioned in the above answer.
Hope this helps!
Good luck!