I have a performance problem with active mq. I send small messages with a string of 20 characters.
When I send 30.000 messages in, active mq gets very slow. I increase the memory now to 2GB before it was 1GB, now the performance is better.
My question is now, why does active mq needs so much memory for so small messages?
I would start my debugging with a thread dump and heap dump for activemq process id.. you can use jmap for heap dump analysis and samurai for thread dump analysis
Related
when I read https://github.com/rabbitmq/internals/blob/master/variable_queue.md, the variable_queue keeps messages on four queue data structures,but I am always confused why this design?Any one can give me a more intuitive explanation?
Thanks.
"q4. The need for these four queues becomes apparent once disk paging is taken into account." Per the authors from the link you provided.
Have you ever ran into a time where your queue ran into the 44 million messages range waiting to be processed? The reason for this design is those 44 million message have to go somewhere either the disk or memory, and going into memory would be really expansive.
Seems like the design for a variable queue is meant to keep messages in a queue while creating a buffer from the disk so you are never waiting for a message in any one of the other queues.
Essentially you have a queue of a queue of a queue that feeds queues messages being read from the disk to save on memory. Reading and writing to the disk is slow compared to writing/reading from memory, thus having this design seems to add some concurrency so you can keep getting your messages.
I have a datapipeline component that reads SQS messages, generated at S3 upload trigger, and parses and publishes the message for a batchpipeline component.
I have recently observed that in production system, my datapipeline keeps crashing with OutOfMemory error under heavy load but it never crashes when tested locally with similar loads? The batchpipeline never seems to crash in Production ever.
How do I go about debugging it when I can't reproduce it locally?
As I have found a solution, after 2 weeks, to my problem above, I figured I'll document it for others and my future self.
I wasn't able to replicate the issue because the aws command-
aws s3 cp --recursive dir s3://input-queue/dir
somehow wasn't uploading messages fast enough that it could stress my local datapipeline. So I brought down the datapipeline and once there were 10k SQS messages in the queue, I started it and as expected, it crashed with Out Of Memory error after processing ~3000 messages. It turns out that the pipeline was able to handle continuous throughput but it broke when it started with 10k message load.
My hypothesis was that the issue is happening because Java garbage collection is unable to properly clean up objects after execution. So, I started analyzing the generated heap dump and after some days of research, I stumbled on the possible root cause for Out of Memory error. There were ~5000 instances of my MessageHandlerTask class, when ideally they should have been GC'd after being processed and not keep on piling up.
Further investigation on that line of thought led me to the root cause- it turned out that the code was using Executors.newFixedThreadPool() to create an ExecutorService for submitting tasks to. This implementation used an unbounded queue of tasks, so if too many tasks were submitted, all of them waited in the queue, taking up huge memory.
The reality was similar- messages were being polled faster than they could be processed. This caused a lot of valid MessageHandlerTask instances to be created that filled the heap memory if there was a message backlog.
The fix was to have create a ThreadPoolExecutor with an ArrayBlockingQueue of capacity 100 so that there is a cap on number of instances of MessageHandlerTask and its member variables.
Having figured out the fix, I moved on to optimize the pipeline for maximum throughput by varying the maximumPoolSize of the ThreadPoolExecutor. It turned out there were some SQS connection exceptions happening at higher thread counts. Further investigation revealed that increasing the SQS connection pool size ameliorated this issue.
I ultimately settled on a count of 40 threads for the given Xmx heap size of 1.5G and 80 SQS connection pool size so that the task threads do not run out of SQS connections while processing. This helped me achieve a throughput of 44 messages/s with just a single instance of datapipeline.
I also found out why the batchpipeline never crashed in Production, despite suffering from a similar ExecutorService implementation- turns out the datapipeline could be stressed by too many concurrent S3 uploads but the messages for batchpipeline were produced by datapipeline in a gradual fashion. Besides, the batchpipeline had a much higher throughput that I benchmarked at 347 messages/s when using 70 maximumPoolSize.
I have 3 ActiveMQ brokers, out of the three, one broker is running into an issue which says persistent store is full.
Sample error:
INFO | Usage(default:store:queue://foo.bar:store) percentUsage=99%, usage=537210471, limit=536870912, percentUsageMinDelta=1%;Parent:Usage(default:store) percentUsage=100%, usage=537210471, limit=536870912,percentUsageMinDelta=1%: Persistent store is Full, 100% of 536870912. Stopping producer (ID: AKUNTAMU-1-31754-1388571228628-1:1:1:1) to prevent flooding queue://foo.bar. See http://activemq.apache.org/producer-flow-control.html for more info (blocking for: 155s)
I have configured my storeUsage limit as 100GB for persistent messages but when i go and check the disk usage of the kahadb it is more than 100GB (it is 190Gb).
My understanding is kahadb folder contains both the persistent messages plus the journal log files.
Question:
1) Can we query kahadb to see which queue is eating up space?
2) Inside the kahadb folder, how do we segregate the space occupied by messages and other database related files. because everything is data*.log files.
3) for the other 2 brokers, on the activemq web console the store limit used is showing as 0%, confused on this part. so how i validate if its actually zero percent on the other two brokers?
Thanks in advance.
Whenever we configure ActiveMQ we provide a setting as to how much space on the Disk the MQ should be using,
The settings have 3 parameters
Memory Usage
Store Usage
Temp Usage
Out of which TempUsage is the maximum memory that MQ can use to store the non-persistent messages, you should most probably have this value set to 50GB(which is 536870912 bytes).
Have a look at this answer on how to find this value
https://stackoverflow.com/a/27549226/2551236
I haven't seen this limit being breached, is there no consumer on the queue or some slow consumer? any which ways if you want to increase the limit you can tweak your activemq.xml file as mentioned in the above answer.
Hope this helps!
Good luck!
As advised on the webpage
activemq-performance-module-users-manual I've tried (on an Intel i7 laptop with Windows 7 OS and SSD drive) the performance of producing persistent messages on a ActiveMQ Queue :
mvn activemq-perf:producer -Dproducer.destName=queue://TEST.FOO -Dproducer.deliveryMode=persistent
against the default installation of activemq 5.12.1
The performance which I got is around 300-400 messages per second.
On the page activemq-performance I have been reading much higher numbers:
When running the server on one box and a single producer and consumer thread in separate VMs on the other box, using a single topic we got around 21-22,000 messages/second using 1-2K messages.
On the other hand, when the messages are not persistent, the performance of the producer grows to 49000 messages per second. -Dproducer.deliveryMode=nonpersistent
When the messages are sent asynchrounously.
-Dproducer.deliveryMode=persistent -Dfactory.useAsyncSend=true
I get around 23000 messages sent per second.
From what I see here stackoverflow-activemq-persistent-performance-on-different-operatiing-systems it makes a difference when running activemq on different OS.
Can somebody give me some tips for having a better performance for writing persistent activemq messages?
Performance of sending persistent messages is all about disk based IO as the message must be written to the disk prior to the broker signalling the client that the message send completed. The faster the disk the better your throughput will be, all else being equal.
To work around some of this you can send persistent messages in transactional batches so that the send itself is complete and the synchronization point is reduced to the transaction boundary.
Depending on the size of the text messages you can also gain some performance by using compression, this can be turned on via a option in the ActiveMQConnectionFactory.
I am using LogStash to collect the logs from my service. The volume of the data is so large (20GB/day) that I am afraid that some of the data will be dropped at peak time.
So I asked question here and decided to add a Redis as a buffer between ELB and LogStash to prevent data loss.
However, I am curious about when will LogStash exceed the queue capacity and drop messages?
Because I've done some experiments and the result shows that LogStash can completely process all the data without any loss, e.g., local file --> LogStash --> local file, netcat --> LogStash --> local file.
Can someone give me a solid example when LogStash eventually drop messages? So I can have a better understanding about why we need a buffer in front of it.
As far as I know, Logstash queue is very small. Please refer to here.
Logstash sets each queue size to 20. This means only 20 events can be pending into the next phase.
This helps reduce any data loss and in general avoids logstash trying to act as a data storage
system. These internal queues are not for storing messages long-term.
As you say, your daily logs size are 20GB. It's quite large amount. So, it is recommended that install a redis before logstash. The other advantage for installing a redis is when your logstash process have error and shutdown, redis can buffer the logs for you, otherwise all your logs will be drop.
The maximum queue size is configurable and the queue can be stored on-disk or in-memory. (Strongly advise in-memory due to high volume).
When the queue is full, logstash will stop reading log messages and drop incoming logs.
For log files, logstash will stop reading further when tit can't keep up, it can resume reading later. It's keeping track of active log files and last read position. The files are basically acting like an enormous buffer, it's really unlikely to lose data (unless files are deleted).
For TCP/UDP input, messages can be lost if the queue is full.
For other inputs/outputs, you have to check the doc, whether it can support back pressure, whether it can replay missed messages if a network connection was lost.
Generally speaking, 20 GB a day is pretty low (even in 2014 when it was originally posted), we're talking about 1000 messages a second. logstash really doesn't need a redis in front.
For very large deployments (multiple TB per day), it's common to encounter kafka somewhere in the chain to buffer messages. At this stage there are typically many clients with different types of messages, flowing over a variety of protocols.