ActiveMQ persistence store for large message size - activemq

IF producer is sending message of large message size (let's say 120 MB). How does kahadb and levelDB handle such messages.
kahadb: What I understand is journal size is 32 MB by default. and If i send message more than 32MB, how it will handle such message ? Do I need to change this size to appropriate value according to message size?
leveldb: By default 100mB is default size to store message data. After which rolling happens. IF message is more than 100mB. How does it handle ?
Thanks,
ANuj

For Kahadb we have rolling log of messages, messages and commands are stored in data file of fixed length,if the length exceeds the size of message journal then new file is created.
KahaDB just appends the new message to existing journal and takes care of creating new journal.
Also,KahaDB holds indexes to messages in form a BTree.These Btree indexes hold references to the messages in data logs indexed by their message ID.In short KahaDB will know exactly where your message is stored with the help of this index.So, adding any new configuration for storing this message should not be required
Regarding the whole message in single data log file, I am not sure may be bit research needed
and before trying your luck at changing the journalSizeLength for KahaDB please got through this
link (reading comments there might be helpful)
Good luck!
Hope it helps.

Related

Rabbitmq :: Message is never removed from stream queue

I have created an stream queue in the rabbitmq of my project and configured max-age to 1 minute. I sent a message to the queue,all the consumers consumed the message, but the message is remaining in the queue (I waited more than 1 minute) as "ready". My worry is about accumulation of messages in the HD of rabbitmq instance.
So, my question is: All the messages marked as "ready" are stored in the HD, even after all consumer consumed the messages? If yes, how can I could purge (in this case, max-age is not working for it) these messages from HD of rabbitmq instance?
That is the design; see https://www.rabbitmq.com/streams.html#retention
Streams are implemented as an immutable append-only disk log. This means that the log will grow indefinitely until the disk runs out. To avoid this undesirable scenario it is possible to set a retention configuration per stream which will discard the oldest data in the log based on total log data size and/or age.
There are two parameters that control the retention of a stream. These can be combined. These are either set at declaration time using a queue argument or as a policy which can be dynamically updated. ...
max-age:
valid units: Y, M, D, h, m, s
e.g. 7D for a week
max-length-bytes:
the max total size in bytes
NB: retention is evaluated on per segment basis so there is one more parameter that comes into effect and that is the segment size of the stream. The stream will always leave at least one segment in place as long as the segment contains at least one message. When using broker-provided offset-tracking, offsets for each consumer are persisted in the stream itself as non-message data.
But I see what you mean.
I suggest you ask on the rabbitmq-users Google group where the RabbitMQ engineers hang out; they don't monitor SO closely.
Same problem here, the messages is nerver deleted.
The solution that I found:
It's not possible to avoid to store data into HD or make a purge, but it's possible to prevent excessive disk usage.
Add the argument x-stream-max-segment-size-bytes to the queue decreasing the default size to a size that is OK for your necessity. I defined 1 mb, for example. More details: https://www.rabbitmq.com/streams.html#declaring
At least one segment file will always remain, so if you just send 1 message and wait, it will remain on disk forever. However, if you keep publishing, a new segment file gets created at some point and the retention process kicks in. Files that only contain messages older than the retention period will be deleted.

Artemis vs Activemq 5 message store

In activemq 5, each queue had a folder containing its data and messages, everything.
Which would mean that, in case of an issue, for example an out of disk space error. Some files would get corrupted before the server crash. In that case, in activemq 5, we would find logs indicating corrupted files, and we could delete the queue folder that was corrupted, resulting in small loss of messages instead of ALL messages.
In artemis, it seems that messages are stored in the same files, independently from the queue they are stored in. Which means if i get an out of disk space error, i might have to delete all my messages.
First, can you confirm the change of behaviour, and secondly, is there a way to recover ? And a bonus, if anyone know why this change happened, I would like to understand.
Artemis uses a completely new message journal implementation as compared to 5.x. The same journal is used for all messages. However, it isn't subject to the same corruption problems as you've seen with 5.x. If records from the journal can't be processed then they are simply skipped.
If you get an out of disk space error you should never need to delete all your messages. The journal files themselves are allocated and filled with zeroes to meet their configured size before they are actually used so if you were going to run out of disk space you'd do so during that process before any messages were written to them.
The Artemis journal implementation was written from the ground up for high performance specifically in conjunction with the broker's non-blocking architecture.

ActiveMQ broker storage usage

I have 3 ActiveMQ brokers, out of the three, one broker is running into an issue which says persistent store is full.
Sample error:
INFO | Usage(default:store:queue://foo.bar:store) percentUsage=99%, usage=537210471, limit=536870912, percentUsageMinDelta=1%;Parent:Usage(default:store) percentUsage=100%, usage=537210471, limit=536870912,percentUsageMinDelta=1%: Persistent store is Full, 100% of 536870912. Stopping producer (ID: AKUNTAMU-1-31754-1388571228628-1:1:1:1) to prevent flooding queue://foo.bar. See http://activemq.apache.org/producer-flow-control.html for more info (blocking for: 155s)
I have configured my storeUsage limit as 100GB for persistent messages but when i go and check the disk usage of the kahadb it is more than 100GB (it is 190Gb).
My understanding is kahadb folder contains both the persistent messages plus the journal log files.
Question:
1) Can we query kahadb to see which queue is eating up space?
2) Inside the kahadb folder, how do we segregate the space occupied by messages and other database related files. because everything is data*.log files.
3) for the other 2 brokers, on the activemq web console the store limit used is showing as 0%, confused on this part. so how i validate if its actually zero percent on the other two brokers?
Thanks in advance.
Whenever we configure ActiveMQ we provide a setting as to how much space on the Disk the MQ should be using,
The settings have 3 parameters
Memory Usage
Store Usage
Temp Usage
Out of which TempUsage is the maximum memory that MQ can use to store the non-persistent messages, you should most probably have this value set to 50GB(which is 536870912 bytes).
Have a look at this answer on how to find this value
https://stackoverflow.com/a/27549226/2551236
I haven't seen this limit being breached, is there no consumer on the queue or some slow consumer? any which ways if you want to increase the limit you can tweak your activemq.xml file as mentioned in the above answer.
Hope this helps!
Good luck!

Message queueing fails after a few months

We had a problem the other day with MSMQ and I'm trying to understand what is going on.
We have like 10 services sending messages to each other. Some with WCF, other with straight usage of System.Messaging.
At some point not a single message would be sent anymore and all logs would fill up with
"Insufficient resources to perform this operation"
The messages are smaller than 4MB and it has worked for many months so the message size was not the problem.
Looking further in the msmq\storage folder there were 1.07 Gigabytes of message files in there and 950 megabyte of them were files starting with a 'j'.
j0002f0e.mq j0002f0f.mq etc
These messages represent journal files and indeed one (WCF) service sending thousands of messages every day had useSourceJournal enabled.All those files are 4MB in size, the max they all contain multiple queue messages from the past.
Now could this be the cause? Is there some limit of 1GB where journal messages pile up and that MSMQ starts failing then with that general insufficient resources?
Should the journal queue be cleared every once in a while so that the storage folder is (almost) empty?
Journal messages are just like any other message. They take up space until your application does something with them. They aren't like temporary files that the system purges after a while. The idea is that if journaling is enabled (at the message or queue levels) then the messages are important as otherwise you wouldn't bother switching it on in the first place. Processing the journal messages should be part of your application (or at least part of a formal maintenance procedure).Journaling has a quota, just like with regular messages.

When will LogStash exceed the queue capacity and drop messages?

I am using LogStash to collect the logs from my service. The volume of the data is so large (20GB/day) that I am afraid that some of the data will be dropped at peak time.
So I asked question here and decided to add a Redis as a buffer between ELB and LogStash to prevent data loss.
However, I am curious about when will LogStash exceed the queue capacity and drop messages?
Because I've done some experiments and the result shows that LogStash can completely process all the data without any loss, e.g., local file --> LogStash --> local file, netcat --> LogStash --> local file.
Can someone give me a solid example when LogStash eventually drop messages? So I can have a better understanding about why we need a buffer in front of it.
As far as I know, Logstash queue is very small. Please refer to here.
Logstash sets each queue size to 20. This means only 20 events can be pending into the next phase.
This helps reduce any data loss and in general avoids logstash trying to act as a data storage
system. These internal queues are not for storing messages long-term.
As you say, your daily logs size are 20GB. It's quite large amount. So, it is recommended that install a redis before logstash. The other advantage for installing a redis is when your logstash process have error and shutdown, redis can buffer the logs for you, otherwise all your logs will be drop.
The maximum queue size is configurable and the queue can be stored on-disk or in-memory. (Strongly advise in-memory due to high volume).
When the queue is full, logstash will stop reading log messages and drop incoming logs.
For log files, logstash will stop reading further when tit can't keep up, it can resume reading later. It's keeping track of active log files and last read position. The files are basically acting like an enormous buffer, it's really unlikely to lose data (unless files are deleted).
For TCP/UDP input, messages can be lost if the queue is full.
For other inputs/outputs, you have to check the doc, whether it can support back pressure, whether it can replay missed messages if a network connection was lost.
Generally speaking, 20 GB a day is pretty low (even in 2014 when it was originally posted), we're talking about 1000 messages a second. logstash really doesn't need a redis in front.
For very large deployments (multiple TB per day), it's common to encounter kafka somewhere in the chain to buffer messages. At this stage there are typically many clients with different types of messages, flowing over a variety of protocols.