In activemq 5, each queue had a folder containing its data and messages, everything.
Which would mean that, in case of an issue, for example an out of disk space error. Some files would get corrupted before the server crash. In that case, in activemq 5, we would find logs indicating corrupted files, and we could delete the queue folder that was corrupted, resulting in small loss of messages instead of ALL messages.
In artemis, it seems that messages are stored in the same files, independently from the queue they are stored in. Which means if i get an out of disk space error, i might have to delete all my messages.
First, can you confirm the change of behaviour, and secondly, is there a way to recover ? And a bonus, if anyone know why this change happened, I would like to understand.
Artemis uses a completely new message journal implementation as compared to 5.x. The same journal is used for all messages. However, it isn't subject to the same corruption problems as you've seen with 5.x. If records from the journal can't be processed then they are simply skipped.
If you get an out of disk space error you should never need to delete all your messages. The journal files themselves are allocated and filled with zeroes to meet their configured size before they are actually used so if you were going to run out of disk space you'd do so during that process before any messages were written to them.
The Artemis journal implementation was written from the ground up for high performance specifically in conjunction with the broker's non-blocking architecture.
Related
when I read https://github.com/rabbitmq/internals/blob/master/variable_queue.md, the variable_queue keeps messages on four queue data structures,but I am always confused why this design?Any one can give me a more intuitive explanation?
Thanks.
"q4. The need for these four queues becomes apparent once disk paging is taken into account." Per the authors from the link you provided.
Have you ever ran into a time where your queue ran into the 44 million messages range waiting to be processed? The reason for this design is those 44 million message have to go somewhere either the disk or memory, and going into memory would be really expansive.
Seems like the design for a variable queue is meant to keep messages in a queue while creating a buffer from the disk so you are never waiting for a message in any one of the other queues.
Essentially you have a queue of a queue of a queue that feeds queues messages being read from the disk to save on memory. Reading and writing to the disk is slow compared to writing/reading from memory, thus having this design seems to add some concurrency so you can keep getting your messages.
We've recently come across a problem when using RabbitMQ: when the hard drive of our server is full, RabbitMQ's vhost are getting corrupted, and unusable.
The only to make RabbitMQ functional again is to delete, and recreate the corrupted hosts.
Doing so, all of our queues, and exchanges, along with the data in it, is then gone.
While this situation should not happen in prod, we're searching for a way to prevent data loss, if such an event does occur.
We've been looking at the official rabbitMQ documentation, as well as on stack exchange, but haven't found any solution to prevent data loss when a host is corrupted.
We plan on setting up a cluster at a later stage of development, which should at least help in reducing the loss of data when a vhost is corrupted, but it's not possible for now.
Is there any reliable way to either prevent vhost corruption, or to fix the vhost without losing data?
Some thoughts on this (in no particular order):
RabbitMQ has multiple high-availability configurations - relying upon a single node provides no protection against data loss.
In general, you can have one of two possible guarantees with a message, but never both:
At least once delivery - a message will be delivered at least one time, and possibly more.
At most once delivery - a message may or may not be delivered, but if it is delivered, it will never be delivered a second time
Monitoring the overall health of your nodes (i.e. disk space, processor use, memory, etc.) should be done proactively by a tool specific to that purpose. You should never be surprised by running out of a critical system resource.
If you are running one node, and that node is out of disk space, and you have a bunch of messages on it, and you're worried about data loss, wondering how RabbitMQ can help you, I would say you have your priorities mixed up.
RabbitMQ is not a database. It is not designed to reliably store messages for an indefinite time period. Please don't count on it as such.
We had a problem the other day with MSMQ and I'm trying to understand what is going on.
We have like 10 services sending messages to each other. Some with WCF, other with straight usage of System.Messaging.
At some point not a single message would be sent anymore and all logs would fill up with
"Insufficient resources to perform this operation"
The messages are smaller than 4MB and it has worked for many months so the message size was not the problem.
Looking further in the msmq\storage folder there were 1.07 Gigabytes of message files in there and 950 megabyte of them were files starting with a 'j'.
j0002f0e.mq j0002f0f.mq etc
These messages represent journal files and indeed one (WCF) service sending thousands of messages every day had useSourceJournal enabled.All those files are 4MB in size, the max they all contain multiple queue messages from the past.
Now could this be the cause? Is there some limit of 1GB where journal messages pile up and that MSMQ starts failing then with that general insufficient resources?
Should the journal queue be cleared every once in a while so that the storage folder is (almost) empty?
Journal messages are just like any other message. They take up space until your application does something with them. They aren't like temporary files that the system purges after a while. The idea is that if journaling is enabled (at the message or queue levels) then the messages are important as otherwise you wouldn't bother switching it on in the first place. Processing the journal messages should be part of your application (or at least part of a formal maintenance procedure).Journaling has a quota, just like with regular messages.
I am using LogStash to collect the logs from my service. The volume of the data is so large (20GB/day) that I am afraid that some of the data will be dropped at peak time.
So I asked question here and decided to add a Redis as a buffer between ELB and LogStash to prevent data loss.
However, I am curious about when will LogStash exceed the queue capacity and drop messages?
Because I've done some experiments and the result shows that LogStash can completely process all the data without any loss, e.g., local file --> LogStash --> local file, netcat --> LogStash --> local file.
Can someone give me a solid example when LogStash eventually drop messages? So I can have a better understanding about why we need a buffer in front of it.
As far as I know, Logstash queue is very small. Please refer to here.
Logstash sets each queue size to 20. This means only 20 events can be pending into the next phase.
This helps reduce any data loss and in general avoids logstash trying to act as a data storage
system. These internal queues are not for storing messages long-term.
As you say, your daily logs size are 20GB. It's quite large amount. So, it is recommended that install a redis before logstash. The other advantage for installing a redis is when your logstash process have error and shutdown, redis can buffer the logs for you, otherwise all your logs will be drop.
The maximum queue size is configurable and the queue can be stored on-disk or in-memory. (Strongly advise in-memory due to high volume).
When the queue is full, logstash will stop reading log messages and drop incoming logs.
For log files, logstash will stop reading further when tit can't keep up, it can resume reading later. It's keeping track of active log files and last read position. The files are basically acting like an enormous buffer, it's really unlikely to lose data (unless files are deleted).
For TCP/UDP input, messages can be lost if the queue is full.
For other inputs/outputs, you have to check the doc, whether it can support back pressure, whether it can replay missed messages if a network connection was lost.
Generally speaking, 20 GB a day is pretty low (even in 2014 when it was originally posted), we're talking about 1000 messages a second. logstash really doesn't need a redis in front.
For very large deployments (multiple TB per day), it's common to encounter kafka somewhere in the chain to buffer messages. At this stage there are typically many clients with different types of messages, flowing over a variety of protocols.
What is the intended usage of ForwardRecievedMessagesTo?
I read some where that it is to support auditing. Is there any harm in using it as a solution to ensure that messages have been processed and if not reprocessing them? lets say a message was sent to queue_A#server_A and also forwarded to q_All#server_All and before the message was handled, machine_A died irrecoverably. In such a case, I could have a handler pick up messages from q_All#sever_All and check against a database table if the message has been processed. If not reprocess(publish or send) the message or save it in a database table.
Also, what is the performance implication of using forwardreceivedmessageto? How is it different from journalling?
Yes, I am trying to not use msmq clustering.
The feature is there to support auditing. If your machine dies during processing then the messages will backup at the sending machine and would continue to flow after the machine recovered. This means you must size the disk on the sending machine appropriately. You could leverage auditing to accomplish this and the overhead would be minimal. The implication would be the time it would take to complete the distributed transaction to the other machine where your audit queue lives which should be very small.