when I read https://github.com/rabbitmq/internals/blob/master/variable_queue.md, the variable_queue keeps messages on four queue data structures,but I am always confused why this design?Any one can give me a more intuitive explanation?
Thanks.
"q4. The need for these four queues becomes apparent once disk paging is taken into account." Per the authors from the link you provided.
Have you ever ran into a time where your queue ran into the 44 million messages range waiting to be processed? The reason for this design is those 44 million message have to go somewhere either the disk or memory, and going into memory would be really expansive.
Seems like the design for a variable queue is meant to keep messages in a queue while creating a buffer from the disk so you are never waiting for a message in any one of the other queues.
Essentially you have a queue of a queue of a queue that feeds queues messages being read from the disk to save on memory. Reading and writing to the disk is slow compared to writing/reading from memory, thus having this design seems to add some concurrency so you can keep getting your messages.
Related
To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!
The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.
If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.
In activemq 5, each queue had a folder containing its data and messages, everything.
Which would mean that, in case of an issue, for example an out of disk space error. Some files would get corrupted before the server crash. In that case, in activemq 5, we would find logs indicating corrupted files, and we could delete the queue folder that was corrupted, resulting in small loss of messages instead of ALL messages.
In artemis, it seems that messages are stored in the same files, independently from the queue they are stored in. Which means if i get an out of disk space error, i might have to delete all my messages.
First, can you confirm the change of behaviour, and secondly, is there a way to recover ? And a bonus, if anyone know why this change happened, I would like to understand.
Artemis uses a completely new message journal implementation as compared to 5.x. The same journal is used for all messages. However, it isn't subject to the same corruption problems as you've seen with 5.x. If records from the journal can't be processed then they are simply skipped.
If you get an out of disk space error you should never need to delete all your messages. The journal files themselves are allocated and filled with zeroes to meet their configured size before they are actually used so if you were going to run out of disk space you'd do so during that process before any messages were written to them.
The Artemis journal implementation was written from the ground up for high performance specifically in conjunction with the broker's non-blocking architecture.
This article tells about control queues Nsb Master node uses to control message load, though to me it's still not clear how to interpret disproportions in number of messages in this queues: https://docs.particular.net/nservicebus/msmq/distributor/
I'm observing slowness in my Nsb service which have never experienced slowness before. For some reason less parallel threads are created per every master node comparing to the past time, and there have been no change in workers or master nodes configuration, like max amount of threads to allocate. I'm trying to figure out if it's Master node that does not want to feed workers, or workers do not want to take more job.
I see that amount of messages in control queue jumps from 15 to 40, while storage has only 5-8. Should I interpret that as workers ready to work, while Distributor can't send them more messages? Thanks
The numbers in the control and storage queue will jump up and down as long as the distributor is handing out messages. A message coming into the control queue will immediately be popped off that queue and onto the storage queue. A message coming into the primary queue of the distributor will immediately result in the first message of the storage queue to be popped off.
It's hard to interpret the numbers of messages in the queues of a running distributor, because, by the time you look at the numbers with Computer Management or Queue Explorer, they will have changed.
The extreme cases are this:
1. No messages in the primary input queue of the distributor and no work happening on any of the workers.
Input queue: 0
Control queue: 0
Storage queue: number of workers*configured threads per worker
2. All workers are working at full capacity. None able to take on more work.
Input queue: 0+ (grows as new messages comes in)
Control queue: 0
Storage queue: 0
In a running system, it can be anything between these two extremes, so, unfortunately, it's hard to say much from just a snapshot of the control and storage queue.
Some troubleshooting tips:
If the storage queue is empty, the distributor can not hand out more work. It does not know where to send it. This happens if all the workers are fully occupied as they will not be sending any ready-messages back to the control queue until they finish up handling a message.
If the storage queue is consistently small compared to the total number of worker threads across all the workers, you are approaching the total maximum capacity of your workers.
I suggest you start looking at the logs of the workers and see if the work they are doing is taking longer than usual. Slower database/third party integration?
Another thing to check is if there has been anything IO-heavy added to the machine hosting the distributor. If the distributor was already running at close to max capacity, adding extra IO might slow down MSMQ on the box, giving you worse throughput.
We had a problem the other day with MSMQ and I'm trying to understand what is going on.
We have like 10 services sending messages to each other. Some with WCF, other with straight usage of System.Messaging.
At some point not a single message would be sent anymore and all logs would fill up with
"Insufficient resources to perform this operation"
The messages are smaller than 4MB and it has worked for many months so the message size was not the problem.
Looking further in the msmq\storage folder there were 1.07 Gigabytes of message files in there and 950 megabyte of them were files starting with a 'j'.
j0002f0e.mq j0002f0f.mq etc
These messages represent journal files and indeed one (WCF) service sending thousands of messages every day had useSourceJournal enabled.All those files are 4MB in size, the max they all contain multiple queue messages from the past.
Now could this be the cause? Is there some limit of 1GB where journal messages pile up and that MSMQ starts failing then with that general insufficient resources?
Should the journal queue be cleared every once in a while so that the storage folder is (almost) empty?
Journal messages are just like any other message. They take up space until your application does something with them. They aren't like temporary files that the system purges after a while. The idea is that if journaling is enabled (at the message or queue levels) then the messages are important as otherwise you wouldn't bother switching it on in the first place. Processing the journal messages should be part of your application (or at least part of a formal maintenance procedure).Journaling has a quota, just like with regular messages.
I am using LogStash to collect the logs from my service. The volume of the data is so large (20GB/day) that I am afraid that some of the data will be dropped at peak time.
So I asked question here and decided to add a Redis as a buffer between ELB and LogStash to prevent data loss.
However, I am curious about when will LogStash exceed the queue capacity and drop messages?
Because I've done some experiments and the result shows that LogStash can completely process all the data without any loss, e.g., local file --> LogStash --> local file, netcat --> LogStash --> local file.
Can someone give me a solid example when LogStash eventually drop messages? So I can have a better understanding about why we need a buffer in front of it.
As far as I know, Logstash queue is very small. Please refer to here.
Logstash sets each queue size to 20. This means only 20 events can be pending into the next phase.
This helps reduce any data loss and in general avoids logstash trying to act as a data storage
system. These internal queues are not for storing messages long-term.
As you say, your daily logs size are 20GB. It's quite large amount. So, it is recommended that install a redis before logstash. The other advantage for installing a redis is when your logstash process have error and shutdown, redis can buffer the logs for you, otherwise all your logs will be drop.
The maximum queue size is configurable and the queue can be stored on-disk or in-memory. (Strongly advise in-memory due to high volume).
When the queue is full, logstash will stop reading log messages and drop incoming logs.
For log files, logstash will stop reading further when tit can't keep up, it can resume reading later. It's keeping track of active log files and last read position. The files are basically acting like an enormous buffer, it's really unlikely to lose data (unless files are deleted).
For TCP/UDP input, messages can be lost if the queue is full.
For other inputs/outputs, you have to check the doc, whether it can support back pressure, whether it can replay missed messages if a network connection was lost.
Generally speaking, 20 GB a day is pretty low (even in 2014 when it was originally posted), we're talking about 1000 messages a second. logstash really doesn't need a redis in front.
For very large deployments (multiple TB per day), it's common to encounter kafka somewhere in the chain to buffer messages. At this stage there are typically many clients with different types of messages, flowing over a variety of protocols.