Do you risk triggering race conditions when sending large messages in Erlang?

Do you risk triggering race conditions when sending large messages in Erlang? - process

In Erlang if two processes A and B are sending message to a process C simultaneously. Will there be a race condition?
C ! {very large message} sent by A
C ! {very large message} sent by B
Will C receive the complete message from A and then proceed for the message from B? or is it that C is likely be going to receive chunks of A's message along with chunks of B's message?

Message receiving is an atomic operation.
If you are interested how it is done, read the source code of VM. If I simplify it, the sending process is doing those steps:
Allocate a target memory space in the sending process (it's called environment).
Copy the message to that memory space
Take the external lock on the target process
Link message into the mailbox linked list
Release the external lock on the target process
As you can see, copying is done outside (before) critical section and the critical section is pretty fast. It is just juggling with few pointers.

Related

Rabbitmq :: Message is never removed from stream queue

I have created an stream queue in the rabbitmq of my project and configured max-age to 1 minute. I sent a message to the queue,all the consumers consumed the message, but the message is remaining in the queue (I waited more than 1 minute) as "ready". My worry is about accumulation of messages in the HD of rabbitmq instance.
So, my question is: All the messages marked as "ready" are stored in the HD, even after all consumer consumed the messages? If yes, how can I could purge (in this case, max-age is not working for it) these messages from HD of rabbitmq instance?

That is the design; see https://www.rabbitmq.com/streams.html#retention
Streams are implemented as an immutable append-only disk log. This means that the log will grow indefinitely until the disk runs out. To avoid this undesirable scenario it is possible to set a retention configuration per stream which will discard the oldest data in the log based on total log data size and/or age.
There are two parameters that control the retention of a stream. These can be combined. These are either set at declaration time using a queue argument or as a policy which can be dynamically updated. ...
max-age:
valid units: Y, M, D, h, m, s
e.g. 7D for a week
max-length-bytes:
the max total size in bytes
NB: retention is evaluated on per segment basis so there is one more parameter that comes into effect and that is the segment size of the stream. The stream will always leave at least one segment in place as long as the segment contains at least one message. When using broker-provided offset-tracking, offsets for each consumer are persisted in the stream itself as non-message data.
But I see what you mean.
I suggest you ask on the rabbitmq-users Google group where the RabbitMQ engineers hang out; they don't monitor SO closely.

Same problem here, the messages is nerver deleted.

The solution that I found:
It's not possible to avoid to store data into HD or make a purge, but it's possible to prevent excessive disk usage.
Add the argument x-stream-max-segment-size-bytes to the queue decreasing the default size to a size that is OK for your necessity. I defined 1 mb, for example. More details: https://www.rabbitmq.com/streams.html#declaring
At least one segment file will always remain, so if you just send 1 message and wait, it will remain on disk forever. However, if you keep publishing, a new segment file gets created at some point and the retention process kicks in. Files that only contain messages older than the retention period will be deleted.

How to understand rabbitmq Variable Queue

when I read https://github.com/rabbitmq/internals/blob/master/variable_queue.md, the variable_queue keeps messages on four queue data structures,but I am always confused why this design？Any one can give me a more intuitive explanation？
Thanks.

"q4. The need for these four queues becomes apparent once disk paging is taken into account." Per the authors from the link you provided.
Have you ever ran into a time where your queue ran into the 44 million messages range waiting to be processed? The reason for this design is those 44 million message have to go somewhere either the disk or memory, and going into memory would be really expansive.
Seems like the design for a variable queue is meant to keep messages in a queue while creating a buffer from the disk so you are never waiting for a message in any one of the other queues.
Essentially you have a queue of a queue of a queue that feeds queues messages being read from the disk to save on memory. Reading and writing to the disk is slow compared to writing/reading from memory, thus having this design seems to add some concurrency so you can keep getting your messages.

RabbitMQ - allow only one process per user

To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!

The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.

If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.

Artemis vs Activemq 5 message store

In activemq 5, each queue had a folder containing its data and messages, everything.
Which would mean that, in case of an issue, for example an out of disk space error. Some files would get corrupted before the server crash. In that case, in activemq 5, we would find logs indicating corrupted files, and we could delete the queue folder that was corrupted, resulting in small loss of messages instead of ALL messages.
In artemis, it seems that messages are stored in the same files, independently from the queue they are stored in. Which means if i get an out of disk space error, i might have to delete all my messages.
First, can you confirm the change of behaviour, and secondly, is there a way to recover ? And a bonus, if anyone know why this change happened, I would like to understand.

Artemis uses a completely new message journal implementation as compared to 5.x. The same journal is used for all messages. However, it isn't subject to the same corruption problems as you've seen with 5.x. If records from the journal can't be processed then they are simply skipped.
If you get an out of disk space error you should never need to delete all your messages. The journal files themselves are allocated and filled with zeroes to meet their configured size before they are actually used so if you were going to run out of disk space you'd do so during that process before any messages were written to them.
The Artemis journal implementation was written from the ground up for high performance specifically in conjunction with the broker's non-blocking architecture.

How to recover from JVM subprocess running OOM?

I have two JVM processes A and B. Process A communicates with the user and uses B as a slave, to do heavy computation: User -> A -> B.compute
Yet the method B.compute can run out of memory for certain inputs (it is impossible to know which). In such case I want to inform the user, that the input data he gave me is not appropriate, and I want to restart B.
I found the following (not very detailed) solutions on google:
catch the error in B catch (OutOfMemoryException e)
use JVM option -XX:OnOutOfMemoryError=restart-command
manually restart B from A
Which method is the most appropriate to use?
Please show a minimal (OS agnostic) working example.

Letting the JVM terminate abruptly is never a good design for any kind of application.
If you know that there are situations that will cause this, I would design process B to monitor its own memory usage and then terminate processing of data if it is going to run out of memory.
You can do this as simply as:
Runtime rt = Runtime.getRuntime();
long usedMem = (rt.totalMemory() - rt.freeMemory()) / 1024 / 1024;
You could set a threshold on free memory where your B process will stop processing the input, throw away all results and inform A that this is an erroneous input. The garbage collector will reclaim unused memory and return B to being ready for more input (if you really have to you could make an explicit call to System.gc() to force this but I wouldn't recommend it).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas