I know that monitoring a "process" on a C Node is not supported, but if I try to call link for a pid originating from a C Node on an Erlang node, my C Node first gets an ERL_LINK message and then few moments later an ERL_EXIT message since obviously the linked Erlang process crashed.
Probably since my C Node did not respond to the ERL_LINK and Erlang determined that my C Node "process" has died.
I would like to know if this is supported at all? And how does the C Node need to handle the ERL_LINK message in order to make this work.
The good news:
It is actually pretty simple. It seems that I made a mistake when thinking that the linked Erlang process died since my C node did not process the ERL_LINK message.
The C node does not have to do anything if Erlang node links to a "pid" originating from it.
Once link is called, C node will get an ERL_LINK message with from/to filled out with appropriate pids.
If Erlang process dies it will get an ERL_EXIT message. Its contents will be whatever Erlang terms are specified as the Reason for EXIT message. If the C node dies/looses connection then the linked Erlang process will get an appropriate EXIT message.
The bad news:
There is no support in erl_interface for the C node to send ERL_EXIT or ERL_LINK back to Erlang. Seems that it was considered at some point but the code is left in a folder called "not_used"
Related
We have been trying to use RabbitMQ to transfer data from Project A to Project B.
We created a producer who takes the data from Project A and puts it in a queue, and that was relatively easy. Then, create a k8s pod for Project B, which listens to the appropriate queue with the ConsumerMixin of kombu.
Overall, the integration was reasonable and straightforward. But when we started to process long messages, we noticed that they were coming back into the queue repeatedly.
After research, we found out that whenever the processing of the message takes more than 20 seconds, the message showed up in the queue again, even though the processing was successful.
The source of this issue lies with the heartbeat of RabbitMQ. We set the heartbeat for 10 seconds, and the RabbitMQ checks the connection twice before it kills it. However, because the process of the callback takes more than 20 seconds, and the .ack() (acknowledge) of the message happens at the end of the callback (to ensure it was successful), the heartbeat is being blocked by the process of this message (as described here: https://github.com/celery/kombu/issues/621#issuecomment-251836611).
We have been trying to find a workaround with Threading, to process the message on a different thread and avoid the block of the heartbeat, but it didn't work. Also, it feels like we were trying to hack things and not solve the problem.
So my question here is if there is a proper workaround to handle this situation, or what alternatives do we have? RabbitMQ seemed like the right choice since we use it in standalone projects with Celery, and it is also recommended on the internet.
I'm getting an strange behavior in my DEV environment. When I send a message to my queue it's dispatched correctly, but the next message (can be same content or different) always fails and is directly sent to my deadletter queue. And then this pattern is repeated, one OK, one sent to deadletter.
In my local setup, everything is working OK, but not in my DEV env so this makes a little difficult to debug/troubleshoot. Not sure what could be wrong or different. I'm new on RabbitMq so maybe I need to include more information (if so please let me know).
Does anyone have an idea of what could be causing it? Or does anyone have experienced something like this before?
RabbitMq Version is: 3.8.2
My rabbitmq.config file is:
[{rabbitmq_management,[{tcp_config,[{port,15672}]}]}, {rabbit,[{total_memory_available_override_value,3999997952}, {tcp_listeners,[5672]}, {loopback_users,[]}]}].
My two queues are configured this way:
**my-queue.dev**
Type: Classic
Features: D, DLX
**my-queue.dev.deadletter**
Type: Classic
Features: D
Kind regards!
I have 2 nodes (x and y) on a can bus using canopen. Using a temp node "z" I send an nmt message to put all nodes in preop state and then a command to put y into operational state. I then send a bunch of extended id messages on the bus intended for node y, node x does not know of these in its dictionary. During the sending to y, node monitoring on node x says it is in preop state. All seems fine. Upon completing of sending data to node y I send a command to put all nodes into operational state. Node x is stuck in preop state according to its nmt state code. Debugging i found the rx fifo in canopen x is overflowing. It should be ignoring all these extended messages when in preop mode? I even tried in stopped mode with the same results of a stuck x. Whats going on here?
For any CAN bus node, you have to read all incoming messages continuously and ignore the ones of no interest. Filter settings in the CAN controller can help a bit, but to build rugged applications, you must always be prepared that any CAN message with any ID can appear at any time. The best way to ensure this is to always read rx fifo buffer continuously, and at each time keep reading until it is empty.
A CANopen node remains in pre-operational state as long as there are errors. Optionally, it may send out an EMCY message telling the nature of the error, and then another with all bits set to zero when the error is cleared. In which case the NMT master should wait until EMCY clear message, before sending out start remote node.
I am using RabbitMQ version 3.0.2 & I see close to 1000 message in Error queue. I want to know
At what point messages are moved to the error queues?
Is there a way to know why a certain message is being moved to an error queue?
Is there any way to move message from error queue to normal queue?
Thank you
a) they fail to deserialize or b) the consumer throws an exception processing that message five times
Not really... If you peek at the message in the queue, the payload headers might contain a note but I don't think we did that. If you turn logging on (NLog, log4net, etc) you should be able to see the exceptions in your log. You'll have to correlate message ids at that point to figure out exactly why.
There is no built in way via MassTransit. Mostly because there doesn't seem to be a great, generic way to handle this. Everyone wants some process around this. Dru did create a BusDriver app (in the main MT source repo) that could be used to move messages back to the exchange in question. This default behaviour is there so you at least know things have been failing if you don't put in the infrastructure to handle it.
To add to Travis' answer, During my development I found some other reasons for messages going onto the error queue:
The published message type has no consumer
A SAGA and a consumer are expecting the same concrete message type. Even if you try and differentiate using "Accepts" and ".Selected", both a SAGA and a Consumer should not be programmed to receive the same message type.
We've been using Rabbit successfully for about a year. Recently have upgraded to v2.6.1, because we want to use clusters with replicated message queues.
My testing has hit a puzzling behavior that smells like a Rabbit bug to me. The test that uncovers this is working with a two-node cluster. Both nodes are running v2.6.1. Both nodes have disk. Both nodes are running on Mac OS, though I doubt this is pertinent.
I'm also running Alice on the node that runs the test. The test uses it to programmatically do a stop_app on one of the nodes, because the test is trying to validate that if the cluster master fails, and a slave is elevated to take its place, that we don't lose messages.
So, the test has a small thread pool, which is given tasks that periodically 1) publish messages, and 2) toggle the state of the Rabbit master node (stopped if running; started if stopped). Other threads are consuming messages from queues.
I'm using publisher confirms, and I'm also acknowledging the messages in the consumers (using autoAck=false for channel.basicConsume()).
When the master node is stopped, I see both the producers and consumers catching ShutdownSignalException. They handle this by attempting to reconnect to the cluster. This works fine. When reconnected, they continue with their business.
Sometimes, what I see is that a consumer has successfully fetched a message from the broker, and is calling channel.basicAck() when it gets that ShutdownSignalException.
Later, when the consumer has reconnected, it again pulls down the same message. (The message bodies are tagged with a UUID, so I know it is the same one.) This time, when the consumer attempts to basicAck() the message, it again gets ShutdownSignalException, but this one has the following text in it: "reply-text=PRECONDITION_FAILED - unknown delivery tag 7".
In fact, that is the same delivery tag that was offered to the consumer by the broker before the master went down and the consumer reconnected.
Googling suggests that this event means that the consumer is attempting to ack the same message more than once.
But, how can this be so? If the first ack succeeded, then the message should have been removed from the broker's queues, and the consumer shouldn't see the same message again.
Yet, if the first ack did not succeed, then the consumer shouldn't be dinged for attempting to re-ack the message.
Anyone seen this before? It smells like a bug in Rabbit's replicated queues to me, but I've still new to Rabbit, and so am willing to believe there's a subtlety here in consuming from a clustered broker that I haven't yet grokked!
Thanks, --Steve
I'm not sure if my case matching yours, but I have seen similar "unknown delivery tag" on attempts to ack after reconnect and then the same message arrived again. Initially it looked like a bug to me, but in fact this is expected behavior. Consumer with QOS>1 may have in it's local buffer some messages and delivery tag will be invalid for all o them after reconnect. From another hand, attempt to ack even the current message after reconnect doesn't make any sense, because that message already nacked automatically on connection lost and this is why I got it again.