Uncatchable errors in node.js - error-handling

So I'm trying to write a simple TCP socket server that broadcasts information to all connected clients. So when a user connects, they get added to the list of clients, and when the stream emits the close event, they get removed from the client list.
This works well, except that sometimes I'm sending a message just as a user disconnects.
I've tried wrapping stream.write() in a try/catch block, but no luck. It seems like the error is uncatchable.

The solution is to add a listener for the stream's 'error' event. This might seem counter-intuitive at first, but the justification for it is sound.
stream.write() sends data asynchronously. By the time that node has realized that writing to the socket has raised an error your code has moved on, past the call to stream.write, so there's no way for it to raise the error there.
Instead, what node does in this situation is emit an 'error' event from the stream, and EventEmitter is coded such that if there are no listeners for an 'error' event, the error is raised as a toplevel exception, and the process ends.

Peter is quite right,
and there is also another way, you can also make a catch all error handler with
process.on('uncaughtException',function(error){
// process error
})
this will catch everything which is thrown...
it's usually better to do this peter's way, if possible, however if you where writing, say, a test framework, it may be a good idea to use process.on('uncaughtException',...
here is a gist which covers (i think) all the different aways of handling errors in nodejs
http://gist.github.com/636290

I had the same problem with the time server example from here
My clients get killed and the time server then tries to write to closed socket.
Setting an error handler does not work as the error event only fires on reception. The time server does no receiving, (see stream event documentation).
My solution is to set a handler on the stream close event.
stream.on('close', function() {
subscribers.remove(stream);
stream.end();
console.log('Subscriber CLOSE: ' + subscribers.length + " total.\n");
});

Related

Boost ASIO and file descriptor reuse

I have multi-threaded (linux) server that registers async_writes and async_reads on the same native file descriptor through a socket object. I noticed under very heavy load when the server was dropping connections, on a very rare occasion a client would receive a garbled first message.
Tracking it down, the async_read detects an error on the socket and closes the socket. This closes the native file descriptor. If that file descriptor is reused before the original async_write has a chance to fire, it will find its native file descriptor valid and proceed to send its message (which is really a message from a previous session).
The only way I could see to fix this was to make the the async_read and async_write callbacks know if there were other callbacks registered and only close the socket if it were the last one.
Has anyone seen this issue?
Haven't seen it but it sounds plausible. Although I am surprised to see a new native file descriptor getting the exact same number than a recently closed descriptor.
You might want to put the socket in a shared_ptr and query shared_ptr::is_unique in both async_read and async_write. That'd be the easiest way to let the other callback know if both callbacks are registered. If is_unique is true you can be sure that no one else is still using this socket and can close it.
So if the connection gets dropped, async_read can check is_unique. If it is true, close the socket. And let go of the shared_ptr in either case.
Then, when async_write also fires it will find is_unique true and can close the socket, unless async_read has already closed it.
The only drawback is of course: async_write has to fire also (perhaps with an error code) in order to close the socket.
Oh I've seen exactly this in production code. (Much fun: we would be talking a proprietary protocol on a TCP socket to mysql server). The problem is when some thread "handles" (mis-handles) errors by closing sockets using the native handle (fd). Don't. Use shutdown (perhaps with cancel) instead and let the destructor take care of close. Of course, the real problem is the non-owning copies of the handle (fd) that are the cause of the resource race.
Critical Note:
Tracking it down, the async_read detects an error on the socket and closes the socket. This closes the native file descriptor
That's patently UNTRUE for Asio itself. Perhaps you have (third-party) code in the completion handlers doing that, but as I mention above, you cannot afford to do that.

Using Sagas with Recoverabilty

We are having an issue with recovery for messages originating from Sagas.
When a Saga sends a message for processing, the message handler can sometimes fail with an exception. We currently use a try/catch and when an exception is thrown, we "Reply" with a failed message to the Saga. The issue with this approach is that Recoverability retries don't happen since we are handling the error in the message handler.
My thought was to add custom logic to the pipeline and if the Command message implements some special Interface, the custom logic would send a failed message response to the Saga if an exception occurs (after the retries fails), but I'm not sure where to plug into the pipeline that would allow me to send messages after retries fails.
Is this a valid approach? If not, how can I solve for Saga to Handler failure messages after retries?
You can use immediate dispatch to not wait for a handler to complete.
However, I would like to suggest an alternate approach. Why not create a Timeout in the saga? If the reply from the processing-handler isn't received within a certain TimeSpan, you take an alternate path. The processing-handler gets 5 minutes and if it doesn't respond within 5 minutes, we do something else. If it still responds after 6 minutes, we know we've already taken the alternate path (use a boolean flag or so and store that inside the saga data) and put aside the reply that arrived too late.
If you want to start a discussion based on this, check our community platform.

Catch a disconnect event from ActiveMQ

Using the 1.6 version of NMS (1.6.3 activemq)
I'm setting up a listener to wait for messages.
The listener has a thread of it's own (not mine) and my code get out of scope (until the listener's function is being called).
If the ActiveMQ server disconnects, I get a global exception which I can only catch globally.
(my thread that created the listener will not catch it. I have nothing to wrap with "try" and "catch").
Is there a way to set a callback function like - OnError += ErrorHandlingFunction() as I use the listener to deal with this issue in a local way and not by global exception catcher ?
Is there a better way to deal with this issue (I can't use Transport Failure as I don't have any other options, but to wait a while, and disconnect, maybe log something or send a message that the server is offline).
There is no mechanism in the client for hooking in the async message listener to find out if the connection dropped during the processing of a message. You should really examine why you think you need such a thing there.
NMS API methods you use in the async callback will throw an exception when not connected so if you did something like try to ACK a message in the async message event handler then it would throw an exception if the connection was down.

How to detect alarm-based blocking RabbitMQ producer?

I have a producer sending durable messages to a RabbitMQ exchange. If the RabbitMQ memory or disk exceeds the watermark threshold, RabbitMQ will block my producer. The documentation says that it stops reading from the socket, and also pauses heartbeats.
What I would like is a way to know in my producer code that I have been blocked. Currently, even with a heartbeat enabled, everything just pauses forever. I'd like to receive some sort of exception so that I know I've been blocked and I can warn the user and/or take some other action, but I can't find any way to do this. I am using both the Java and C# clients and would need this functionality in both. Any advice? Thanks.
Sorry to tell you but with RabbitMQ (at least with 2.8.6) this isn't possible :-(
had a similar problem, which centred around trying to establish a channel when the connection was blocked. The result was the same as what you're experiencing.
I did some investigation into the actual core of the RabbitMQ C# .Net Library and discovered the root cause of the problem is that it goes into an infinite blocking state.
You can see more details on the RabbitMQ mailing list here:
http://rabbitmq.1065348.n5.nabble.com/Net-Client-locks-trying-to-create-a-channel-on-a-blocked-connection-td21588.html
One suggestion (which we didn't implement) was to do the work inside of a thread and have some other component manage the timeout and kill the thread if it is exceeded. We just accepted the risk :-(
The Rabbitmq uses a blocking rpc call that listens for a reply indefinitely.
If you look the Java client api, what it does is:
AMQChannel.BlockingRpcContinuation k = new AMQChannel.SimpleBlockingRpcContinuation();
k.getReply(-1);
Now -1 passed in the argument blocks until a reply is received.
The good thing is you could pass in your timeout in order to make it return.
The bad thing is you will have to update the client jars.
If you are OK with doing that, you could pass in a timeout wherever a blocking call like above is made.
The code would look something like:
try {
return k.getReply(200);
} catch (TimeoutException e) {
throw new MyCustomRuntimeorTimeoutException("RabbitTimeout ex",e);
}
And in your code you could handle this exception and perform your logic in this event.
Some related classes that might require this fix would be:
com.rabbitmq.client.impl.AMQChannel
com.rabbitmq.client.impl.ChannelN
com.rabbitmq.client.impl.AMQConnection
FYI: I have tried this and it works.

Monitor and handle MSGW messages on a job on an IBM i-series (AS/400) from Java

Does anyone know how one can automatically reply to messages with status MSGW that block a job on an IBM i-series (AS/400)?
I'm using the jt400/jtopen library to access a program on an AS/400 from Java. I'm using the com.ibm.as400.access.ProgramCall class, which works fine, unless the program fails for some reason. As with almost any program, failures will happen sometimes, but unfortunately, in this case, it does not result in a status message or an exception. Instead, the calling thread just hangs. What's worse, any call to the AS/400 to get information on the Job (another class in jt400 that mostly does what you would expect) backing the queue will hang as well.
I could of course monitor the thread in which the call runs and simply kill it after waiting for a while, but that's a last resort. Getting an error message back from the system would be nice.
You could try execute this command before invoke your pcml with com.ibm.as400.access.CommandCall.run() method:
CHGJOB INQMSGRPY(*DFT)
It sets 'C' as default answer for all messages.
but you should ensure you have log of the messages in order to know the problem which generates this message
Regards,
I don't believe Java can directly trap errors that occur on the other side of that API. What I've done is to 'harden' the RPG (IBM i side) program so that it monitors for errors rather than let the default error handler get them. When an error occurs, the RPG program gracefully terminates and passes back an error code or even the entire message back to the Java application.
I've found that you can use the timeout mechanism of ExecutorService to interrupt a ProgramCall in MSGW.
You must discard the AS400 object afterwards, and the server job is still in MSGW, but at least you can continue on the Javaside.
(You need to use a separate AS400 object if you want to investigate on the hanging job.)