NServiceBus crashes when losing connectivity to RabbitMQ host - nservicebus

We're having a problem that NServiceBus crashes after about 4-5 minutes after connection lost to RabbitMQ Server.
To reproduce, I started my app, saw that RabbitMQ sees the connections, disconnected my network cable, and waited. After about 5 minutes NServiceBus host crashed.
When running in Debug, I got the following error message:
Additional information: The runtime has encountered a fatal error. The address of the error was at 0xf6a94323, on thread 0xf8b8. The error code is 0x80131623. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.
On our server we have the following in EventLog:
Application: NServiceBus.Host.exe
Framework Version: v4.0.30319
Description: The application requested process termination through System.Environment.FailFast(string message).
Message: The following critical error was encountered by NServiceBus:
Repeated failures when communicating with the broker
NServiceBus is shutting down.
Stack:
at System.Environment.FailFast(System.String, System.Exception)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
This is our RabbitMQ connection string:
<add name="NServiceBus/Transport" connectionString="host=our_host_address;VirtualHost=OurVirtualHost;UserName=OurUser;Password=******;PrefetchCount=1;DequeueTimeout=30" />
What's causing this crash? is there a way to recover from it / catch it? how can we handle disconnections from RabbitMQ server gracefully?

This happens because the circuit breaker makes sure the service does not hang but shut down if it is unable to do its work.
You can configure the endpoint to have a longer timeout if the connection is dropped, see "controlling behavior when broker connection is lost" for more information
In addition, you can set the service recovery to restart on failure.

Related

WCF service crashed without any exceptions

We have a WCF service hosted in windows service. Unfortunately, the service has crashed and upon examining the event viewer logs, found the below error message.
Framework Version: v4.0.30319
Description: The process was terminated due to an internal error in
the .NET Runtime at IP 000007FEF9A258EF (000007FEF98A0000) with exit
code 80131506.
We have already implemented proper exception handling and even though the service is crashing when an exception occurs. Also, we are unable to trace out what exactly is the error message from the event viewer logs.
Could you please guide me on the above error? Please let me know if you need any more information

Check if BeginPeek is still Subscribed

I am using BeginPeek() /no params/ to subscribe to messages coming in to my private queue. This is being done in a service hosted in NServiceBus host. When NServiceBus encounters transport connection timeout exception (i'm seeing circuit breaker armed logs and timeout exception logs), the peek event subscription seems get lost. When database connectivity becomes stable and new messages come in to my queue, the service is no longer notified.
Any ideas or suggestions on how to address this?

WCF MSMQ DllNotFoundException

I am trying to access a remote WCF service (using netMsmqBinding) hosted in a windows service and am getting the error:
Message: System.TypeInitializationException: The type initializer for 'System.ServiceModel.Channels.Msmq' threw an exception. ---> System.DllNotFoundException: Unable to load DLL 'mqrt.dll': A dynamic link library (DLL) initialization routine failed. (Exception from HRESULT: 0x8007045A)
at System.ServiceModel.Channels.UnsafeNativeMethods.MQGetPrivateComputerInformation(String computerName, IntPtr properties)
I have read that this error may come up if msmq is not installed, but msmq is not supposed to be installed on the local machine... it is installed on the remote machine it is trying to talk to.
What else can cause this?
Any machine wishing to participate in the transmission of messages requires MSMQ to be installed.
This is because MSMQ uses a messaging pattern called Store and forward, which is what makes MSMQ robust to transmission failures.
Go to Programs and Features and then Turn Windows Feature on or off. Find Microsoft Message Queue (MSMQ) Server and enable it.
credit to: https://stackoverflow.com/a/26705197/782856

WCF + NetTcp: high load make the channel stop working (calls/second rate)

First of all, sorry, i'm not fluent.
I'm trying to figure out why my WCF services stop working when we have an environment with high calls/second rate. I'm not sure that just increasing timeout will solve the issue.
We have 2 webservices:
The first is hosted on IIS 7.5, Windows Server 2008 R2 Enterprise SP1 x64, with AppFabric (and WAS)
Second, hosted on Windows Service, Windows 2003 R2 SP1 x86
Both webservices have minimum configuration: No authentication, No trasaction, Without special treating of message.. check the binding:
<netTcpBinding>
<binding transactionFlow="false">
<security mode="None">
<message clientCredentialType="None" />
<transport clientCredentialType="None"></transport>
</security>
<reliableSession enabled="false"/>
</binding>
</netTcpBinding>
We are trying to use Net.Tcp binding because of its realibility and velocity.
FACT 1 - Net.Tcp Binding is primary reason
When the load is high, the channel Net.Tcp stop working. That's it! But the BasicHttp still working like a charm.
The WindowsService: the channel net.tcp last down for some minutes (3m - 10m) before get working back (BY ITSELF, without we change anything. Goblins are working hard).
The AppFabric/IIS/WAS: the channel net.tcp keep down. Need manual restart.
The BasicHttpBinding configuration is similar to net.tcp: without any treating of the message, whitout security concerns or something like that.
FACT 2 - Without any kind of logging
We couldn't find any kind, tip, trick to figure out what's happening. I have tried Dump the memory, event logs, System.Diagnostics and nothing relevant. The most relevant tip is an Error from SMSvcHost 4.0.0.0:
An error occurred while dispatching a duplicated socket: this handle
is now leaked in the process. ID: 2272 Source:
System.ServiceModel.Activation.TcpWorkerProcess/62875109 Exception:
System.TimeoutException: This request operation sent to
http://schemas.microsoft.com/2005/12/ServiceModel/Addressing/Anonymous
did not receive a reply within the configured timeout (00:01:00). The
time allotted to this operation may have been a portion of a longer
timeout. This may be because the service is still processing the
operation or because the service was unable to send a reply message.
Please consider increasing the operation timeout (by casting the
channel/proxy to IContextChannel and setting the OperationTimeout
property) and ensure that the service is able to connect to the
client.
Server stack trace: at
System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at
System.ServiceModel.Channels.ServiceChannel.SendAsyncResult.End(SendAsyncResult
result) at
System.ServiceModel.Channels.ServiceChannel.EndCall(String action,
Object[] outs, IAsyncResult result) at
System.ServiceModel.Channels.ServiceChannelProxy.InvokeEndService(IMethodCallMessage
methodCall, ProxyOperationRuntime operation) at
System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage
message)
Exception rethrown at [0]: at
System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at
System.ServiceModel.Activation.WorkerProcess.EndDispatchSession(IAsyncResult
result) Process Name: SMSvcHost Process ID: 1532
Do you have any tip or configuration trick to help me solve this issue?
Whats the best configuration for high load scenarios?
If you generated a service reference in Visual Studio, or with the svcutil tool, make sure you always call the Close or Abort methods of your proxies. I encountered a similar problem some days ago because I forgot to call these methods.
In case you are calling the Close() and Abort() methods accordingly and still receive this error consider the following scenario:
You run a Microsoft .NET Framework 3.0-based or .NET Framework 3.5-based Windows Communication Foundation (WCF) service.
The WCF service uses the Net.Tcp Port Sharing Service (Smsvchost.exe) and is hosted on a computer that is running Internet Information Services (IIS).
One of the following conditions is true:
The CPU usage is high on the computer that is running IIS.
A throttle occurs in a service model for the WCF service.
Multiple requests are sent to the WCF service at the same time.
In this scenario, the WCF service takes longer than one minute to process a request from a client application. Additionally, an error message that assembles the following event entry is logged in the event log:
Log Name: System
Source: SMSvcHost 3.0.0.0
Date:
Event ID: 8
Task Category: Sharing Service
Level: Error
Keywords: Classic
User: LOCAL SERVICE
Computer:
Description: An error occurred while dispatching a duplicated socket: this handle is now leaked in the process.
ID: 2620
Source: System.ServiceModel.Activation.TcpWorkerProcess
Exception:
System.TimeoutException: This request operation sent to did not receive a reply within the configured timeout (00:01:00). The time allotted to this operation may have been a portion of a longer timeout. This may be because the service is still processing the operation or because the service was unable to send a reply message. Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client.
Note: You must restart IIS to recover the WCF service from this issue.
Cause:
This issue occurs because of the Smsvchost.exe process times out after one minute when it tries to transfer an incoming connection request to the W3wp.exe worker process. Additionally, this time-out is not configurable.
When the CPU has a heavy workload, or when many concurrent connection requests are incoming, the Smsvchost.exe process cannot transfer the incoming connection to the W3wp.exe worker process within one minute. Therefore, the Smsvchost.exe process times out and eventually stops responding. When this issue occurs, the Smsvchost.exe process cannot route later requests to the W3wp.exe worker process until IIS is restarted.
Solution:
Microsoft suggests applying the hot fix 2504602 that is described in Microsoft Knowledge Base (KB) article. This hot fix is available for WCF in the .NET Framework 3.0 SP2, in the .NET Framework 3.5 SP1 and the .NET Framework 4.
In addition, Microsoft claims to have solved this issue in the .Net Framework 4.5, therefore, you should upgrade to the latest version.
In case you upgrade to the .Net Framework 4.5 and the problem persists the workaround is to modify the smsvchost.exe.config file to increase timeout and pending accepts and various other parameters.

WCF Channel State Not be updated

I have a problem with my WCF application. I use a netTcpBinding for my application. And on the client side, I use ClientBase<> to connect to the host, and ICommunicationObject.State to check if the channel is still available.
The problem is after the "receiveTimeout", the TCP connection is cut, but in the client side, when I check the state, it still "Opened". And when I try to use it directly, there are exceptions.
To comfirm the disconnection of TCP socket, I use TCPView to monitor it. It is cut off after the timeout. But the state of channel is not updated.
Acutally, I add the diagnostic log in the config of server.
And I get an exception just after the timeout(at the same time the disconncection happens).
Here is the exception(on the server side):
System.ServiceModel.CommunicationObjectAbortedException, System.ServiceModel, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
System.Net.Sockets.SocketException, System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
A TCP error (995: The I/O operation has been aborted because of either a thread exit or an application request) occurred while transmitting data.
And if I try to call the service again from client, on the client side, I get this exception:
System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue.
I think it's normal for the exception on the client side. But I don't know if I need to handle the exceptions on the server side.
Someone has an idea?
Thanks every much.
The CommunicationState will stay as 'Opened' unless you explicitly Close() it or there is a fault with the Channel. Unfortunately, in your scenario until such point as you attempt to use said Channel, there is no way to determine if it is actually available apart from checking for an Exception.
I would suggest that you do not attempt to keep a Channel open past the point of it being used and explictly Close() it once you are done.
We have a wrapper that encapsulates the call, including the creation of a proxy, the service call itself and the subsequent closure of the channel and this works well.