Azure Service Bus Relay Occasional FaultException - wcf

We can't determine why the Azure BasicHttpRelay is throwing an occasional FaultException without any details. We've enabled WCF diagnostic tracing, but the available stack trace information is still the same. It seems like the WCF client channel fails for a brief time and then shortly returns.
We do cache the WCF Channel (e.g. CreateChannel), but this is the first time we've experienced this strange behavior. We have other Azure Service Bus relay solutions that work fine with this approach.
Error Message:
There was an error encountered while processing the request.
Stack Trace:
at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)
at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at [our WCF method]...
FaultException - FaultCode Details:
Name: ServerErrorFault
Namespace: http://schemas.microsoft.com/netservices/2009/05/servicebus/relay
IsPredefinedFault: false
IsReceiverFault: false
IsSenderFault: false
Soap Message
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header />
<s:Body>
<s:Fault>
<faultcode xmlns:a="http://schemas.microsoft.com/netservices/2009/05/servicebus/relay">a:ServerErrorFault</faultcode>
<faultstring xml:lang="en-US">There was an error encountered while processing the request.</faultstring>
<detail>
<ServerErrorFault xmlns="http://schemas.microsoft.com/netservices/2009/05/servicebus/relay" xmlns:i="http://www.w3.org/2001/XMLSchema-instance" />
</detail>
</s:Fault>
</s:Body>
</s:Envelope>
Through debugging, we can see the server properly responds to the message requests (via IDispatchMessageInspector), but the client fails to handle the response appropriately (IClientMessageInspector reports fault). Subsequent relay requests will succeed after the client channel seemingly corrects itself. These failures seem to be intermittent and not load-driven. We never see these FaultException errors using basicHttpBinding outside the Azure relay.
Does anyone have any suggestions? We are using Azure SDK 1.8.
I've tried configured a new Service Bus Relay namespace using owner shared secret, but still seeing the same results.

After reaching out to MS - this issue turned out to be an MS bug with the Relay or the SDK, specifically when using Http Connectivity Mode. At this point, the only workaround is ensure you have the appropriate outgoing TCP ports opened up to ensure reliable connectivity with the Azure Relay.
Allow Outgoing TCP Ports: 9350 - 9354
MS has told us that they are still working on resolving the root cause. Hopefully this workaround will help others. Our corporate firewall had these TCP ports blocked which forced all communication over port 80 which must trigger this issue. The positive thing is that opening up these ports enables faster connectivity to the relay when starting up your listeners (AutoDetect doesn't have to check the TCP ports availability every time).

Related

Unable to load DLL 'mqrt.dll'

I've developed a WCF Service which is hosted as a Windows Service and exposes a MSMQ endpoint.
I have the client app on SERVER1, and the MSMQ and WCF Service on SERVER2.
When the SERVER1/ClientApp attempts to push a message on to the SERVER2 MSMQ, I get the following errror:
System.TypeInitializationException: The type initializer for 'System.ServiceModel.Channels.Msmq' threw an exception. ---> System.DllNotFoundException: Unable to load DLL 'mqrt.dll': The specified module could not be found. (Exception from HRESULT: 0x8007007E)
at System.ServiceModel.Channels.UnsafeNativeMethods.MQGetPrivateComputerInformation(String computerName, IntPtr properties)
at System.ServiceModel.Channels.MsmqQueue.GetMsmqInformation(Version& version, Boolean& activeDirectoryEnabled)
at System.ServiceModel.Channels.Msmq..cctor()
--- End of inner exception stack trace ---
at System.ServiceModel.Channels.Msmq.EnterXPSendLock(Boolean& lockHeld, ProtectionLevel protectionLevel)
at System.ServiceModel.Channels.MsmqOutputChannel.OnSend(Message message, TimeSpan timeout)
at System.ServiceModel.Channels.OutputChannel.Send(Message message, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
Exception rethrown at [7]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at FacilityManager.Service.NotificationsProcessorServiceReference.INotificationsProcessor.SendNewReactiveTaskNotifications(NewReactiveTaskDataContract newReactiveTaskDataContract)
Both SERVER1 and SERVER2 are running Windows Server 2008 R2 Enterprise (6.1 SP1), and both have had MSMQ installed via the Add Features in Server Manager.
I understand that the DLL is missing (fairly obvious from the error!), but I've no idea what I should be installing to get the dll where it should be.
A search in Windows Explorer shows that the DLL is present in the following directories on both servers....
C:\Windows\System32
C:\Windows\SysWOW64
C:\Windows\winsxs\x86_microsoft-windows-msmq-runtime-core_31bf3856ad364e35_6.1.7601.17514_none_5768e2ad17453bd6
C:\Windows\winsxs\amd64_microsoft-windows-msmq-runtime-core_31bf3856ad364e35_6.1.7601.17514_none_b3877e30cfa2ad0c
Any help appreciated.
An obvious aside; If you don't have the Windows Feature -> Microsoft Message Queue (MSMQ) Server installed then you will get this error. Simply go to Programs and Features and then Turn Windows Feature on or off.
I'm none the wiser but things are working now.
After hours on SO and Google, I ended up just checking that MSMQ was installed on both Servers by writing a quick console application with the code grabbed from here...
https://stackoverflow.com/a/16104212/192999
I ran the console app on both Server1 and Server2 and both came back with a result of True to IsMsmqInstalled.
I then ran my application and the "Unable to load DLL 'mqrt.dll'" error was no longer being raised.
I don't know if the call to NativeMethods.LoadLibrary("Mqrt.dll"); registered the DLL or something, but it certainly fixed my problem.
I hope this helps someone in the future!
This can be caused by your service on SERVER2 starting up and finishing its initialization before MSMQ is done initializing itself. The easiest way to test this is to restart the service hosting the WCF MSMQ endpoint. If the WCF service is hosted in IIS, perhaps bouncing the app pool will do the same thing, but I do not know for sure -- I've never dealt with an IIS hosted MSMQ endpoint.
If restarting the service fixes your problem and your own service is a Windows service, you can then add MSMQ as a dependency to your own service so that it will delay its startup until MSMQ is ready. This answer on Server Fault describes how to do it. Incidentally, the service you want to depend on is called "Message Queueing"

WCF + NetTcp: high load make the channel stop working (calls/second rate)

First of all, sorry, i'm not fluent.
I'm trying to figure out why my WCF services stop working when we have an environment with high calls/second rate. I'm not sure that just increasing timeout will solve the issue.
We have 2 webservices:
The first is hosted on IIS 7.5, Windows Server 2008 R2 Enterprise SP1 x64, with AppFabric (and WAS)
Second, hosted on Windows Service, Windows 2003 R2 SP1 x86
Both webservices have minimum configuration: No authentication, No trasaction, Without special treating of message.. check the binding:
<netTcpBinding>
<binding transactionFlow="false">
<security mode="None">
<message clientCredentialType="None" />
<transport clientCredentialType="None"></transport>
</security>
<reliableSession enabled="false"/>
</binding>
</netTcpBinding>
We are trying to use Net.Tcp binding because of its realibility and velocity.
FACT 1 - Net.Tcp Binding is primary reason
When the load is high, the channel Net.Tcp stop working. That's it! But the BasicHttp still working like a charm.
The WindowsService: the channel net.tcp last down for some minutes (3m - 10m) before get working back (BY ITSELF, without we change anything. Goblins are working hard).
The AppFabric/IIS/WAS: the channel net.tcp keep down. Need manual restart.
The BasicHttpBinding configuration is similar to net.tcp: without any treating of the message, whitout security concerns or something like that.
FACT 2 - Without any kind of logging
We couldn't find any kind, tip, trick to figure out what's happening. I have tried Dump the memory, event logs, System.Diagnostics and nothing relevant. The most relevant tip is an Error from SMSvcHost 4.0.0.0:
An error occurred while dispatching a duplicated socket: this handle
is now leaked in the process. ID: 2272 Source:
System.ServiceModel.Activation.TcpWorkerProcess/62875109 Exception:
System.TimeoutException: This request operation sent to
http://schemas.microsoft.com/2005/12/ServiceModel/Addressing/Anonymous
did not receive a reply within the configured timeout (00:01:00). The
time allotted to this operation may have been a portion of a longer
timeout. This may be because the service is still processing the
operation or because the service was unable to send a reply message.
Please consider increasing the operation timeout (by casting the
channel/proxy to IContextChannel and setting the OperationTimeout
property) and ensure that the service is able to connect to the
client.
Server stack trace: at
System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at
System.ServiceModel.Channels.ServiceChannel.SendAsyncResult.End(SendAsyncResult
result) at
System.ServiceModel.Channels.ServiceChannel.EndCall(String action,
Object[] outs, IAsyncResult result) at
System.ServiceModel.Channels.ServiceChannelProxy.InvokeEndService(IMethodCallMessage
methodCall, ProxyOperationRuntime operation) at
System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage
message)
Exception rethrown at [0]: at
System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at
System.ServiceModel.Activation.WorkerProcess.EndDispatchSession(IAsyncResult
result) Process Name: SMSvcHost Process ID: 1532
Do you have any tip or configuration trick to help me solve this issue?
Whats the best configuration for high load scenarios?
If you generated a service reference in Visual Studio, or with the svcutil tool, make sure you always call the Close or Abort methods of your proxies. I encountered a similar problem some days ago because I forgot to call these methods.
In case you are calling the Close() and Abort() methods accordingly and still receive this error consider the following scenario:
You run a Microsoft .NET Framework 3.0-based or .NET Framework 3.5-based Windows Communication Foundation (WCF) service.
The WCF service uses the Net.Tcp Port Sharing Service (Smsvchost.exe) and is hosted on a computer that is running Internet Information Services (IIS).
One of the following conditions is true:
The CPU usage is high on the computer that is running IIS.
A throttle occurs in a service model for the WCF service.
Multiple requests are sent to the WCF service at the same time.
In this scenario, the WCF service takes longer than one minute to process a request from a client application. Additionally, an error message that assembles the following event entry is logged in the event log:
Log Name: System
Source: SMSvcHost 3.0.0.0
Date:
Event ID: 8
Task Category: Sharing Service
Level: Error
Keywords: Classic
User: LOCAL SERVICE
Computer:
Description: An error occurred while dispatching a duplicated socket: this handle is now leaked in the process.
ID: 2620
Source: System.ServiceModel.Activation.TcpWorkerProcess
Exception:
System.TimeoutException: This request operation sent to did not receive a reply within the configured timeout (00:01:00). The time allotted to this operation may have been a portion of a longer timeout. This may be because the service is still processing the operation or because the service was unable to send a reply message. Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client.
Note: You must restart IIS to recover the WCF service from this issue.
Cause:
This issue occurs because of the Smsvchost.exe process times out after one minute when it tries to transfer an incoming connection request to the W3wp.exe worker process. Additionally, this time-out is not configurable.
When the CPU has a heavy workload, or when many concurrent connection requests are incoming, the Smsvchost.exe process cannot transfer the incoming connection to the W3wp.exe worker process within one minute. Therefore, the Smsvchost.exe process times out and eventually stops responding. When this issue occurs, the Smsvchost.exe process cannot route later requests to the W3wp.exe worker process until IIS is restarted.
Solution:
Microsoft suggests applying the hot fix 2504602 that is described in Microsoft Knowledge Base (KB) article. This hot fix is available for WCF in the .NET Framework 3.0 SP2, in the .NET Framework 3.5 SP1 and the .NET Framework 4.
In addition, Microsoft claims to have solved this issue in the .Net Framework 4.5, therefore, you should upgrade to the latest version.
In case you upgrade to the .Net Framework 4.5 and the problem persists the workaround is to modify the smsvchost.exe.config file to increase timeout and pending accepts and various other parameters.

WCF Remote MSMQ - I can write to a remote queue, but cannot receive

jobsServer: Windows Server 2008 R2
.NET Version: 4.5
I'm using WCF to connect two servers - app and queue. I want app to be able to send/receive messages from queue. For some reason, app can send messages, but CANNOT receive them.
The netMsmq binding looks like:
<binding name="JobExecutionerBinding" receiveErrorHandling="Move">
<security>
<transport msmqAuthenticationMode="None" msmqProtectionLevel="None" />
</security>
</binding>
And the service binding looks like:
Now, the client binding looks like:
<endpoint address="net.msmq://queue/private/jobs"
binding="netMsmqBinding"
bindingConfiguration="JobExecutionerBinding"
contract="JobExecution.Common.IJobExecutionService"
name="SimpleEmailService"
kind=""
endpointConfiguration=""/>
I changed a few names for security's sake.
So, the WC client can send to the remote queue without a problem. It even properly queues the outgoing message and forwards it on later in the event that the remote queue server is down. But every time I start up the WCF service, I get this:
There was an error opening the queue. Ensure that MSMQ is installed
and running, the queue exists and has proper authorization to be read
from. The inner exception may contain additional information. --->
System.ServiceModel.MsmqException: An error occurred while opening the
queue:The queue does not exist or you do not have sufficient
permissions to perform the operation. (-1072824317, 0xc00e0003). The
message cannot be sent or received from the queue. Ensure that MSMQ is
installed and running. Also ensure that the queue is available to open
with the required access mode and authorization. at
System.ServiceModel.Channels.MsmqQueue.OpenQueue() at
System.ServiceModel.Channels.MsmqQueue.GetHandle() at
System.ServiceModel.Channels.MsmqQueue.SupportsAccessMode(String
formatName, Int32 accessType, MsmqException& msmqException) --- End
of inner exception stack trace --- at
System.ServiceModel.Channels.MsmqVerifier.VerifyReceiver(MsmqReceiveParameters
receiveParameters, Uri listenUri) at
System.ServiceModel.Channels.MsmqTransportBindingElement.BuildChannelListener[TChannel](BindingContext
context) at
System.ServiceModel.Channels.Binding.BuildChannelListener[TChannel](Uri
listenUriBaseAddress, String listenUriRelativeAddress, ListenUriMode
listenUriMode, BindingParameterCollection parameters) at
System.ServiceModel.Description.DispatcherBuilder.MaybeCreateListener(Boolean
actuallyCreate, Type[] supportedChannels, Binding binding,
BindingParameterCollection parameters, Uri listenUriBaseAddress,
String listenUriRelativeAddress, ListenUriMode listenUriMode,
ServiceThrottle throttle, IChannelListener& result, Boolean
supportContextSession) at
System.ServiceModel.Description.DispatcherBuilder.BuildChannelListener(StuffPerListenUriInfo
stuff, ServiceHostBase serviceHost, Uri listenUri, ListenUriMode
listenUriMode, Boolean supportContextSession, IChannelListener&
result) at
System.ServiceModel.Description.DispatcherBuilder.InitializeServiceHost(ServiceDescription
description, ServiceHostBase serviceHost) at
System.ServiceModel.ServiceHostBase.InitializeRuntime() at
I've been all over StackOverflow and the internet for 8 hours. Here's what I've done:
Ensured that ANONYMOUS LOGIN, Everyone, Network, Network Service, and Local Service have full control
Stopped the remote MSMQ server and observed what the WCF service does, and I get a different error - so I'm sure that the WCF service when starting up is speaking to the MSMQ server
Disabled Windows Firewall on both boxes and opened all ports via EC2 security groups
Set AllowNonauthenticatedRpc and NewRemoteReadServerAllowNoneSecurityClient to 1 in the registry
Configured MS DTC on both servers (the queue is transactional, but I get the same error regardless as to whether the queue is transactional or not)
Confirmed that the WCF server starts up fine if I use the local queue, and receives without a problem
Help!!! I can't scale my app without a remote queueing solution.
It's not clear from your post which tier can not read and more importantly which queue.
However, reading remote queues transactionally is not supported:
Message Queuing supports sending transactional messages to remote queues, but does not support reading messages from a remote queue within a transaction. This means that reliable, exactly-once reception is not available from remote queues. See
Reading Messages from Remote Queues
I suspect that somewhere your system is still performing transactional remote reads even though you mentioned you disabled it.
From a best practice point of view, even if you got it to work, your design will not scale which is a shame as it is something you mentioned you wanted.
Remote reading is a high-overhead and therefore inefficient process. Including remote read operations in an application limits scaling. 1
You should always remote write not remote read.
A better way is to insert a message broker or router service that acts as the central point for messaging. Your app and queue services (confusing names by the way) should merely read transactionally from their local queues.
i.e.
app should transactionally read it's local queue
app should transactionally send to the remote broker
broker transactionally reads local queue
broker transactionally sends to remote queue
Similarly if your queue tier wanted to reply the reverse process to the above would occur.
Later if you wish to improve performance you can introduce a Dynamic router which redirects a message to a different remote queue on another machine based on dynamic rulesets or environmental conditions such as stress levels.
Remote transactional reads are supported as of MSMQ 4.0 (Windows server 2008). If you are facing this issue, be sure to checkout https://msdn.microsoft.com/en-us/library/ms700128(v=vs.85).aspx

Why WCF Service hosted inside windows service dies after some time

I have hosted a WCF service inside a Windows service using C#. It works fine and I was able to communicate with the WCF service from a client application.
But the issue is if I leave the client idle for 10 min or so and then try to connect again, I get the following error
Server stack trace:
at System.ServiceModel.Channels.CommunicationObject.ThrowIfDisposedOrNotOpen()
at System.ServiceModel.Channels.ServiceChannel.Call(String action,
Boolean oneway, ProxyOperationRuntime operation, Object[] ins,
Object[] outs, TimeSpan timeout)
It is not the windows service that is down, it is your client proxy.
You say that you leave the client idle. You should not do this. You should close the client after you have made your request. Then open it when needed.
This happends when your service binding ReceiveTimeout setting is left at its default value (10 minutes).
To set this to "forever", you can set in your config file:
ReceiveTimeout = "infinite"
or by code:
binding.ReceiveTimeout = TimeSpan.MaxValue;

A SelfHosted WCF Service over Basic HTTP Binding doesn't support more than 1000 concurrent requests

I have self hosted a WCF Service over BasicHttpBinding consumed by an ASMX Client. I'm simulating a concurrent user load of 1200 users. The service method takes a string parameter and returns a string. The data exchanged is less than 10KB. The processing time for a request is fixed at 2 seconds by having a Thread.Sleep(2000) statement. Nothing additional. I have removed all the DB Hits / business logic.
The same piece of code runs fine for 1000 concurrent users. I get the following error when I bump up the number to 1200 users.
System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a receive. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace ---
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.PooledStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.Connection.SyncRead(HttpWebRequest request, Boolean userRetrievedStream, Boolean probeRead)
--- End of inner exception stack trace ---
at System.Web.Services.Protocols.WebClientProtocol.GetWebResponse(WebRequest request)
at System.Web.Services.Protocols.HttpWebClientProtocol.GetWebResponse(WebRequest request)
at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
at WCF.Throttling.Client.Service.Function2(String param)
This exception is often reported on DataContract mismatch and large data exchange. But never when doing a load test. I have browsed enough and have tried most of the options which include,
Enabled Trace & Message log on server side. But no errors logged.
To overcome Port Exhaustion MaxUserPort is set to 65535, and TcpTimedWaitDelay 30 secs.
MaxConcurrent Calls is set to 600, and MaxConcurrentInstances is set to 1200.
The Open, Close, Send and Receive Timeouts are set to 10 Minutes.
The HTTPWebRequest KeepAlive set to false.
I have not been able to nail down the issue for the past two days.
Any help would be appreciated.
Thank you.
If there are no errors in the service-side WCF logs, I suspect you are hitting some kind of limit in the HTTP.SYS driver layer, leading to requests being turned away before the service application sees them. I think the default limit in the request queue for a particular application may be 1000.
I'm no expert on HTTP.SYS but you may get some insight by running:
netsh http show servicestate
I have seen similar problems on different servers, depending on their CPUs and RAM. You did not mention the server type, how it was upgraded (XP Pro or Server 2003 upgraded to Server 2008), etc. The way I had resolved the issue was through checking the x:\Windows\Microsoft.NET\Framework[version]\config\machine.config. Apparently selecting "unlimited" connection through IIS does not mean "unlimited" connections. The number of connections I ran into errored out after 11 requests at the exact same millisecond.
The issue was related to the number of connections coming from the same source. The performance benchmark tool resided on the same PC which has the same IP. The machine.config contains a constraint on the number of connections from the same source.