NServiceBus Subscriber failing on server - nservicebus

I have a rather simple Pub/sub setup which works fine on our developer machines but when I deploy to our test serveres it throws this error for all messages:
System.NullReferenceException: Object reference not set to an instance of an object.
at NServiceBus.Unicast.UnicastBus.HandleTransportMessage(IBuilder childBuilder, TransportMessage msg) in c:\BuildAgent\work\nsb.master_6\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 1328
at NServiceBus.Unicast.UnicastBus.TransportMessageReceived(Object sender, TransportMessageReceivedEventArgs e) in c:\BuildAgent\work\nsb.master_6\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 1247
at System.EventHandler`1.Invoke(Object sender, TEventArgs e)
at NServiceBus.Unicast.Transport.Transactional.TransactionalTransport.OnTransportMessageReceived(TransportMessage msg) in c:\BuildAgent\work\nsb.master_6\src\impl\unicast\transport\NServiceBus.Unicast.Transport.Transactional\TransactionalTransport.cs:line 480
We allready have other SendOnly, Distributors and workers running on the same servers, so msmq etc. should be installed corretly. This is the first time however we are using Pub/Sub on these servers.
If i use the exact same binaries and config on a developer machine it runs smoothly, but not on the servers which are 2008R2, Powershell V3.
We are using a fluent configuration for the subscriber:
return NServiceBus.Configure.With()
.DefineEndpointName(queuePrefix)
.Log4Net(_serviceBusLog.Build())
.StructureMapBuilder()
.JsonSerializer()
.License(ConfigTable.GetConfigString(ConfigTableKeys.NServiceBus, "License"))
.MsmqTransport()
.IsTransactional(true)
.RunTimeoutManagerWithInMemoryPersistence()
.EnablePerformanceCounters()
.UnicastBus()
.CreateBus()
.Start(() => NServiceBus.Configure.Instance.ForInstallationOn<NServiceBus.Installation.Environments.Windows>().Install());
We also have our own UnicastBus config which scans for message handlers (they're message types) and then automatically creates the endpoint mappings. This was my first concern so I disabled it and used the app.config way of setting up endpoints, but the error still occurs.
Note the error occours for every single message.
Note we are running version 3.3.5 of NSB.
Im still travering the server settings as I believe there must be some difference that makes it tick but i have not found it yet.
Anyone has any recommendations as for what to look for?
Kind regards

It appears that I have found the error.
After testing a raw simple console Pub/Sub on the server I added a try catch in the handler and caught... My own exception....
Im embrassed.
But it appears that the exception is not forwarded correctly to the log in NSB and i was therefore completely thrown of from the real problem.
I do not know if this is something that is fixed in later versions of NSB, but i hope so.
Until then Im using my own try catch logic to add a custom log entry.
Kind regards.

Related

MassTransit - Socket exception with AmazonMQ when starting bus

I'm trying to get a basic PoC app running with MassTransit using our Amazon MQ instance, and running into the following problem when I call StartAsync on IBusControl:
MassTransit.ActiveMqTransport.ActiveMqConnectException: Connection exception: (user)#(host)
---> Apache.NMS.NMSConnectionException: Error connecting to (host) ---> System.Net.Sockets.SocketException (0xFFFFFFFE): Unknown error (0xfffffffe)
at Apache.NMS.ActiveMQ.Transport.Tcp.TcpTransportFactory.DoConnect(String host, Int32 port, String localAddress, Int32 localPort)
Note: In the exception above, I've edited the items in bold to remove sensitive information. We know that the credentials we are using are in fact correct since we have integration tests for NMS and ActiveMq that use the same credentials. But when trying to connect using MassTransit, we get the above error.
I've tried a number of different approaches but they all produce the same result. Here's some example code to give a general idea of how we're trying to connect:
var busControl = Bus.Factory.CreateUsingActiveMq(configurator =>
{
configurator.Host(host, activeMqHostConfigurator =>
{
activeMqHostConfigurator.Username(activeMqConfiguration.UserName);
activeMqHostConfigurator.Password(activeMqConfiguration.Password);
});
});
await busControl.StartAsync(new CancellationTokenSource(TimeSpan.FromSeconds(10)).Token);
The call to StartAsync is what throws the exception. I have my doubts that this is an issue with MassTransit, it's more likely something that I'm missing but I cannot see what's wrong, and I've had my team review it as well.
As I mentioned in my comment this ended up not being related to MassTransit. It was due to the host being inactive.

Topshelf Windows Service times out Error 7000 7009

I have a windows service programmed in vb.NET, using Topshelf as Service Host.
Once in a while the service doesn't start. On the event log, the SCM writes errors 7000 and 7009 (service did not respond in a timely fashion). I know this is a common issue, but I (think) I have tried everything with no result.
The service only relies in WMI, and has no time-consuming operations.
I read this question (Error 1053: the service did not respond to the start or control request in a timely fashion), but none of the answers worked for me.
I Tried:
Set topshelf's start timeout.
Request additional time in the first line of "OnStart" method.
Set a periodic timer wich request additional time to the SCM.
Remove TopShelf and make the service with the Visual Studio Service Template.
Move the initialization code and "OnStart" code to a new thread to return inmediately.
Build in RELEASE mode.
Set GeneratePublisherEvidence = false in the app.config file (per application).
Unchecked "Check for publisher’s certificate revocation" in the internet settings (per machine).
Deleted all Alternate Streams (in case some dll was marked as web and blocked).
Removed any "Debug code"
Increased Window's general service timeout to 120000ms.
Also:
The service doesn't try to communicate with the user's desktop in any way.
The UAC is disabled.
The Service Runs on LOCAL SYSTEM ACCOUNT.
I believe that the code of the service itself is not the problem because:
It has been on production for over two years.
Usually the service starts fine.
There is no exception logged in the Event Log.
The "On Error" options for the service dosn't get called (since the service doesn't actually fails, just doesn't respond to the SCM)
I've commented out almost everything on it, pursuing this error! ;-)
Any help is welcome since i'm completely out of ideas, and i've been strugling with this for over 15 days...
For me the 7009 error was produced by my NET core app because I was using this construct:
var builder = new ConfigurationBuilder()
.SetBasePath(Directory.GetCurrentDirectory())
.AddJsonFile("appsettings.json");
and appsettings.json file obviously couldn't be found in C:\WINDOWS\system32.. anyway, changing it to Path.Combine(AppContext.BaseDirectory, "appsettings.json") solved the issue.
More general help - for Topshelf you can add custom exception handling where I finally found some meaningfull error info, unlike event viewer:
HostFactory.Run(x => {
...
x.OnException(e =>
{
using (var fs = new StreamWriter(#"C:\log.txt"))
{
fs.WriteLine(e.ToString());
}
});
});
I've hit the 7000 and 7009 issue, which fails straight away (even though the error message says A timeout was reached (30000 milliseconds)) because of misconfiguration between TopShelf and what the service gets installed as.
The bottom line - what you pass in HostConfigurator.SetServiceName(name) needs to match exactly the SERVICE_NAME of the Windows service which gets installed.
If they don't match it'll fail straight away and you get the two event log messages.
I had this start happening to a service after Windows Creator's Edition update installed. Basically it made the whole computer slower, which is what I think triggered the problem. Even one of the Windows services had a timeout issue.
What I learned online is that the constructor for the service needs to be fast, but OnStart has more leeway with the SCM. My service had a C# wrapper and it included an InitializeComponent() that was called in the constructor. I moved that call to OnStart and the problem went away.

Configuring DataProtectionSecurityStateEncoder, to resolve CryptographicException in Web Farm

We have an Authenticated WCF service running in a web farm that is intermittently throwing this error:
MessageSecurityException: The SecurityContextSecurityToken has an invalid Cookie. The following error occurred when processing the Cookie: 'Error decoding the Cookie element of SecurityContextSecurityToken.'. ---> CryptographicException: The DataProtectionSecurityStateEncoder is unable to decode the byte array. Ensure that a 'UserProfile' is loaded, if this is a 'web farm scenario' ensure all servers are running as the same user with the roaming profiles or provide a custom SecurityStateEncoder'. ---> CryptographicException: Key not valid for use in specified state.
I've spent a fair bit of time digging into the above, and I believe I understand the error.... however I can't find any information on how to configure the DataProtectionSecurityStateEncoder.
I would like to configure the encoder to use the local computer settings (we've sync'd machine keys, etc) but I'm completely stuck. Can anyone point me in the right direction?
An update with a possible solution, I believe we may be able to resolve this by:
protected void Application_BeginRequest(object sender, EventArgs e)
{
OperationContext.Current.Host.Credentials.SecureConversationAuthentication.SecurityStateEncoder = new DataProtectionSecurityStateEncoder(false);
}
We are still in the process of testing this, however this issue has been parked whilst we resolve some high priority stuff that has come up.
Noting here in case anyone else has a similar issue, and can't find a solution (and who, like us, has a web farm running outside of a domain).
I will come back and update this answer when we return to the work.

WCF Net.Msmq Service occasionally faults

I have a self-hosted WCF service (runs inside a windows service). This service listens for messages on an MSMQ. The service is PerCall, and Transactional running on Windows 2008 R2, .NET 4.0, MSMQ 5.0.
Once every couple of weeks the service will stop processing messages. The windows service remains running but the WCF servicehost itself stops. The servicehost faults with the following exception:
Timestamp: 3/21/2015 5:37:06 PM Message: HandlingInstanceID:
a26ffd8b-d3b4-4b89-9055-4c376d586268 An exception of type
'System.ServiceModel.MsmqException' occurred and was caught.
--------------------------------------------------------------------------------- 03/21/2015 13:37:06 Type : System.ServiceModel.MsmqException,
System.ServiceModel, Version=4.0.0.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089 Message : An error occurred while
receiving a message from the queue: The transaction's operation
sequence is incorrect. (-1072824239, 0xc00e0051). Ensure that MSMQ is
installed and running. Make sure the queue is available to receive
from. Source : System.ServiceModel Help link : ErrorCode :
-1072824239 Data : System.Collections.ListDictionaryInternal TargetSite : Boolean TryReceive(System.TimeSpan,
System.ServiceModel.Channels.Message ByRef) dynatrace_invocationCount
: 0 Stack Trace : at
System.ServiceModel.Channels.MsmqInputChannelBase.TryReceive(TimeSpan
timeout, Message& message) at
System.ServiceModel.Dispatcher.InputChannelBinder.TryReceive(TimeSpan
timeout, RequestContext& requestContext) at
System.ServiceModel.Dispatcher.ErrorHandlingReceiver.TryReceive(TimeSpan
timeout, RequestContext& requestContext)
Searching for the particular exception ("The transaction's operation sequence is incorrect") doesn't yield a lot of info. And most suggestions for how to remedy a faulted services is to restart the servicehost within the faulted event.
I can do that but I hoping that there is a known fixable cause for this exception and/or whether there is a cleaner way to handle it.
We have faced this issue in our product and we opened a ticket with Microsoft, at the end they admits its a bug in .NET Framework and it will be fixed soon.
The issue was reported on windows server 2008 and 2012 but never on 2016 or windows 10.
So we did two solution, recommended all customers to upgrade to Windows 2016, and we added a code to handle the on fault for the service host to restart the service (You can simulate the same error by restarting the MSMQ service while the WCF service host is open.
The code to restore the service is as below:
first you add an event handler for your host to handle "Faulted" event:
SH.Faulted += new EventHandler(SH_Faulted);
//SH is the ServiceHost
Then inside the event handler
private static void SH_Faulted(object sender, EventArgs e)
{
if (SH.State != CommunicationState.Opened)
{
int intSleep = 15 * 1000;
//Abort the host
SH.Abort();
//Remove the event
SH.Faulted -= new EventHandler(SH_Faulted);
//I sleep to make sure that the MSMQ have enough time to recover, better make it optional.
System.Threading.Thread.Sleep(intSleep);
try
{
ReConnectCounter++;
LogEvent(string.Format("Service '{0}' faulted restarting service count # {1}", serviceName, ReConnectCounter));
//Restart the service again here
}
catch (Exception ex)
{
//failed.. .you can retry if you like
}
}
}
Eventually the error will happen again, but your service will continue working fine, till Microsoft solves the issue or you upgrade to 2016
Updated:
After further investigation, and help from Microsoft we found the root cause of the issue, which is the order of the timeout between the below:
MachineLeveDTCTimeOut(20 minutes) >=
DefaultTimeOut(15 minutes) >=
WCF service transactionTimeout >
receiveTimeout()
So by adding the below it should fix this issue:
<system.transactions>
<defaultSettings timeout="00:05:00"/>
</system.transactions>
More detailed article:
https://blogs.msdn.microsoft.com/asiatech/2013/02/18/wcfmsmq-intermittent-mq_error_transaction_sequence-error/
We have the same problem in our production environment. Unfortunately, there is an issue opened with Microsoft about it, but it's marked "Closed as Deferred" since 2013. The following workaround is mentioned by EasySR20:
If you set the service's receiveTimeout a few seconds less than the
service's transactionTimeout this will prevent the exception from
happening and taking down the service host. These are both settings
that can be set in the server's app.config file.
I haven't confirmed this resolves the issue, but it's one option.
We have implemented the service fault restart option instead.

NServiceBus message handler not going to 'error' queue on exception

I have a sample NServiceBus application to test the waters.. All is going well, sending and handling is working correctly.
I have deliberately thrown an exception within a certain message handler to see what happens - but nothing does. The exception is logged correctly to the console, yet the message is pulled off the queue and NOT placed in the error queue as I'd expect. Also, the 5 times retry didn't occur either. Is this correct behaviour?
Also, the queue was created correctly at startup when first specified.
the config and bootstrap code for the server (where the handler resides are below)
config:
<MsmqTransportConfig
InputQueue="SiteServer1"
NumberOfWorkerThreads="1"
MaxRetries="5"
ErrorQueue="SiteServer1Errors"
/>
program.cs:
var bus = NServiceBus.Configure.With()
.Log4Net()
.CastleWindsorBuilder(container)
.XmlSerializer()
.MsmqTransport()
.UnicastBus()
.LoadMessageHandlers()
.CreateBus()
.Start();
Am I missing anything here?
I modified the bootstrapper code to include
.IsTransactional(true)
on the bus config and now it is working! It seems that non-transactional messages are disposable. Makes sense!
Are you running Windows Server 2008? If so you will find an event log in the event viewer, under the Application event logs -> windows -> MSMQ - End2End. This will record every action taken by the MSMQ subsystem on your machine.
I am guessing that NSB has tried to send the message to the error queue. However, what is really hapenning is that the MSMQ subsystem on your machine has consumed the message, but not been able to deliver it to the error queue for some reason.
I would look in the MSMQ log for an idea of what is going on.