NServiceBus Unicast debugging - nservicebus

I have several websites that push messages to a remote MSMQ (so they can then be published to a list of subscribers). Because the websites don't consume messages -- it's just a send-and-forget set up -- I only have the following config in my web.config for the sites:
<UnicastBusConfig>
<MessageEndpointMappings>
<add Messages="MyNamespace.Messages" Endpoint="PublishingInputQueue#remoteserver" />
</MessageEndpointMappings>
</UnicastBusConfig>
I'm trying to chase down a bug where intermittently messages appear to be successfully sent via Bus.Send(..) (no exceptions are thrown), but the messages never make it to the destination queue.
Is there any other configuration I can do to help diagnose this issue? Is there a way to set up an error queue for messages that fail with UnicastBus? Is it as simple as establishing an Error queue in the MsmqTransportConfig (which I currently don't have in my config?)
Update
A little more context...
As mentioned in the comments, I'm seeing a bunch of messages apparently piled up in the outgoing message queue of the web server(s). (I can't view the actual messages, but can see a count > 0 when clicking on "Outgoing Queues" in Computer Management.)
The publisher that created the destination queue (on remoteserver) has .IsTransactional(true) in the NServiceBus startup code.
The clients that push to that queue via UnicastBus has .IsTransactional(false) in its NServiceBus startup code.
This doesn't seem to cause an issue for other client websites that push to this destination queue. HOWEVER:
The webservers that don't seem to fail are Windows Server 2008
The webservers that do fail are Windows Server 2003
No clue if that makes a difference or not
The domain user the application pools run under has full permissions to the destination queue.

For completeness sake:
It turned out that the Active Directory objects used for the MSMQ service on the servers were out of sync. I believe it occurred when our DevOps team created a single server instance, set it all up, and then cloned it for the other server instances in the cluster. This initial server worked correctly (explaining the intermittent success), but the cloned instances had an out-of-sync AD object for MSMQ. Deleting the objects to allow for recreation fixed the issue.

Related

WCF - MSMQ endpoint not found in new environment

The setup
I have a WCF service hosted in IIS/AppFabric running on Windows Server 2012R2.
The service is bound to a local transactional MSMQ queue via netMsmqBinding. My operations are decorated with TransactionScopeRequired = true.
The service operations recieves calls from a BizTalk server, handles them and send responses back to a remote queue (on the same BizTalk Server), also via a netMsmqBinding.
<endpoint name="Outbound" address="net.msmq://int01test.mydomain.com/private/queue.name" binding="netMsmqBinding" bindingConfiguration="QueueBindingConfigurationOutbound" contract="My.Outbound.Contract" />
<netMsmqBinding>
<binding name="QueueBindingConfigurationOutbound">
<security>
<transport msmqAuthenticationMode="WindowsDomain" msmqProtectionLevel="Sign" />
</security>
</binding>
</netMsmqBinding>
In the testing environment this works as intended.
Physical setup in testing environment:
Server int01test.mydomain.com hosts a BizTalk server and my inbound queue. This runs under service account mydomain\inttestuser.
Server app01test.mydomain.com hosts my application (IIS/AppFabric), my database (SQL server) and my outbound queue. This runs under service account mydomain\apptestuser.
The problem
When this solution is promoted to the acceptance testing environment, calls are still handled, but the responses are blocked with error message:
System.ServiceModel.EndpointNotFoundException: An error occurred while
opening the queue:Unrecognized error -1072824317 (0xc00e0003). The
message cannot be sent or received from the queue. Ensure that MSMQ is
installed and running. Also ensure that the queue is available to open
with the required access mode and authorization. --->
System.ServiceModel.MsmqException: An error occurred while opening the
queue:Unrecognized error -1072824317 (0xc00e0003). The message cannot
be sent or received from the queue. Ensure that MSMQ is installed and
running. Also ensure that the queue is available to open with the
required access mode and authorization.
Differences
In the testing environment, my service and my database is running on a single server instance. (The BizTalk Server and it's queue, the target of my outbound messages, is on another server though)
In the acceptance testing environment, my solution is deployed on two load balanced servers and the database is on a separate cluster.
There are also more strict external firewall rules to mimic the production environment.
Even the BizTalk server is clustered, though we communicate machine-to-machine rather than cluster-to-cluster right now.
So setup in QA Environment is:
Server int01qa.mydomain.com (clustered with int02qa.mydomain.com) hosts a BizTalk server and my inbound queue. This runs under service account mydomain\intqauser.
Server app01qa.mydomain.com (clustered with app02qa.mydomain.com) hosts my application (IIS/AppFabric) and my outbound queue. This runs under service account mydomain\appqauser.
Server db01qa.mydomain.com hosts my database.
What we've already tried
We have disabled authentication on the remote queue.
We have granted full control to the account which my service is running under as well as to "everyone".
We have, successfully, sent msmq messages manually between the two servers.
I have configured my service to send responses to a local private queue, same error.
The problem turned out to be that MSMQ couldn't find a certificate for the app pool user. That is, the
0xc00e0003, MQ_ERROR_QUEUE_NOT_FOUND
was really caused by a
0xC00E002F, MQ_ERROR_NO_INTERNAL_USER_CERT
Changing security settings to
<transport msmqAuthenticationMode="None" msmqProtectionLevel="None" />
enabled messages to be sent.
The real solution, of course, is not to disable security but to ensure that the app pool users cerificate is installed in msmq.
We came across this issue and didn't want to disable authentication. We tried a number of different approaches, but it was something to do with the User Certificate not existing we think.
We went to the App Pool of the client application (which calls the WCF endpoint via MSMQ) and changed the Load Profile property to True. The call then worked. As an aside, changing it back to false continued to work - presumably because it had already sorted the certificate issue out.

Weblogic migratable JMS consumer doesn't follow the service to the new managed server if the old server remains running

I have a JMS service targeted at a migratable target (using an Auto-Migrate Exactly-Once policy) in a cluster which consists of 2 managed servers, at any point of time the service is hosted at one of them and the consumer (which is targeted at the cluster) is supposed to receive messages seamlessly no matter where the service is hosted.
When I manually switch the host of the migratable target (clicking migrate), without turning the hosting managed server off, the consumer fails to receive messages sent to the queues, unless I turn off the previous hosting managed server forcing the consumer to the new host.
I can rule out sender problems, I can see the messages in the queue right after them being sent.
I'll be grateful if anyone can advice on how to configure either the consumer or the migratable service to work seamlessly when migration happens.
I think that may just be a misunderstanding of how migration works. The docs state Auto-Migrate Exactly-Once:
indicates that if at least one Managed Server in the candidate list
is running, then the JMS service will be active somewhere in the
cluster if servers should fail or are shut down (either gracefully or
forcibly). For example, a migratable target hosting a path service
should use this option so if its hosting server fails or is shut down,
the path service will automatically migrate to another server and so
will always be active in the cluster. Note that this value can lead to
target grouping. For example, if you have five exactly-once migratable
targets and only one server member is started, then all five
migratable targets will be activated on that server member.
The docs also state:
Manual Service Migration—the manual migration of pinned JTA and
JMS-related services (for example, JMS server, SAF agent, path
service, and custom store) after the host server instance fails
Your server/service has neither failed or shut down, you are forcing it to migrate with a healthy host still running, so it has not met the criteria for migration.
See more here as well.
I have some experience that sounds reminiscent of what you're looking at. There was some WLS-specific capability around recognizing reconfiguration in JMS destinations as part of their clustered server design.
In one case I had to call a WLS-specific method: weblogic.jms.extensions.WLSession.setExceptionListener(). This was on their implementation of the JMS Session interface. This is analogous to the standard JMS Connection.setExceptionListener().
With this WLS-specific capability, the WLSession.setExceptionListener() callback would occur at a point where the consuming client should tear down and re-establish the connection / session / consumer in reaction to a reconfiguration (migration) that had happened.

Using a web service to drop message onto an ActiveMQ Queue fails on failover

I have a two activeMQ(5.6.0) brokers. They use a shared kaha database so only one can be 'running' at once.
I have a (asp.net) webservice that puts a message on a queue, locally if I start and stop the brokers the webservice fails over correctly
when I test with the brokers on seperate machines it sometimes works but often I get "socketException: Connection reset" errors and the message is lost.
The connection string I am using is below. Note that I am aware NMS does not understand the priority backup command but I have left it there for the future.
failover:(tcp://MACHINE1:61616,tcp://MACHINE2:62616)?transport.initialReconnectDelay=1000&transport.timeout=10000&randomize=false&priorityBackup=true
How can I make my fail over between brokers fool proof?
The shared Kaha database was on a simple share. Currently activeMQ (or windows) cannot reliably get or release the lock in this configuration. The shared database must sit on a 'real' SAN so that both instances of the queue software see the database as being on a local filestore not a network location.
See this page for more info http://activemq.apache.org/shared-file-system-master-slave.html

Can't read from remote transactional private queue using WCF in workgroup mode (can do using System.Messaging !)

I have spent days reading MSDN, forums and article about this, and cannot find a solution to my problem.
As a PoC, I need to consume a queue from more than one machine since I need fault tolerance on the consumers side. Performance is not an issue since less than 100 messages a day should by exchanged.
I have coded two trivial console application , one as client, the other one as server. Using Framework 4.0 (tested also on 3.5). Messages are using transactions.
Everything runs fines on a single machine (Windows 7), even when running multiple consumers application instance.
Now I have a 2012 and a 2008 R2 virtual test servers running in the same domain (but don't want to use AD integration anyway). I am using IP address or "." in endpoint address attribute to prevent from DNS / AD resolution side effects.
Everything works fine IF the the queue is hosted by the consumer and the producer is submitting messages on the remote private queue. This is also true if I exchange the consumer / producer role of the 2012 and 2008 server.
But I have NEVER been able to make this run, using WCF, when the consumer is reading from remote queue and the producer is submitting messages localy. Submition never fails, my problem is on the consumer side.
My wish is to make this run using netMsmqBinding, but I also tried using msmqIntegrationBinding. For each test, I adapted code and configuration, then confirmed this was running ok when the consumer was consuming from the local queue.
The last test I have done is using WCF (msmqIntegrationBinding) only on the producer (local queue) and System.Messaging.MessageQueue on the consumer (remote queue) : It works fine ! => My goal is to make the same using WCF and netMsmqBinding on both sides.
In my point of view, I have proved this problem is a WCF issue, not an MSMQ one. This has nothing to do with security, authentication, firewall, transport, protocol, MSMQ version etc.
Errors info using MS Service Trace Viewer :
Using msmqIntegrationBinding when receiving the message (openning queue was ok) : An error occurred while receiving a message from the queue: The transaction specified cannot be imported. (-1072824242, 0xc00e004e). Ensure that MSMQ is installed and running. Make sure the queue is available to receive from.
Using netMsmqBinding, on opening the queue : An error occurred when converting the '172.22.1.9\private$\Test' queue path name to the format name: The queue path name specified is invalid. (-1072824300, 0xc00e0014). All operations on the queued channel failed. Ensure that the queue address is valid. MSMQ must be installed with Active Directory integration enabled and access to it is available.
If someone can help to find why my configuration cannot be handled by WCF, a much elegant and configurable way than Messaging, I would greatly appreciate !
Thank you.
You may need to post you consumer code and config to give more of an idea but it could be the construction of the queue name - e.g.
FormatName:DIRECT=TCP:192.168.0.2\SomeQueue
There are several different ways to connect to a queue and it changes when you are remote or local as well.
I have found this article in the past to help:
http://blogs.msdn.com/b/johnbreakwell/archive/2009/02/26/difference-between-path-name-and-format-name-when-accessing-msmq-queues.aspx
Also, MessageQueue Constructor on MSDN...
http://msdn.microsoft.com/en-us/library/ch1d814t.aspx

NServiceBus, why can I Bus.Send, but not Bus.Publish? Need help debugging

Ok,
Quick background
We're using NServiceBus 2.0 with pretty much the standard config, nothing "crazy" going on.
App is .NET 3.5
Dev environment is Publisher and Subscriber are on the same box, Windows 7.
Staging environment is Publisher and Subscriber are on different boxes, one Windows 7, the other Windows Server 2008.
Behaviour
On the Dev environment, Publisher Subscriber work fine, which suggests the code itself is ok in term on Starting Up, Configuring Containers etc, and all the messages being configured correctly, i.e size, serialization etc.
On the Staging environment, the Publisher SUCCESSFULLY receives the subscription request.
It also successfully stores the subscriber in the Subscription table (SQL Server, we're using DBSubscription), and the "queuename#machinename" is correct.
Problem
On Bus.Publish() nothing happens. No Outgoing Queue is created, no message sent or created anywhere, no error is thrown.
Extra Info
Interestingly a Bus.Send from the Publisher works fine! except of course I have to add this to the config:
<UnicastBusConfig>
<MessageEndpointMappings>
<add Messages="Library.Messages" Endpoint="subscriberqueue#machinename"/>
</MessageEndpointMappings>
</UnicastBusConfig>
Also the Publisher CAN resolve:
ping machinename
SO, what's going on, and what should I be looking out for?
Why does SEND work, and PUBLISH doesn't?
How can I get PUBLISH working?
Turn the logging threshold to debug and see if the publisher logs "sending message to..." during the call to publish.