Current Setup
We have a UI (well more than 1 UI, but that is not relevant), and we have 2 load balanced app servers. Such the UI will talk to an alias, behind which are the 2 load balancer app servers.
The app servers are also self hosting NServiceBus endpoints. The app server (this could be either App Server 1 or App Server 2 ) that is dealing with the current request is capable of doing the following using the self hosted NServiceBus:
Send a message locally (this is a calculation that can be run at any
time, and it doesn’t matter who triggers it, it is just a trigger to
do the calculation)
Send a command to the publisher on the Ancillary
Service Box (the publisher pushes new event to Worker 1 and Worker 2)
Send a command to Worker 1 directly on the Ancillary Services Box
Send a command to Worker 2 directly on the Ancillary Services Box
The "App Server(s)" current App.Config
As such the App.Config for each app server has something like this
<UnicastBusConfig ForwardReceivedMessagesTo="audit">
<MessageEndpointMappings>
<add Assembly="Messages" Type="PublisherCommand" Endpoint="Publisher" />
<add Assembly="Messages" Type=" Worker1Command" Endpoint="Worker1" />
<add Assembly="Messages" Type=" Worker2Command" Endpoint="Worker2" />
<!-- This one is sent locally only -->
<add Assembly=" Messages" Type="RunCalculationCommand" Endpoint="Dealing" />
</MessageEndpointMappings>
</UnicastBusConfig>
The “Publisher” current App.Config
Currently the “Publisher” App.Config
<UnicastBusConfig ForwardReceivedMessagesTo="audit">
<MessageEndpointMappings>
</MessageEndpointMappings>
</UnicastBusConfig>
The “Worker(s)” current App.Config
Currently the worker App.Configs at the moment only have to subscribe to one other endpoint the “Publisher”, their config files looks like this:
<UnicastBusConfig ForwardReceivedMessagesTo="audit">
<MessageEndpointMappings>
<add Assembly="Messages" Type="SomeEvent" Endpoint="Publisher" />
</MessageEndpointMappings>
</UnicastBusConfig>
All other messages to the workers right now come directly from one of the app servers, as shown in the App.Config above for the app servers.
This is all working correctly.
Thing is we have a single point of failure, if the “Ancillary Services Box” dies, we are stuffed.
So we are wondering if we could make use of multiple “Ancillary Services Boxes (each with a Publishers/Worker1/Worker2)”. Ideally they would work exactly as described above, and as shown in the diagram above. Where if “Ancillary Services Box 1” is available it is used, otherwise we use “Ancillary Services Box 2”
I have read about the distributor (but not used it), which if I have it correct, we may be able to use in either the AppServer(s) themselves, where we treat each AppServer as a Distributor and a worker (for the case where we need to do the SendLocal command (RunCalculationCommand) we need to run).
Where the “Ancillary Services Box” would have to use the Distributor for each of the contained endpoints:
So we may end up with something like this:
Could someone help me to know if I am even thinking about this the right way, or whether I am way off.
Essentially what I want to know is:
Is the distributor the correct approach to use?
What would the worker / publisher configs look like, they would have to change somehow to point to distributor no? As I state right now the app servers send a message directly to the workers, to the app server config has the worker end point address, and the worker is only setup to point to the publisher
What would the app servers config look like? Would this stop sending directly to the publisher / workers?
What would the publisher config look like? Should this point to the distributor?
The distributor is a good approach here, but it comes at a cost of increased infrastructure complexity. To avoid introducing another single point of failure, the distributor and it's queues must be run on a Windows Failover Cluser. Meaning both MSMQ and DTC must be configured as clustered services. This can be oh so much fun.. :D
I've renamed what you call "worker" to endpoints, from Worker1 to Endpoint1 and Worker2 to Endpoint2. This is because "worker" is very clearly defined as something specific when you introduce the distributor. An actual physical endpoint on a machine that is receiving messages from a distributor is a worker. So Endpoint1#ServicesMachine01, Endpoint2#ServicesMachine02 etc. are all workers. Workers get work from the distributor.
Scenario 01
In the first scenario you see the app server gets a request from the load balancer and sends it to
Endpoint1#Cluster01 or Endpoint2#Cluster01 queue on the distributor, depending on the command. The
distributor then finds a ready worker for message in that queue and send the command along to it.
So for WorkerCommand1 EITHER Endpoint1#ServicesBox01 OR Endpoint1#ServicesBox02 ends up getting
the command from the distributor and process it as normal.
Scenario 02
In scenario two it's pretty much the same. The PublishCommand is sent to Endpoint3#Cluster01. It
picks one of the ready Endpoint3s, in this case Endpoint3#ServicesBox02, and gives it the command.
ServiceBox02 processes the message and publishes the SomeEvent to Endpoint01#Cluster01 and
Endpoint02#Cluster01. These are picked up by the distributor and in this case sent to
Endpoint1#ServiceBox01 and Endpoint2#ServiceBoxN.
Notice how the messages ALWAYS flow THROUGH the distributor and the queues on Cluster01. This is
actual load balancing of MSMQ.
Config for app server changes to makes sure the commands go through the cluster.
<UnicastBusConfig ForwardReceivedMessagesTo="audit">
<MessageEndpointMappings>
<add Assembly="Messages" Type="PublisherCommand" Endpoint="Endpoint3#Cluster01" />
<add Assembly="Messages" Type="Worker1Command" Endpoint="Endpoint1#Cluster01" />
<add Assembly="Messages" Type="Worker2Command" Endpoint="Endpoint2#Cluster01" />
<!-- This one is sent locally only -->
<add Assembly=" Messages" Type="RunCalculationCommand" Endpoint="Dealing" />
</MessageEndpointMappings>
</UnicastBusConfig>
ServicesBox config changes slightly to make sure subscriptions go through the distributor as well.
<UnicastBusConfig ForwardReceivedMessagesTo="audit">
<MessageEndpointMappings>
<add Assembly="Messages" Type="SomeEvent" Endpoint="Endpoint3#Cluster01" />
</MessageEndpointMappings>
</UnicastBusConfig>
No changes for the publisher config. It doesn't need to point to anything. The subscribers will tell it where to publish.
Related
The setup
I have a WCF service hosted in IIS/AppFabric running on Windows Server 2012R2.
The service is bound to a local transactional MSMQ queue via netMsmqBinding. My operations are decorated with TransactionScopeRequired = true.
The service operations recieves calls from a BizTalk server, handles them and send responses back to a remote queue (on the same BizTalk Server), also via a netMsmqBinding.
<endpoint name="Outbound" address="net.msmq://int01test.mydomain.com/private/queue.name" binding="netMsmqBinding" bindingConfiguration="QueueBindingConfigurationOutbound" contract="My.Outbound.Contract" />
<netMsmqBinding>
<binding name="QueueBindingConfigurationOutbound">
<security>
<transport msmqAuthenticationMode="WindowsDomain" msmqProtectionLevel="Sign" />
</security>
</binding>
</netMsmqBinding>
In the testing environment this works as intended.
Physical setup in testing environment:
Server int01test.mydomain.com hosts a BizTalk server and my inbound queue. This runs under service account mydomain\inttestuser.
Server app01test.mydomain.com hosts my application (IIS/AppFabric), my database (SQL server) and my outbound queue. This runs under service account mydomain\apptestuser.
The problem
When this solution is promoted to the acceptance testing environment, calls are still handled, but the responses are blocked with error message:
System.ServiceModel.EndpointNotFoundException: An error occurred while
opening the queue:Unrecognized error -1072824317 (0xc00e0003). The
message cannot be sent or received from the queue. Ensure that MSMQ is
installed and running. Also ensure that the queue is available to open
with the required access mode and authorization. --->
System.ServiceModel.MsmqException: An error occurred while opening the
queue:Unrecognized error -1072824317 (0xc00e0003). The message cannot
be sent or received from the queue. Ensure that MSMQ is installed and
running. Also ensure that the queue is available to open with the
required access mode and authorization.
Differences
In the testing environment, my service and my database is running on a single server instance. (The BizTalk Server and it's queue, the target of my outbound messages, is on another server though)
In the acceptance testing environment, my solution is deployed on two load balanced servers and the database is on a separate cluster.
There are also more strict external firewall rules to mimic the production environment.
Even the BizTalk server is clustered, though we communicate machine-to-machine rather than cluster-to-cluster right now.
So setup in QA Environment is:
Server int01qa.mydomain.com (clustered with int02qa.mydomain.com) hosts a BizTalk server and my inbound queue. This runs under service account mydomain\intqauser.
Server app01qa.mydomain.com (clustered with app02qa.mydomain.com) hosts my application (IIS/AppFabric) and my outbound queue. This runs under service account mydomain\appqauser.
Server db01qa.mydomain.com hosts my database.
What we've already tried
We have disabled authentication on the remote queue.
We have granted full control to the account which my service is running under as well as to "everyone".
We have, successfully, sent msmq messages manually between the two servers.
I have configured my service to send responses to a local private queue, same error.
The problem turned out to be that MSMQ couldn't find a certificate for the app pool user. That is, the
0xc00e0003, MQ_ERROR_QUEUE_NOT_FOUND
was really caused by a
0xC00E002F, MQ_ERROR_NO_INTERNAL_USER_CERT
Changing security settings to
<transport msmqAuthenticationMode="None" msmqProtectionLevel="None" />
enabled messages to be sent.
The real solution, of course, is not to disable security but to ensure that the app pool users cerificate is installed in msmq.
We came across this issue and didn't want to disable authentication. We tried a number of different approaches, but it was something to do with the User Certificate not existing we think.
We went to the App Pool of the client application (which calls the WCF endpoint via MSMQ) and changed the Load Profile property to True. The call then worked. As an aside, changing it back to false continued to work - presumably because it had already sorted the certificate issue out.
Given the following scenario:
I have two servers, each of them has RabbitMQ queuing installed and they form a cluster. I have configured them for HA queues using mirroring.
Node A (has master queue)
Node B (has slave queue)
We use NServiceBus as messaging framework. We have a Service A (load balanced WCF service) which should publish messages to RabbitMQ exchange and Service B (clustered) which should dequeue messages and process them. The problem is how should I configure NServicebus on both nodes. I cannot specify single host names for connectionstring like this:
<connectionStrings>
<add name="NServiceBus/Transport" connectionString="host=nodeA, nodeB" />
</connectionStrings>
This is because the feature has been deprecated in current NServiceBus release. It makes sense. I cannot specify cluster name either.
<connectionStrings>
<add name="NServiceBus/Transport" connectionString="host=clustername" />
</connectionStrings>
This option does not work.
I tried also localhost which works for Node A, but not for Node B (which has the slave queue).
What should I define as host to make it work (on both services, A and B)? What it is needed for Node B to dequeue messages from master queue?
There might be things I do not understand but help me out, please.
RabbitMQ docs gives advice about connecting to a cluster from client: it's not a RabbitMQ concern but you have to use other technologies like a load balancer.
Generally, it's not advisable to bake in node hostnames or IP addresses into client applications: this introduces inflexibility and will require client applications to be edited, recompiled and redeployed should the configuration of the cluster change or the number of nodes in the cluster change. Instead, we recommend a more abstracted approach: this could be a dynamic DNS service which has a very short TTL configuration, or a plain TCP load balancer, or some sort of mobile IP achieved with pacemaker or similar technologies.
NServiceBus follows this suggestion: v 3.x of RabbitMQ transport drops the facility to specify multiple hostnames in the connection string as detailed here
You need to put localhost in the connectionstring like this:
<connectionStrings>
<add name="NServiceBus/Transport" connectionString=" host=localhost" />
</connectionStrings>
Then it works :)
I have several websites that push messages to a remote MSMQ (so they can then be published to a list of subscribers). Because the websites don't consume messages -- it's just a send-and-forget set up -- I only have the following config in my web.config for the sites:
<UnicastBusConfig>
<MessageEndpointMappings>
<add Messages="MyNamespace.Messages" Endpoint="PublishingInputQueue#remoteserver" />
</MessageEndpointMappings>
</UnicastBusConfig>
I'm trying to chase down a bug where intermittently messages appear to be successfully sent via Bus.Send(..) (no exceptions are thrown), but the messages never make it to the destination queue.
Is there any other configuration I can do to help diagnose this issue? Is there a way to set up an error queue for messages that fail with UnicastBus? Is it as simple as establishing an Error queue in the MsmqTransportConfig (which I currently don't have in my config?)
Update
A little more context...
As mentioned in the comments, I'm seeing a bunch of messages apparently piled up in the outgoing message queue of the web server(s). (I can't view the actual messages, but can see a count > 0 when clicking on "Outgoing Queues" in Computer Management.)
The publisher that created the destination queue (on remoteserver) has .IsTransactional(true) in the NServiceBus startup code.
The clients that push to that queue via UnicastBus has .IsTransactional(false) in its NServiceBus startup code.
This doesn't seem to cause an issue for other client websites that push to this destination queue. HOWEVER:
The webservers that don't seem to fail are Windows Server 2008
The webservers that do fail are Windows Server 2003
No clue if that makes a difference or not
The domain user the application pools run under has full permissions to the destination queue.
For completeness sake:
It turned out that the Active Directory objects used for the MSMQ service on the servers were out of sync. I believe it occurred when our DevOps team created a single server instance, set it all up, and then cloned it for the other server instances in the cluster. This initial server worked correctly (explaining the intermittent success), but the cloned instances had an out-of-sync AD object for MSMQ. Deleting the objects to allow for recreation fixed the issue.
I have been technically testing a WCF service recently and have got to the point where, my lack of understanding is not allowing me to progress forward and find a solution to a timeout problem we see.
We are load testing a WCF Service which is hosted on IIS7 on windows server 2008. The system set up to fire the messages actually fires them at an application which is biztalk. Biztalk then process the messages and sends them on to the end point of the WCF Service. The WCF Serviceis also using .net 2.0 in it's app pool (I guess this means it could actually be 3.0 or 3.5 as these were not full releases?
We fire 40 messages within a seconds and 90% of them become timed out due to the send timeout on the client (biztalk). We thought at first this was strange because we expected the server's basic http binding receive timeout to trigger first, but it turned out that was set at 10 minutes and the client send timeout was set at 1Min and 30 Secs.
What I understand:
WCF Services have config files which have inside them behaviors and http bindings. The Server end point we are sending an XML message to is using BasicHtppBindings: Timeouts:Open/Close is 1 Minute, Send and Recieve are 10 minutes. The server's timeout which we know are involved so far is: sendtimeout: 1 minute.
I understand WCF's architecture works by creating an instance of either a channel factory or service host and creates a channel stack which contains the behaviors and binding settings from the config as channels. There is a TransportAdaptor which is used to move the xml message once it has been processed through the channel stack.
I understand from IIS that http.sys handles the incoming requests. It passes requests to the workerprocess and when that is busy, it places requests onto the kernel mode queue? I understand there some machine.config settings that can be set to increase this queue/limit this queue?
I also know about how to make an app pool into a webgarden and I have read you can increase the number of threads per core, from the default of 12; this is don e via a registry setting or a later on in .net a web config change.
I just read about InstanceContextMode and how it can effect the server's service too... But I'm unsure what that is set to in this case.
We recorded some perforamance counters, .net ones and I noticed the number of current requests minus the (Queued+Disconnected) = 12. Which indicates we are using 1 core? and the number of threads of on that core is set to 12.
Can anyone help me for a clearer picture and help piece my knowledge with some extra into something that is more complete?
The WCF Behavior has a throttle setting. Here is an example (grabbed from msdn):
<service
name="Microsoft.WCF.Documentation.SampleService"
behaviorConfiguration="Throttled" />
..... .....
<behaviors>
<serviceBehaviors>
<behavior name="Throttled">
<serviceThrottling
maxConcurrentCalls="1"
maxConcurrentSessions="1"
maxConcurrentInstances="1"/>
</behavior>
</serviceBehaviors>
By default (if not specified), the service is throttled to 10 concurrent calls.
I find that a sensible production setting for high volume clients running short calls is more like 100. Of course it depends on your implementation, but the defualt definitely hurts performance on my test and production systems.
Ok,
Quick background
We're using NServiceBus 2.0 with pretty much the standard config, nothing "crazy" going on.
App is .NET 3.5
Dev environment is Publisher and Subscriber are on the same box, Windows 7.
Staging environment is Publisher and Subscriber are on different boxes, one Windows 7, the other Windows Server 2008.
Behaviour
On the Dev environment, Publisher Subscriber work fine, which suggests the code itself is ok in term on Starting Up, Configuring Containers etc, and all the messages being configured correctly, i.e size, serialization etc.
On the Staging environment, the Publisher SUCCESSFULLY receives the subscription request.
It also successfully stores the subscriber in the Subscription table (SQL Server, we're using DBSubscription), and the "queuename#machinename" is correct.
Problem
On Bus.Publish() nothing happens. No Outgoing Queue is created, no message sent or created anywhere, no error is thrown.
Extra Info
Interestingly a Bus.Send from the Publisher works fine! except of course I have to add this to the config:
<UnicastBusConfig>
<MessageEndpointMappings>
<add Messages="Library.Messages" Endpoint="subscriberqueue#machinename"/>
</MessageEndpointMappings>
</UnicastBusConfig>
Also the Publisher CAN resolve:
ping machinename
SO, what's going on, and what should I be looking out for?
Why does SEND work, and PUBLISH doesn't?
How can I get PUBLISH working?
Turn the logging threshold to debug and see if the publisher logs "sending message to..." during the call to publish.