Increment negative number displayed for Queue Size field in the Member Clients Table - gemfire

After enabling subscription conflate in my regions, I saw increment negative number (-XXXXX) in the queue size field in the Member Client Table in GemFire Pulse Website. Any reason that the negative number appear in the queue size field?
GemFire Version : 9.8.6
Number of Regions : 1
1 Client Application updating regions every 0.5 seconds (Caching Proxy)
1 Client Application reading data from regions (Caching Proxy - Register interest for all keys)
1 Locators and 1 Cache Server in same virtual machine
Queue Size. The size of the queue used by server to send events in case of a subscription enabled client or a client that has continuous queries running on the server. [https://gemfire.docs.pivotal.io/910/geode/developing/events/tune_client_message_tracking_timeout.html].
Additional Discovery
Pulse Website (Negative Number in Queue Size)
JConsole (showClientQueueDetail)
(numVoidRemovals (4486)
#ClientCacheApplication(locators = {
#ClientCacheApplication.Locator(host = "192.168.208.20", port = 10311) }, name = "Reading-Testing", subscriptionEnabled = true)
#EnableEntityDefinedRegions(basePackageClasses = Person.class, clientRegionShortcut = ClientRegionShortcut.CACHING_PROXY, poolName = "SecondPool")
#EnableGemfireRepositories(basePackageClasses = PersonRepository.class)
#EnablePdx
#Import({ GemfireCommonPool.class })
public class PersonDataAccess {
....
}
#Configuration
public class GemfireCommonPool {
#Bean("SecondPool")
public Pool init() {
PoolFactory poolFactory = PoolManager.createFactory();
poolFactory.setPingInterval(8000);
poolFactory.setRetryAttempts(-1);
poolFactory.setMaxConnections(-1);
poolFactory.setReadTimeout(30000);
poolFactory.addLocator("192.168.208.20", 10311);
poolFactory.setSubscriptionEnabled(true);
return poolFactory.create("SecondPool");
}
}
Additonal Discovery 2
When i remove the poolName field in #EnableEntityDefinedRegions, I found out that the pulse website does not display negative number for the queue size. However, in the showClientQueueDetail, it display negative number for queue size.
Is it my coding error or conflate issue?
Thank you so much.

Related

ReactiveRedisTemplate List Operations - set Expire and TTL

This integration test uses ReactiveRedisTemplate list operations to:
push a List to the cache using the ReactiveRedisTemplate
list operation leftPushAll()
retrieves the List as a flux from the
cache using the ReactiveRedisTemplate list operation range() to retrieve all elements of the list from index 0 to the last index in the list
public void givenList_whenLeftPushAndRange_thenPopCatalog() {
//catalog is a List<Catalog> of size 367
Mono<Long> lPush = reactiveListOps.leftPushAll(LIST_NAME, catalog).log("Pushed");
StepVerifier.create(lPush).expectNext(367L).verifyComplete();
Mono<Long> sz = reactiveListOps.size(LIST_NAME);
Long listsz = sz.block();
assert listsz != null;
Assert.assertEquals(listsz.intValue(), catalog.size());
Flux<Catalog> catalogFlux =
reactiveListOps
.range(LIST_NAME, 0, listsz)
.map(
c -> {
System.out.println(c.getOffset());
return c;
})
.log("Fetched Catalog");
List<Catalog> catalogList = catalogFlux.collectList().block();
assert catalogList != null;
Assert.assertEquals(catalogList.size(), catalog.size());
}
This test works fine. My question is - how is the EXPIRY and TTL of these List objects stored in the cache controlled?
The reason I ask - is on my local Redis server, I notice they remain in the cache for a number of hours, but when I run this code against a Redis server hosted on AWS, the List objects only seem to remain in the cache for 30 minutes.
Is there a configuration property that can be used to control the TTL of list objects?
Thank you, ideally I'd like to have more control over how long these objects remain in the cache.

Akka.NET with persistence dropping messages when CPU in under high pressure?

I make some performance testing of my PoC. What I saw is my actor is not receiving all messages that are sent to him and the performance is very low. I sent around 150k messages to my app, and it causes a peak on my processor to reach 100% utilization. But when I stop sending requests 2/3 of messages are not delivered to the actor. Here is a simple metrics from app insights:
To prove I have almost the same number of event persistent in mongo that my actor received messages.
Secondly, performance of processing messages is very disappointing. I get around 300 messages per second.
I know Akka.NET message delivery is at most once by default but I don't get any error saying that message were dropped.
Here is code:
Cluster shard registration:
services.AddSingleton<ValueCoordinatorProvider>(provider =>
{
var shardRegion = ClusterSharding.Get(_actorSystem).Start(
typeName: "values-actor",
entityProps: _actorSystem.DI().Props<ValueActor>(),
settings: ClusterShardingSettings.Create(_actorSystem),
messageExtractor: new ValueShardMsgRouter());
return () => shardRegion;
});
Controller:
[ApiController]
[Route("api/[controller]")]
public class ValueController : ControllerBase
{
private readonly IActorRef _valueCoordinator;
public ValueController(ValueCoordinatorProvider valueCoordinatorProvider)
{
_valueCoordinator = valuenCoordinatorProvider();
}
[HttpPost]
public Task<IActionResult> PostAsync(Message message)
{
_valueCoordinator.Tell(message);
return Task.FromResult((IActionResult)Ok());
}
}
Actor:
public class ValueActor : ReceivePersistentActor
{
public override string PersistenceId { get; }
private decimal _currentValue;
public ValueActor()
{
PersistenceId = Context.Self.Path.Name;
Command<Message>(Handle);
}
private void Handle(Message message)
{
Context.IncrementMessagesReceived();
var accepted = new ValueAccepted(message.ValueId, message.Value);
Persist(accepted, valueAccepted =>
{
_currentValue = valueAccepted.BidValue;
});
}
}
Message router.
public sealed class ValueShardMsgRouter : HashCodeMessageExtractor
{
public const int DefaultShardCount = 1_000_000_000;
public ValueShardMsgRouter() : this(DefaultShardCount)
{
}
public ValueShardMsgRouter(int maxNumberOfShards) : base(maxNumberOfShards)
{
}
public override string EntityId(object message)
{
return message switch
{
IWithValueId valueMsg => valueMsg.ValueId,
_ => null
};
}
}
akka.conf
akka {
stdout-loglevel = ERROR
loglevel = ERROR
actor {
debug {
unhandled = on
}
provider = cluster
serializers {
hyperion = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
}
serialization-bindings {
"System.Object" = hyperion
}
deployment {
/valuesRouter {
router = consistent-hashing-group
routees.paths = ["/values"]
cluster {
enabled = on
}
}
}
}
remote {
dot-netty.tcp {
hostname = "desktop-j45ou76"
port = 5054
}
}
cluster {
seed-nodes = ["akka.tcp://valuessystem#desktop-j45ou76:5054"]
}
persistence {
journal {
plugin = "akka.persistence.journal.mongodb"
mongodb {
class = "Akka.Persistence.MongoDb.Journal.MongoDbJournal, Akka.Persistence.MongoDb"
connection-string = "mongodb://localhost:27017/akkanet"
auto-initialize = off
plugin-dispatcher = "akka.actor.default-dispatcher"
collection = "EventJournal"
metadata-collection = "Metadata"
legacy-serialization = off
}
}
snapshot-store {
plugin = "akka.persistence.snapshot-store.mongodb"
mongodb {
class = "Akka.Persistence.MongoDb.Snapshot.MongoDbSnapshotStore, Akka.Persistence.MongoDb"
connection-string = "mongodb://localhost:27017/akkanet"
auto-initialize = off
plugin-dispatcher = "akka.actor.default-dispatcher"
collection = "SnapshotStore"
legacy-serialization = off
}
}
}
}
So there are two issues going on here: actor performance and missing messages.
It's not clear from your writeup, but I'm going to make an assumption: 100% of these messages are going to a single actor.
Actor Performance
The end-to-end throughput of a single actor depends on:
The amount of work it takes to route the message to the actor (i.e. through the sharding system, hierarchy, over the network, etc)
The amount of time it takes the actor to process a single message, as this determines the rate at which a mailbox can be emptied; and
Any flow control that affects which messages can be processed when - i.e. if an actor uses stashing and behavior switching, the amount of time an actor spends stashing messages while waiting for its state to change will have a cumulative impact on the end-to-end processing time for all stashed messages.
You will have poor performance due to item 3 on this list. The design that you are implementing calls Persist and blocks the actor from doing any additional processing until the message is successfully persisted. All other messages sent to the actor are stashed internally until the previous one is successfully persisted.
Akka.Persistence offers four options for persisting messages from the point of view of a single actor:
Persist - highest consistency (no other messages can be processed until persistence is confirmed), lowest performance;
PersistAsync - lower consistency, much higher performance. Doesn't wait for the message to be persisted before processing the next message in the mailbox. Allows multiple messages from a single persistent actor to be processed concurrently in-flight - the order in which those events are persisted will be preserved (because they're sent to the internal Akka.Persistence journal IActorRef in that order) but the actor will continue to process additional messages before the persisted ones are confirmed. This means you probably have to modify your actor's in-memory state before you call PersistAsync and not after the fact.
PersistAll - high consistency, but batches multiple persistent events at once. Same ordering and control flow semantics as Persist - but you're just persisting an array of messages together.
PersistAllAsync - highest performance. Same semantics as PersistAsync but it's an atomic batch of messages in an array being persisted together.
To get an idea as to how the performance characteristics of Akka.Persistence changes with each of these methods, take a look at the detailed benchmark data the Akka.NET organization has put together around Akka.Persistence.Linq2Db, the new high performance RDBMS Akka.Persistence library: https://github.com/akkadotnet/Akka.Persistence.Linq2Db#performance - it's a difference between 15,000 per second and 250 per second on SQL; the write performance is likely even higher in a system like MongoDB.
One of the key properties of Akka.Persistence is that it intentionally routes all of the persistence commands through a set of centralized "journal" and "snapshot" actors on each node in a cluster - so messages from multiple persistent actors can be batched together across a small number of concurrent database connections. There are many users running hundreds of thousands of persistent actors simultaneously - if each actor had their own unique connection to the database it would melt even the most robustly vertically scaled database instances on Earth. This connection pooling / sharing is why the individual persistent actors rely on flow control.
You'll see similar performance using any persistent actor framework (i.e. Orleans, Service Fabric) because they all employ a similar design for the same reasons Akka.NET does.
To improve your performance, you will need to either batch received messages together and persist them in a group with PersistAll (think of this as de-bouncing) or use asynchronous persistence semantics using PersistAsync.
You'll also see better aggregate performance if you spread your workload out across many concurrent actors with different entity ids - that way you can benefit from actor concurrency and parallelism.
Missing Messages
There could be any number of reasons why this might occur - most often it's going to be the result of:
Actors being terminated (not the same as restarting) and dumping all of their messages into the DeadLetter collection;
Network disruptions resulting in dropped connections - this can happen when nodes are sitting at 100% CPU - messages that are queued for delivery at the time can be dropped; and
The Akka.Persistence journal receiving timeouts back from the database will result in persistent actors terminating themselves due to loss of consistency.
You should look for the following in your logs:
DeadLetter warnings / counts
OpenCircuitBreakerExceptions coming from Akka.Persistence
You'll usually see both of those appear together - I suspect that's what is happening to your system. The other possibility could be Akka.Remote throwing DisassociationExceptions, which I would also look for.
You can fix the Akka.Remote issues by changing the heartbeat values for the Akka.Cluster failure-detector in configuration https://getakka.net/articles/configuration/akka.cluster.html:
akka.cluster.failure-detector {
# FQCN of the failure detector implementation.
# It must implement akka.remote.FailureDetector and have
# a public constructor with a com.typesafe.config.Config and
# akka.actor.EventStream parameter.
implementation-class = "Akka.Remote.PhiAccrualFailureDetector, Akka.Remote"
# How often keep-alive heartbeat messages should be sent to each connection.
heartbeat-interval = 1 s
# Defines the failure detector threshold.
# A low threshold is prone to generate many wrong suspicions but ensures
# a quick detection in the event of a real crash. Conversely, a high
# threshold generates fewer mistakes but needs more time to detect
# actual crashes.
threshold = 8.0
# Number of the samples of inter-heartbeat arrival times to adaptively
# calculate the failure timeout for connections.
max-sample-size = 1000
# Minimum standard deviation to use for the normal distribution in
# AccrualFailureDetector. Too low standard deviation might result in
# too much sensitivity for sudden, but normal, deviations in heartbeat
# inter arrival times.
min-std-deviation = 100 ms
# Number of potentially lost/delayed heartbeats that will be
# accepted before considering it to be an anomaly.
# This margin is important to be able to survive sudden, occasional,
# pauses in heartbeat arrivals, due to for example garbage collect or
# network drop.
acceptable-heartbeat-pause = 3 s
# Number of member nodes that each member will send heartbeat messages to,
# i.e. each node will be monitored by this number of other nodes.
monitored-by-nr-of-members = 9
# After the heartbeat request has been sent the first failure detection
# will start after this period, even though no heartbeat mesage has
# been received.
expected-response-after = 1 s
}
Bump the acceptable-heartbeat-pause = 3 s value to something larger like 10,20,30 if needed.
Sharding Configuration
One last thing I want to point out with your code - the shard count is way too high. You should have about ~10 shards per node. Reduce it to something reasonable.

Gemfire WAN with Peer to Peer combined

We are using the multi-site WAN configuration. We have two clusters across geographical distances in North America and Europe.
Context: Cluster 1 has two members A and B that are both gateway senders. Cluster B has two members C and D that are both gateway receivers. When member A in cluster 1 starts, it reads data from database and loads it into the gemfire cache which gets sent to the cluster 2. Everything so far is good.
Problem: If both members in Cluster 2 are restarted at the same time, they lose all the gemfire regions/data. At that point, we could restart member A in cluster 1, it again loads data from the DB and gets pushed to cluster B. But we would prefer to avoid the restart of member A and without persisting to hard disk.
Is there a solution where if cluster 2 is restarted, it can request a full copy of data from cluster 1?
Not sure if it's possible, but could we somehow setup peer to peer for the gateway receivers in cluster 2 (on top of WAN), so they would be updated automatically upon restart.
Thanks
Getting a full copy of data over WAN is not supported at this time. What you could do instead is run a function on all members of site A, that simply iterates over all data and puts it back again in the region. i.e something like:
public void execute(FunctionContext context) {
RegionFunctionContext ctx = (RegionFunctionContext)context;
Region localData = PartitionRegionHelper.getLocalDataForContext(ctx);
for (Object key : localData.keySet()) {
Object val = localData.get(key);
localData.put(key, val);
}
}

MassTransit capping message rates at 10

I have a mass transit consumer service set up to work with RabbitMQ and I can't figure out how to increase the speed of the consumer - it seems to hard cap at 10 messages received per second.
I have tried the steps listed here: https://groups.google.com/forum/#!msg/masstransit-discuss/plP4n2sixrY/xfORgTPqcwsJ, with no success - setting the prefetch and the concurrent consumers to 25 does nothing other than increasing the acknowledged messages, but it doesn't increase the rate at which the messages are downloaded.
My config is as follows:
ServiceBusFactory.ConfigureDefaultSettings(x =>
{
x.SetConcurrentReceiverLimit(25);
x.SetConcurrentConsumerLimit(25);
});
_bus = ServiceBusFactory.New(
sbc =>
{
sbc.UseRabbitMq(x =>
x.ConfigureHost(
"rabbitmq://localhost/Dev/consume?prefetch=25",
y =>
{
y.SetUsername(config.Username);
y.SetPassword(config.Password);
}));
sbc.UseLog4Net();
sbc.ReceiveFrom("rabbitmq://localhost/Dev/consume?prefetch=25");
sbc.Subscribe(x => RegisterConsumers(x, container));
sbc.UseJsonSerializer();
sbc.SetConcurrentConsumerLimit(25);
});
I'm setting the concurrent consumer limit in two places as I'm not sure whether I need to set it on the default or in the bus configuration, and the consumers are registered via unity - I have omitted the consumer subscription as all subscribers are receiving.
I'm a little confused as to whether there's anything else I need to set or if I need to change the order in which I'm setting the configs.
Any help greatly appreciated.
After spending a romantic evening with the problem and trying out different things suggested by Chris, I've found out that there is yet another thing you have to do to make it work like it should.
Specifically, yes, you need to set the prefetch on the consumer queue address:
sbc.UseRabbitMq(
f =>
f.ConfigureHost(
new Uri( "rabbitmq://guest:guest#localhost/masstransit_consumer" ),
c =>
{
} )
);
int pf = 20; // prefetch
// set consumer prefetch (required!)
sbc.ReceiveFrom( string.Format( "rabbitmq://guest:guest#localhost/masstransit_consumer?prefetch={0}", pf ) );
But this is still not enough.
The key is available in the code of the mtstress tool Chris mention in his comment below his answer. It turns out the tool calls:
int _t, _ct;
ThreadPool.GetMinThreads( out _t, out _ct );
ThreadPool.SetMinThreads( pf, _ct );
Adding this to my code resolves the issue. I wonder though why this is not required with MSMQ transport, though...
Update #1
After further investigation I've found a possible culprit. It's in the ServiceBusBuilderImpl.
There is a method to raise the limit, the ConfigureThreadPool.
The problem here is that it calls CalculateRequiredThreads which should return the number of required threads. Unfortunately the latter returns a negative value on both my client Windows 7 and my Windows Server. Thus, the ConfigureThreadPool effectively does nothing as the negative value is then ignored when calling ThreadPool.SetMin/MaxThreads.
What about this negative value? It seems the CalculateRequiredThreads calls ThreadPool.GetMinThreads and ThreadPool.GetAvailableThreads and uses a formula to came up with the number of required threads:
var requiredThreads = consumerThreads + (workerThreads - availableWorkerThreads);
The problem here is that on my machines this effectively does:
40 (my limit) + 8 (workerThreads) - 1023 (availableThreads)
which of course returns
-975
The conclusion is: the above code from the Mass Transit internals seems to be wrong. When I manually raise the limit in advance, the ConfigureMinThreads respects it (as it sets the limit only if it is higher than the read value).
Without setting the limit manually in advance, the limit fails to be set and thus the code does as much threads as the default thread pool limit (which seems to be 8 on my machine).
Apparently someone assumed this formula will yield
40 + 8 - 8
in a default scenario. Why GetMinThreads and GetAvailableThreads return such unrelated values is yet to be determined...
Update #2
Changing
static int CalculateRequiredThreads( int consumerThreads )
{
int workerThreads;
int completionPortThreads;
ThreadPool.GetMinThreads( out workerThreads, out completionPortThreads );
int availableWorkerThreads;
int availableCompletionPortThreads;
ThreadPool.GetAvailableThreads( out availableWorkerThreads, out availableCompletionPortThreads );
var requiredThreads = consumerThreads + ( workerThreads - availableWorkerThreads );
return requiredThreads;
}
to
static int CalculateRequiredThreads( int consumerThreads )
{
int workerThreads;
int completionPortThreads;
ThreadPool.GetMaxThreads( out workerThreads, out completionPortThreads );
int availableWorkerThreads;
int availableCompletionPortThreads;
ThreadPool.GetAvailableThreads( out availableWorkerThreads, out availableCompletionPortThreads );
var requiredThreads = consumerThreads + ( workerThreads - availableWorkerThreads );
return requiredThreads;
}
resolves the issue. Both return 1023 here and the output of the formula is a correct number of expected threads.
What amount of work is being performed by your consumer? If it runs fast enough, it's likely that the .NET runtime need not create additional threads to handle the inbound message rate.
We have many systems in production that use specified counts where we match the consumer limit with the prefetch count, and in all of those cases under load, the unacknowledged message count shown by RabbitMQ is equal to those settings. We typically see nearly the same number of threads processing messages. Initially the .NET runtime is conservative in the allocated threads used, but it quickly ramps up to the full thread count when consumers are simply waiting on a remote operation such as an HTTP request or SQL command.
If there is an area of the consumer that is single threaded, it might be limiting thread scaling based on that bottleneck, so verify that your threading model is properly configured as well.

How to enforce message queue sequence with multiple WCF service instances

I want to create a WCF service which uses an MSMQ binding as I have a high volume of notifications the service is to process. It is important that clients are not held up by the service and that the notifications are processed in the order they are raised, hence the queue implementation.
Another consideration is resilience. I know I could cluster MSMQ itself to make the queue more robust, but I want to be able to run an instance of my service on different servers, so if a server crashes notifications do not build up in the queue but another server carries on processing.
I have experimented with the MSMQ binding and found that you can have multiple instances of a service listening on the same queue, and left to themselves they end up doing a sort of round-robin with the load spread across the available services. This is great, but I end up losing the sequencing of the queue as different instances take a different amount of time to process the request.
I've been using a simple console app to experiment, which is the epic code dump below. When it's run I get an output like this:
host1 open
host2 open
S1: 01
S1: 03
S1: 05
S2: 02
S1: 06
S1: 08
S1: 09
S2: 04
S1: 10
host1 closed
S2: 07
host2 closed
What I want to happen is:
host1 open
host2 open
S1: 01
<pause while S2 completes>
S2: 02
S1: 03
<pause while S2 completes>
S2: 04
S1: 05
S1: 06
etc.
I would have thought that as S2 has not completed, it might still fail and return the message it was processing to the queue. Therefore S1 should not be allowed to pull another message off of the queue. My queue us transactional and I have tried setting TransactionScopeRequired = true on the service but to no avail.
Is this even possible? Am I going about it the wrong way? Is there some other way to build a failover service without some kind of central synchronisation mechanism?
class WcfMsmqProgram
{
private const string QueueName = "testq1";
static void Main()
{
// Create a transactional queue
string qPath = ".\\private$\\" + QueueName;
if (!MessageQueue.Exists(qPath))
MessageQueue.Create(qPath, true);
else
new MessageQueue(qPath).Purge();
// S1 processes as fast as it can
IService s1 = new ServiceImpl("S1");
// S2 is slow
IService s2 = new ServiceImpl("S2", 2000);
// MSMQ binding
NetMsmqBinding binding = new NetMsmqBinding(NetMsmqSecurityMode.None);
// Host S1
ServiceHost host1 = new ServiceHost(s1, new Uri("net.msmq://localhost/private"));
ConfigureService(host1, binding);
host1.Open();
Console.WriteLine("host1 open");
// Host S2
ServiceHost host2 = new ServiceHost(s2, new Uri("net.msmq://localhost/private"));
ConfigureService(host2, binding);
host2.Open();
Console.WriteLine("host2 open");
// Create a client
ChannelFactory<IService> factory = new ChannelFactory<IService>(binding, new EndpointAddress("net.msmq://localhost/private/" + QueueName));
IService client = factory.CreateChannel();
// Periodically call the service with a new number
int counter = 1;
using (Timer t = new Timer(o => client.EchoNumber(counter++), null, 0, 500))
{
// Enter to stop
Console.ReadLine();
}
host1.Close();
Console.WriteLine("host1 closed");
host2.Close();
Console.WriteLine("host2 closed");
// Wait for exit
Console.ReadLine();
}
static void ConfigureService(ServiceHost host, NetMsmqBinding binding)
{
var endpoint = host.AddServiceEndpoint(typeof(IService), binding, QueueName);
}
[ServiceContract]
interface IService
{
[OperationContract(IsOneWay = true)]
void EchoNumber(int number);
}
[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single)]
class ServiceImpl : IService
{
public ServiceImpl(string name, int sleep = 0)
{
this.name = name;
this.sleep = sleep;
}
private string name;
private int sleep;
public void EchoNumber(int number)
{
Thread.Sleep(this.sleep);
Console.WriteLine("{0}: {1:00}", this.name, number);
}
}
}
batwad,
You are trying to manually create a service bus. Why don't you try to use an existing one?
NServiceBus, MassTransit, ServiceStack
At least 2 of those work with MSMQ.
Furthermore, if you absolutely need order it may actually be for another reason - you want to be able to send a message and you don't want dependent messages to be processed before the first message. You are looking for the Saga Pattern. NServiceBus and MassTransit both will allow you to manage Sagas easily, they will both allow you to simply trigger the initial message and then trigger the remaining messages based on conditions. It will allow you to implement the plumping of your distributed application a snap.
You can then even scale up to thousands of clients, queue servers and message processors without having to write a single line of code nor have any issues.
We tried to implement our own service bus over msmq here, we gave up because another issue kept creeping up. We went with NServiceBus but MassTransit is also an excellent product (it's 100% open source, NServiceBus isn't). ServiceStack is awesome at making APIs and using Message Queues - I'm sure you could use it to make Services that act as Queue front-ends in minutes.
Oh, did I mention that in the case of NSB and MT both only require under 10 lines of code to fully implement queues, senders and handlers?
----- ADDED -----
Udi Dahan (one of the main contributers of NServiceBus) talks about this in:
"In-Order Messaging a Myth" by Udi Dahan
"Message Ordering: Is it Cost Effective?" with Udi Dahan
Chris Patterson (one of the main contributers of Mass Transit)
"Using Sagas to ensure proper sequential message order" question
StackOverflow questions/answers:
"Preserve message order when consuming MSMQ messages in a WCF application"
----- QUESTION -----
I must say that I'm baffled as to why you need to guarantee message order - would you be in the same position if you were using an HTTP/SOAP protocol? My guess is no, then why is it a problem in MSMQ?
Good luck, hope this helps,
Ensuring in-order delivery of messages is one of the de-facto sticky issues with high volume messaging.
In an ideal world, your message destinations should be able to handle out-of-order messaging. This can be achieved by ensuring that your message source includes some kind of sequencing information. Again ideally this takes the form of some kind of x-of-n batch stamp (message 1 of 10, 2 of 10, etc). Your message destination is then required to assemble the data into order once it has been delivered.
However, in the real world there often is no scope for changing downstream systems to handle messages arriving out of order. In this instance you have two choices:
Go entirely single threaded - actually you can usually find some kind of 'grouping id' which means you can go single-threaded in a for-each-group sense, meaning you still have concurrency across different message groups.
Implement a re-sequencer wrapper around each of your consumer systems you want to receive in-order messages.
Neither solution is very nice, but that's the only way I think you can have concurrency and in-order message delivery.