BizTalk Singleton - repeats consumed messages? - singleton

I tried a singleton in BizTalk and that seems to work so far.
However, this does lead to an incident that I cannot really explain.
The instance is now running for about half a day and it seems that some messages are repeated over and over again.
What makes me wonder is the time span between the messages, which is almost exactly 30 minutes.
I use the BizTalk out of the box file adapter for it, in which I dropped in a handful of files at 12:10.
As you can see from the SQL query(attachment), these files were repeated every half hour. Regarding to this problem the Orchestration works as expected.
Have I forgotten an essential part of the Singleton concept?
Do I have to delete processed messages?
If you have an idea what it could be, please give me a hint.
Attached is a picture of the orchestration and an evaluation of the staging table.
StagingTable eval:
https://owncloud.kurdy.de/index.php/s/FNMKeF9JJY6BZiy

What you are missing is a listen shape which in one branch contains your ReceiveFollower and the other branch that contains a configured delay, and also possibly a shape that sets the Exit Loop condition unless you want your singleton to go on forever.
You do have to be careful with Zombies this sort of singleton. Zombies occurs if the Orchestration has just hit the delay and is in the process of tearing itself down when you get another message that matches the subscription. Then you will get the following sort of error
0xC0C01B4C The instance completed without consuming all of its messages. The instance and its unconsumed messages have been suspended.

Related

RabbitMQ - allow only one process per user

To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!
The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.
If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.

Best way to handle timouts on rabbitmq message processing

I am trying to get my head around an issue I have recently encountered and I hope someone will be able to point me in the most reasonable direction of solving it.
I am using Riak KV store and working on CRDT data, where I have some sort of counter inside each CRDT item stored in database.
I have a rabbitmq queue, where each message is a request to increase or decrease a certain amount of aforementioned counters.
Finally, I have a group of service-workers, that listens on the queue, and for each request try to change the amount of counters accordingly.
The issue I have is as follows: While a single worker is processing a request, it may get stuck for a while on a write operation to database – let’s say on a second change of counters out of three. It’s connection with rabbitmq gets lost (timeout) so the message-request gets back on to the queue (I cannot afford to miss one). Then it is picked up by second worker, which begins all processing anew. However, the first worker finishes its work, and as a results I have processed a single message twice.
I can split those increments into single actions, but this still leaves me with dilemma – can still change value of counter twice, if some worker gets stuck on a write operation for a long period.
I do not have possibility of making Riak KV CRDT writes work faster, nor can I accept missing out a message-request. I need to implement some means of checking whether a request was already processed before.
My initial thoughts were to use some alternative, quick KV store for storing rabbitMQ message ID if they are being processed. That way other workers could tell whether they are not starting to process a message that is already parsed elsewhere.
I could use any help and pointers to materials I can read.
You can't have "exactly one delivery" semantic. You can reduce double-sent messages or missed deliveries, so it's up to you to decide which misbehavior is the least inconvenient.
First of all are you sure it's the CRDTs that are too slow ? Are you using simple counters or counters inside maps ? In my experience they are quite fast, although slower than kv. You could try:
- having simple CRDTs (no maps), and more CRDTs objects, to lower their stress( can you split the counters in two ?)
- not using CRDTs but using good old sibling resolution on client side on simple key/values.
- accumulate the count updates orders and apply them in batch, but then you're accepting an increase in latency so it's equivalent to increasing the timeout.
Can you provide some metrics? Like how long the updates take, what numbers you'd expect, if it's as slow when you have few updates or many updates, etc

Broker Queue - Move Poisoned Messages to Table

Currently I have a queue that stores merge queries which are run once it is read off the queue. This all works well, and currently if there is an error with the merge the queue will disable and I have to manually remove the message (or fix the merge, as it were).
I was wondering whether it was possible to simply move the poisoned message to a table? The queues run important (and different) merges that must continually run to ensure data is updated. It is not beneficial to me for the queue to, say, become disabled over night and gain a huge backlog.
Is there any way for me to simply push the bad message into a table? I have attempted this myself however I wound up having a TRY...CATCH inside a TRANSACTION, which performs a rollback on the error anyway (thus invoking the 5 rollbacks to disable rule). Most solutions online mention only manually removing the message.
Any suggestions? Is this just a bad idea? If so, why?
Thanks.
The disable-after-5-rollbacks can be switched off by setting POISON_MESSAGE_HANDLING status to OFF in the CREATE/ALTER QUEUE statement. You can then use TRY...CATCH to manually deal with transactions that fail.
Like you I don't find this feature very useful, so almost always turn it off in my applications and deal with problem messages in whatever way seems best.

Does firiing off an indefinite WAITFOR increase the log file size?

In the last release of my app, I added a command that tells it to wait when something arrives in the Service Broker queue
WAITFOR (RECEIVE CONVERT(int, message_body) AS Message FROM MyQueue)
The DBAs tell me that since the addition, the log sizes have gone through the roof. Could this be correct? Or should I be looking elsewhere?
I haven't tested this in service broker but I assume the same ACID compliance mechanisms would be in play. It would depend on if it's leaving a transaction open or not in your code. If it is leaving a transaction open and not committing it, the log will continue to grow until something closes it and only at that point will it finally mark the old areas for re-use.
I haven't rolled service broker in prod yet but the testing/reading I did did not include any WAITFOR.
Instead, the Server Broker MVPs like Denny Cherry would typically keep querying the queue instead of doing a WAITFOR.
Can you post some of the other code and also tell us why you're using WAITFOR? Maybe there's something I'm not getting that would be a good use case scenario. Thanks!

Service Broker Design

I’m looking to introduce SS Service Broker,
I have a remote orders database and a local processing database, all activity on the processing database has to happen in sequence, this seems a perfect job for Service Broker!
I’ve set up the infrastructure, I can send and receive messages and now I’m looking at the design of the processing. As I said all processes for one order need to be completed in sequence so I’ll put them in one conversation.
One of these processes is a request for external flat file data, we then wait (could be several days) and then import and process this file when it returns. How can I process half the tasks, then wait for the flat file to return before processing the other half.
I’ve had some ideas but I’m sure I’m missing a trick somewhere
1) Write all queue items to a status table and use status values – seems to remove some of the flexibility of SSSB and add another layer of tasks
2) Keep the transaction open until we get the data back – not ideal
3) Have the flat file import task continually polling for the file to appear – this seems inefficient
What is the most efficient way of managing this workflow?
thanks in advance
In my opinion it is like chain of responsibility. As far as i can understand we have the following workflow.
1.) Process for message.
2.) Wait for external file, now this can be a busy wait or if external data provides you a notification then we can actually do it in non-polling manner.
3.) Once data is received then process the data.
So my suggestion would be to use 3 different Queues one for each part, when one is done it will forward or put a new message in chained queue.
I am assuming, one order processing will not disrupt another order processing.
I am thinking MSMQ with Windows Sequential Work flow, might also be a candidate for this task.