We currently have an process that involves sending a third-party an xml file containing changes that have occurred within our system.
We are moving to use NServiceBus and the changes are modelled as individual commands sent to an endpoint.
We do not want to send these changes as individual files; instead we want to batch receive a number of commands, concatenating this information into a single file.
How might one go about batching multiple commands into a single export file?
Have you looked at Sagas? Sagas allow you to model long-running business processes. So if a file has multiple parts, the Saga could begin when it gets the first part, and complete when it has all the parts it needs.
http://cdn.nservicebus.com/sagas.aspx
You can send the commands in using Bus.Send(IMessage[]messages). Note it takes an array and the messages will be packed into one queue message over the wire. On the receiving side, the handler will be invoked once per message. In your handler you should be able to just keep appending to your file. In the handler you could place logic to determine when to "roll" the file if necessary.
Related
I do have bunch of xml files say hundreds in my source directory. I have made my flow processing strategy to be synchronous to execute only 1 xml file at a time as performance
is not much priority to me. But i do have batch processing in my flow. So what i under stand is flow thread is creating a child thread to execute my Batch processing and control moves forward. My whole transformation code lies in batch processing which takes 30secs to execute a xml. So nothing much logic in my main flow except file inbound EP and batch execute component(to trigger batch job). So file inbound endpoint is keep on pollingfiles and whole bunch xmls getting picked in very less time make my mule memory out and unexpected behavior occurs.
Came to know fork-join pattern very late and it may or not fit into my req.
So is there any configuration to make my batch process completely and
execute and pick the next files. Help me out. I already made processing strategy synchronous!!
Shouldn't you in this case just adjust the polling frequency at the file inbound endpoint?
https://docs.mulesoft.com/mule-user-guide/v/3.7/file-connector
Polling Frequency
(Applies to inbound File endpoints only.)
Specify how often the endpoint should check for incoming messages. The default value is 1000 ms.
Set maxThreadsActive and maxBufferSize
https://docs.mulesoft.com/mule-user-guide/v/3.6/tuning-performance#calculating-threads
Scenario is this:
S3 Bucket full of csv files with hundreds of formatted lines each.
N number of Mule servers. Clustered or not. Both options available.
One unique Mule flow installed in all mule server.
Mule flow behavior is simple. Polls S3 to lazy fetch available files, retrieve each single file contents, transform csv lines into sql statements and insert in DB.
Problems:
All Flows from different Mule server successfully polls s3, retrieves files, process them, and insert in DB. So files and registries are processed several times.
Wish List:
load balance is done between all active servers.
flows installed in different mule servers are equal (we don't modify flow to get different files)
files and registries inside them are not processed twice
Failed Approach:
We tried a processed/non processed mechanism common to all mule servers, in clustered mode. We used Mule's 3.5 Object Store to keep a list of the files that has been processed, visible to all servers. Problem here is, we are not balancing, all workload its on one servers, rest are idle almost all time.
Questions:
Which could be best architecture design is we want load balancing?
Maybe we need an specific mule app to do s3 file download, and let this
app to divide equally the work load between the Mule servers?
Here is an schema of scenario:
Configure your S3 bucket to push events to a SQS queue (see here), and have your mule servers pull events from that queue, instead of polling S3. This way, each event will be pulled by one worker only.
It works as follows: In each worker, you need to repeatedly call ReceiveMessage() to get the next message in the queue. Once a worker gets a message, that message becomes invisible to other workers for a certain amount of time (which you can control by setVisibilityTimeout()). After a worker processes a message, it should call deleteMessage() to remove it completely from the queue. In case of failure in the worker, deleteMessage() is not called, and so after the visibility timeout period, another worker will pick up that message.
In other words, the Queue in SQS doesn't deal with distributing the work. The workers pull messages from the queue when they are ready, and this is what creates the load balancing.
I have a process which reads a file and uploads it to a database. The flow goes as below.
File connector
Processing within a for-each loop (Update to database)
The problem with the above approach was that at any time when an exception occurs, the processing stops at that record and the rest of the records are not processed. As a work around, I have changed the flow as below:
File connector
For each - Within the for-each a flow-ref is placed to call a separate flow which does the processing.
The thing that I noticed is that, at the point of calling the new flow, separate threads are used for processing, due to which an exception does not cause all the records to fail. Now I am facing another difficulty, which is that after the processing completes, I need to initiate a report with the complete processing details (no of records processed, rejected, etc). Since all the records are processed asynchronously in different threads, I am not able to figure out when the processing is completed. Is there a way to monitor whether the processing is complete from another mule flow, so that I can generate the report when it is complete?
at the point of calling the new flow, separate threads are used for processing
A flow-ref doesn't necessarily imply a new thread: you can tune the processing-strategy of the ref-ed flow to force synchronous processing.
With this, you'll be able to remain synchronous, have a custom expression strategy in the ref-ed flow and achieve your goal of not breaking processing when an error occurs.
For the purpose of logging , a certain file present on the server is to be written to several times. The file will be accessed by several hundred users,possibly simultaneously. How can I manage concurrency in VB.net so that the file is not corrupted ?
To do this directly from the clients is very hard.
But if you introduce one layer of indirection, it should become solvable.
For example, you might have a server-based component (such as a web-service or even a windows-service) and all of your clients send their messages to it. It and it alone is responsible for logging to the file. It will need to manage a queue in one form or another.
A common server-based component that people use to handle this type of scenario is (drumroll) a database. You could use a database as the endpoint of your logging and would then have fine-grained control over the way locking occurs.
I’m looking to introduce SS Service Broker,
I have a remote orders database and a local processing database, all activity on the processing database has to happen in sequence, this seems a perfect job for Service Broker!
I’ve set up the infrastructure, I can send and receive messages and now I’m looking at the design of the processing. As I said all processes for one order need to be completed in sequence so I’ll put them in one conversation.
One of these processes is a request for external flat file data, we then wait (could be several days) and then import and process this file when it returns. How can I process half the tasks, then wait for the flat file to return before processing the other half.
I’ve had some ideas but I’m sure I’m missing a trick somewhere
1) Write all queue items to a status table and use status values – seems to remove some of the flexibility of SSSB and add another layer of tasks
2) Keep the transaction open until we get the data back – not ideal
3) Have the flat file import task continually polling for the file to appear – this seems inefficient
What is the most efficient way of managing this workflow?
thanks in advance
In my opinion it is like chain of responsibility. As far as i can understand we have the following workflow.
1.) Process for message.
2.) Wait for external file, now this can be a busy wait or if external data provides you a notification then we can actually do it in non-polling manner.
3.) Once data is received then process the data.
So my suggestion would be to use 3 different Queues one for each part, when one is done it will forward or put a new message in chained queue.
I am assuming, one order processing will not disrupt another order processing.
I am thinking MSMQ with Windows Sequential Work flow, might also be a candidate for this task.