I do have bunch of xml files say hundreds in my source directory. I have made my flow processing strategy to be synchronous to execute only 1 xml file at a time as performance
is not much priority to me. But i do have batch processing in my flow. So what i under stand is flow thread is creating a child thread to execute my Batch processing and control moves forward. My whole transformation code lies in batch processing which takes 30secs to execute a xml. So nothing much logic in my main flow except file inbound EP and batch execute component(to trigger batch job). So file inbound endpoint is keep on pollingfiles and whole bunch xmls getting picked in very less time make my mule memory out and unexpected behavior occurs.
Came to know fork-join pattern very late and it may or not fit into my req.
So is there any configuration to make my batch process completely and
execute and pick the next files. Help me out. I already made processing strategy synchronous!!
Shouldn't you in this case just adjust the polling frequency at the file inbound endpoint?
https://docs.mulesoft.com/mule-user-guide/v/3.7/file-connector
Polling Frequency
(Applies to inbound File endpoints only.)
Specify how often the endpoint should check for incoming messages. The default value is 1000 ms.
Set maxThreadsActive and maxBufferSize
https://docs.mulesoft.com/mule-user-guide/v/3.6/tuning-performance#calculating-threads
Related
I have a huge file that can have anywhere from few hundred thousand to 5 million records. Its tab-delimited file. I need to read the file from ftp location , transform it and finally write it in a FTP location.
I was going to use FTP connector get the repeatable stream and put it into mule batch. Inside mule batch process idea was to use a batch step to transform the records and finally in batch aggregate FTP write the file to destination in append mode 100 records at a time.
Q1. Is this a good approach or is there some better approach?
Q2. How does mule batch load and dispatch phase work (https://docs.mulesoft.com/mule-runtime/4.3/batch-processing-concept#load-and-dispatch ) Is it waiting for entire stream of millions of records to be read in memory before dispatching a mule batch instance ?
Q3. While doing FTP write in batch aggregate there is a chance that parallel threads will start appending content to FTP at same time thereby corrupting the records. Is that avoidable. I read about File locks (https://docs.mulesoft.com/ftp-connector/1.5/ftp-write#locks) . My assumption is it will simply raise File lock exception and not necessarily wait to write FTP in append mode.
Q1. Is this a good approach or is there some better approach?
See answer Q3, this might not work for you. You could instead use a foreach and process the file sequentially though that will increase the time for processing significantly.
Q2. How does mule batch load and dispatch phase work
(https://docs.mulesoft.com/mule-runtime/4.3/batch-processing-concept#load-and-dispatch
) Is it waiting for entire stream of millions of records to be read in
memory before dispatching a mule batch instance ?
Batch doesn't load big numbers of records in memory, it uses file based queues. And yes, it loads all records in the queue before starting to process them.
Q3. While doing FTP write in batch aggregate there is a chance that
parallel threads will start appending content to FTP at same time
thereby corrupting the records. Is that avoidable. I read about File
locks (https://docs.mulesoft.com/ftp-connector/1.5/ftp-write#locks) .
My assumption is it will simply raise File lock exception and not
necessarily wait to write FTP in append mode
The file write operation will throw a FILE:FILE_LOCK error if the file is already locked. Note that Mule 4 doesn't manage errors through exceptions, it uses Mule errors.
If you are using DataWeave flatfile to parse the input file, note that it will load the file in memory and use significantly more memory than the file itself to process it, so you probably are going to get an out of memory error anyway.
I am facing a problem while working with Apache Nifi. Is there a way to stop ExecuteSQL processor once it is completed fetching all the data in the table, instead of fetching repeatedly until I stop it manually?
Generally processors are meant to be scheduled on some frequency through their scheduling tab. Processors in the middle of the graph with incoming relationships usually leave their scheduling at 0 seconds, which means run as fast as possible when data is queue. Source processors typically run on some interval based on Timer Driver or Cron Driven scheduling.
That being said... ExecuteSQL supports being triggered by incoming flow files, so you might be able to do something like put a ListenHTTP processor in front of ExecuteSQL and whenever you want to trigger it you would invoke the http end-point for ListenHTTP. This way you can leave it running, but it will only be triggered when you want it to be.
I have a process which reads a file and uploads it to a database. The flow goes as below.
File connector
Processing within a for-each loop (Update to database)
The problem with the above approach was that at any time when an exception occurs, the processing stops at that record and the rest of the records are not processed. As a work around, I have changed the flow as below:
File connector
For each - Within the for-each a flow-ref is placed to call a separate flow which does the processing.
The thing that I noticed is that, at the point of calling the new flow, separate threads are used for processing, due to which an exception does not cause all the records to fail. Now I am facing another difficulty, which is that after the processing completes, I need to initiate a report with the complete processing details (no of records processed, rejected, etc). Since all the records are processed asynchronously in different threads, I am not able to figure out when the processing is completed. Is there a way to monitor whether the processing is complete from another mule flow, so that I can generate the report when it is complete?
at the point of calling the new flow, separate threads are used for processing
A flow-ref doesn't necessarily imply a new thread: you can tune the processing-strategy of the ref-ed flow to force synchronous processing.
With this, you'll be able to remain synchronous, have a custom expression strategy in the ref-ed flow and achieve your goal of not breaking processing when an error occurs.
We currently have an process that involves sending a third-party an xml file containing changes that have occurred within our system.
We are moving to use NServiceBus and the changes are modelled as individual commands sent to an endpoint.
We do not want to send these changes as individual files; instead we want to batch receive a number of commands, concatenating this information into a single file.
How might one go about batching multiple commands into a single export file?
Have you looked at Sagas? Sagas allow you to model long-running business processes. So if a file has multiple parts, the Saga could begin when it gets the first part, and complete when it has all the parts it needs.
http://cdn.nservicebus.com/sagas.aspx
You can send the commands in using Bus.Send(IMessage[]messages). Note it takes an array and the messages will be packed into one queue message over the wire. On the receiving side, the handler will be invoked once per message. In your handler you should be able to just keep appending to your file. In the handler you could place logic to determine when to "roll" the file if necessary.
I’m looking to introduce SS Service Broker,
I have a remote orders database and a local processing database, all activity on the processing database has to happen in sequence, this seems a perfect job for Service Broker!
I’ve set up the infrastructure, I can send and receive messages and now I’m looking at the design of the processing. As I said all processes for one order need to be completed in sequence so I’ll put them in one conversation.
One of these processes is a request for external flat file data, we then wait (could be several days) and then import and process this file when it returns. How can I process half the tasks, then wait for the flat file to return before processing the other half.
I’ve had some ideas but I’m sure I’m missing a trick somewhere
1) Write all queue items to a status table and use status values – seems to remove some of the flexibility of SSSB and add another layer of tasks
2) Keep the transaction open until we get the data back – not ideal
3) Have the flat file import task continually polling for the file to appear – this seems inefficient
What is the most efficient way of managing this workflow?
thanks in advance
In my opinion it is like chain of responsibility. As far as i can understand we have the following workflow.
1.) Process for message.
2.) Wait for external file, now this can be a busy wait or if external data provides you a notification then we can actually do it in non-polling manner.
3.) Once data is received then process the data.
So my suggestion would be to use 3 different Queues one for each part, when one is done it will forward or put a new message in chained queue.
I am assuming, one order processing will not disrupt another order processing.
I am thinking MSMQ with Windows Sequential Work flow, might also be a candidate for this task.