Multiple file transfer in Mule - mule

I want to develop a flow in Mule which would poll a folder and pick-up 3 files and transfer it to a separate folder. The flow should log an error or send email if 1 of the file is not present and do the processing if all the 3 files are present.
I developed a flow with File endpoint which picks up all the files in the folder and transfer it to the destination folder. But I am not aware how to keep count on the received files (i.e. 3) or read the file names in this case and then direct the flow with the help of Choice component.
Any help would be much appreciated.

Hi get the files and send to a VM and use a mule requestor to fetch the files after polling time use for each and give the counter value as 3

Using a File inbound endpoint might not help as it will create 3 separate threads (i mean it will execute your flow for each file from the folder).
You can try like this:
1) Use a Quartz scheduler in the beginning which triggers on specific interval
2) Use a java component, use java IO to poll the folder and read the 3 file names
3) Do your business logic on whether all 3 files are available, if yes read them and process them (move to different folder, etc)
This is a cleaner approach than dealing with multiple flows.
Another option might be to override the File endpoint's message dispatcher but that is more complicated than using Quartz for a simple use case.

Related

Make airflow read from S3 and post to slack?

I have a requirement where I want my airflow job to read a file from S3 and post its contents to slack.
Background
Currently, the airflow job has an S3 key sensor that waits for a file to be put in an S3 location and if that file doesn't appear in the stipulated time, it fails and pushes error messages to slack.
What needs to be done now
If airflow job succeeds, it needs to check another S3 location and if file there exists, then push its contents to slack.
Is this usecase possible with airflow?
You have already figured that the first step of your workflow has to be an S3KeySensor
As for the subsequent steps, depending of what you mean by ..it needs to check another S3 location and if file there exists,.., go can go about it in the following way
Step 1
a. If the file at another S3 location is also supposed to appear there in sometime, then of course you will require another S3KeySensor
b. Or else if this other file is expected to be there (or to not be there, but need not be waited upon to appear in sometime), we perform the check for presence of this file using check_for_key(..) function of S3_Hook (this can be done within python_callable of a simple PythonOperator / any other custom operator that you are using for step 2)
Step 2
By now, it is ascertained that either the second file is present in the expected location (or else we won't have come this far). Now you just need to read the contents of this file using read_key(..) function. After this you can push the contents to Slack using call(..) function of SlackHook. You might have an urge to use SlackApiOperator, (which you can, of course) but still reading the file from S3 and sending contents to Slack should be clubbed into single task. So you are better off doing these things in a generic PythonOperator by employing the same hooks that are used by the native operators also

Mule SFTP component

Hi I have below queries with SFTP component if you guys can help me out that would be a great help:
1) Can we get the file size of the file picked up by SFTP component? I need to restrict the transfer based on the size of file.
2) Can I get the number of files and the file names picked up by the SFTP component?
3) Is the understanding correct: SFTP component picks up all the files from the server and keep in memory and do the processing 1 by 1 until it finishes all files?
4) If server has 5 files can SFTP component process all the 5 files in parallel rather than 1 by 1?
1-Mule does not populate the file-size field for SFTP as they do with FILE. There are Jira tickets open on this matter but MuleSoft has called it an enhancement and not given it a priority. I disagree. Perhaps ping MuleSoft, if enough users do maybe they will raise the priority and address it. They use the size internally, they simply do not expose it outside as is done with the FILE connector.
2-No, not really. It gives them back one at a time, not as a list.
3 & 4-It is only loading the entire file into memory if you tell it not to stream or do something else, like an onject-to-string transformer which forces a memory load. The files or files streams are passed back 1 by 1, but unless you restrict threading and make your flow synchronous, it will go to asych and multi-threaded and process multiple files in parallel. Flows default to asych, subflow are synchronous.
You can use the SFTP endpoint to retrieve files, and then use a Java or script call to get the file's attributes and filter to only process the files you are actually interested in, such as ones larger than your minimum size. This would seem more in line with what you are looking for in point 1. There are other options, but this would be more straight forward that others I can quickly think of.
I found 1 way to get the file size, if we provide transformer-refs="Object_to_Byte_Array" in and then do #[payload.size()] to get the size of file in Bytes? Will this cause any issue?

How to separate the latest file from Multiple files in Mule

I have 5000 files in a folder and on daily basis new file keep loaded in same file. I need to get the latest file only on daily basis among all the files.
Will it be possible to achieve the scenario in Mule out of box.
Tried keeping file component inside Poll component( To make use of waterMark) but not working.
Is there any way we can achieve this. If not please suggest the best way ( Any possible links).
Mule Studio: 5.3, RunTime 3.7.2.
Thanks in advance
Short answer: Not really any extremely quick out of the box solution. But there are other ways. Im not saying this is the right or only way of solving it, but I've earlier implemented a similar scenario in this way:
A Normal File inbound with a database table as file-log. Each time a new file is processed, a component checks if its name appears in the table. By choice or filter I only continue if it isn't in there already - and after processing I add the filename to the table.
This is a quite "heavy" solution though. A simpler access would be to use an idempotent filter with a object store. For example a Redis server: https://github.com/mulesoft/redis-connector/blob/master/src/test/resources/redis-objectstore-tests-config.xml
It is actually very simple if your incoming file contains timestamp........you can configure the file inbound connector by setting file:filename-regex-filter pattern="myfilename_#[function:timestamp].csv". I hope this helps
May be you can use a quartz scheduler( mention the time in cron expression), followed by a groovy script in which you can start the file connector . Keep the file connector in another flow.

Large file to LookUp other large file when files are dependent- Mule ESB

Could you please suggest. I have two files each have 80 to 90k product and these two files are interlinked with each other(one file have information on other) and i need to generate one single file by looking up the other files. These files probably comes in the sameTime with different name.
Both the files are csv and i need to generate the new csv.
Is that the only way I should keep any one of these files in memory and keep looking by iterating.
I planned to use Batch inside dataMapper. Is there any way we can keep the first file in Datamapper userDefined table or something like that.And the getting the new file to make a look up on it.( I'm not provided with external DB)
If any one of the file have some 5000 or 10k lines it the sense, i can keep that in memory and make the 80k file to look on it. I'm not comfortable to keep 80 or 90k file in memory.
Have reference this link: Mule ESB - design a multi file processing flow when files are dependent on each other.
Could you please suggest me the best solution.
Also any idea How long to process the file it does take, Thanks in advance.
Mule studio:5.3.1 and Runtime: 3.7.2
I would think of the problem as two distinct events from Mule's perspective, and plan to keep state from the first one in a "database" of some kind. This doesn't have to be an Oracle cluster or anything, you can run H2 in process or Redis on the same server as Mule for example.
I think you're on the right track with the Batch idea. When the first file is received, I'd create a record for each in a batch job. Then when the second file is received, I'd run a second batch job that looks up the relevant information from the database, and generates the CSV file you need. It could also remove the records that have been matched from the database in a subsequent batch step.
For the transformations, I'd recommend trying DataWeave instead of DataMapper. It's a better way to write transformation logic, and Mulesoft has deprecated DataMapper, to be removed as of Mule 4.0.

Executing Abaqus Model in Taverna

I'm pretty new to both Taverna and Abaqus but I am trying to run an Abaqus model using a "Tool" in Taverna remotely on a HPC. This works fine if I already have my model file and inputs on the HPC but I need a way of uploading the files dynamically in Taverna (trying to generically wrap Abaqus models).
I've tried adding a input port that takes a file list but I don't know how I can copy it to the "location" that I've set for the tool. Could a beanshell service be the answer or can I iterate through the file list and copy them up before executing the abaqus model?
Thanks
When you say that you created an input port that takes a file list, I guess you mean an input to the tool service.
Assuming the input port is called my_file_list, when the tool service is run, it will take a list of data values on port my_file_list. As an example, say it has "hello", "hi" and "hola" is the three values in the list.
On the location where the tool service is run, it executes in a temporary directory - a different directory for each execution of the service. It is normally something like /tmp/usecase-2029778474741087696
Three files will be created in the temporary directory; those files contain the (in this example) three values the tool service received on port my_file_list. The files could be called
/tmp/usecase-2029778474741087696/tempfile.0.tmp containing hello
/tmp/usecase-2029778474741087696/tempfile.1.tmp containing hi
/tmp/usecase-2029778474741087696/tempfile.2.tmp containing hola
There will also be a file called my_input_list. That file will contain
/tmp/usecase-2029778474741087696/tempfile.0.tmp
/tmp/usecase-2029778474741087696/tempfile.1.tmp
/tmp/usecase-2029778474741087696/tempfile.2.tmp
The script of your tool service would normally read the contents of my_input_list line by line and do something with the contents of the listed file(s).
I have also seen some scripts that 'cheat' and iterate directly over tempfile*.tmp but that would be "a bad thing". The problem with that trick, is that if you want to add a second list of files to the tool service then the file my_input_list could contain
/tmp/usecase7932018053449784034/tempfile.4.tmp
/tmp/usecase7932018053449784034/tempfile.5.tmp
/tmp/usecase7932018053449784034/tempfile.6.tmp
as other temporary files were used for the other file list port.
I hope that helps
The tool service allows you to upload files - but if you are using the HPC through a job submission node, then you would have to modify your command line tool to then use the job file staging command to further push the files as part of the job. The files would be available in the current (temporary) directory of the specified tool script.
I would try to do it through the Tool service and not involve the beanshell - then you can keep your workflow simpler.
A good thing to remember is that you can write multiple shell commands in the box.
Similarly you would probably want to retrieve back the results so that you can process them further in the workflow (unless they are massive - in which case you should just output their remote filenames and send them in again to the next HPC job)
The exact commands to use for staging files and retrieving them depends on the HPC job submission system. Which one are you using?
Thanks for the input guys.
It was my misunderstanding of how Taverna uses the File list. All the files in the list are copied to the temp "sandbox" and are therefore available for use.
Another nice easy way is to zip the directory and pass the zipped files into an input port for the service. Then just unzip the files inside the command.
Thanks again