Iterate over a directory and extract only file names without reading the payload - mule

I am using the Mule 4.4 community edition on premise.
Thanks to help, I have been able to read a large file without consuming memory and processing it, which is all good (here).
Now building on this further - my use case is to read all .csv files from within a directory.
And then process them one by one:
\opt\out\
students.csv
teachers.csv
collesges.csv
....
So my plan was to list the files in the directory:
<sftp:list doc:name="List" config-ref="SFTP_Config" directoryPath="/opt/out">
<non-repeatable-iterable />
<sftp:matcher filenamePattern="#['*.csv' ]"
directories="EXCLUDE" symLinks="EXCLUDE" />
</sftp:list>
And then I wanted to only read file names from directory and not read payload.
As per this early access article we are advised to use <non-repeatable-iterable />. However, after the list file operation as per article when I try to extract attributes:
<set-payload doc:name="Set Payload" value="#[output application/json --- payload map $.attributes]"/>
No attributes are available... (my plan is to extract the file names and then run a for loop for each file name and then a choice condition to determine if file name has student, use student transformer, if teacher use teacher transformer, etc.)
However, as attributes are not available, I am not able to pass file names to the for loop (yet to be written).
So I changed from <non-repeatable-iterable /> to <repeatable-in-memory-iterable />
Code below:
<sftp:list doc:name="List" config-ref="SFTP_Config" directoryPath="/opt/out">
<repeatable-in-memory-iterable />
<sftp:matcher filenamePattern="#['*.csv' ]"
directories="EXCLUDE" symLinks="EXCLUDE" />
</sftp:list>
Using the above, I can extract the attributes of file names.
I am confused about the following:
The files to be processed in the above directory will be large (each file 700 MB), so while iterating the directory by using repeatable-in-memory-iterable, will it cause any memory issues? (I do not want to read file content, simply get file names at this stage)
Here is the complete payload till now (note - it does not contain any for loop to iterate over files, which I will plug in...)
<flow name="employee-process-flow">
<http:listener doc:name="Listener" config-ref="HTTP_Listener_config" path="/processFiles"/>
<set-variable value='#[now() as String { format: "ddMMuu" }]' doc:name="Set todays date as ddmmyy" doc:id="c6a91a41-65b1-46df-a720-9c13fe360b6b" variableName="today"/>
<sftp:list doc:name="List" config-ref="SFTP_Config" directoryPath="/opt/out">
<repeatable-in-memory-iterable />
<sftp:matcher filenamePattern="#['*.csv' ]"
directories="EXCLUDE" symLinks="EXCLUDE" />
</sftp:list>
<set-payload doc:name="Set Payload" value="#[output application/json --- payload map $.attributes]"/>
<foreach doc:name="For Each" >
<logger level="INFO" doc:name="Logger" message="we are here"/>
</foreach>
</flow>

The List operation returns a list of messages, and each has a payload and attributes. The content of the files is returned as the payload, in a lazy way, meaning that the file's content is read only if you try to access that element's payload.
It makes sense that if you a non-repeatable-iterator and don't access the payload of each item in the <foreach> then you should not have any memory issues, because the contents are not read.
By using in memory repeatable streaming it is possible that the entire payload is being read into memory. Try reading a file a few gigabytes in size and see what happens there.
I'm not sure what the problem is with the attributes. It should work the same in any streaming mode.
Note that if you plan on doing something with the attributes—other than printing them—then you should output to application/java instead of JSON, to avoid unneeded conversions to and from JSON. For example, in your flow the output is used as input for the <foreach>, so it would be better for it to be Java.
Example:
output application/java --- payload map $.attributes

Related

Mule: How to print the file name in logger?

I want to print the mule configuration file name, in the logger in the flow, how can I get it?
Suppose the configuration file name in test.xml, inside that a flow is having logger, which prints test.xml, how can I get this?
<flow name="filenameFlow">
<http:listener config-ref="HTTP_Listener_Configuration" path="/Hello" doc:name="HTTP"/>
<logger message="#[app.name.toString()]" level="INFO" doc:name="Logger"/>
</flow>
[name.flow] is not correct one.
you should go with #[flow.name] which is the correct form. Don't mislead by your answers.
Thanks,
Should print out the name of your application, in you case "test". This is not however the name of the xml file. #[flow.name] will give you the name of the flow currently executing.
Try these expressions:
1) #[message.outboundProperties['originalFileName']]
2) #[header:originalFilename]
I have done almost the same thing a few days ago.
Add a global element of type property placeholder, give location: mule-deploy.properties.
In logger, use ${config.resources}.
It will work if there is only one config file.
Just as #dlb explained, I am also wondering you may have better solution for your requirement, basically I am asuming that you want to make log more transparent, and easier to locate which flow caused any event/error.
As such, it makes more sense to log flow name rather than the config file name, which may contain multiple flows.You can utilize the catagory in log component for this purpose:
<logger level="INFO" category="${application-prefix}.myMainFlow" doc:name="Logger" message="#['payload is ---\n' + payload]"/>
In each and every log component (logs should be used in important places kind of milestones), input ${application-prefix}.flowName in catagory (property is used for reusing application's name in all logs, and flowName should be hardcoded), then you will find logs like below in runtime:
INFO 2016-09-07 17:00:27,566 [[test].HTTP_Listener_Configuration.worker.01] com.myOrg.myApp.myMainFlow: payload is ---
Hello World
#[message.outboundproperties[originalFilename]]
Try this expression.

Mule Server 3.6 > Anypoint Studio > Raw JSON as POST data

Still learning Mulesoft's Anypoint Studio... I am confused as how will I be able to access raw JSON POST data via the HTTP Listener then use the Choice flow control to execute conditions based on a value from a given JSON index. Anyone can show/tell me how to do this?
The JSON HTTP body will automatically become the payload of your message in Mule probably represented as Stream.
Just for demo purposes, try logging the payload after your http:listener using:
<object-to-string-transformer />
<logger level="INFO" message="#[payload]" />
There best way to query JSON is to transform it to a Map suing the JSON module transformers.
<json:json-to-object-transformer returnClass="java.util.HashMap" />
And then query it using MEL like standard MVEL or Java syntax.
For a JSON document like: {"person" : {"name" : "bob"}}
<logger message="#[payload.person.name]" level="INFO" />
You can use these expressions in your choic router also:
<choice>
<when expression="#[payload.person.name == 'bob']">
do something ...
</when>
</choice>

Message Splitter

I needed to split a message into 3 different payloads and transform and send to 3 routers. So the payload initially will have a header a body or detail and a footer. These 3 different payloads need to be extracted and send to 3 different routers. What would be the most efficient way to do it.
It depends on your body/payload type. If your payload is XML, you can easily split it using xpath and route it using content based routing similar to:
<splitter expression="#[xpath('//nodes/node)']" />
<choice>
<when expression="#[xpath('//node/id').text ='myid']">
<!-- Route somewhere -->
</when>
<otherwise>
<!-- Route somewhere else -->
</otherwise>
</choice>
The expression splitter above can take any MEL expression to split up your payload. There are many other splitters, for example if your payload is already a java Collection, you can simply use the collection-splitter.
Other splitter info can be found here: http://www.mulesoft.org/documentation-3.2/display/32X/Message+Splitting+and+Aggregation
Also there are other routers that can help you with fork and join patterns if you need to process messages asynchronously as well. Here's a good post on that: http://java.dzone.com/articles/aggregation-mule-%E2%80%93-%E2%80%9Cfork-and

Mule ESB: File outputpattern doesn't translate the pattern

I'm using Mule ESB CE 3.4. I have a requirement where I'm reading the configuration information from database and using it as the file name for the file outbound endpoint. Here is an example code (the code may not work as I have only given an outline)
<file:connector name="File-Data" autoDelete="false" streaming="true" validateConnections="true" doc:name="File" />
.....
<!-- Gets the configuration from database using a transformer. The transformer populates the configuration entries in a POJO and puts that in a session. -->
<custom-transformer class="com.test.DbGetConfigsTransformer" doc:name="Get Integration Configs"/>
....<!-- some code to process data -->
<logger message="$$$: #[sessionVars['currentFeed'].getFilePattern()]" doc:name="Set JSON File Name" /> -->
<file:outbound-endpoint path="/temp" outputPattern="#[sessionVars['currentFeed'].getFilePattern()]" responseTimeout="10000" mimeType="text/plain" connector-ref="File-Data" doc:name="Save File"/>
The above code throws the following error:
1. The filename, directory name, or volume label syntax is incorrect (java.io.IOException)
java.io.WinNTFileSystem:-2 (null)
2. Unable to create a canonical file for /temp/Test_User_#[function:datestamp:YYYYMMddhhmmss.sss] (org.mule.api.MuleRuntimeException)
org.mule.util.FileUtils:354 (http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/api/MuleRuntimeException.html)
3. Failed to route event via endpoint: DefaultOutboundEndpoint{endpointUri=file:///temp, connector=FileConnector
In the database table, the field name is called FilePattern and it has the value 'Test_User_#[function:datestamp:YYYYMMddhhmmss.sss]. If I hardcode the value or move this value to the mule configuration file
file.name=Test_User_#[function:datestamp:YYYYMMddhhmmss.sss]
and use the configuration property syntax (for e.g. ${file.name} in the 'outputpattern'), it works. But if I read the same from db and use it, it is not working and throwing the error. The logger displays as (which is read from the db)
$$$: Test_#[function:datestamp:YYYYMMddhhmmss.sss]
Any help is much appreciated.
If your datestamp format does not vary, you should just store the environment prefix in your db and use something like:
outputPattern="#[sessionVars['prefix']+server.dateTime.format('YYYYMMddhhmmss.sss')]"
If you need to use your current database values, you can use basic Java string methods to find the correct substrings. For example:
#[sessionVars['currentFeed'].getFilePattern().substring(0,sessionVars['currentFeed'].getFilePattern().indexOf('function')-2)+server.dateTime.format('YYYYMMddhhmmss.sss')]
If you use different datestamp formats, you can find that part as well using similar String methods. However, I still suggest you come up with an implementation that only stores the environment prefix in the db.

Sending a file attachement with Mule

I have seen this answer but it does not show how you use the MEL to send the file in the value field. If you enter some value in there that is the content of the file. I assume you have to move the payload from the file endpoint connector to the attachment value property using MEL.
Also how can you set the content type dynamically
Mule SMTP - send email with attachment
Thanks
Jaco.
You can use the file-to-string-transformer to transform your file to string. You can also use Mule variables, properties, etc for defining the content type or other params. Example:
<file:inbound-endpoint path="/tmp/attachments" responseTimeout="10000"/>
<file:file-to-byte-array-transformer/>
<set-variable variableName="ct" value="test/plain" />
<set-attachment attachmentName="#[message.outboundProperties.filename]" value="#[payload]" contentType="#[flowVars['ct']]"/>
<set-payload value="this is my message"/>
<smtp:outbound-endpoint...