A simplified version of the process I am trying to accomplish is that I will have sets of files which should be processed, in order, and only as complete sets. For proof of concept, I have created a flow to collect a set of two files named as "File1*YYMMDD*.txt" and "File2*YYMMDD*.txt" which will constitute a set for date YYMMDD. I use a file inbound-endpoint to watch for files and use the date portion of the name to define a correlation ID. A collection-aggregator then groups these into a set of 2 and a file outbound then dispatched the files from the set:
<configuration>
<default-threading-profile doThreading="false" />
</configuration>
<flow name="Aggregator">
<file:inbound-endpoint path="G:/SourceDir" moveToDirectory="G:/SourceDir/Archive"
responseTimeout="10000" doc:name="get-working-files"
pollingFrequency="5000" fileAge="600000">
<file:filename-regex-filter pattern="File1(.*).txt|File2(.*).txt" caseSensitive="false"/>
<message-properties-transformer>
<add-message-property key="MULE_CORRELATION_GROUP_SIZE" value="2" />
<add-message-property key="MULE_CORRELATION_ID"
value="#[message.inboundProperties
.originalFilename
.substring(5, message.inboundProperties.originalFilename.lastIndexOf('.'))]" />
</message-properties-transformer>
</file:inbound-endpoint>
<collection-aggregator timeout="86400000" failOnTimeout="false" doc:name="Collection Aggregator">
</collection-aggregator>
<foreach doc:name="For Each">
<logger message="Processing: #[message.inboundProperties.originalFilename]" level="INFO"
doc:name="Some process"/>
<file:outbound-endpoint responseTimeout="10000" doc:name="Destination"
outputPattern="#[function:datestamp:yyyyMMdd.HHmmss].#[message.inboundProperties.originalFilename]"
path="G:/DestDir"/>
</foreach>
</flow>
The issues I have are two-fold.
1) If I have only one file from the set, say File2150102.txt, the flow correctly identifies the set is incomplete and waits. After about 1 minute, the file again has a lock put on it and is accepted as the second file in the collection. The file is processed through the outbound endpoint and archived, and then this process is attempted again for the file a second time and fails as the file has already been removed:
INFO 2015-07-14 11:19:51,205 [[fileset].connector.file.mule.default.receiver.01] org.mule.transport.file.FileMessageReceiver: Lock obtained on file: G:\SourceDir\File2150102.txt
INFO 2015-07-14 11:21:01,241 [[fileset].connector.file.mule.default.receiver.01] org.mule.transport.file.FileMessageReceiver: Lock obtained on file: G:\SourceDir\File2150102.txt
INFO 2015-07-14 11:21:01,273 [[fileset].connector.file.mule.default.receiver.01] org.mule.api.processor.LoggerMessageProcessor: Processing: File2150102.txt
INFO 2015-07-14 11:21:01,304 [[fileset].connector.file.mule.default.receiver.01] org.mule.lifecycle.AbstractLifecycleManager: Initialising: 'connector.file.mule.default.dispatcher.452370795'. Object is: FileMessageDispatcher
INFO 2015-07-14 11:21:01,304 [[fileset].connector.file.mule.default.receiver.01] org.mule.lifecycle.AbstractLifecycleManager: Starting: 'connector.file.mule.default.dispatcher.452370795'. Object is: FileMessageDispatcher
INFO 2015-07-14 11:21:01,320 [[fileset].connector.file.mule.default.receiver.01] org.mule.transport.file.FileConnector: Writing file to: G:\DestDir\20150714.112101.File2150102.txt
WARN 2015-07-14 11:21:01,336 [[fileset].connector.file.mule.default.receiver.01] org.mule.transport.file.ReceiverFileInputStream: Failed to move file from G:\SourceDir\File2150102.txt to G:\SourceDir\archive\File2150102.txt
INFO 2015-07-14 11:21:01,336 [[fileset].connector.file.mule.default.receiver.01] org.mule.api.processor.LoggerMessageProcessor: Processing: File2150102.txt
INFO 2015-07-14 11:21:01,336 [[fileset].connector.file.mule.default.receiver.01] org.mule.transport.file.FileConnector: Writing file to: G:\DestDir\20150714.112101.File2150102.txt
WARN 2015-07-14 11:21:01,476 [[fileset].connector.file.mule.default.receiver.01] org.mule.transport.file.FileMessageReceiver: Failure trying to remove file G:\SourceDir\File2150102.txt from list of files under processing
I can find no setting which is controlling this iteration of grabbing the file again, my polling frequency is set at 5 seconds, I require a file age of 10 minutes, and gave the collection timeout a very long period of 10 days so it should sit and wait until another file is found, but I do not want it picking up the same file a second time.
2) In a more complex case, I have files: File1150201.txt, File2150201.txt, File1150202.txt, File1150203.txt, and File2150203.txt in the directory. The flow starts grabbing files, correctly finds and processes the set for "150201" and dispatches it. It finds the file for 150202, recognizes it needs the second file and does not process it. It then finds the complete set for "150203" and does process it. I need for it to not process this set until the "150202" set has been processed. Can someone tell me how to get it to wait on the incomplete set and not continue with other sets? I have the correct processing order, just not the ability to wait for the missing file and keep sets in sequence if there is an incomplete set.
I am not sure whether I understand it correctly, but for your issue 1, the matching (and waiting for the incomplete sets) is working for me with the below test flow --
<file:connector name="File" autoDelete="false" streaming="false" validateConnections="true" doc:name="File">
<file:expression-filename-parser />
</file:connector>
<file:connector name="File1" autoDelete="false" outputAppend="true" streaming="false" validateConnections="true" doc:name="File"/>
<vm:connector name="VM" validateConnections="true" doc:name="VM">
<receiver-threading-profile maxThreadsActive="1"></receiver-threading-profile>
</vm:connector>
<flow name="fileaggreFlow2" doc:name="fileaggreFlow2">
<file:inbound-endpoint path="C:\InFile" moveToDirectory="C:\InFile\Archive" responseTimeout="10000" connector-ref="File" doc:name="File">
</file:inbound-endpoint>
<message-properties-transformer overwrite="true" doc:name="Message Properties">
<add-message-property key="MULE_CORRELATION_ID" value="#[message.inboundProperties.originalFilename.substring(5,13)]"/>
<add-message-property key="MULE_CORRELATION_GROUP_SIZE" value="2"/>
<add-message-property key="MULE_CORRELATION_SEQUENCE" value="#[message.inboundProperties.originalFilename.substring(0,5)]"/>
</message-properties-transformer>
<vm:outbound-endpoint exchange-pattern="one-way" path="Merge" doc:name="VM" connector-ref="VM"/>
</flow>
<flow name="fileaggreFlow1" doc:name="fileaggreFlow1" processingStrategy="synchronous">
<vm:inbound-endpoint exchange-pattern="one-way" path="Merge" doc:name="VM" connector-ref="VM"/>
<logger level="INFO" doc:name="Logger"/>
<processor-chain doc:name="Processor Chain">
<collection-aggregator timeout="1000000" failOnTimeout="true" storePrefix="#[MULE_CORRELATION_ID]" doc:name="Collection Aggregator"/>
<logger message="#[payload]" level="INFO" doc:name="Logger"/>
<foreach doc:name="For Each">
<logger message="Processing: #[message.inboundProperties.originalFilename]" level="INFO" doc:name="Some process"/>
<file:outbound-endpoint path="C:\TestFile" outputPattern="#[message.inboundProperties.originalFilename.substring(5,17)]" responseTimeout="10000" connector-ref="File1" doc:name="Destination"/>
</foreach>
</processor-chain>
</flow>
It would help if you could post the complete flow. My file names are File120150107.txt (and so on...)
I think your issue is just because you made the failOnTimeout = 'false'. Make it as 'True'
<collection-aggregator timeout="86400000" failOnTimeout="true" doc:name="Collection Aggregator">
.
It will wait for the files ( 2 or 3 files, based on your requirement) until it reached the specific time ( Here, 86400000). Once it exceed, it will fail.
In your case( FailOnTime= 'False'), for example. If you try sending some 4 files. Within the time if only 2 files are received. Incomplete file will be processed ( It wont wait for remaining 2 files).
Try to check how many files you are planning to process, and how long it will take to process ( example:4 files), adjust the time accordingly.
Related
I'm trying to use mule inbound file connector with poll scope got error saying couldn't start endpoint. If I remove poll scope and use file connector with default polling and its working fine without any file path changes.
I was wondering why is Poll scope giving error? If file inbound connector not allowed to wrapped in poll scope, why anypoint studio showing poll scope in the wrap in option ?
I found similar question, but I didn't see detailed explanations.
Mule won't allow POLL message processor to read file using file Inbound?
Advance thanks for your response.
Use mule-module-requester https://github.com/mulesoft/mule-module-requester, together with the Poll Scheduler.
relevant posts: http://blogs.mulesoft.com/dev/mule-dev/introducing-the-mule-requester-module/
Another way is,
Set the FTP flow initialState="stopped", and let the poll scheduler start the flow. After the FTP processing, stop the flow again.
see sample code:
<ftp:connector name="FTP" pollingFrequency="1000"
validateConnections="true" moveToDirectory="/work/ftp/processed"
doc:name="FTP" />
<flow name="scheduleStartFTPFlow">
<poll doc:name="Poll">
<fixed-frequency-scheduler frequency="1"
timeUnit="MINUTES" />
<expression-component doc:name="START FTP FLOW"><![CDATA[if(app.registry.processFTPFlow.isStopped()){
app.registry.processFTPFlow.start();
}]]></expression-component>
</poll>
<logger message="Poll Logging: #[payload]" level="INFO"
doc:name="Logger" />
</flow>
<flow name="processFTPFlow" initialState="stopped">
<ftp:inbound-endpoint host="localhost" port="21"
path="/data/ftp" user="Sanjeet" password="sanjeet123" responseTimeout="10000"
doc:name="FTP" connector-ref="FTP" />
<logger message="Logging FTP #[payload]" level="INFO" doc:name="Logger" />
<expression-component doc:name="STOP FTP FLOW"><![CDATA[app.registry.processFTPFlow.stop();]]></expression-component>
</flow>
Please, provide SSCCE.
Based on your question you do not need Poll at all. File Connector already has this feature to check file periodically. Here is example which polls file every 0.123 seconds
<file:inbound-endpoint path="/tmp" responseTimeout="10000" doc:name="File" pollingFrequency="123"/>
My suggestion is to use the quartz connector beside the file connector and set the interval in the quartz connector. Or use the file connector itself having the poll frequency so no need to wrap the file in poll scope.
you can create a file endpoint in the global element section and then use mule requester to invoke that endpoint inside a poll scope.
<file:connector name="File1" autoDelete="true" streaming="true" validateConnections="true" doc:name="File"/>
<file:endpoint connector-ref="File1" name="File" responseTimeout="10000" doc:name="File" path="/"/>
<flow name="pocforloggingFlow1">
<poll doc:name="Poll">
<mulerequester:request resource="File" doc:name="Mule Requester"/>
</poll>
</flow>
I want to run one thread at a time in my mule flow. And also i want to take input only one by one, i.e.,for first input once I completed with the flow, only then Mule flow picks the second input. Which strategy should I use??
If I used synchronous strategy, and we have two or more than two files in a folder looking by Mule Flow, it picks all the input at a time.
And if i use asynchronous strategy and 1 thread at a time, then I am not able to complete the full flow before taking any other input.
<flow name="Catalog_command_Execution" doc:name="Catalog_command_Execution" processingStrategy="synchronous">
<file:inbound-endpoint path="${inputCAT.path}" responseTimeout="10000" connector-ref="File" doc:name="Catalog File"/>
<object-to-string-transformer doc:name="File Mapping"/>
<custom-transformer class="com.tcs.sdm.kcm.cmdExecution.CmdCAT" doc:name="CAT cmd Execution"/>
<logger message="******************Entered file #[message.inboundProperties.originalFilename] for command execution has been Processed*********" level="INFO" category="Audit_LogCAT" doc:name="Logger"/>
<catch-exception-strategy doc:name="Catch Exception Strategy">
<logger message="*******************************Entered Catalog file for command execution is having error: #[exception.causeException]****************" level="INFO" category="Audit_LOgCAT" doc:name="Logger"/>
</catch-exception-strategy>
</flow>
<flow name="CatalogueFlow_AB" doc:name="CatalogueFlow_AB" processingStrategy="allowOneThread">
<wmq:inbound-endpoint queue="${wmq.queue.nameCT_AB}" doc:name="WMQ" connector-ref="WMQ"/>
<object-to-string-transformer doc:name="File Mapping"/>
<logger level="INFO" doc:name="CAT Logger" category="Audit_LogCAT" message="******************Entered Catalogue SOAP File with Province Name AB is Processed from queue*********"/>
<custom-transformer class="com.tcs.sdm.kcm.catalog.ServiceController_AB" doc:name="Java"/>
<catch-exception-strategy doc:name="Catch Exception Strategy">
<logger level="INFO" doc:name="CAT Exception Logger" category="Audit_LogCAT" message="*******************************Entered Catalogue SOAP File with Province Name AB is having error: #[exception.causeException]****************"/>
</catch-exception-strategy>
</flow>
From the kind of scenario you are looking for processing one file after another Mule Synchronous processing strategy should serve the purpose.
If you see Mule picking up more than one file then the flow needs to be looked at as to why this is happening.
Update:
The processing strategy on your flow for WMQ inbound is not synchronous. Then it should work as expected.
<flow name="CatalogueFlow_AB" doc:name="CatalogueFlow_AB" processingStrategy="synchronous">
Hope this helps.
Old thread but have you tried setting the WMQ consumer count to 1?
Flow can be synchronous but that doesn't mean the inbound connector will work in a synchronous manner. For file based connector you can set dispatcher to be non-threaded and for WMQ you should try to make consumers to 1.
I have a jms connector, i am receiving message from a queue processing the message in a flow, calling db to get the data based on some ids in the message and writing response output to files, i am using dynamic outbound endpoints to decide output location.
<jms:connector name="tibco" numberOfConsumers="20" ..... >
.....
</jms:connector>
<flow name="realtime" doc:name="ServiceId-8">
<jms:inbound-endpoint queue="${some.queue}" connector-ref="tibco" doc:name="JMS">
<jms:transaction action="ALWAYS_BEGIN"/>
</jms:inbound-endpoint>
<processor ref="proc1"></processor>
<processor ref="proc2"></processor>
<component doc:name="Java">
<spring-object bean="comp1"/>
</component>
<processor ref="proc3"></processor>
<collection-splitter doc:name="Collection Splitter"/>
<processor ref="endpointprocessor"></processor>
<foreach collection="#[message.payload.consumerEndpoints]" counterVariableName="endpoints" doc:name="Foreach">
<when expression="#[consumerEndpoint.getOutputType().equals('txt') and consumerEndpoint.getChannel().equals('file')]">
<processor-chain>
<file:outbound-endpoint path="#[consumerEndpoint.getPath()]" outputPattern="#[consumerEndpoint.getClientId()]-#[attributes['eventId']]%#[consumerEndpoint.getTicSeedCount()]-#[attributes['dateTime']].tic" responseTimeout="10000" doc:name="File"/>
</processor-chain>
</when>
<when expression="#[consumerEndpoint.getOutputType().equals('txt') and consumerEndpoint.getChannel().equals('ftp')]">
<processor-chain>
<ftp:outbound-endpoint path="#[consumerEndpoint.getPath()]" outputPattern="#[consumerEndpoint.getClientId()]-#[attributes['eventId']]%#[consumerEndpoint.getTicSeedCount()]-#[attributes['dateTime']].tic" host="#[consumerEndpoint.getHost()]" port="#[consumerEndpoint.getPort()]" user="#[consumerEndpoint.getChannelUser()]" password="#[consumerEndpoint.getChannelPass()]" responseTimeout="10000" doc:name="FTP"/>
</processor-chain>
</when>
</choice>
</foreach>
<rollback-exception-strategy doc:name="Rollback Exception Strategy">
<processor ref="catchExceptionCustomHandling"></processor>
</rollback-exception-strategy>
</flow>
Above is not complete flow. i pasted the important parts to understand.
Question 1. As i have not defined any thread strategy at any level, and connector has numberOfConsumers="20", if i drop 20 messages in queue how many threads will start.
prefetch size in the jms queue is set to 20.
Question 2: Do i need to configure threading strategy at receiver end and/or at flow level.
some time when the load is very high(let say 15k msgs in queue in a minute) i see message processing gets slow and thread dump shows some thing like below:
"TIBCO EMS Session Dispatcher (7905958)" prio=10 tid=0x00002aaadd4cf000 nid=0x3714 waiting for monitor entry [0x000000004af1e000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.mule.endpoint.DynamicOutboundEndpoint.createStaticEndpoint(DynamicOutboundEndpoint.java:153)
- waiting to lock <0x00002aaab711c0e0> (a org.mule.endpoint.DynamicOutboundEndpoint)
Any help and pointers will be appreciated.
Thanks-
Message processing is getting slow because of dynamic endpoint, I see thread congestion when dynamic outbound endpoint is created and used. I was using mule 3.3.x and after looking at mule 3.4.x code i realized that dynamic outbound endpoint creation is handled more appropriately. upgraded to 3.4 and the issue is almost gone.
I tried to create a Data-Mapper example in mule in which both inbound and outbound endpoints are File, Looks some thing like.
When i execute this program output folder of file remains empty, Logically i assume that i need to put and HashMap to XML transformer between Data Mapper and Output File. More Over i created a csv file to xml file selecting from example option in data mapper.
Initially i tried to use FTP endpoint it started resulting into error so i replaced FTP with file endpoint.
Here I am Sharing configuration.xml file
<mule xmlns:file="....>
<data-mapper:config name="sample_mapper_grf" transformationGraphPath="sample_mapper.grf" doc:name="DataMapper"/>
<flow name="CSV_to_XML_Data_MapperFlow1" doc:name="CSV_to_XML_Data_MapperFlow1">
<file:inbound-endpoint path="/home/jay/CSV_XML_/input" responseTimeout="10000" doc:name="Input File"/>
<data-mapper:transform config-ref="sample_mapper_grf" doc:name="DataMapper"/>
<file:outbound-endpoint path="/home/jay/CSV_XML_/output/" responseTimeout="10000" doc:name="Output File"/>
</flow>
</mule>
Data-Mapper configuration image is here
Add a Groovy component after the data mapper and try and dump the contents
println "post mapping payload " + payload
return payload
I got it resolved using.
here is the configuration.xml
<mule ....>
<data-mapper:config name="sample_mapper_grf"transformationGraphPath="sample_mapper.grf" doc:name="DataMapper"/>
<flow name="CSV_to_XML_Data_MapperFlow1" doc:name="CSV_to_XML_Data_MapperFlow1">
<file:inbound-endpoint path="/home/jay/CSV_XML_/input" responseTimeout="10000" doc:name="Input File"/>
<data-mapper:transform config-ref="sample_mapper_grf" doc:name="DataMapper"/>
<object-to-string-transformer doc:name="Object to String"/>
<file:outbound-endpoint path="/home/jay/Output" responseTimeout="10000" doc:name="File" outputPattern="#[function:dateStamp].xml"/>
</flow>
</mule>
I have a mule flow as under
<flow name="flow1" doc:name="f1">
<file:inbound-endpoint path="C:\input" responseTimeout="10000"
doc:name="File" />
</flow>
<flow name="flow2" doc:name="f2">
<http:inbound-endpoint address="http://localhost:8080"
doc:name="HTTP" exchange-pattern="request-response" />
<flow-ref name="flow1" doc:name="Flow Reference" />
<file:outbound-endpoint path="C:\outputfile"
responseTimeout="10000" doc:name="File" />
</flow>
I am trying to move/upload multiple files from source to destination (can be anything e.g. FTP or File outbound etc..) by using the flow.
The reason for doing in this way is that I want to invoke the job from CLI(Command Line Interface) using CURL.
But it is not working....
Edited
I need to pick up some files(multiple files) from a particular folder located in my hard drive. And then move those to some outbound process which can be FTP site or some other hard drive location.
But this flow needs to be invoked from CLI.
Edited (Based on David's answer)
I now have the flow as under
<flow name="filePickupFlow" doc:name="flow1" initialState="stopped">
<file:inbound-endpoint path="C:\Input" responseTimeout="10000" doc:name="File"/>
<logger message="#[message.payloadAs(java.lang.String)]" level="ERROR" />
</flow>
<flow name="flow2" doc:name="flow2">
<http:inbound-endpoint address="http://localhost:8080/file-pickup/start" doc:name="HTTP" exchange-pattern="request-response"/>
<expression-component>
app.registry.filePickupFlow.start();
</expression-component>
<file:outbound-endpoint path="C:\outputfile" responseTimeout="10000" doc:name="File"/>
</flow>
I am getting couple of problems
a) I am getting an error that - Attribute initialState is not defined as a valid property of flow
However, if I remove that attribute, the flow continues without waiting for "http://localhost:8080/file-pickup/start" to fire up.
b) The files are not moved to the destination folder
So how can I do so?
You can't reference a flow that has an inbound endpoint in it because such a flow is already active and consuming events from its inbound endpoint so you can't invoke it on demand.
The following, tested on Mule 3.3.1, shows how to start a "file pickup flow" on demand from an HTTP request.
<flow name="filePickupFlow" initialState="stopped">
<file:inbound-endpoint path="///tmp/mule/input" />
<!-- Do something with the file: here we just log its content -->
<logger message="#[message.payloadAs(java.lang.String)]" level="ERROR" />
</flow>
<flow name="filePickupStarterFlow">
<http:inbound-endpoint address="http://localhost:8080/file-pickup/start"
exchange-pattern="request-response" />
<expression-component>
app.registry.filePickupFlow.start();
</expression-component>
<set-payload value="File Pickup successfully started" />
</flow>
HTTP GETting http://localhost:8080/file-pickup/start would then start the filePickupFlow, which in turn will process the files in /tmp/mule/input.
Note that it is up to you to configure the file:connector for what behavior it must have for files it processes, either deleting them or moving them to another directory are two main options.
I guess in this case a File inbound to read a file on demand will not be helpful.
Please try if the follwoing way.
<flow name="flow1" doc:name="f2">
<http:inbound-endpoint address="http://localhost:8080"
doc:name="HTTP" exchange-pattern="request-response" />
<component>
<spring-object bean="fileLoader"></spring-object>
</component>
<file:outbound-endpoint path="C:\outputfile"
responseTimeout="10000" doc:name="File" />
</flow>
So the Custom component will be a Class which reads the file from your specified location.
Hope this helps.
You can use Mule Requester for a clean solution. See the details in the blog entry Introducing the Mule Requester.