Mule - split a big JSON list into multiple smaller JSON lists - mule

I have a list of json objects containing about 200 objects. I want to split that list into smaller lists where each list contains max 20 objects each. I would like to POST each sublist to HTTP based endpoint.
<flow name="send-to-next-step" doc:name="send-to-vm-flow">
<vm:inbound-endpoint exchange-pattern="one-way"
path="send-to-next-step-vm" doc:name="VM" />
<!-- received the JSON List payload with 200 objects-->
<!-- TODO do processing here to split the list into sub-lists and call sub-flow for each sub-list
<flow-ref name="send-to-aggregator-sf" doc:name="Flow Reference" />
</flow>
One possible way is that I write a java component which iterates over the list and after iterating over each 20 objects, call sub-flow. Is there any better way of accomplishing this?

If your payload is a Java Collection, the Mule foreach scope has batching built in: http://www.mulesoft.org/documentation/display/current/Foreach
Example:
<foreach batchSize="20">
<json:object-to-json-transformer/>
<http:outbound-endpoint ... />
</foreach>

You could use the Groovy collate method for the batching, and then foreach or collection-splitter, depending on your needs:
<json:json-to-object-transformer returnClass="java.util.List"/>
<set-payload value="#[groovy:payload.collate(20)]"/>
<foreach>
<json:object-to-json-transformer/>
<http:outbound-endpoint exchange-pattern="request-response" host="0.0.0.0" port="8082" path="xx"/>
</foreach>
<set-payload value="#[groovy:payload.flatten()]"/>
This will send each batch of 20 objects to the http endpoint and then flatten back to the original list.

Related

Accessing variable inside of forEach in mule

I have two queries
Suppose if I declared two variables inside a forEach like flowVars.ABC and flowVars.DEF, how can I access those 2 variables outside that forEach block?
And each variable has a JSON payload, how can I add those 2 variable's data into single JSON payload?
Can anyone assist me? I unable to access the variables inside of foreach and adding 2 JSON.
This is my sample code
<flow name="test">
<foreach doc:name="For Each">
<scatter-gather doc:name="Scatter-Gather">
<set-variable variableName="ABC" value="#[payload]" mimeType="application/json" doc:name="ABC"/>
<set-variable variableName="DEF" value="#[payload]" mimeType="application/json" doc:name="DEF"/>
</scatter-gather>
</foreach>
<set-payload value="#[flowVars.ABC + flowVars.DEF]" mimeType="application/json" doc:name="adding 2 vars"/>
</flow>
You need to understand how scoping works with foreach. Any variables set inside the foreach scope will NOT be available outside of that scope. However, variables set outside of the foreach scope (e.g. a set-variable before the foreach) will be available inside the foreach scope. This should help you get around your issue. I'm taking out the scatter-gather because it really doesn't serve any purpose in your example:
<flow name="test">
<set-variable variableName="ABC value="#[payload] mimeType="application/json" doc:name="ABC"/>
<set-variable variableName="DEF value="#[payload] mimeType="application/json" doc:name="DEF"/>
<foreach doc:name="For Each">
<set-variable variableName="ABC" value="#[payload]" mimeType="application/json" doc:name="ABC"/>
<set-variable variableName="DEF" value="#[payload]" mimeType="application/json" doc:name="DEF"/>
</foreach>
<set-payload value="#[flowVars.ABC ++ flowVars.DEF]" mimeType="application/json" doc:name="adding 2 vars"/>
</flow>
Beyond this, I'm not sure if your code is a simplification or not, but as it stands now there are a couple things that are questionable:
Why are you using a scatter-gather? If you don't really need to do multiple things asynchronously (like making calls to multiple services), it's just a complication in your code. Setting two vars doesn't qualify, in my opinion.
What is your code supposed to do? From my perspective it looks like you're just setting the payload to a duplicate of the last element in the original payload. If so you could just do this in a transformer:
%dw 2.0
output application/json
---
if (not isEmpty(payload))
payload[-1] ++ payload[-1]
else
[]

How to process a list in parallel in mule?

I have a list of objects, which right now I am processing in foreach. The list is nothing but a string of ids that kicks off other stuff internally.
<flow name="flow1" processingStrategy="synchronous">
<quartz:inbound-endpoint jobName="integration" repeatInterval="86400000" responseTimeout="10000" doc:name="Quartz" >
<quartz:event-generator-job/>
</quartz:inbound-endpoint>
<component class="RequestFeeder" doc:name="RequestFeeder"/>
<foreach collection="#[payload]" doc:name="For Each">
<flow-ref name="createFlow" doc:name="createFlow"/>
<flow-ref name="queueFlow" doc:name="queueFlow"/>
<flow-ref name="statusCheckFlow" doc:name="statusCheckFlow"/>
<flow-ref name="resultsFlow" doc:name="resultsFlow"/>
<flow-ref name="sftpFlow" doc:name="sftpFlow"/>
<logger message="RequestType #[flowVars['rqstType']] complete" level="INFO" doc:name="Done"/>
</foreach>
<logger message="ALL 15 REQUESTS HAVE BEEN PROCESSED" level="INFO" doc:name="Logger"/>
</flow>
I want to process them in parallel. ie execute the same 4 flow-refs in parallel for all 15 requests coming in the list. This seems simple, but I havent been able to figure it out yet. Any help appreciated.
An alternative to the scatter-gather approach is to simply split the collection and use a VM queue for the items in the list. This method can be simpler if you don't need to wait and collect all 15 results, and will still work if you do.
Try something like this. Mule automatically uses a thread pool (more info) to run your flow, so the requestProcessor flow below will process your requests in parallel.
<flow name="scheduleRequests">
<quartz:inbound-endpoint jobName="integration" repeatInterval="86400000" responseTimeout="10000" doc:name="Quartz" >
<quartz:event-generator-job/>
</quartz:inbound-endpoint>
<component class="RequestFeeder" doc:name="RequestFeeder"/>
<collection-splitter />
<vm:outbound-endpoint path="requests" />
</flow>
<flow name="requestProcessor">
<vm:inbound-endpoint path="requests" />
<flow-ref name="createFlow" doc:name="createFlow"/>
<flow-ref name="queueFlow" doc:name="queueFlow"/>
<flow-ref name="statusCheckFlow" doc:name="statusCheckFlow"/>
<flow-ref name="resultsFlow" doc:name="resultsFlow"/>
<flow-ref name="sftpFlow" doc:name="sftpFlow"/>
</flow>
I reckon you still want those four flows to run sequentially, right?
If that were not the case you could always change the threading profile.
Another thing you could do is to wrap the four flows in an async scope although you may need a processor change.
In any event I think you'll be better of using the scatter gather component:
https://developer.mulesoft.com/docs/display/current/Scatter-Gather
https://www.mulesoft.com/exchange#!/scatter-gather-flow-control
Which without needing the for each scope will split the list and execute each item in a different thread. You could define how many threads you want to run in parallel (so you don't just spin of a new thread you use a pool).
One final note though, is meant to aggregate the result of all the processed items. I reckon you could change that with a custom aggregation strategy but not sure really, please take a look at the docs for that.
HTH
You say 4 flows, but the list contains 5 flows. If you want all flows executed in sequence, but each item in the collection executed in parallel, you will want a splitter followed by a separate vm flow containing all (4/5) flows, as explained here: https://support.mulesoft.com/s/article/Concurrently-processing-Collection-and-getting-the-results.
If you want the flows inside the loop to execute in parallel then you choose a Scatter-Gather component.
It is important to be clear which of the two things you are wanting to achieve as the solution would be very different. So the basic difference is, in Scatter-Gather a single message is sent to multiple recipients for processing in parallel, but in Splitter-Aggregator a single message is split into multiple sub messages and processed individually and then aggregated. See: http://muthurajud.blogspot.com/2016/07/eai-patterns-scattergather-versus.html
Scatter- gather of Mule component is one of the component to make easy for parallel processing, A simple example will be following :-
<scatter-gather >
<flow-ref name="flow1" />
<flow-ref name="flow2" />
<flow-ref name="flow3" />
</scatter-gather>
So, the flows you want to execute in parallel can be kept inside the

Aggregate data from for-each loop

Scenario - Converting a csv file to json format, taking each json element and making a get request api call. I am doing this in a for-each loop sequence. I am getting a json response (extracting eventId and cost from each). Now I wish to club all these responses together under the main header listings and make a bigger json payload.
For example:
{
"listings": [
{
"eventId":"8993478",
"cost":34
},
{
"eventId":"xxxxxyyyy",
"cost":zz
},
]
}
How would I do this for all iteration entries. I can do it for a single entry(using groovy script).
You could define a variable before the for-each loop as an empty list with:
<set-variable variableName="listings" value="#[[]]" />
Then, on each iteration inside the for-each loop add an element to the previous variable with:
<expression-transformer expression="#[flowVars.listings.add(flowVars.iterationMap)]" />
In the previous code fragment I used the variable flowVars.iterationMap to denote the map generated on each iteration.
Finally, if needed, you can add a set-payload transformer after the for-each loop:
<set-payload value="#[flowVars.listings]" />
HTH, Marcos
You can use the Batch module but you would have to rewrite this logic a little bit different. For example, you will no longer be able to use an aggregation flowVar like Marcos suggested. Instead, you would need to use a fixed size batch:commit block (which would actually be better in many ways, for example you could start sending bulks to the remote API while still processing some of the other records in the background).
...I like Marco's answer and it worked perfectly for my use case.
Simply creating an array in a flow variable and using the add() method on the array in a ForEach scope did the trick.
The OP follow up question was a good one. It prompted me to do an alternate test using the approach suggested. See both of my flows here:
<flow name="sampleAggregatorFlow" doc:description="this is a simple demo that shows how to aggregate results into an accumulator array">
<http:listener config-ref="manage-s3-api-httpListenerConfig" path="/aggregate" allowedMethods="GET" doc:name="HTTP"/>
<set-payload value="#[['red','blue','gold']]" doc:name="Set Payload"/>
<set-variable variableName="accumulator" value="#[[]]" doc:name="accumulator"/>
<foreach doc:name="For Each">
<expression-transformer expression="#[flowVars.accumulator.add(payload)]" doc:name="addEm"/>
</foreach>
<set-payload value="#[flowVars.accumulator]" doc:name="Set Payload"/>
<json:object-to-json-transformer doc:name="Object to JSON"/>
</flow>
<flow name="Copy_of_sampleAggregatorFlow" doc:description="this is a simple demo that shows how to aggregate results into an accumulator array">
<http:listener config-ref="manage-s3-api-httpListenerConfig" path="/aggregate2" allowedMethods="GET" doc:name="Copy_of_HTTP"/>
<set-payload value="#[['red','blue','gold']]" doc:name="Copy_of_Set Payload"/>
<set-variable variableName="accumulator" value="#[new java.util.ArrayList()]" doc:name="Copy_of_accumulator"/>
<foreach doc:name="Copy_of_For Each">
<expression-transformer expression="#[flowVars.accumulator.add(payload)]" doc:name="Copy_of_addEm"/>
</foreach>
<set-payload value="#[flowVars.accumulator]" doc:name="Copy_of_Set Payload"/>
<json:object-to-json-transformer doc:name="Copy_of_Object to JSON"/>
</flow>
Both flows produced the same outcome:
[
"red",
"blue",
"gold"
]
Tests conducted 12/26/2017 with Anypoint Studio 6.4.1 and wth Mule Runtime 3.9

Mule foreach : Splitter returned no results

I get a list of files on amazon S3 and iterate over the list of files and process one file at a time. The corresponding flow is as follows --
<flow name="process-from-s3" doc:name="process-from-s3"
processingStrategy="synchronous">
<poll doc:name="Poll" frequency="${s3-poll-interval}">
<s3:list-objects config-ref="Amazon_S3" doc:name="Get List of files"
accessKey="${s3-access-key}" secretKey="${s3-secret-key}"
bucketName="${s3-read-bucket}" />
</poll>
<choice doc:name="Choice">
<foreach doc:name="For Each">
<set-session-variable variableName="s3_file_name" value="#[payload.getKey()]" doc:name="Session Variable"/>
<logger message="From bucket ( ${s3-read-bucket} ), received the file #[s3_file_name]" level="INFO" doc:name="Logger"/>
<flow-ref name="process_s3_file" doc:name="Flow Reference"/>
</foreach>
</choice>
</flow>
The flow works well, however it keeps on spitting the following log statements if there are no files found.
[03-06 21:52:05] WARN Foreach$CollectionMapSplitter
[[myapp].connector.polling.mule.default.receiver.01]: Splitter returned no results.
If this is not expected, please check your split expression
How can I avoid this annoying log message. Should I wrap the foreach within a choice router that processes the foreach if there is atleast one element in the list. Any suggestions are welcome.
I would rather set the log level for org.mule.routing.Foreach$CollectionMapSplitter to ERROR than configure any additional logic for this warning. See Mule docs for configuring logger/log4j if you need to.

Mule MEL to read database result-set from an second 'database outbound endpoint'

I have a flow something like this
A 'Database inbound endpoint' which polls(for every 5 mins) to mySQL Database-Server and get result-set by a select-query (automatically this becomes the current payload i.e #[message.payload])
'For each' component and a 'Logger' component in it using a expression as #[message.payload]
Now flow has one more 'Database-out-bound-endpoint' component which executes another select-query and obtains result-set.
'For each' component with a 'Logger' component in it using a expression as #[message.payload]
Note: in the loggers result-set of first DB is printing. I mean second logger is also showing result-set of first query itself.Because the result-set is storing as payload
so, my questions are
what is the MEL to read the result-set of second database-query in the above scenario.
is there any another way to read result-set in the flow
Here is the configuration XML
<jdbc-ee:connector name="oracle_database" dataSource-ref="Oracle_Data_Source" validateConnections="true" queryTimeout="-1" pollingFrequency="0" doc:name="Database"/>
<flow name="testFileSaveFlow3" doc:name="testFileSaveFlow3">
<poll frequency="1000" doc:name="Poll">
<jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="selectTable1" queryTimeout="-1" connector-ref="oracle_database" doc:name="get data from table 1">
<jdbc-ee:query key="selectTable1" value="SELECT * FROM TABLE1"/>
</jdbc-ee:outbound-endpoint>
</poll>
<foreach doc:name="For Each">
<logger message="#[message.payload]" level="INFO" doc:name="prints result-set of table1"/>
</foreach>
<jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="selectTable2" queryTimeout="-1" connector-ref="oracle_database" doc:name="get data from table 2">
<jdbc-ee:query key="selectTable2" value="SELECT * FROM TABLE2"/>
</jdbc-ee:outbound-endpoint>
<foreach doc:name="For Each">
<logger message="#[message.payload]" level="INFO" doc:name="prints result-set of table2"/>
</foreach>
</flow>
thanks in advance.
This is not the issue with the MEL. It is the issue with your flow logic.
The second result set is not available in the message.
The JDBC Outbound Endpoint is one-way. So Mule flow will not wait for the reply (result set) from the second JDBC (outbound ) in the middle of the flow. So the second time also it is printing the first result set.
Type 1:
Try making your JBDC outbound request-response instead of one-way.
Type 2:
Try Mule Enricher to call the JDBC outbound to call the DB and store the result set into a varaible and try looping the varaible.
Hope this helps.