Mulesoft batch crashes - batch-processing

I am new to mulesoft, have a question regarding batch processing.if batch process crashed in between and some record already processed, what happened when batch processing start again. Duplicate data ???

It will try to continue processing the pending data in the batch queues, unless it was corrupted by the crash.

The answer depends upon a few things. First, unless you configure the batch job scope to allow record level failures, the entire job will stop when a record fails. Second, if you do configure to allow failures, then the batch will continue to process all records. In such a case, each batch step can be configured to accept only successful records (the default), or only failed records, or all records.
So the answer to your question is dependent upon configuration.
And as far as duplicate data, this part is entirely up to you.
If you have the job stop for failure, when you restart it, the set of records you provide at that time will be the ones processed. If you submit records that have been processed once before, then they will be processed again. You can provide for filtering either upon reentry to the batch job, or as the records are successfully processed.

Before answering your question I would like to know couple of things.
a. what do u mean by batch crashes ?
are you saying during the batch processing there is some JVM hit and the
batch started again ?
or There is failure during processing of some records the batch?
a.1 --> if there is JVM hit then then the entire program halts.
need to restart the programme. which results into processing
the same set of record.
a.2 --> To handle the failure record inside batch.
In the batch step you can do three steps as below .
set the batch job to continue irrespective of any error
<batch:job jobName="Batch1" maxFailedRecords="-1">
In the batch job create 3 batch step :-
a.processRecord.- process the 1st record in batch
queue.
b.ProcessSuccessful-
if there is no exception in a. go to batch step b.
c.processFailure record.-
if there is exception in a. goto batch step. c
THE ENTIRE SAMPLE CODE IS SHOWN BELOW .
<batch:job jobName="batchJob" maxFailedRecords="-1" >
<batch:process-records>
<batch:step name="processRecord" acceptPolicy="ALL" >
log.info("process any record COMES IN THE STEP");
</batch:step>
<batch:step name="ProcessSuccessful"
acceptPolicy="NO_FAILURES">
log.info("process only SUCCESSFUL record")
</batch:step>
<batch:step name="processFailure"
acceptPolicy="ONLY_FAILURES">
log.info("process only FAILURE record");
</batch:step>
</batch:process-records>
<batch:on-complete>
log.info("on complete phase , log the successful and
failure count");
</batch:on-complete>
</batch:job>
N.B --> the code as per Mule 4.0.

Related

Anypoint MQ messages remains in in-flight and not been processed

I have a flow which submits around 10-20 salesforce bulk query job details to anypoint mq to be processed asynchronously.
I am using normal Queue, Not using FIFO queue and wants process one message at a time.
My subscriber configurations are given below. I am putting this whooping ack timeout to 15 minutes as max it has taken 15 minutes for a Job to change the status from jobUpload to JobCompleted.
MuleRuntime: 4.4
MQ Connector Version: 3.2.0
<anypoint-mq:subscriber doc:name="Subscribering Bulk Query Job Details"
config-ref="Anypoint_MQ_Config"
destination="${anyPointMq.name}"
acknowledgementTimeout="15"
acknowledgementTimeoutUnit="MINUTES">
<anypoint-mq:subscriber-type >
<anypoint-mq:prefetch maxLocalMessages="1" />
</anypoint-mq:subscriber-type>
</anypoint-mq:subscriber>
Anypoint MQ Connector Configuration
<anypoint-mq:config name="Anypoint_MQ_Config" doc:name="Anypoint MQ Config" doc:id="ce3aaed9-dcba-41bc-8c68-037c5b1420e2">
<anypoint-mq:connection clientId="${secure::anyPointMq.clientId}" clientSecret="${secure::anyPointMq.clientSecret}" url="${anyPointMq.url}">
<reconnection>
<reconnect frequency="3000" count="3" />
</reconnection>
<anypoint-mq:tcp-client-socket-properties connectionTimeout="30000" />
</anypoint-mq:connection>
</anypoint-mq:config>
Subscriber flow
<flow name="sfdc-bulk-query-job-subscription" doc:id="7e1e23d0-d7f1-45ed-a609-0fb35dd23e6a" maxConcurrency="1">
<anypoint-mq:subscriber doc:name="Subscribering Bulk Query Job Details" doc:id="98b8b25e-3141-4bd7-a9ab-86548902196a" config-ref="Anypoint_MQ_Config" destination="${anyPointMq.sfPartnerEds.name}" acknowledgementTimeout="${anyPointMq.ackTimeout}" acknowledgementTimeoutUnit="MINUTES">
<anypoint-mq:subscriber-type >
<anypoint-mq:prefetch maxLocalMessages="${anyPointMq.prefecth.maxLocalMsg}" />
</anypoint-mq:subscriber-type>
</anypoint-mq:subscriber>
<json-logger:logger doc:name="INFO - Bulk Job Details have been fetched" doc:id="b25c3850-8185-42be-a293-659ebff546d7" config-ref="JSON_Logger_Config" message='#["Bulk Job Details have been fetched for " ++ payload.object default ""]'>
<json-logger:content ><![CDATA[#[output application/json ---
payload]]]></json-logger:content>
</json-logger:logger>
<set-variable value="#[p('serviceName.sfdcToEds')]" doc:name="ServiceName" doc:id="f1ece944-0ed8-4c0e-94f2-3152956a2736" variableName="ServiceName"/>
<set-variable value="#[payload.object]" doc:name="sfObject" doc:id="2857c8d9-fe8d-46fa-8774-0eed91e3a3a6" variableName="sfObject" />
<set-variable value="#[message.attributes.properties.key]" doc:name="key" doc:id="57028932-04ab-44c0-bd15-befc850946ec" variableName="key" />
<flow-ref doc:name="bulk-job-status-check" doc:id="c6b9cd40-4674-47b8-afaa-0f789ccff657" name="bulk-job-status-check" />
<json-logger:logger doc:name="INFO - subscribed bulk job id has been processed successfully" doc:id="7e469f92-2aff-4bf4-84d0-76577d44479a" config-ref="JSON_Logger_Config" message='#["subscribed bulk job id has been processed successfully for salesforce " ++ vars.sfObject default "" ++ " object"]' tracePoint="END"/>
</flow>
After the bulk query job subscriber, I am checking the status of the job for 5 time with an interval of 1 minutes inside until successful scope. It generally exhausts all 5 attempts and subscribe it again and do the same process again until it gets completed. I have seen until successfull scope gets exhausted more than one for a single job.
Once the job's status changes to jobComplete. I fetch the result and sends to AWS S3 bucket via mulesoft system api. Here also I use a retry logic as due to large volume of data I always get this message while making first call
HTTP POST on resource 'https://****//dlb.lb.anypointdns.net:443/api/sys/aws/s3/databricks/object' failed: Remotely closed.
But during the second retry it gets successful response from S3 Bucket system api.
Now the main problem:
Though I am using normal queue. I have notice messages remains in flight mode for infinite amount of time and still not get picket up by mule flow/subscriber. Below screenshot shows an example, there were 7 messages in flight but were not being picked up even after many days.
As I have kept maxConcurrency and maxPrefetchLocalMsg to 1. But there are more than 1 messages are been taken out of the queue. Please help understand this.

Camunda - Intermedia message event cannot correlate to a single execution

I created a small application (Spring Boot and camunda) to process an order process. The Order-Service receives the new order via Rest and calls the Start Event of the BPMN Order workflow. The order process contains two asynchronous JMS calls (Customer check and Warehouse Stock check). If both checks return the order process should continue.
The Start event is called within a Spring Rest Controller:
ProcessInstance processInstance =
runtimeService.startProcessInstanceByKey("orderService", String.valueOf(order.getId()));
The Send Task (e.g. the customer check) sends the JMS message into a asynchronous queue.
The answer of this service is catched by a another Spring component which then trys to send an intermediate message:
runtimeService.createMessageCorrelation("msgReceiveCheckCustomerCredibility")
.processInstanceBusinessKey(response.getOrder().getBpmnBusinessKey())
.setVariable("resultOrderCheckCustomterCredibility", response)
.correlate();
I deactivated the warehouse service to see if the order process waits for the arrival of the second call, but instead I get this exception:
1115 06:33:08.564 WARN [o.c.b.e.jobexecutor] ENGINE-14006 Exception while executing job 67d2cc24-0769-11ea-933a-d89ef3425300:
org.springframework.messaging.MessageHandlingException: nested exception is org.camunda.bpm.engine.MismatchingMessageCorrelationException: ENGINE-13031 Cannot correlate a message with name 'msgReceiveCheckCustomerCredibility' to a single execution. 4 executions match the correlation keys: CorrelationSet [businessKey=1, processInstanceId=null, processDefinitionId=null, correlationKeys=null, localCorrelationKeys=null, tenantId=null, isTenantIdSet=false]
This is my process. I cannot see a way to post my bpmn file :-(
What can't it not correlate with the message name and the business key? The JMS queues are empty, there are other messages with the same businessKey waiting.
Thanks!
Just to narrow the problem: Do a runtimeService eventSubscription query before you try to correlate and check what subscriptions are actually waiting .. maybe you have a duplicate message name? Maybe you (accidentally) have another instance of the same process running? Once you identified the subscriptions, you could just notify the execution directly without using the correlation builder ...

Mule Batch Max Failures not working

I'm using mule batch flow to process the files. As per the requirement I should stop processing the batch step for further processing after 10 failures.
So I've configured max-failed-records="10" but still I see around 99 failures in my logger that is kept in complete phase. The file which the app recieves will have around 8657 rows. so loaded records will be 8657 records.
Logger in complete phase:
<logger message="#['Failed Records'+payload.failedRecords]" level="INFO" doc:name="Logger"/>
Below image is my flow:
Its default behavior of the mule. As per Batch Documentation Mule loads 1600 records at once (16 threads x 100 records per block). Though max failure is set 10 it will process all loaded records, but it wont load next record blocks as max failure limit is reached.
Hope this helps.

Mule batch commit and records failures

My current scenario:
I have 10000 records as input to batch.
As per my understanding, batch is only for record-by-record processing.Hence, i am transforming each record using dataweave component inside batch step(Note: I havenot used any batch-commit) and writing each record to file. The reason for doing record-by-record processing is suppose in any particular record, there is an invalid data, only that particular record gets failed, rest of them will be processed fine.
But in many of the blogs I see, they are using a batchcommit(with streaming) with dataweave component. So as per my understanding, all the records will be given in one shot to dataweave, and if one record has invalid data, all the 10000 records will get failed(at dataweave). Then, the point of record-by-record processing is lost.
Is the above assumption correct or am I thinking wrong way??
That is the reason I am not using batch Commit.
Now, as I said am sending each record to a file. Actually, i do have the requirement of sending each record to 5 different CSV files. So, currently I am using Scatter-Gather component inside my BatchStep to send it to five different routes.
As, you can see the image. the input phase gives a collection of 10000 records. Each record will be send to 5 routes using Scatter-Gather.
Is, the approach I am using is it fine, or any better Design can be followed??
Also, I have created a 2nd Batch step, to capture ONLY FAILEDRECORDS. But, with the current Design, I am not able to Capture failed records.
SHORT ANSWERS
Is the above assumption correct or am I thinking wrong way??
In short, yes you are thinking the wrong way. Read my loooong explanation with example to understand why, hope you will appreciate it.
Also, I have created a 2nd Batch step, to capture ONLY FAILEDRECORDS.
But, with the current Design, I am not able to Capture failed records.
You probably forget to set max-failed-records = "-1" (unlimited) on batch job. Default is 0, on first failed record batch will return and not execute subsequent steps.
Is, the approach I am using is it fine, or any better Design can be
followed??
I think it makes sense if performance is essential for you and you can't cope with the overhead created by doing this operation in sequence.
If instead you can slow down a bit it could make sense to do this operation in 5 different steps, you will loose parallelism but you can have a better control on failing records especially if using batch commit.
MULE BATCH JOB IN PRACTICE
I think the best way to explain how it works it trough an example.
Take in consideration the following case:
You have a batch processing configured with max-failed-records = "-1" (no limit).
<batch:job name="batch_testBatch" max-failed-records="-1">
In this process we input a collection composed by 6 strings.
<batch:input>
<set-payload value="#[['record1','record2','record3','record4','record5','record6']]" doc:name="Set Payload"/>
</batch:input>
The processing is composed by 3 steps"
The first step is just a logging of the processing and the second step will instead do a logging and throw an exception on record3 to simulate a failure.
<batch:step name="Batch_Step">
<logger message="-- processing #[payload] in step 1 --" level="INFO" doc:name="Logger"/>
</batch:step>
<batch:step name="Batch_Step2">
<logger message="-- processing #[payload] in step 2 --" level="INFO" doc:name="Logger"/>
<scripting:transformer doc:name="Groovy">
<scripting:script engine="Groovy"><![CDATA[
if(payload=="record3"){
throw new java.lang.Exception();
}
payload;
]]>
</scripting:script>
</scripting:transformer>
</batch:step>
The third step will instead contain just the commit with a commit count value of 2.
<batch:step name="Batch_Step3">
<batch:commit size="2" doc:name="Batch Commit">
<logger message="-- committing #[payload] --" level="INFO" doc:name="Logger"/>
</batch:commit>
</batch:step>
Now you can follow me in the execution of this batch processing:
On start all 6 records will be processed by the first step and logging in console would look like this:
-- processing record1 in step 1 --
-- processing record2 in step 1 --
-- processing record3 in step 1 --
-- processing record4 in step 1 --
-- processing record5 in step 1 --
-- processing record6 in step 1 --
Step Batch_Step finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch
Now things would be more interesting on step 2 the record 3 will fail because we explicitly throw an exception but despite this the step will continue in processing the other records, here how the log would look like.
-- processing record1 in step 2 --
-- processing record2 in step 2 --
-- processing record3 in step 2 --
com.mulesoft.module.batch.DefaultBatchStep: Found exception processing record on step ...
Stacktrace
....
-- processing record4 in step 2 --
-- processing record5 in step 2 --
-- processing record6 in step 2 --
Step Batch_Step2 finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch
At this point despite a failed record in this step batch processing will continue because the parameter max-failed-records is set to -1 (unlimited) and not to the default value of 0.
At this point all the successful records will be passed to step3, this because, by default, the accept-policy parameter of a step is set to NO_FAILURES. (Other possible values are ALL and ONLY_FAILURES).
Now the step3 that contains the commit phase with a count equal to 2 will commit the records two by two:
-- committing [record1, record2] --
-- committing [record4, record5] --
Step: Step Batch_Step3 finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch
-- committing [record6] --
As you can see this confirms that record3 that was in failure was not passed to the next step and therefore not committed.
Starting from this example I think you can imagine and test more complex scenario, for example after commit you could have another step that process only failed records for make aware administrator with a mail of the failure.
After you can always use external storage to store more advanced info about your records as you can read in my answer to this other question.
Hope this helps

Jmeter - Getting previous results in mail

I'm using Jmeter - it runs automatically every 4 hours (through crontab). I'm sending the results file (csv) in the mail at the end of the test. I always see the file of the previous test, not the current one (I can see by the hour).
the structure is this: one 'Test Plan' (I checked 'Run Thread Groups consecutively' and 'Run tearDown Thread Groups after shutdown of main threads), two 'Thread Groups' - which at the end of each I write results to csv file using 'View Results Tree', and at the end - 'TearDown Thread Group' that uses SMTP sampler to send the files created.
any help would be appreciated.
EDIT:
This is the SMTP sampler settings:
and this is the writing to the file:
This might be due to Autoflush policy which flushes content of buffer only when buffer is reached.
As you use a tear down thread group results are nit guaranteed to be fully written as test is not really finished.
The fact that you think you are sending previous test file might be due to jmeter appending data to the same results file.
So :
1/ ensure you move or delete the file once sent
2/ Edit user.properties and add:
jmeter.save.saveservice.autoflush=true
This will make jmeter write to file any sample result immediately afte it is executed.