I'm using mule batch flow to process the files. As per the requirement I should stop processing the batch step for further processing after 10 failures.
So I've configured max-failed-records="10" but still I see around 99 failures in my logger that is kept in complete phase. The file which the app recieves will have around 8657 rows. so loaded records will be 8657 records.
Logger in complete phase:
<logger message="#['Failed Records'+payload.failedRecords]" level="INFO" doc:name="Logger"/>
Below image is my flow:
Its default behavior of the mule. As per Batch Documentation Mule loads 1600 records at once (16 threads x 100 records per block). Though max failure is set 10 it will process all loaded records, but it wont load next record blocks as max failure limit is reached.
Hope this helps.
Related
I have a flow which submits around 10-20 salesforce bulk query job details to anypoint mq to be processed asynchronously.
I am using normal Queue, Not using FIFO queue and wants process one message at a time.
My subscriber configurations are given below. I am putting this whooping ack timeout to 15 minutes as max it has taken 15 minutes for a Job to change the status from jobUpload to JobCompleted.
MuleRuntime: 4.4
MQ Connector Version: 3.2.0
<anypoint-mq:subscriber doc:name="Subscribering Bulk Query Job Details"
config-ref="Anypoint_MQ_Config"
destination="${anyPointMq.name}"
acknowledgementTimeout="15"
acknowledgementTimeoutUnit="MINUTES">
<anypoint-mq:subscriber-type >
<anypoint-mq:prefetch maxLocalMessages="1" />
</anypoint-mq:subscriber-type>
</anypoint-mq:subscriber>
Anypoint MQ Connector Configuration
<anypoint-mq:config name="Anypoint_MQ_Config" doc:name="Anypoint MQ Config" doc:id="ce3aaed9-dcba-41bc-8c68-037c5b1420e2">
<anypoint-mq:connection clientId="${secure::anyPointMq.clientId}" clientSecret="${secure::anyPointMq.clientSecret}" url="${anyPointMq.url}">
<reconnection>
<reconnect frequency="3000" count="3" />
</reconnection>
<anypoint-mq:tcp-client-socket-properties connectionTimeout="30000" />
</anypoint-mq:connection>
</anypoint-mq:config>
Subscriber flow
<flow name="sfdc-bulk-query-job-subscription" doc:id="7e1e23d0-d7f1-45ed-a609-0fb35dd23e6a" maxConcurrency="1">
<anypoint-mq:subscriber doc:name="Subscribering Bulk Query Job Details" doc:id="98b8b25e-3141-4bd7-a9ab-86548902196a" config-ref="Anypoint_MQ_Config" destination="${anyPointMq.sfPartnerEds.name}" acknowledgementTimeout="${anyPointMq.ackTimeout}" acknowledgementTimeoutUnit="MINUTES">
<anypoint-mq:subscriber-type >
<anypoint-mq:prefetch maxLocalMessages="${anyPointMq.prefecth.maxLocalMsg}" />
</anypoint-mq:subscriber-type>
</anypoint-mq:subscriber>
<json-logger:logger doc:name="INFO - Bulk Job Details have been fetched" doc:id="b25c3850-8185-42be-a293-659ebff546d7" config-ref="JSON_Logger_Config" message='#["Bulk Job Details have been fetched for " ++ payload.object default ""]'>
<json-logger:content ><![CDATA[#[output application/json ---
payload]]]></json-logger:content>
</json-logger:logger>
<set-variable value="#[p('serviceName.sfdcToEds')]" doc:name="ServiceName" doc:id="f1ece944-0ed8-4c0e-94f2-3152956a2736" variableName="ServiceName"/>
<set-variable value="#[payload.object]" doc:name="sfObject" doc:id="2857c8d9-fe8d-46fa-8774-0eed91e3a3a6" variableName="sfObject" />
<set-variable value="#[message.attributes.properties.key]" doc:name="key" doc:id="57028932-04ab-44c0-bd15-befc850946ec" variableName="key" />
<flow-ref doc:name="bulk-job-status-check" doc:id="c6b9cd40-4674-47b8-afaa-0f789ccff657" name="bulk-job-status-check" />
<json-logger:logger doc:name="INFO - subscribed bulk job id has been processed successfully" doc:id="7e469f92-2aff-4bf4-84d0-76577d44479a" config-ref="JSON_Logger_Config" message='#["subscribed bulk job id has been processed successfully for salesforce " ++ vars.sfObject default "" ++ " object"]' tracePoint="END"/>
</flow>
After the bulk query job subscriber, I am checking the status of the job for 5 time with an interval of 1 minutes inside until successful scope. It generally exhausts all 5 attempts and subscribe it again and do the same process again until it gets completed. I have seen until successfull scope gets exhausted more than one for a single job.
Once the job's status changes to jobComplete. I fetch the result and sends to AWS S3 bucket via mulesoft system api. Here also I use a retry logic as due to large volume of data I always get this message while making first call
HTTP POST on resource 'https://****//dlb.lb.anypointdns.net:443/api/sys/aws/s3/databricks/object' failed: Remotely closed.
But during the second retry it gets successful response from S3 Bucket system api.
Now the main problem:
Though I am using normal queue. I have notice messages remains in flight mode for infinite amount of time and still not get picket up by mule flow/subscriber. Below screenshot shows an example, there were 7 messages in flight but were not being picked up even after many days.
As I have kept maxConcurrency and maxPrefetchLocalMsg to 1. But there are more than 1 messages are been taken out of the queue. Please help understand this.
I have a message-driven-channel-adapter and I defined the max-concurrent-consumers as 100 and concurrent-consumers as 2. When I tried a load test, I saw that the concurrent-consumers increased but after the load test, The number of consumers didn't reduce to the standard level. I'm checking it with RabbitMQ management portal.
When the project restarted (no load test), the GET (Empty) is 650/s but after load test it stays about 2500/s. It is not returning to 650/s. I think concurrent-consumers property is being increased to a number but is not being reduced to original value.
How can I make it to reduce to normal level again?
Here is my message-driven-channel-adapter definition:
<int-jms:message-driven-channel-adapter id="inboundAdapter"
auto-startup="true"
connection-factory="jmsConnectionFactory"
destination="inboundQueue"
channel="requestChannel"
error-channel="errorHandlerChannel"
receive-timeout="-1"
concurrent-consumers="2"
max-concurrent-consumers="100" />
With receiveTimeout=-1; the container has no control over the idle consumer (it is blocked in the jms client).
You also need to set max-messages-per-task for the container to consider stopping a consumer.
<int-jms:message-driven-channel-adapter id="inboundAdapter"
auto-startup="true"
connection-factory="jmsConnectionFactory"
destination-name="inboundQueue"
channel="requestChannel"
error-channel="errorHandlerChannel"
receive-timeout="5000"
concurrent-consumers="2"
max-messages-per-task="10"
max-concurrent-consumers="100" />
The time elapsed for an idle consumer is receiveTimeout * maxMessagesPerTask.
I am new to mulesoft, have a question regarding batch processing.if batch process crashed in between and some record already processed, what happened when batch processing start again. Duplicate data ???
It will try to continue processing the pending data in the batch queues, unless it was corrupted by the crash.
The answer depends upon a few things. First, unless you configure the batch job scope to allow record level failures, the entire job will stop when a record fails. Second, if you do configure to allow failures, then the batch will continue to process all records. In such a case, each batch step can be configured to accept only successful records (the default), or only failed records, or all records.
So the answer to your question is dependent upon configuration.
And as far as duplicate data, this part is entirely up to you.
If you have the job stop for failure, when you restart it, the set of records you provide at that time will be the ones processed. If you submit records that have been processed once before, then they will be processed again. You can provide for filtering either upon reentry to the batch job, or as the records are successfully processed.
Before answering your question I would like to know couple of things.
a. what do u mean by batch crashes ?
are you saying during the batch processing there is some JVM hit and the
batch started again ?
or There is failure during processing of some records the batch?
a.1 --> if there is JVM hit then then the entire program halts.
need to restart the programme. which results into processing
the same set of record.
a.2 --> To handle the failure record inside batch.
In the batch step you can do three steps as below .
set the batch job to continue irrespective of any error
<batch:job jobName="Batch1" maxFailedRecords="-1">
In the batch job create 3 batch step :-
a.processRecord.- process the 1st record in batch
queue.
b.ProcessSuccessful-
if there is no exception in a. go to batch step b.
c.processFailure record.-
if there is exception in a. goto batch step. c
THE ENTIRE SAMPLE CODE IS SHOWN BELOW .
<batch:job jobName="batchJob" maxFailedRecords="-1" >
<batch:process-records>
<batch:step name="processRecord" acceptPolicy="ALL" >
log.info("process any record COMES IN THE STEP");
</batch:step>
<batch:step name="ProcessSuccessful"
acceptPolicy="NO_FAILURES">
log.info("process only SUCCESSFUL record")
</batch:step>
<batch:step name="processFailure"
acceptPolicy="ONLY_FAILURES">
log.info("process only FAILURE record");
</batch:step>
</batch:process-records>
<batch:on-complete>
log.info("on complete phase , log the successful and
failure count");
</batch:on-complete>
</batch:job>
N.B --> the code as per Mule 4.0.
My current scenario:
I have 10000 records as input to batch.
As per my understanding, batch is only for record-by-record processing.Hence, i am transforming each record using dataweave component inside batch step(Note: I havenot used any batch-commit) and writing each record to file. The reason for doing record-by-record processing is suppose in any particular record, there is an invalid data, only that particular record gets failed, rest of them will be processed fine.
But in many of the blogs I see, they are using a batchcommit(with streaming) with dataweave component. So as per my understanding, all the records will be given in one shot to dataweave, and if one record has invalid data, all the 10000 records will get failed(at dataweave). Then, the point of record-by-record processing is lost.
Is the above assumption correct or am I thinking wrong way??
That is the reason I am not using batch Commit.
Now, as I said am sending each record to a file. Actually, i do have the requirement of sending each record to 5 different CSV files. So, currently I am using Scatter-Gather component inside my BatchStep to send it to five different routes.
As, you can see the image. the input phase gives a collection of 10000 records. Each record will be send to 5 routes using Scatter-Gather.
Is, the approach I am using is it fine, or any better Design can be followed??
Also, I have created a 2nd Batch step, to capture ONLY FAILEDRECORDS. But, with the current Design, I am not able to Capture failed records.
SHORT ANSWERS
Is the above assumption correct or am I thinking wrong way??
In short, yes you are thinking the wrong way. Read my loooong explanation with example to understand why, hope you will appreciate it.
Also, I have created a 2nd Batch step, to capture ONLY FAILEDRECORDS.
But, with the current Design, I am not able to Capture failed records.
You probably forget to set max-failed-records = "-1" (unlimited) on batch job. Default is 0, on first failed record batch will return and not execute subsequent steps.
Is, the approach I am using is it fine, or any better Design can be
followed??
I think it makes sense if performance is essential for you and you can't cope with the overhead created by doing this operation in sequence.
If instead you can slow down a bit it could make sense to do this operation in 5 different steps, you will loose parallelism but you can have a better control on failing records especially if using batch commit.
MULE BATCH JOB IN PRACTICE
I think the best way to explain how it works it trough an example.
Take in consideration the following case:
You have a batch processing configured with max-failed-records = "-1" (no limit).
<batch:job name="batch_testBatch" max-failed-records="-1">
In this process we input a collection composed by 6 strings.
<batch:input>
<set-payload value="#[['record1','record2','record3','record4','record5','record6']]" doc:name="Set Payload"/>
</batch:input>
The processing is composed by 3 steps"
The first step is just a logging of the processing and the second step will instead do a logging and throw an exception on record3 to simulate a failure.
<batch:step name="Batch_Step">
<logger message="-- processing #[payload] in step 1 --" level="INFO" doc:name="Logger"/>
</batch:step>
<batch:step name="Batch_Step2">
<logger message="-- processing #[payload] in step 2 --" level="INFO" doc:name="Logger"/>
<scripting:transformer doc:name="Groovy">
<scripting:script engine="Groovy"><![CDATA[
if(payload=="record3"){
throw new java.lang.Exception();
}
payload;
]]>
</scripting:script>
</scripting:transformer>
</batch:step>
The third step will instead contain just the commit with a commit count value of 2.
<batch:step name="Batch_Step3">
<batch:commit size="2" doc:name="Batch Commit">
<logger message="-- committing #[payload] --" level="INFO" doc:name="Logger"/>
</batch:commit>
</batch:step>
Now you can follow me in the execution of this batch processing:
On start all 6 records will be processed by the first step and logging in console would look like this:
-- processing record1 in step 1 --
-- processing record2 in step 1 --
-- processing record3 in step 1 --
-- processing record4 in step 1 --
-- processing record5 in step 1 --
-- processing record6 in step 1 --
Step Batch_Step finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch
Now things would be more interesting on step 2 the record 3 will fail because we explicitly throw an exception but despite this the step will continue in processing the other records, here how the log would look like.
-- processing record1 in step 2 --
-- processing record2 in step 2 --
-- processing record3 in step 2 --
com.mulesoft.module.batch.DefaultBatchStep: Found exception processing record on step ...
Stacktrace
....
-- processing record4 in step 2 --
-- processing record5 in step 2 --
-- processing record6 in step 2 --
Step Batch_Step2 finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch
At this point despite a failed record in this step batch processing will continue because the parameter max-failed-records is set to -1 (unlimited) and not to the default value of 0.
At this point all the successful records will be passed to step3, this because, by default, the accept-policy parameter of a step is set to NO_FAILURES. (Other possible values are ALL and ONLY_FAILURES).
Now the step3 that contains the commit phase with a count equal to 2 will commit the records two by two:
-- committing [record1, record2] --
-- committing [record4, record5] --
Step: Step Batch_Step3 finished processing all records for instance d8660590-ca74-11e5-ab57-6cd020524153 of job batch_testBatch
-- committing [record6] --
As you can see this confirms that record3 that was in failure was not passed to the next step and therefore not committed.
Starting from this example I think you can imagine and test more complex scenario, for example after commit you could have another step that process only failed records for make aware administrator with a mail of the failure.
After you can always use external storage to store more advanced info about your records as you can read in my answer to this other question.
Hope this helps
I am trying to send an HTTP request via JMeter. I have created a thread group with a loop count of 25. I have a ramp up period of 120 and number of threads set to 30. Within the thread group, I have 20 HTTP Requests. I am a little confused as to how JMeter runs these requests. Do each of the 20 requests within a thread group run in a single thread, and each loop over a thread group runs concurrently on a different thread? Or do each of the 20 requests run in different threads as and when they are available.
My other question is, Over each loop, I want to vary the body of the post data that is being sent via the HTTP request. Is it possible to pass the post data body via a file instead of inserting the data into the JMeter Body Data Tab as show below:
However, instead of doing that, I want to define some kind of variable that picks a file based on iteration of the threadgroup that is running, for example, if it is looping over the thread group the second time, i want to call test2.txt, if the third time test3.txt etc and these text files will contain different post data. Could anyone tell me if this is possible with JMeter please and if so, how would I go about doing this.
Point 1 - JMeter concurrency
JMeter starts with 1 thread and spawns more threads as per ramp-up set. In your case (30 threads and 120 seconds ramp-up) another thread is being added each 4 seconds. Each thread executes 20 requests and if there is another loop - starts over, if there is no loop - the threads shuts down. To control load and concurrency JMeter provides 2 options:
Synchronizing Timer - pause all threads till specified threshold is reached and then release all of them at the same time
Constant Throughput Timer - to specify the load in requests per minute.
Point 2 - Send file instead of text
You can replace your request body with __fileToString function. If you want to parametrize it you can use nested function to provide current iteration - see below.
Point 3 - adding iteration as a parameter
JMeter provides 2 options on how you can increment a counter each loop
Counter config element - starts from specified value and gets incremented by specified value each time it's called.
__counter function - start from 1 and gets incremented by 1 each time it's being called. Can be "per-user" or "global"
See How to Use JMeter Functions post series for comprehensive information on above and more JMeter functions.