ServerFailureTriggerMBean.MaxStuckThreadTime & ServerFailureTriggerMBean.StuckThreadCount strange behaviour - weblogic

I'm facing a strange behaviour with some parameters in weblogic.
I have a J2EE batch which is executed during more than 10 minutes in a weblogic server which cause an exception like
com.ibm.jbatch.container.exception.BatchContainerRuntimeException:
java.lang.InterruptedException
After some investigation, I found that the property MaxStuckThreadTime is set to 600 seconds (default value) and the property StuckThreadCount is set to 25 (was 0 in the past without any issue).
If I understand well, this means, the server should fail if and only if at least 25 threads are busy since more than 600seconds.
But I have maximum 10 threads running at the same time on the server.
I made some test on my dev environement and as soon as I have one thread stuck (busy during 10 minutes, the interruped exception is launched), is-it the expected behaviour?
I don't have the right to modify those value on production.
So, any idea is welcome to by pass this kind of error.
In the documentation, I found :
StuckThreadCount = The number of stuck threads after which the server is transitioned into FAILED state.
MaxStuckThreadTime = Sets the value of the MaxStuckThreadTime attribute.
So, in my point of view, the interupted excpetion, should only appears if the 2 conditions are field-in, but i have the impression that only one stuck thread is enough to interupt the batch.
Am-i correct if I say that the MaxStuckThreadTime is only taken into account if the StuckThreadCount is different than 0?
Thanks in advance for your help
edit :
I tried to implement the proposal here under but until now, without success.
So, in my weblogic-ejb-jar.xml, I've added the following code :
<work-manager>
<name>BatchWorkManager</name>
<ignore-stuck-threads>true</ignore-stuck-threads>
</work-manager>
<managed-executor-service>
<name>batch-job-executor</name>
<dispatch-policy>BatchWorkManager</dispatch-policy>
<long-running-priority>10</long-running-priority>
</managed-executor-service>
and in my batch, I added
#Resource(name = "BatchWorkManager")
WorkManager myMW;
and the call to my batch like this
#Override
public String process() throws Exception {
myWM.schedule(new MyWork("MyBatchName"));
return BatchStatus.COMPLETED.toString();
}
After a few minutes (defined in the MaxStuckThreadTime parameter), the job is put on status failed.
If I debug the code, I see the value of the workmanager :
stuckThreadActions = null name = "NO STUCK THREAD ACTIONS !"
stuckThreads = {BitSet#36226} "{}"
It seems, the workmanager is correctly setup (NO STUCK THREAD ACTIONS ! is what I want).
So, I still don't understand, why the batch is failing ...
Any help is welcome.
For information, the statcktrace I receive :
###<Apr 21, 2022, 12:40:00,793 PM CEST> <com.ibm.jbatch.container.impl.BatchletStepControllerImpl>
<[STUCK] ExecuteThread: '0' for queue:
'weblogic.kernel.Default (self-tuning)'> <>
<33ef2b10-13cc-45be-bf47-e06daf40042c-0000003b> <1650537600793>
<[severity-value: 16] [rid: 0:1] [partition-id: 0] [partition-name:
DOMAIN] > <Caught exception executing step:
com.ibm.jbatch.container.exception.BatchContainerRuntimeException:
java.lang.InterruptedException at
com.ibm.jbatch.container.impl.PartitionedStepControllerImpl.executeAndWaitForCompletion(PartitionedStepControllerImpl.java:407)
at
com.ibm.jbatch.container.impl.PartitionedStepControllerImpl.invokeCoreStep(PartitionedStepControllerImpl.java:297)
at
com.ibm.jbatch.container.impl.BaseStepControllerImpl.execute(BaseStepControllerImpl.java:144)
at
com.ibm.jbatch.container.impl.ExecutionTransitioner.doExecutionLoop(ExecutionTransitioner.java:112)
at
com.ibm.jbatch.container.impl.JobThreadRootControllerImpl.originateExecutionOnThread(JobThreadRootControllerImpl.java:110)
at
com.ibm.jbatch.container.util.BatchWorkUnit.run(BatchWorkUnit.java:80)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at weblogic.work.concurrent.TaskWrapper.call(TaskWrapper.java:151)
at
weblogic.work.concurrent.future.AbstractFutureImpl.runTask(AbstractFutureImpl.java:391)
at
weblogic.work.concurrent.future.AbstractFutureImpl.doRun(AbstractFutureImpl.java:436)
at
weblogic.work.concurrent.future.ManagedFutureImpl.run(ManagedFutureImpl.java:28)
at
weblogic.invocation.ComponentInvocationContextManager._runAs(ComponentInvocationContextManager.java:348)
at
weblogic.invocation.ComponentInvocationContextManager.runAs(ComponentInvocationContextManager.java:333)
at
weblogic.work.LivePartitionUtility.doRunWorkUnderContext(LivePartitionUtility.java:54)
at
weblogic.work.PartitionUtility.runWorkUnderContext(PartitionUtility.java:41)
at
weblogic.work.SelfTuningWorkManagerImpl.runWorkUnderContext(SelfTuningWorkManagerImpl.java:640)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:406) at
weblogic.work.ExecuteThread.run(ExecuteThread.java:346) Caused by:
java.lang.InterruptedException at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
com.ibm.jbatch.container.impl.PartitionedStepControllerImpl.executeAndWaitForCompletion(PartitionedStepControllerImpl.java:402)
... 17 more

You could configure a new work manager for running the batch job and configure stuck threads to be ignored, or launch the batch job as a long running request.
A work manager can be configured globally via the weblogic console, or locally for each deployed application. To define a work manager in an application, you can configure it in the weblogic.xml (or equivalent for ear files) packaged up with your deployment. For example, i have this in my weblogic.xml file to define a work manager that ignores stuck threads...
<?xml version="1.0" encoding="UTF-8"?>
<weblogic-web-app xmlns="http://xmlns.oracle.com/weblogic/weblogic-web-app" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.oracle.com/weblogic/weblogic-web-app http://xmlns.oracle.com/weblogic/weblogic-web-app/1.4/weblogic-web-app.xsd">
...
<work-manager>
<name>batch-job-wm</name>
<max-threads-constraint>
<name>batch-job-max-threads</name>
<count>10</count>
</max-threads-constraint>
<ignore-stuck-threads>true</ignore-stuck-threads>
</work-manager>
<managed-executor-service>
<name>batch-job-executor</name>
<dispatch-policy>batch-job-wm</dispatch-policy>
<long-running-priority>10</long-running-priority>
<max-concurrent-long-running-requests>10</max-concurrent-long-running-requests>
</managed-executor-service>
<resource-env-description>
<resource-env-ref-name>concurrent/batch-job-executor</resource-env-ref-name>
<resource-link>batch-job-executor</resource-link>
</resource-env-description>
...
</weblogic-web-app>
I reference that managed-executor-service in my web.xml...
<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd" version="3.0">
...
<resource-env-ref>
<resource-env-ref-name>concurrent/batch-job-executor</resource-env-ref-name>
<resource-env-ref-type>javax.enterprise.concurrent.ManagedExecutorService</resource-env-ref-type>
</resource-env-ref>
</web-app>
In my web application, I can then access that task executor as follows...
#Configuration
public class ResourceConfig {
#Bean
public TaskExecutor batchTaskExecutor() {
DefaultManagedTaskExecutor taskExecutor = new DefaultManagedTaskExecutor();
taskExecutor.setJndiName("java:comp/env/concurrent/batch-job-executor");
return taskExecutor;
}
}
When launching a batch job using that work manager, any stuck threads are ignored by weblogic and the servers show as healthy even for long running tasks.
An enhancement to this is to have the batch job launched as a long running task. I think this will cause weblogic to create a new thread for the task instead of taking a thread from the work manager thread pool. Also weblogic won't consider a thread assigned to a long running task as being stuck.
To launch a long running task, you need to set the LONGRUNNING_HINT to true in the ManagedTask that is launched. For more details see the following...
https://docs.oracle.com/javaee/7/api/javax/enterprise/concurrent/ManagedTask.html#LONGRUNNING_HINT
https://docs.oracle.com/javaee/7/api/javax/enterprise/concurrent/ManagedExecutorService.html
https://blogs.oracle.com/weblogicserver/post/concurrency-utilities-support-in-weblogic-server-1221-part-one-managedexecutorservice

Related

Reactor Kafka health check in a Spring webflux app

I have a Reactor Kafka application that consumes messages from a topic indefinitely. I need to expose a health check REST endpoint that can indicate the health of this process - Essentially interested in knowing if the Kafka receiver flux sequence has terminated so that some action can be taken to start it. Is there a way to know the current status of a flux (completed/terminated etc)? The application is Spring Webflux + Reactor Kafka.
Edit 1 - doOnTerminate/doFinally do not execute
Flux.range(1, 5)
.flatMap(record -> Mono.just(record)
.map(i -> {
throw new OutOfMemoryError("Forcing exception for " + i);
})
.doOnNext(i -> System.out.println("doOnNext: " + i))
.doOnError(e -> System.err.println(e))
.onErrorResume(e -> Mono.empty()))
.doFinally(signalType -> System.err.println("doFinally: Terminating with Signal type: " + signalType))
.doOnTerminate(()-> System.err.println("doOnTerminate: executed"))
.subscribe();
"C:\Program Files\Java\jdk1.8.0_211\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.2.4\lib\idea_rt.jar=52295:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2019.2.4\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_211\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_211\jre\lib\rt.jar;C:\Users\akoul680\intellij-workspace\basics\target\classes;C:\Users\akoul680\.m2\repository\com\zaxxer\HikariCP\3.4.1\HikariCP-3.4.1.jar;C:\Users\akoul680\.m2\repository\org\apache\kafka\kafka-clients\2.2.0\kafka-clients-2.2.0.jar;C:\Users\akoul680\.m2\repository\com\github\luben\zstd-jni\1.3.8-1\zstd-jni-1.3.8-1.jar;C:\Users\akoul680\.m2\repository\org\lz4\lz4-java\1.5.0\lz4-java-1.5.0.jar;C:\Users\akoul680\.m2\repository\org\xerial\snappy\snappy-java\1.1.7.2\snappy-java-1.1.7.2.jar;C:\Users\akoul680\.m2\repository\org\apache\avro\avro\1.9.0\avro-1.9.0.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.9.8\jackson-core-2.9.8.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.9.8\jackson-databind-2.9.8.jar;C:\Users\akoul680\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.9.0\jackson-annotations-2.9.0.jar;C:\Users\akoul680\.m2\repository\org\apache\commons\commons-compress\1.18\commons-compress-1.18.jar;C:\Users\akoul680\.m2\repository\com\codahale\metrics\metrics-core\3.0.2\metrics-core-3.0.2.jar;C:\Users\akoul680\.m2\repository\org\junit\jupiter\junit-jupiter-api\5.3.2\junit-jupiter-api-5.3.2.jar;C:\Users\akoul680\.m2\repository\org\apiguardian\apiguardian-api\1.0.0\apiguardian-api-1.0.0.jar;C:\Users\akoul680\.m2\repository\org\opentest4j\opentest4j\1.1.1\opentest4j-1.1.1.jar;C:\Users\akoul680\.m2\repository\org\junit\platform\junit-platform-commons\1.3.2\junit-platform-commons-1.3.2.jar;C:\Users\akoul680\.m2\repository\org\slf4j\slf4j-api\1.7.26\slf4j-api-1.7.26.jar;C:\Users\akoul680\.m2\repository\ch\qos\logback\logback-core\1.2.3\logback-core-1.2.3.jar;C:\Users\akoul680\.m2\repository\ch\qos\logback\logback-classic\1.2.3\logback-classic-1.2.3.jar;C:\Users\akoul680\.m2\repository\io\projectreactor\reactor-core\3.4.10\reactor-core-3.4.10.jar;C:\Users\akoul680\.m2\repository\org\reactivestreams\reactive-streams\1.0.3\reactive-streams-1.0.3.jar;C:\Users\akoul680\.m2\repository\io\projectreactor\reactor-test\3.4.10\reactor-test-3.4.10.jar;C:\Users\akoul680\.m2\repository\commons-net\commons-net\3.6\commons-net-3.6.jar;C:\Users\akoul680\.m2\repository\com\box\box-java-sdk\2.32.0\box-java-sdk-2.32.0.jar;C:\Users\akoul680\.m2\repository\com\eclipsesource\minimal-json\minimal-json\0.9.1\minimal-json-0.9.1.jar;C:\Users\akoul680\.m2\repository\org\bitbucket\b_c\jose4j\0.4.4\jose4j-0.4.4.jar;C:\Users\akoul680\.m2\repository\org\bouncycastle\bcprov-jdk15on\1.52\bcprov-jdk15on-1.52.jar;C:\Users\akoul680\.m2\repository\com\jcraft\jsch\0.1.55\jsch-0.1.55.jar;C:\Users\akoul680\.m2\repository\org\apache\commons\commons-vfs2\2.4\commons-vfs2-2.4.jar;C:\Users\akoul680\.m2\repository\commons-logging\commons-logging\1.2\commons-logging-1.2.jar;C:\Users\akoul680\.m2\repository\org\bouncycastle\bcpkix-jdk15on\1.52\bcpkix-jdk15on-1.52.jar;C:\Users\akoul680\intellij-workspace\basics\lib\db2jcc4.jar" lrn.chapter14.ErrorHandling
2021-10-12T09:53:34,344 main r.util.Loggers - Using Slf4j logging framework
Exception in thread "main" java.lang.OutOfMemoryError: Forcing exception for 1
at lrn.chapter14.ErrorHandling.lambda$null$0(ErrorHandling.java:19)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onNext(FluxMapFuseable.java:281)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.request(FluxMapFuseable.java:354)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableConditionalSubscriber.request(FluxPeekFuseable.java:437)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.request(MonoPeekTerminal.java:139)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2194)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onSubscribe(MonoPeekTerminal.java:152)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableConditionalSubscriber.onSubscribe(FluxPeekFuseable.java:471)
at reactor.core.publisher.FluxMapFuseable$MapFuseableConditionalSubscriber.onSubscribe(FluxMapFuseable.java:263)
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
at reactor.core.publisher.Mono.subscribe(Mono.java:4361)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:426)
at reactor.core.publisher.FluxRange$RangeSubscription.slowPath(FluxRange.java:156)
at reactor.core.publisher.FluxRange$RangeSubscription.request(FluxRange.java:111)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onSubscribe(FluxFlatMap.java:371)
at reactor.core.publisher.FluxRange.subscribe(FluxRange.java:69)
at reactor.core.publisher.Flux.subscribe(Flux.java:8468)
at reactor.core.publisher.Flux.subscribeWith(Flux.java:8641)
at reactor.core.publisher.Flux.subscribe(Flux.java:8438)
at reactor.core.publisher.Flux.subscribe(Flux.java:8362)
at reactor.core.publisher.Flux.subscribe(Flux.java:8280)
at lrn.chapter14.ErrorHandling.ex5(ErrorHandling.java:26)
at lrn.chapter14.ErrorHandling.main(ErrorHandling.java:12)
Process finished with exit code 1
You can't query the flux itself, but you can tell it to do something if it ever stops.
In the service that contains your Kafka listener, I'd recommend adding a terminated (or similar) boolean flag that's false by default. You can then ensure that the last operator in your flux is:
.doOnTerminate(() -> terminated = true)
...and then get the healthcheck endpoint to monitor that value, marking the container as unhealthy if that flag is ever true.
doOnTerminate() is more reliable than doOnError() in this use-case, as it executes whether the publisher has terminated either with an error, or a completion signal. As per the comment though, this isn't completely reliable - if your publisher terminates due to a JVM error or similar, that doOnTerminate() operator won't be run.
In my experience, if this happens it's usually due to an OutOfMemoryError, in which case the -XX:+ExitOnOutOfMemoryError is a good VM option to use (the immediate exit can then trigger an immediate restart policy, without waiting for a healthcheck endpoint to be called and trigger the restart after a while.)
Bear in mind there are other fatal JVM errors that wouldn't get caught by the above process though, so that's still not 100% reliable.

Talend (7.0.1) - Cannot modify mapred.job.name at runtime

I am having some trouble running a simple tHiveCreateTable job in Talend OS for Big Data (Print of the job where I am getting this error).
The Hive connection is fine and the job worked until Ranger was activated in the cluster.
After ranger, I started getting the following log:
[statistics] connecting to socket on port 3345
[statistics] connected
Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
[statistics] disconnected
This error occurs either using Tez or MapReduce for the job, throwing an exception in the following line of the automatically generated code:
// For MapReduce Mode
stmt_tHiveCreateTable_1.execute("set mapred.job.name=" + queryIdentifier);
Do you know any solution or workarround for this?
Thanks in advance
It is possible to disable changing mapreduce.job.name and hive.query.name at runtime by Talend7 jobs.
Edit the file
{talend_install_dir}/plugins/org.talend.designer.components.localprovider_7.1.1.20181026_1147/components/templates/Hive/SetQueryName.javajet
and comment out lines 6 and 11 like that:
// stmt_<%=cid %>.execute("set mapred.job.name=" + queryIdentifier_<%=cid %>);
// stmt_<%=cid %>.execute("set hive.query.name=" + queryIdentifier_<%=cid %>);
It solved this issue for me.

How to start and stop multiple weblogic managed servers at one go through WLST

I am writing a code to start , stop, undeploy and deploy my application on weblogc.
My components need to be deployed on few managed servers.
When I do new deployments manually I can start and stop the servers in parallel, by ticking multiple boxes and selecting start and stop from the dop down. See below.
but when trying from WLST, i could do that in one server at a time.
ex:
start(name='ServerX',type='Server',block='true')
start(name='ServerY',type='Server',block='true')
shutdown(name='ServerX',entityType='Server',ignoreSessions='true',timeOut=600,force='true',block='true')
shutdown(name='ServerY',entityType='Server',ignoreSessions='true',timeOut=600,force='true',block='true')
Is there a way I can start stop multiple servers in once command?
Instead of directly starting and stopping servers, you create tasks, then wait for them to complete.
e.g.
tasks = []
for server in cmo.getServerLifeCycleRuntimes():
# to shut down all servers
if (server.getName() != ‘AdminServer’ and server.getState() != ‘RUNNING’ ):
tasks.append(server.start())
#or to start them up:
#if (server.getName() != ‘AdminServer’ and server.getState() != ‘SHUTDOWN’ ):
# tasks.append(server.shutdown())
#wait for tasks to complete
while len(tasks) > 0:
for task in tasks:
if task.getStatus() != ‘TASK IN PROGRESS’ :
tasks.remove(task)
java.lang.Thread.sleep(5000)
I know this is an old post, today I was reading this book "Advanced WebLogic Server Automation" written by Martin Heinzl so in the page 282 I found this.
def startCluster(clustername):
try:
start(clustername, 'Cluster')
except Exception, e:
print 'Error while starting cluster', e
dumpStack()
I tried it and it started managed servers in parallel.
Just keep in mind the AdminServer must be started first and your script must connect to the AdminServer before trying it.
Perhaps this would not be useful for you as the servers should be in a cluster, but I wanted to share this :)

Setting a timeout on webservice consumer built with org.apache.axis.client.Call and running on Domino

I'm maintaining an antedeluvian Notes application which connects to a SAP back-end via a manually done 'Webservice'
The server is running Domino Release 7.0.4FP2 HF97.
The Webservice is not the more recently Webservice Consumer, but a large Java agent which is using Apache soap.jar (org.apache.soap). Below an example of the calling code.
private Call setupSOAPCall() {
Call call = new Call();
SOAPHTTPConnection conn = new SOAPHTTPConnection();
call.setSOAPTransport(conn);
call.setEncodingStyleURI(Constants.NS_URI_SOAP_ENC);
There has been a change in the SAP system which is now taking 8 minutes to complete (verified by SAP Team).
I'm getting an error message as follows:
[SOAPException: faultCode=SOAP-ENV:Client; msg=For input string: "906 "; targetException=java.lang.NumberFormatException: For input string: "906 "]
I found a blog article describing the error message quite closely:
https://thejavablog.wordpress.com/category/jmeter/
and I've come to the hypothesis that it is a timeout message that is returning to my Call object and that this timeout message is being incorrectly parsed, hence the NumberFormat Exception.
Looking at my logs I can see that there is a time difference of 62 seconds between my call and the response.
I recommended that the server setting in the server document, tab Internet Protocols/HTTP/Timeouts/Request timeouts be changed from 60 seconds to 600 seconds, and the http task restarted with
tell http restart
I've re-run the tests and I am getting the same error, and the time difference is still slightly more than 60 seconds, which is not what I was expecting.
I read Michael Rulnau's blog entry
http://www.mruhnau.net/2014/06/how-to-overcome-domino-webservice.html
which points to this APR
http://www-01.ibm.com/support/docview.wss?uid=swg1LO48272
but I'm not convinced that this would apply in this case, since there is no way that IBM would know that my Java agent is in fact making a Soap call.
My current hypothesis is that I have to use either the setTimeout() method on
org.apache.axis.client.Call
https://axis.apache.org/axis/java/apiDocs/org/apache/axis/client/Call.html
or on the org.apache.soap.transport.http.SOAPHTTPConnection
https://docs.oracle.com/cd/B13789_01/appdev.101/b12024/org/apache/soap/transport/http/SOAPHTTPConnection.html
and that the timeout value is an apache default, not something that is controlled by the Domino server.
I'd be grateful for any help.
I understand your approach, and I hope this is the correct one to solve your problem.
Add a debug (console write would be fine) that display the default Timeout then try to increase it to 10 min.
SOAPHTTPConnection conn = new SOAPHTTPConnection();
System.out.println("time out is :" + conn.getTimeout());
conn.setTimeout(600000);//10 min in ms
System.out.println("after setting it, time out is :" + conn.getTimeout());
call.setSOAPTransport(conn);
Now keep in mind that Dommino has also a Max LotusScript/Java execution time, check this value and (at least for a try) change it: http://www.ibm.com/support/knowledgecenter/SSKTMJ_9.0.1/admin/othr_servertasksagentmanagertab_r.html (it's version 9 help but this part should be identical)
I've since discovered that it wasn't my code generating the error; the default timeout for the apache axis SOAPHTTPConnetion is 0, i.e. no timeout.

Query process instances based on starting message name

ENV: camunda 7.4, BPMN 2.0
Given a process, which can be started by multiple start message events.
is it possible to query process instances started by specific messages identified by message name?
if yes, how?
if no, why?
if not at the moment, when?
Some APIs like IncidentMessages?
That is no out-of-the-box feature but should be easy to build by using process variables.
The basic steps are:
1. Implement an execution listener that sets the message name as a variable:
public class MessageStartEventListener implements ExecutionListener {
public void notify(DelegateExecution execution) throws Exception {
execution.setVariable("startMessage", "MessageName");
}
}
Note that via DelegateExecution#getBpmnModelElementInstance you can access the BPMN element that the listener is attached to, so you could determine the message name dynamically.
2. Declare the execution listener at the message start events:
<process id="executionListenersProcess">
<startEvent id="theStart">
<extensionElements>
<camunda:executionListener
event="start" class="org.camunda.bpm.examples.bpmn.executionlistener.MessageStartEventListener" />
</extensionElements>
<messageEventDefinition ... />
</startEvent>
...
</process>
Note that with a BPMN parse listener, you can add such a listener programmatically to every message start event in every process definition. See this example.
3. Make a process instance query filtering by that variable
RuntimeService runtimeService = processEngine.getRuntimeService();
List<ProcessInstance> matchingInstances = runtimeService
.createProcessInstanceQuery()
.variableValueEquals("startMessage", "MessageName")
.list();