Ignite semaphore stucks on release() - ignite

I have started 3 server and 2 client nodes of ignite. If I stop one server node both client stuck on semaphore release call.
Could you please tell me the reason?
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base#11.0.9/Native Method)
at java.util.concurrent.locks.LockSupport.park(java.base#11.0.9/LockSupport.java:323)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4851)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGet(GridCacheAdapter.java:4810)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1469)
at org.apache.ignite.internal.processors.cache.GridCacheProxyImpl.get(GridCacheProxyImpl.java:400)
at org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl$Sync$1.call(GridCacheSemaphoreImpl.java:277)
at org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl$Sync$1.call(GridCacheSemaphoreImpl.java:271)
at org.apache.ignite.internal.processors.cache.GridCacheUtils.retryTopologySafe(GridCacheUtils.java:1423)
at org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl$Sync.compareAndSetGlobalState(GridCacheSemaphoreImpl.java:271)
at org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl$Sync.tryReleaseShared(GridCacheSemaphoreImpl.java:238)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(java.base#11.0.9/AbstractQueuedSynchronizer.java:1382)
at org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl.release(GridCacheSemaphoreImpl.java:790)
at org.apache.ignite.internal.processors.datastructures.GridCacheSemaphoreImpl.release(GridCacheSemaphoreImpl.java:778)

Related

JVM Appears to be hung with Outofheapspace error while having response payload size more than 3 mb in Mule 4

I am using mule 4 to retrieve records from database and show it in the response . Somehow I see all the components are getting passed successfully but while streaming the response its failing . I am trying to call from postman and I see error:
<h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
In the studio , I get logs like :
Pinging the JVM took 9 seconds to respond.
JVM appears hung: Timed out waiting for signal from JVM. Requesting thread dump.
Dumping JVM state.
JVM appears hung: Timed out waiting for signal from JVM. Restarting JVM.
JVM exited after being requested to terminate.
JVM Restarts disabled. Shutting down.
<-- Wrapper Stopped
Could anyone help me on this .
Thanks
Sanjukta
Some information is not being streamed. You didn't provide any details of the implementation but clearly something is consuming a lot of heap memory. It may not be the database, but some other component. Check the streaming configuration for your components.
To identify the cause locally you can capture a heap dump and analyze it while the runtime in studio is timing out for the ping before it crashes. That is probably because of high garbage collection activity.
This is a symptom that your JVM heap memory is full, check your settings in Anypoint Studio and see how much is allocated
Check this article
https://help.mulesoft.com/s/article/Out-Of-Memory-in-Studio-Application-How-to-increase-the-maximum-heap-size?r=6&ui-force-components-controllers-recordGlobalValueProvider.RecordGvp.getRecord=1

how to handle or debug run time threading exception

I am getting the following exception:
java.util.concurrent.RejectedExecutionException: XNIO007007: Thread is
terminating
at org.xnio.nio.WorkerThread.execute(WorkerThread.java:568)
at org.xnio.AbstractIoFuture.runNotifier(AbstractIoFuture.java:354)
at org.xnio.AbstractIoFuture.runAllNotifiers(AbstractIoFuture.java:233)
at org.xnio.AbstractIoFuture.setCancelled(AbstractIoFuture.java:291)
at org.xnio.FutureResult.setCancelled(FutureResult.java:98)
at org.xnio.nio.WorkerThread$ConnectHandle.forceTermination(WorkerThread.java:339)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:490.
In my app there are 3 threads running, one is the main thread and other two are tcp server thread and tcp client thread. After running the app for a long time it throws the above exception. I don't know which thread throwed this exception neither how to debug it.

Websphere application server LDAP connection pool

We are using websphere application server 8.5.0.0. we have a requirement where we have to query a LDAP server to get the customer details. I tried to configure the connection pool as described here and here.
I passed the below JVM arguments
-Dcom.sun.jndi.ldap.connect.pool.maxsize=5
-Dcom.sun.jndi.ldap.connect.pool.timeout=60000
-Dcom.sun.jndi.ldap.connect.pool.debug=all
Below is a sample code snippet
Hashtable<String,String> env = new Hashtable<String,String>();
...
...
env.put("com.sun.jndi.ldap.connect.pool", "true");
env.put("com.sun.jndi.ldap.connect.timeout", "5000");
InitialDirContext c = new InitialDirContext(env);
...
...
c.close();
I have two issues here
When I am calling the service for the 6th time, I am getting javax.naming.ConnectionException: Timeout exceeded while waiting for a connection: 5000ms. I checked the connection pool debug logs and I noticed the connections are not returning back to the pool immediately despite closing the context safely in a finally block. The connections are released after some time and expired after sometime after the release. There after if I call the service again, it connects to the LDAP server but new connections are being created.
I tried to execute the code and I am able to see the connection pool debug logs. But the logs are being logged in System.Err log. Is this an issue? Can I ignore it?
But when I run the code as a standalone application(multithreaded with loop of 50 times), the connections are returned/released immediately.
Can anyone please let me know what am I doing wrong?

How to stop gracefully a Spark Streaming application on YARN?

I'm running a Spark Streaming application on YARN in cluster mode and I'm trying to implement a gracefully shutdown so that when the application is killed it will finish the execution of the current micro batch before stopping.
Following some tutorials I have configured spark.streaming.stopGracefullyOnShutdown to true and I've added the following code to my application:
sys.ShutdownHookThread {
log.info("Gracefully stopping Spark Streaming Application")
ssc.stop(true, true)
log.info("Application stopped")
}
However when I kill the application with
yarn application -kill application_1454432703118_3558
the micro batch executed at that moment is not completed.
In the driver I see the first line of log printed ("Gracefully stopping Spark Streaming Application") but not the last one ("Application stopped").
ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
INFO streaming.MySparkJob: Gracefully stopping Spark Streaming Application
INFO scheduler.JobGenerator: Stopping JobGenerator gracefully
INFO scheduler.JobGenerator: Waiting for all received blocks to be consumed for job generation
INFO scheduler.JobGenerator: Waited for all received blocks to be consumed for job generation
INFO streaming.StreamingContext: Invoking stop(stopGracefully=true) from shutdown hook
In the executors log I see the following error:
ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.6.21:49767 disassociated! Shutting down.
INFO storage.DiskBlockManager: Shutdown hook called
WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver#192.168.6.21:49767] has failed, address is now gated for [5000] ms. Reason: [Disassociated]
INFO util.ShutdownHookManager: Shutdown hook called
I think the problem is related to how YARN send the kill signal the application. Any idea on how can I make the application stop gracefully?
you should go to the executors page to see where your driver is running ( on which node). ssh to that node and do the following:
ps -ef | grep 'app_name'
(replace app_name with your classname/appname). it will list couple of processes. Look at the process, some will be child of the other. Pick the id of the parent-most process and send a SIGTERM
kill pid
after some time you'll see that your application has terminated gracefully.
Also now you don't need to add those hooks for shutdown.
use spark.streaming.stopGracefullyOnShutdown config to help shutdown gracefully
You can stop spark streaming application by invoking ssc.stop when a customized condition is triggered instead of using awaitTermination. As the following pseudocode shows:
ssc.start()
while True:
time.sleep(10s)
if some_file_exist:
ssc.stop(True, True)

Resque + killed worker: does not leave a failed job

I noticed that when i kill a resque worker and it's processing something, it won't leave a failed job. It will be simply gone.
Thus, the job will never be finished and jobs place an important role in my application.
This only happens when i kill a worker. If my job raises an exception, i can retry it later.
Is it possible to avoid this behavior?
Thanks.