Ignite RestartProcessFailureHandler failed to restart the stuck node - ignite

Ignite version v2.8.1-1
I have configured RestartProcessFailureHandler for handling the system critical errors like SYSTEM_WORKER_BLOCKED, however, when the error occurs, the restart never happens even after hours, is this expected behavior?
However, do see in the logs that indicating a restart has been requested but it seems never got executed.
As an alternative, I am thinking of enabling the rest API for a liveness check of the service and restarting the service once the check fails if the failure handler is not suitable for handling this case, please advise.
Thanks.
[2022-03-08T02:14:32,561][ERROR][disco-event-worker-#44%ignite-instance%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=RestartProcessFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=Unmod
ifiableSet []]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=sys-stripe-6, igniteInstanceName=ignite-instance, finished=false, heartbeatTs=1646705660633]]]
org.apache.ignite.IgniteException: GridWorker [name=sys-stripe-6, igniteInstanceName=ignite-instance, finished=false, heartbeatTs=1646705660633]
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1810) [ignite-core-2.8.1.jar:2.8.1]
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1805) [ignite-core-2.8.1.jar:2.8.1]
at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234) [ignite-core-2.8.1.jar:2.8.1]
at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) [ignite-core-2.8.1.jar:2.8.1]
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2796) [ignite-core-2.8.1.jar:2.8.1]
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.8.1.jar:2.8.1]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
...
[2022-03-08T02:14:32,603][ERROR][node-restarter][] Restarting JVM on Ignite failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=sys-stripe-6, igniteInstanceName=ignite-instance, finished=false, heartbeatTs=1646705660633]]]
....

There is no standard way for a JVM to restart itself from Java application and therefore ignite rely on external tools to provide that capability. According to docs for org.apache.ignite.failure.RestartProcessFailureHandler
https://ignite.apache.org/docs/2.11.1/perf-and-troubleshooting/handling-exceptions#failures-handling
standard ignite.sh|bat scripts support restarting when JVM process exits with this code.
If you run ignite as part of your application, you can write your own script to start. And add IGNITE_SUCCESS_FILE=<path to marker file, which will be created by ignite during restart is called> as jvm option at start java process. After this failure handler works, jvm exits with org.apache.ignite.IgniteSystemProperties#IGNITE_RESTART_CODE, then you need to check that IGNITE_SUCCESS_FILE was created, remove it and start jvm again.

Related

How to fix ClassNotFoundException: kotlinx.coroutines.debug.AgentPremain in debug?

I am running several projects as spring boot applications, one of them specifically cannot start and is throwing:
java.lang.ClassNotFoundException: kotlinx.coroutines.debug.AgentPremain
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:304)
at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Exception in thread "main" FATAL ERROR in native method: processing of -javaagent failed
Disconnected from the target VM, address: '127.0.0.1:64279', transport: 'socket'
Process finished with exit code 1
The command line has the option:
-javaagent:[...].m2/repository/org/jetbrains/kotlinx/kotlinx-coroutines-core/1.5.0/kotlinx-coroutines-core-1.5.0.jar
And this jar exists.
this happened after updating my Intellij'Idea to 2021.2
This issue happens only in debug mode.
Any idea how to solve this?
It is a known problem in the Kotlin plugin. As a workaround disable the coroutine agent option under Settings/Preferences | Build, Execution, Deployment | Debugger | Data Views | Kotlin | enable "Disable coroutine agent".
Please follow https://youtrack.jetbrains.com/issue/KTIJ-19345 , for updates.

ai.grakn.redismock.exception.ParseErrorException

Running JUnit test of code doing jedis.get(key), via command line mvn test. The test seems to succeed, but seeing multiple stack traces from RedisServer thread. Using redis-mock 1.0.6 and jedis 2.9.0.
In IntelliJ, setting breakpoint on the throw doesn't yield much as far as diagnostics. The server is trying to read messageInput and gets an EOFException in consumeCount. Are these errors significant? How are they caused?
Exception in thread "Thread-3" Exception in thread "Thread-5" ai.grakn.redismock.exception.ParseErrorException
at ai.grakn.redismock.SliceParser.consumeCount(SliceParser.java:83)
at ai.grakn.redismock.RedisCommandParser.parse(RedisCommandParser.java:27)
at ai.grakn.redismock.RedisClient.nextCommand(RedisClient.java:69)
at ai.grakn.redismock.RedisClient.run(RedisClient.java:45)
at java.lang.Thread.run(Thread.java:748)
ai.grakn.redismock.exception.ParseErrorException
at ai.grakn.redismock.SliceParser.consumeCount(SliceParser.java:83)
at ai.grakn.redismock.RedisCommandParser.parse(RedisCommandParser.java:27)
at ai.grakn.redismock.RedisClient.nextCommand(RedisClient.java:69)
at ai.grakn.redismock.RedisClient.run(RedisClient.java:45)
at java.lang.Thread.run(Thread.java:748)
I've had this exception come up when I used a single Jedis instance in multiple threads; it's not thread-safe.

Apache zeppelin error

I have started apache zeppelin and running successfully in the configured port.
While I am executing simple spark commands like
sc.version
println(zeppelin)
I can just see "ERROR" string near run button without any error output on my console.
My error log :
ERROR [2016-10-21 22:38:05,837] ({pool-2-thread-6} Job.java[run]:189) - Job failed
org.apache.spark.SparkException: Found both spark.driver.extraClassPath and SPARK_CLASSPATH. Use only the former.
at org.apache.spark.SparkConf$$anonfun$validateSettings$7$$anonfun$apply$8.apply(SparkConf.scala:492)
at org.apache.spark.SparkConf$$anonfun$validateSettings$7$$anonfun$apply$8.apply(SparkConf.scala:490)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkConf$$anonfun$validateSettings$7.apply(SparkConf.scala:490)
at org.apache.spark.SparkConf$$anonfun$validateSettings$7.apply(SparkConf.scala:478)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:478)
at org.apache.spark.SparkContext.(SparkContext.scala:398)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_1(SparkInterpreter.java:440)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:354)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:137)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:743)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Also I have mentioned the spark and java path in _zeppelin-env.sh_ file.
I know, this is quite a late answer, but for any one who faces similar issue.
Even I had similar issue, I had added some external dependency {com.twitter:algebird-core_2.11:0.11.0} in interpreter Mode for Spark. If any dependency added for any component, there might be a case that dependency is not compatible with Zeppelin Version or does not exist on remote. Once I removed that dependency, I started seeing other errors on Zeppelin console.
Just to make sure, "zeppelin.spark.printREPLOutput" is set to true, generally by default it is true only, but just to make sure!!

zeppelin interpreter error even after giving correct details

getting the below error
and i have given mysql settings in the interpreter:
com.mysql.jdbc.Driver
jdbc:mysql://:3306/
username and password
restarted interpreter and binded it, but still get the error
using commands: use and select commands
enter code herejava.lang.NullPointerException
at org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:201)
at org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:288)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Adding the jar to $ZEPPELIN_HOME/lib folder like user3921855 didn't work for me.
I got it working adding MYSQL connector to dependency section in the interpreter config (ex: Artifact : mysql:mysql-connector-java:5.1.38)
IMPORTANT :
You need to restart Zeppelin daemon for the interpreter to pick the new jar. Don't know why because it said it had restarted the sub-process. Might be a bug.
Stop / Start Reminder :
$ZEPPELIN_HOME/bin/zeppelin-daemon.sh stop
$ZEPPELIN_HOME/bin/zeppelin-daemon.sh start

How do you implement ExitOnOutOfMemoryError parameter on JRockit R28?

My WebLogic Servers use JRockit JVM R28. We need to have the WebLogic JVMs configured to automatically shutdown/kill/exit when an OutOfMemoryError occurs.
A JRockit JVM Parameter called "ExitOnOutOfMemory" will let us accomplish this. However Oracle documentation provides incorrect and conflicting information.
1.) http://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/jrdocs/refman/optionXX.html says to simply put "-XXexitOnOutOfMemory" into startup scripts. However, JRockit doesnt "recognize" this parameter.
2.) http://docs.oracle.com/cd/E15289_01/doc.40/e15062/optionxx.htm#BABCDAIB says to put "-XX:+ExitOnOutOfMemoryError" into startup scripts. However JRockit does not recognize this configuration either. I believe they mistakenly copied this from Hotspot documentation.
How do i implement this parameter?
-XX:+ExitOnOutOfMemoryError works as expected with JRockit R28.2.2:
$ jrockit-jdk1.6.0_29/bin/java -Xmx20m -XX:+ExitOnOutOfMemoryError OOM
java.lang.OutOfMemoryError: allocLargeObjectOrArray: [B, size 40976
at jrockit/vm/Allocator.allocLargeObjectOrArray(JIZ)Ljava/lang/Object;(Native Method)
at jrockit/vm/Allocator.allocObjectOrArray(Allocator.java:349)
at jrockit/vm/Allocator.allocArray(Allocator.java:257)
at OOM.<init>(OOM.java:3)
at OOM.main(OOM.java:9)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
[ERROR] Exit on OutOfMemory requested. Exiting.
JRockit aborted: Exit requested on OOM (51)
Which version of JRockit are you using? Did you spell the parameter correctly?