Apache Storm: Supervisor kills and restarts worker process - jvm

Using Storm 1.2.2
Kafka 1.1.0
After submitting topology, supervisor launches a worker processes. When checking the worker.log file for that launched Worker Process, it was found out that, somewhere between loading of all the executors, worker process gets killed by supervisor.
Following are the supervisor logs,
{"#timestamp":"2020-01-09 11:18:57,719","message":"SLOT 6700: Assignment Changed from LocalAssignment(topology_id:trident-Topology-578320979, executors:[ExecutorInfo(task_start:22, task_end:22), ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:42, task_end:42), ExecutorInfo(task_start:18, task_end:18), ExecutorInfo(task_start:10, task_end:10), ExecutorInfo(task_start:14, task_end:14), ExecutorInfo(task_start:6, task_end:6), ExecutorInfo(task_start:38, task_end:38), ExecutorInfo(task_start:30, task_end:30), ExecutorInfo(task_start:34, task_end:34), ExecutorInfo(task_start:50, task_end:50), ExecutorInfo(task_start:46, task_end:46), ExecutorInfo(task_start:26, task_end:26), ExecutorInfo(task_start:39, task_end:39), ExecutorInfo(task_start:47, task_end:47), ExecutorInfo(task_start:7, task_end:7), ExecutorInfo(task_start:51, task_end:51), ExecutorInfo(task_start:3, task_end:3), ExecutorInfo(task_start:35, task_end:35), ExecutorInfo(task_start:31, task_end:31), ExecutorInfo(task_start:27, task_end:27), ExecutorInfo(task_start:43, task_end:43), ExecutorInfo(task_start:23, task_end:23), ExecutorInfo(task_start:11, task_end:11), ExecutorInfo(task_start:19, task_end:19), ExecutorInfo(task_start:15, task_end:15), ExecutorInfo(task_start:24, task_end:24), ExecutorInfo(task_start:12, task_end:12), ExecutorInfo(task_start:8, task_end:8), ExecutorInfo(task_start:4, task_end:4), ExecutorInfo(task_start:32, task_end:32), ExecutorInfo(task_start:40, task_end:40), ExecutorInfo(task_start:36, task_end:36), ExecutorInfo(task_start:28, task_end:28), ExecutorInfo(task_start:20, task_end:20), ExecutorInfo(task_start:16, task_end:16), ExecutorInfo(task_start:48, task_end:48), ExecutorInfo(task_start:44, task_end:44), ExecutorInfo(task_start:21, task_end:21), ExecutorInfo(task_start:33, task_end:33), ExecutorInfo(task_start:41, task_end:41), ExecutorInfo(task_start:37, task_end:37), ExecutorInfo(task_start:1, task_end:1), ExecutorInfo(task_start:9, task_end:9), ExecutorInfo(task_start:13, task_end:13), ExecutorInfo(task_start:17, task_end:17), ExecutorInfo(task_start:5, task_end:5), ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:25, task_end:25), ExecutorInfo(task_start:45, task_end:45), ExecutorInfo(task_start:49, task_end:49)], resources:WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0), owner:root) to null","thread_name":"SLOT_6700","level":"WARN"}
{"#timestamp":"2020-01-09 11:18:57,724","message":"Killing 29a1f333-55f1-45c2-988d-daf0712c2862:5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:00,808","message":"STATE RUNNING msInState: 120187 topo:trident-Topology-578320979 worker:5e19382e-c3e5-4c8d-8706-185e00e658a8 -> KILL msInState: 0 topo:trident-Topology-578320979 worker:5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:00,809","message":"GET worker-user for 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:00,828","message":"SLOT 6700 force kill and wait...","thread_name":"SLOT_6700","level":"WARN"}
{"#timestamp":"2020-01-09 11:19:00,831","message":"Force Killing 29a1f333-55f1-45c2-988d-daf0712c2862:5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:01,432","message":"Worker Process 5e19382e-c3e5-4c8d-8706-185e00e658a8 exited with code: 137","thread_name":"Thread-30","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,851","message":"GET worker-user for 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,858","message":"SLOT 6700 all processes are dead...","thread_name":"SLOT_6700","level":"WARN"}
{"#timestamp":"2020-01-09 11:19:03,859","message":"Cleaning up 29a1f333-55f1-45c2-988d-daf0712c2862:5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,859","message":"GET worker-user for 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,859","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8/pids/3100","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,860","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8/heartbeats","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,871","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8/pids","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,872","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8/tmp","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,872","message":"Deleting path /data/workers/5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,873","message":"REMOVE worker-user 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,874","message":"Deleting path /data/workers-users/5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,876","message":"Removed Worker ID 5e19382e-c3e5-4c8d-8706-185e00e658a8","thread_name":"SLOT_6700","level":"INFO"}
{"#timestamp":"2020-01-09 11:19:03,876","message":"STATE KILL msInState: 3068 topo:trident-Topology-578320979 worker:null -> EMPTY msInState: 0","thread_name":"SLOT_6700","level":"INFO"}
After this worker with id 5e19382e-c3e5-4c8d-8706-185e00e658a8 was killed, a new worker process was launched by supervisor with different Id and loading of executors starts again and then after some of the executors have done loading, the worker process will receive a kill signal from supervisor.
Following are the worker logs at port 6700,
...
2020-01-09 14:42:19.455 o.a.s.d.executor main [INFO] Loading executor b-14:[10 10]
2020-01-09 14:42:20.942 o.a.s.d.executor main [INFO] Loaded executor tasks b-14:[10 10]
2020-01-09 14:42:20.945 o.a.s.d.executor main [INFO] Finished loading executor b-14[10 10]
2020-01-09 14:42:20.962 o.a.s.d.executor main [INFO] Loading executor b-39:[37 37]
2020-01-09 14:42:22.547 o.a.s.d.executor main [INFO] Loaded executor tasks b-39:[37 37]
2020-01-09 14:42:22.549 o.a.s.d.executor main [INFO] Finished loading executor b-39:[37 37]
2020-01-09 14:42:22.566 o.a.s.d.executor main [INFO] Loading executor b-5:[46 46]
2020-01-09 14:42:25.267 o.a.s.d.executor main [INFO] Loaded executor tasks b-5:[46 46]
2020-01-09 14:42:25.269 o.a.s.d.executor main [INFO] Finished loading executor b-5:[46 46]
2020-01-09 14:42:31.175 o.a.s.d.executor main [INFO] Loading executor b-0:[4 4]
2020-01-09 14:42:37.512 o.s.c.n.e.InstanceInfoFactory Thread-10 [INFO] Setting initial instance status as: STARTING
2020-01-09 14:42:37.637 o.s.s.c.ThreadPoolTaskScheduler [Ljava.lang.String;#174cb0d8.container-0-C-1 [INFO] Shutting down ExecutorService
2020-01-09 14:42:37.851 o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer [Ljava.lang.String;#174cb0d8.container-0-C-1 [INFO] Consumer stopped
2020-01-09 14:42:37.855 o.s.i.k.i.KafkaMessageDrivenChannelAdapter Thread-10 [INFO] stopped org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter#2459333a
2020-01-09 14:42:37.870 o.s.s.c.ThreadPoolTaskScheduler [Ljava.lang.String;#6e355249.container-0-C-1 [INFO] Shutting down ExecutorService
2020-01-09 14:42:38.054 o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer [Ljava.lang.String;#6e355249.container-0-C-1 [INFO] Consumer stopped
After this, it will again start with 'Launching worker for trident-Topology-578320979 ...' and loading all the executors and tasks.
Can anyone please explain what does "Worker Process 5e19382e-c3e5-4c8d-8706-185e00e658a8 exited with code: 137" mean?
Following link [https://issues.apache.org/jira/browse/STORM-2176], explains that the configuration property supervisor.worker.shutdown.sleep.secs, which is set by default to 1 second. This corresponds to how long the supervisor will wait for a worker to exit gracefully before forcibly killing it with kill -9. When this happens the supervisor will log that the worker terminated with exit code 137 (128 + 9).
Would it help increasing the value of supervisor.worker.shutdown.sleep.secs?
Or Can it be because JVM doesn't have enough Memory? But then, It should throw Exception in thread "main" java.lang.OutOfMemoryError: Java heap space, whereas in any of the logs, there is no such exception visible.
Is it recommended to try by increasing the JVM memory using the configuration settings ('worker.childopts') in storm.yaml.
Any help would be greatly appreciated.
P.S. Trying to find out solution since few days but no success.

Related

celery==5.2.7 rabbitmq:3.11.4 celery stop unexpectedly if apply_async is called with eta set

celery==5.2.7
rabbitmq:3.11.4
Python 3.9.16
i am upgrading the 3PP software for my project, from celery4.x.x to celery 5.2.7
celery stop unexpectedly if apply_async is called with eta set
#app.task(bind=True)
def healthcheck_exec(self,**parm):
......
healthcheck_exec.apply_async(None, parm,eta=next_planed_exec_datetime)
i turn on the debug log, no error detail in the log, only bellows
[2023-02-03 05:45:07,941: DEBUG/MainProcess] Canceling task consumer...
[2023-02-03 05:45:09,545: DEBUG/MainProcess] Canceling task consumer...
[2023-02-03 05:45:09,545: DEBUG/MainProcess] Closing consumer channel...
[2023-02-03 05:45:09,550: DEBUG/MainProcess] removing tasks from inqueue until task handler finished
i suspect there is a compatibility issue between rabbitmq and celery. no sure anybody met the issue?

Automation test report issues with JENKINS/MAVEN/ECLIPSE

I am facing issues while generating reports after successful build using maven in jenkins.
First, I have created java code into my local machine using eclipse.
Second, After creating the code I converted it to the Maven and completed all required setup of pom.xml file inside jenkins using Maven.
Third, I ran the script it gives me the success response.
Below, it is my maven success build result logs.
Started by user jenkin
Building in workspace /var/lib/jenkins/workspace/TestProject
Parsing POMs
Modules changed, recalculating dependency graph
Established TCP socket on 35275
[TestProject01] $ java -cp /var/lib/jenkins/plugins/maven-plugin/WEB-INF/lib/maven35-agent-1.12.jar:/var/lib/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven1/boot/plexus-classworlds-2.5.2.jar:/var/lib/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven1/conf/logging jenkins.maven3.agent.Maven35Main /var/lib/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven1 /var/cache/jenkins/war/WEB-INF/lib/remoting-3.27.jar /var/lib/jenkins/plugins/maven-plugin/WEB-INF/lib/maven35-interceptor-1.12.jar /var/lib/jenkins/plugins/maven-plugin/WEB-INF/lib/maven3-interceptor-commons-1.12.jar 35275
<===[JENKINS REMOTING CAPACITY]===>channel started
Executing Maven: -B -f /home/user/Documents/eclipse-workspace/TestProject01/pom.xml clean install
[INFO] Scanning for projects...
[HUDSON] Collecting dependencies info
[INFO]
[INFO] ------------------< TestProject01:TestProject01 >-------------------
[INFO] Building TestProject01 0.0.1-SNAPSHOT
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) # TestProject01 ---
[INFO]
[INFO] --- maven-install-plugin:2.4:install (default-install) # TestProject01 ---
[INFO] Installing /home/user/Documents/eclipse-workspace/TestProject01/pom.xml to /var/lib/jenkins/.m2/repository/TestProject01/TestProject01/0.0.1-SNAPSHOT/TestProject01-0.0.1-SNAPSHOT.pom
[WARNING] Attempt to (de-)serialize anonymous class org.jfrog.hudson.maven2.MavenDependenciesRecorder$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
[INFO]
[INFO] BUILD SUCCESS
[INFO] Total time: 2.821 s
[INFO] Finished at: 2019-02-05T19:09:16+05:30
[INFO] ------------------------------------------------------------------------
Waiting for Jenkins to finish collecting data
[JENKINS] Archiving /home/user/Documents/eclipse-workspace/TestProject01/pom.xml to TestProject01/TestProject01/0.0.1-SNAPSHOT/TestProject01-0.0.1-SNAPSHOT.pom
/home/user/Documents/eclipse-workspace/TestProject01/pom.xml is not inside /var/lib/jenkins/workspace/TestProject/home/user/Documents/eclipse-workspace/TestProject01/; will archive in a separate pass
channel stopped
[htmlpublisher] Archiving HTML reports...
[htmlpublisher] Archiving at PROJECT level /home/user/Documents/eclipse-workspace/TestProject01/test-output to /var/lib/jenkins/jobs/TestProject/htmlreports/HTML_20Report
Finished: SUCCESS
Now, The issue is I do not know how can I check if all testcases covered with success. Here, HTML report been generated but it stays the same after every build success which creating lot of confusion.
Please help!!!!!!!!

How to save each partition of a Dataframe/Dataset in parallel with partitionBy or InsertInto Hive

I currently use spark 2.0.1 and i try to save my dataset into a "partitioned table Hive" with insertInto() or on S3 storage with partitionBy("col") with job in concurrency (parallel). But with this 2 methods each partition of my dataset is save sequentially one by one . It's very very SLOW.
I already know that I must use insertInto() or partitionBy() one at time.
I assume that in spark.2.0.1 Dataframe are Resilient Data Set .
My current code :
df.write.mode(SaveMode.Append).partitionBy("col").save("s3://bucket/diroutput")
Or
df.write.mode(SaveMode.Append).insertInto("TableHivealreadypartitioned")
So I try some stuff with df.foreachPartition like this :
df.foreachPartition{datasetpartition => datasetpartition.foreach(row => row.sometransformation)}
You will find below an extract logs .In the first example it is the "InserInto(tablehivealreadypartitionned)" in hive.
We can see that all "partitions" Spark are written one by one.
In the second example it is the "partitionBy().save()" that write directly to S3.
We can see also that all "partitions" spark are written one by one.
The dataframe we handle only has one "partition" and the size of it is about 200MB uncompressed (in memory).
The Job can Take 120s 170s to save the Data with the option local[4] .
[INFO] 2016-11-03 00:10:33,255 org.apache.spark.SparkContext logInfo - Created broadcast 2330 from broadcast at TorExitLookup.scala:43
[INFO] 2016-11-03 00:10:35,302 org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer logInfo - Sorting complete. Writing out partition files one at a time.
[INFO] 2016-11-03 00:10:35,363 com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream close - close closed:false s3://BUCKETS3/db/.hive-staging_hive_2016-11-03_00-10-29_426_1749488585639143697-1/-ext-10000/tsbucket=2016-11-02 09%3A00%3A00/part-00001
[INFO] 2016-11-03 00:10:35,380 org.apache.spark.mapred.SparkHadoopMapRedUtil logInfo - No need to commit output of task because needsTaskCommit=false: attempt_201611030010_0948_m_000001_0
[INFO] 2016-11-03 00:10:35,380 org.apache.spark.executor.Executor logInfo - Finished task 1.0 in stage 948.0 (TID 1385). 2652 bytes result sent to driver
[INFO] 2016-11-03 00:10:35,381 org.apache.spark.scheduler.TaskSetManager logInfo - Finished task 1.0 in stage 948.0 (TID 1385) in 5718 ms on localhost (1/2)
[INFO] 2016-11-03 00:11:23,033 org.apache.spark.storage.BlockManagerInfo logInfo - Removed broadcast_2330_piece0 on 10.0.193.149:34016 in memory (size: 6.9 KB, free: 414.4 MB)
[INFO] 2016-11-03 00:11:58,194 org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer logInfo - Sorting complete. Writing out partition files one at a time.
[INFO] 2016-11-03 00:12:00,210 org.apache.spark.storage.BlockManagerInfo logInfo - Removed broadcast_2329_piece0 on 10.0.193.149:34016 in memory (size: 6.9 KB, free: 414.4 MB)
[INFO] 2016-11-03 00:12:05,295 com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream close - close closed:false s3://BUCKETS3/db/.hive-staging_hive_2016-11-03_00-10-29_426_1749488585639143697-1/-ext-10000/tsbucket=2016-11-02 09%3A00%3A00/part-00000
[INFO] 2016-11-03 00:12:05,831 org.apache.spark.mapred.SparkHadoopMapRedUtil logInfo - No need to commit output of task because needsTaskCommit=false: attempt_201611030010_0948_m_000000_0
[INFO] 2016-11-03 00:12:05,835 org.apache.spark.executor.Executor logInfo - Finished task 0.0 in stage 948.0 (TID 1384). 2652 bytes result sent to driver
[INFO] 2016-11-03 00:12:05,835 org.apache.spark.scheduler.TaskSetManager logInfo - Finished task 0.0 in stage 948.0 (TID 1384) in 96173 ms on localhost (2/2)
[INFO] 2016-11-03 00:12:05,835 org.apache.spark.scheduler.DAGScheduler logInfo - ResultStage 948 (insertInto at ImportHive.scala:24) finished in 96,173 s
[INFO] 2016-11-03 00:12:05,835 org.apache.spark.scheduler.TaskSchedulerImpl logInfo - Removed TaskSet 948.0, whose tasks have all completed, from pool
[INFO] 2016-11-03 00:12:05,836 org.apache.spark.scheduler.DAGScheduler logInfo - Job 948 finished: insertInto at ImportHive.scala:24, took 96,188035 s
[INFO] 2016-11-03 00:12:17,171 org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer logInfo - Sorting complete. Writing out partition files one at a time.
[INFO] 2016-11-03 00:12:17,296 com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream close - close closed:false s3://BUCKETS3/rescue/tsbucket=2016-11-02 09%3A00%3A00/part-r-00001-f433a41e-1b59-49af-b232-cf701e0c6df9.zlib.orc
[INFO] 2016-11-03 00:12:17,388 org.apache.spark.mapred.SparkHadoopMapRedUtil logInfo - No need to commit output of task because needsTaskCommit=false: attempt_201611030012_0949_m_000001_0
[INFO] 2016-11-03 00:12:17,388 org.apache.spark.executor.Executor logInfo - Finished task 1.0 in stage 949.0 (TID 1387). 2652 bytes result sent to driver
[INFO] 2016-11-03 00:12:17,389 org.apache.spark.scheduler.TaskSetManager logInfo - Finished task 1.0 in stage 949.0 (TID 1387) in 6892 ms on localhost (1/2)
[INFO] 2016-11-03 00:12:57,467 org.apache.spark.storage.BlockManagerInfo logInfo - Removed broadcast_2333_piece0 on 10.0.193.149:34016 in memory (size: 6.9 KB, free: 414.4 MB)
[INFO] 2016-11-03 00:13:36,195 org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer logInfo - Sorting complete. Writing out partition files one at a time.
[INFO] 2016-11-03 00:13:43,689 com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream close - close closed:false s3://BUCKETS3/rescue/tsbucket=2016-11-02 09%3A00%3A00/part-r-00000-f433a41e-1b59-49af-b232-cf701e0c6df9.zlib.orc
[INFO] 2016-11-03 00:13:44,258 org.apache.spark.mapred.SparkHadoopMapRedUtil logInfo - No need to commit output of task because needsTaskCommit=false: attempt_201611030012_0949_m_000000_0
[INFO] 2016-11-03 00:13:44,259 org.apache.spark.executor.Executor logInfo - Finished task 0.0 in stage 949.0 (TID 1386). 2652 bytes result sent to driver
[INFO] 2016-11-03 00:13:44,259 org.apache.spark.scheduler.TaskSetManager logInfo - Finished task 0.0 in stage 949.0 (TID 1386) in 93762 ms on localhost (2/2)
[INFO] 2016-11-03 00:13:44,259 org.apache.spark.scheduler.DAGScheduler logInfo - ResultStage 949 (save at ImportHive.scala:30) finished in 93,762 s
[INFO] 2016-11-03 00:13:44,259 org.apache.spark.scheduler.TaskSchedulerImpl logInfo - Removed TaskSet 949.0, whose tasks have all completed, from pool
[INFO] 2016-11-03 00:13:44,259 org.apache.spark.scheduler.DAGScheduler logInfo - Job 949 finished: save at ImportHive.scala:30, took 93,772483 s
[INFO] 2016-11-03 00:13:44,260 org.apache.hadoop.mapreduce.lib.output.DirectFileOutputCommitter cleanupJob - Nothing to clean up since no temporary files were written.
[INFO] 2016-11-03 00:13:44,260 com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream close - close closed:false s3://BUCKETS3/rescue/_SUCCESS
[INFO] 2016-11-03 00:13:44,275 org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer logInfo - Job job_201611030012_0000 committed.
Unfortunately i still do not find a way to write/save in parallel each spark partition of my dataset.
Someone already done this?
Can you tell me how to proceed?
Is it a wrong direction?
thanks for your help
The dataframe we handle only has one "partition" and the size of it is about 200MB uncompressed (in memory)
This is your problem.. spark distribute work between the executors based on partitions.
In order to work in parallel you need your df to have multiple partitions.
you can do this by using:
df.repartition(number)
also make sure you are using:
hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version","2")
When writing to s3.

Cloudbees Deploy failure due to NullPointerException in RunTargetImpl.getClickStackConfigMap

I've juste created an new app in cloudbees with clickStart "Jetty 9 Embedded App" and when Jenkins build the generated project(without any change) I get this error.
Can you help me to find the problem origin ?
thanks.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 34.919s
[INFO] Finished at: Fri Sep 27 06:49:44 EDT 2013
[INFO] Final Memory: 16M/40M
[INFO] ------------------------------------------------------------------------
[JENKINS] Archiving /scratch/jenkins/workspace/code-elevator-2/pom.xml to /home/flagadajones/hudson_home/jobs/code-elevator-2/modules/localdomain.localhost$jetty9-embedded-clickstart/builds/2013-09-27_10-48-58/archive/localdomain.localhost/jetty9-embedded-clickstart/0.1-SNAPSHOT/jetty9-embedded-clickstart-0.1-SNAPSHOT.pom
[JENKINS] Archiving /scratch/jenkins/workspace/code-elevator-2/target/jetty9-embedded-clickstart-0.1-SNAPSHOT.jar to /home/flagadajones/hudson_home/jobs/code-elevator-2/modules/localdomain.localhost$jetty9-embedded-clickstart/builds/2013-09-27_10-48-58/archive/localdomain.localhost/jetty9-embedded-clickstart/0.1-SNAPSHOT/jetty9-embedded-clickstart-0.1-SNAPSHOT.jar
[JENKINS] Archiving /scratch/jenkins/workspace/code-elevator-2/target/jetty9-embedded-clickstart-0.1-SNAPSHOT-jar-with-dependencies.jar to /home/flagadajones/hudson_home/jobs/code-elevator-2/modules/localdomain.localhost$jetty9-embedded-clickstart/builds/2013-09-27_10-48-58/archive/localdomain.localhost/jetty9-embedded-clickstart/0.1-SNAPSHOT/jetty9-embedded-clickstart-0.1-SNAPSHOT-jar-with-dependencies.jar
Waiting for Jenkins to finish collecting data
channel stopped
[cloudbees-deployer] Deploying as (jenkins) to the flagadajones account
[cloudbees-deployer] Deploying code-elevator-2
[cloudbees-deployer] Resolved from archived artifacts as /home/flagadajones/hudson_home/jobs/code-elevator-2/modules/localdomain.localhost$jetty9-embedded-clickstart/builds/2013-09-27_10-48-58/archive/localdomain.localhost/jetty9-embedded-clickstart/0.1-SNAPSHOT/jetty9-embedded-clickstart-0.1-SNAPSHOT-jar-with-dependencies.jar
com.cloudbees.plugins.deployer.exceptions.DeployException
at com.cloudbees.plugins.deployer.impl.run.RunEngineImpl.newDeployActor(RunEngineImpl.java:150)
at com.cloudbees.plugins.deployer.impl.run.RunEngineImpl.newDeployActor(RunEngineImpl.java:52)
at com.cloudbees.plugins.deployer.engines.Engine.process(Engine.java:173)
at com.cloudbees.plugins.deployer.engines.Engine.perform(Engine.java:112)
at com.cloudbees.plugins.deployer.DeployPublisher.perform(DeployPublisher.java:103)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:812)
at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:784)
at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.post2(MavenModuleSetBuild.java:957)
at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:734)
at hudson.model.Run.execute(Run.java:1600)
at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:485)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:237)
Caused by: java.lang.NullPointerException
at com.cloudbees.plugins.deployer.impl.run.RunTargetImpl.getClickStackConfigMap(RunTargetImpl.java:256)
at com.cloudbees.plugins.deployer.impl.run.RunEngineImpl$DeployFileCallable.<init>(RunEngineImpl.java:297)
at com.cloudbees.plugins.deployer.impl.run.RunEngineImpl.newDeployActor(RunEngineImpl.java:141)
... 13 more
Build step 'Deploy applications' marked build as failure
Finished: FAILURE
The clickstart generates an invalid configuration for the deployer plugin. Remove the target from the deploy publisher and add it back again. The issue is that the Maven artifact selector is not selected and the clickstart has fed incorrect configuration for the target, which means that you can't just select the maven artifact selector, instead you need to remove the target and add it back again...
Nuts I know, I shall be chastising the clickstart maintainers on Monday rest assured ;-)

Tomcat 7 getting CreateJavaVM Failed Error

I have an Apache tomcat 7.0.5 Server on a Windows R2 Server and I've deployed a webApp ( java JSF) on it, everything was working fine, but it suddenly stopped and won't run again. when I try to run it, it tells me "Error 1067: The Process terminated unexpectedly" and In the logs I get these files:
tomcat7-stdout.2012-08-08.log
2012-08-08 18:00:06 Commons Daemon procrun stdout initialized
commons-daemon.2012-08-08.log
[2012-08-08 19:02:30] [info] Commons Daemon procrun finished
[2012-08-08 19:13:02] [info] Commons Daemon procrun (1.0.4.0 64-bit) started
[2012-08-08 19:13:02] [info] Running 'Tomcat7' Service...
[2012-08-08 19:13:02] [info] Starting service...
[2012-08-08 19:13:02] [error] CreateJavaVM Failed
[2012-08-08 19:13:03] [info] Service started in 1000 ms.
[2012-08-08 19:13:03] [info] Run service finished.
[2012-08-08 19:13:03] [info] Commons Daemon procrun finished
tomcat7-stderr.2012-08-08.log
2012-08-08 18:00:06 Commons Daemon procrun stderr initialized
Please use CMSClassUnloadingEnabled in place of CMSPermGenSweepingEnabled in the future
Unrecognized VM option '+HeapDumpOnOutOfMemoryError '
since I was having some 'perm gen' memory error I added some stuff to my apache Tomcat properties, folowing this link how to handle Perm Gen
so my Java Options look like this:
-Dcatalina.home=C:\Program Files\Apache Software Foundation\Tomcat 7.0
-Dcatalina.base=C:\Program Files\Apache Software Foundation\Tomcat 7.0
-Djava.endorsed.dirs=C:\Program Files\Apache Software Foundation\Tomcat 7.0\endorsed
-Djava.io.tmpdir=C:\Program Files\Apache Software Foundation\Tomcat 7.0\temp
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=C:\Program Files\Apache Software Foundation\Tomcat 7.0\conf\logging.properties
-XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenSweepingEnabled
-XX:PermSize=256m
-XX:MaxPermSize=256m
-XX:+HeapDumpOnOutOfMemoryError
Any ideas as to why the server won't start anymore? THANKS!
~Myy
After spending hours trying to figure this out, I finally found it. I had an extra space at the end of VM command.
'+HeapDumpOnOutOfMemoryError '
which was giving me the Unrecognized VM option.....
Thanks for reading, hopefully when you get this error, you can take extra precautions to verify the syntax.
to manage the memory you need to set some Xms and Xmx parameter
Xms manages the initial heap size
Xmx manages the maximum heap size
Thus try to add :
-Xms=256m -Xmx=256m
regards
The error is on option "HeapDumpOnOutOfMemoryError" which is not even listed in your post. Can you post a screen print of the "Java" tab from the "tomcat" service screen (similar to the link you have provided) ? This will tell us the actual options you are using and also the memory you have set.
copy the file msvcr71.dll from the bin dir of your java installation, to the bin dir of the tomcat installation.