I am using a Cloudera quickstart VM 5.13.0.0 to run Spark applications in yarn-client mode. I have allocated 10GB and 3 cores to my Cloudera VM. When I submit the application, the application is ACCEPTED but never moves on to RUNNING. When I try to look for logs using yarn logs -applicationId I do not see anything. Its absolutely blank.
I have looked up this issue on:
here
here
here
here
here
here
here
I have practically meddled with all the configs that these links see a problem with. I still do not have an answer to my problem which on the face of it looks like the ones in the links above. Here are the config parameters of my cloudera cluster:
mapreduce.map.memory.mb 128M
mapreduce.reduce.memory.mb 128M
mapreduce.job.heap.memory-mb.ratio 0.8
yarn.nodemanager.resource.memory-mb 1900M
yarn.nodemanager.resource.percentage-physical-cpu-limit 100
yarn.nodemanager.resource.cpu-vcores 1
yarn.scheduler.minimum-allocation-mb 1M
yarn.scheduler.increment-allocation-mb 100M
yarn.scheduler.maximum-allocation-mb 1600M
yarn.scheduler.minimum-allocation-vcores 1
yarn.scheduler.increment-allocation-vcores 1
yarn.scheduler.maximum-allocation-vcores 2
yarn.scheduler.fair.continuous-scheduling-enabled unchecked
mapreduce.am.max-attempts 1
yarn.resourcemanager.am.max-retries, yarn.resourcemanager.am.max-attempts 1
yarn.app.mapreduce.am.resource.mb 1G
yarn.app.mapreduce.am.resource.cpu-vcores 1
ApplicationMaster Java Maximum Heap Size 512M
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.scheduler.fair.user-as-default-queue unchecked
yarn.scheduler.fair.preemption unchecked
yarn.scheduler.fair.preemption.cluster-utilization-threshold 0.8
yarn.scheduler.fair.sizebasedweight unchecked
Fair Scheduler Allocations (deployed) {"defaultFairSharePreemptionThreshold":null,"defaultFairSharePreemptionTimeout":null,"defaultMinSharePreemptionTimeout":null,"defaultQueueSchedulingPolicy":"drf","queueMaxAMShareDefault":-1.0,"queueMaxAppsDefault":null,"queuePlacementRules":[{"create":true,"name":"specified","queue":null,"rules":null},{"create":null,"name":"nestedUserQueue","queue":null,"rules":[{"create":true,"name":"default","queue":"users","rules":null}]},{"create":null,"name":"default","queue":null,"rules":null}],"queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"root","queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"default","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null},{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"users","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":"parent"}],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":null,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null}],"userMaxAppsDefault":1,"users":[]}
Here is what the queue description looks like when the application is still in ACCEPTED state:
Likewise, here is the record from the Yarn RM UI (Note that the resources are allocated (memory/cpu) and Running Containers shows 1 container running):
Here is the Application Summary:
Here are the application logs (empty):
And, lastly, here is what the driver sees:
enter code here19/12/26 00:16:42 INFO Client:
client token: N/A
diagnostics: Application application_1577297544619_0002 failed 1 times due to AM Container for appattempt_1577297544619_0002_000001 exited with exitCode: 10
For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1577297544619_0002/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1577297544619_0002_01_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.default
start time: 1577299469533
final status: FAILED
tracking URL: http://quickstart.cloudera:8088/cluster/app/application_1577297544619_0002
user: shepanch
19/12/26 00:16:42 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:165)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:512)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2511)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at cloudera.jobs.ClouderaSampleJob$.delayedEndpoint$cloudera$jobs$ClouderaSampleJob$1(ClouderaSampleJob.scala:17)
at cloudera.jobs.ClouderaSampleJob$delayedInit$body.apply(ClouderaSampleJob.scala:6)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at cloudera.jobs.ClouderaSampleJob$.main(ClouderaSampleJob.scala:6)
at cloudera.jobs.ClouderaSampleJob.main(ClouderaSampleJob.scala)
Is there anything that can be done to solve this issue?
After all the research and apart from the reasons mentioned in the links I have mentioned in the question, I found that this can happen due to various reasons:
when you have different versions of spark in the client (driver) and the cluster. Once you ensure that both bundle the same version of spark, it runs fine.
you might need to mention the property spark.driver.host. Make sure the IP passed in here can be pinged from the guest VM.
After a VM shutdown, GraphDB Workbench would not start.
I have installed GraphDB on a cloud-hosted VM. Incidentally, the machine was shut down without stopping GraphDB. When trying to start it again, the Workbench would not start and the following message is displayed in the error log.
[ERROR] 2019-06-19 12:12:00,299 [Thread-10 | c.o.t.s.i.PluginManager]
Problem shutting down literals-index java.lang.RuntimeException:
com.ontotext.trree.transactions.TransactionException: Failed to
created journal file: /home/peio/graphdb-se-8.10.1/data/repositor
ies/bgnews/storage/literals-index/numerics.index.precommit
As Damyan suggested, delete literals-index in the storage folder and it will be rebuilt on start-up.
I am running Spoon - Pentaho EE V 6.1 on my laptop (8 GB RAM) and allocated 4 G to Spoon. Still it takes 3 minutes and 30 seconds to start. I dont have any plugins and my plugins directory is empty. I have also tried this by closing all applications and processes but with no luck. Am I missing anything obvious?
Sep 22, 2017 10:15:13 AM org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions
INFO: Adding the extensions from bundle org.apache.cxf.cxf-rt-javascript (208) [org.apache.cxf.javascript.JavascriptServerListener]
Sep 22, 2017 10:15:13 AM org.pentaho.caching.impl.PentahoCacheManagerFactory$Reg
istrationHandler$1 onSuccess
INFO: New Caching Service registered
2017/09/22 10:15:27 - General - Logging plugin type found with ID: CheckpointLog Table
2017/09/22 10:18:00,201 ERROR [KarafLifecycleListener] Error in Blueprint Watcher org.pentaho.osgi.api.IKarafBlueprintWatcher$BlueprintWatcherException: Unknown error in KarafBlueprintWatcher at org.pentaho.osgi.impl.KarafBlueprintWatcherImpl.waitForBlueprint(KarafBlueprintWatcherImpl.java:103) at org.pentaho.di.osgi.KarafLifecycleListener$2.run(KarafLifecycleListen er.java:161) at java.lang.Thread.run(Unknown Source)
Caused by: org.pentaho.osgi.api.IKarafBlueprintWatcher$BlueprintWatcherException : Timed out waiting for blueprints to load: pdi-dataservice-server-plugin,pentaho-big-data-impl-shim-initializer,pentaho-big-data-impl-shim-hdfs,pentaho-big-data-impl-shim-hbase,pentaho-big-data-impl-shim-mapreduce,pentaho-big-data-impl-shim-pig,pentaho-big-data-impl-shim-oozie,pentaho-big-data-impl-shim-sqoop,pentaho-big-data-impl-vfs-hdfs,pentaho-big-data-kettle-plugins-common-named-cluster-bridge,pentaho-big-data-kettle-plugins-guiTestActionHandlers,pentaho-big-data-kettle-plugins-hdfs,pentaho-big-data-kettle-plugins-hbase,pentaho-big-data-kettle-plugins-mapreduce,pentaho-big-data-kettle-plugins-pig,pentaho-big-data-kettle-plugins-oozie,pentaho-big-data-kettle-plugins-sqoop,pentaho-hadoop-shims-mapr-osgi-jaas,pentaho-big-data-impl-clusterTests,pentaho-big-data-impl-shim-shimTests,pentaho-yarn-api,pentaho-yarn-impl-shim,pentaho-yarn-kettle-plugin,pentaho-metaverse-core,pentaho-metaverse-web,pentaho-requirejs-osgi-manager,pentaho-angular-bundle,common-ui-6.1.0.1,pentaho-marketplace-di at org.pentaho.osgi.impl.KarafBlueprintWatcherImpl.waitForBlueprint(KarafBlueprintWatcherImpl.java:88)
... 2 more
2017/09/22 10:18:42 - General - Starting agile-bi
2017/09/22 10:18:43 - class org.pentaho.agilebi.platform.JettyServer - WebServer.Log.CreateListener localhost:10001
Finally found the answer. Its a known issue and fixed under Version 7
There seems to be a timing issue in the Karaf Blueprint Watcher where sometimes it will ask a bundle for its blueprint file before the bundle is in RESOLVED state. This triggers a (usually) parallel resolve that causes the bundle to be destroyed immediately after creation.
http://jira.pentaho.com/browse/PDI-15488
http://jira.pentaho.com/browse/PDI-14698
http://jira.pentaho.com/browse/PDI-15574
I connected hive, and when I try to show all databases using command below, I get the following error,:
techgene#slaveone:~/apps/hive-0.12.0$ hive
Logging initialized using configuration in jar:file:/home/techgene/apps/hive-0.12.0/lib/hive-common-0.12.0.jar!/hive-log4j.properties
hive> show databases;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
Can you please provide a solution for this?
This problem usually occurs when hive CLI session is improperly ended. In such case, kill the improperly closed hive CLI session as follows. After this launch hive CLI fresh.
ramisetty#aspire:~$ jps
3710 SecondaryNameNode
4103 RunJar -------------------------> hive CLI instance.
4019 TaskTracker
3467 DataNode
3242 NameNode
4366 Jps
3788 JobTracker
ramisetty#aspire:~$ kill -9 4103
ramisetty#aspire:~$
still problem persists means follow the available solutions # FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
When i try to start node agent in my glassfish app server via putty i got the following warning
Apr 25, 2014 5:03:03 AM com.sun.enterprise.admin.server.core.channel.RMIClient warn
WARNING: channel.client_init_error
Apr 25, 2014 5:03:03 AM com.sun.enterprise.admin.server.core.channel.RMIClient warn
WARNING: channel.client_init_error
and finally "CLI137 Command start-node-agent failed." a timeout.
The log file details are
2014-04-25T05:03:04.388-0500|WARNING|sun-appserver2.1|javax.enterprise.system.tools.admin|_ThreadID=10;_ThreadName=main;|ADM5801:Admin server channel crea
tion failed.|#]
[#|2014-04-25T05:03:04.396-0500|SEVERE|sun-appserver2.1|javax.ee.enterprise.system.nodeagent|_ThreadID=10;_ThreadName=main;|NAGT0014:Unexpected Node Agent ex
ception.
com.sun.appserv.server.ServerLifecycleException: java.lang.RuntimeException: Unable to save stub to /opt/vendor/sunone/SDK/nodeagents/ACSNA-TEST/agent/config
/admch
at com.sun.enterprise.admin.server.core.channel.AdminChannel.createRMIChannel(AdminChannel.java:111)
at com.sun.enterprise.ee.nodeagent.NodeAgentMain.startup(NodeAgentMain.java:204)
at com.sun.enterprise.ee.nodeagent.NodeAgentMain.main(NodeAgentMain.java:396)
Caused by: java.lang.RuntimeException: Unable to save stub to /opt/vendor/sunone/SDK/nodeagents/ACSNA-TEST/agent/config/admch
at com.sun.enterprise.admin.server.core.channel.AdminChannel.saveStubToFile(AdminChannel.java:354)
at com.sun.enterprise.admin.server.core.channel.AdminChannel.createRMIChannel(AdminChannel.java:107)
... 2 more
Caused by: java.io.FileNotFoundException: /opt/vendor/sunone/SDK/nodeagents/ACSNA-TEST/agent/config/admch (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
at com.sun.enterprise.admin.server.core.channel.AdminChannel.saveStubToFile(AdminChannel.java:348)
... 3 more
I am unable to figure out what excatly the issue is. I have got my permissions right. Please provide some inputs on this issue.
There IS obviously a permission problem. Make sure that the user your GF runs with has the permission to create a file in this path. Try to touch a file in this path in a shell.
Check that every directory in the path /opt/vendor/sunone/SDK/nodeagents/ACSNA-TEST/agent/config/admch has the execute right set for the GF user or its group (chmod +x).