yarn application accepted but not running cloudera despite resource allocation - hadoop-yarn

I am using a Cloudera quickstart VM 5.13.0.0 to run Spark applications in yarn-client mode. I have allocated 10GB and 3 cores to my Cloudera VM. When I submit the application, the application is ACCEPTED but never moves on to RUNNING. When I try to look for logs using yarn logs -applicationId I do not see anything. Its absolutely blank.
I have looked up this issue on:
here
here
here
here
here
here
here
I have practically meddled with all the configs that these links see a problem with. I still do not have an answer to my problem which on the face of it looks like the ones in the links above. Here are the config parameters of my cloudera cluster:
mapreduce.map.memory.mb 128M
mapreduce.reduce.memory.mb 128M
mapreduce.job.heap.memory-mb.ratio 0.8
yarn.nodemanager.resource.memory-mb 1900M
yarn.nodemanager.resource.percentage-physical-cpu-limit 100
yarn.nodemanager.resource.cpu-vcores 1
yarn.scheduler.minimum-allocation-mb 1M
yarn.scheduler.increment-allocation-mb 100M
yarn.scheduler.maximum-allocation-mb 1600M
yarn.scheduler.minimum-allocation-vcores 1
yarn.scheduler.increment-allocation-vcores 1
yarn.scheduler.maximum-allocation-vcores 2
yarn.scheduler.fair.continuous-scheduling-enabled unchecked
mapreduce.am.max-attempts 1
yarn.resourcemanager.am.max-retries, yarn.resourcemanager.am.max-attempts 1
yarn.app.mapreduce.am.resource.mb 1G
yarn.app.mapreduce.am.resource.cpu-vcores 1
ApplicationMaster Java Maximum Heap Size 512M
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.scheduler.fair.user-as-default-queue unchecked
yarn.scheduler.fair.preemption unchecked
yarn.scheduler.fair.preemption.cluster-utilization-threshold 0.8
yarn.scheduler.fair.sizebasedweight unchecked
Fair Scheduler Allocations (deployed) {"defaultFairSharePreemptionThreshold":null,"defaultFairSharePreemptionTimeout":null,"defaultMinSharePreemptionTimeout":null,"defaultQueueSchedulingPolicy":"drf","queueMaxAMShareDefault":-1.0,"queueMaxAppsDefault":null,"queuePlacementRules":[{"create":true,"name":"specified","queue":null,"rules":null},{"create":null,"name":"nestedUserQueue","queue":null,"rules":[{"create":true,"name":"default","queue":"users","rules":null}]},{"create":null,"name":"default","queue":null,"rules":null}],"queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"root","queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"default","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null},{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"users","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":"parent"}],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":null,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null}],"userMaxAppsDefault":1,"users":[]}
Here is what the queue description looks like when the application is still in ACCEPTED state:
Likewise, here is the record from the Yarn RM UI (Note that the resources are allocated (memory/cpu) and Running Containers shows 1 container running):
Here is the Application Summary:
Here are the application logs (empty):
And, lastly, here is what the driver sees:
enter code here19/12/26 00:16:42 INFO Client:
client token: N/A
diagnostics: Application application_1577297544619_0002 failed 1 times due to AM Container for appattempt_1577297544619_0002_000001 exited with exitCode: 10
For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1577297544619_0002/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1577297544619_0002_01_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.default
start time: 1577299469533
final status: FAILED
tracking URL: http://quickstart.cloudera:8088/cluster/app/application_1577297544619_0002
user: shepanch
19/12/26 00:16:42 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:165)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:512)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2511)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at cloudera.jobs.ClouderaSampleJob$.delayedEndpoint$cloudera$jobs$ClouderaSampleJob$1(ClouderaSampleJob.scala:17)
at cloudera.jobs.ClouderaSampleJob$delayedInit$body.apply(ClouderaSampleJob.scala:6)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at cloudera.jobs.ClouderaSampleJob$.main(ClouderaSampleJob.scala:6)
at cloudera.jobs.ClouderaSampleJob.main(ClouderaSampleJob.scala)
Is there anything that can be done to solve this issue?

After all the research and apart from the reasons mentioned in the links I have mentioned in the question, I found that this can happen due to various reasons:
when you have different versions of spark in the client (driver) and the cluster. Once you ensure that both bundle the same version of spark, it runs fine.
you might need to mention the property spark.driver.host. Make sure the IP passed in here can be pinged from the guest VM.

Related

Plain vanilla Apache Ignite cluster fails setting state back to ACTIVE

I've got a plain vanilla install of ignite 2.14, with the binaries downloaded from https://ignite.apache.org/download.cgi (exact link https://dlcdn.apache.org/ignite/2.14.0/apache-ignite-2.14.0-bin.zip). I'm on Windows 10, IGNITE_HOME is not set in the PATH (this is optional), and Ignite's using this java runtime:
OpenJDK Runtime Environment 1.8.0_201-2-redhat-b09 Oracle Corporation
OpenJDK 64-Bit Server VM 25.201-b09
I start an ignite node using the default configuration provided in the downloaded zip apache-ignite-2.14.0-bin.zip :
ignite.bat ..\config\default-config.xml
This starts fine. Following the instructions at https://ignite.apache.org/docs/latest/tools/control-script I can check the state and see I've got a single node cluster in state ACTIVE (the default-config.xml must not have native persistence enabled, so the cluster goes to ACTIVE state automatically).
I can then set the state to INACTIVE like so:
control.bat --set-state INACTIVE
This works fine. However if I set the state to active again like so:
control.bat --set-state ACTIVE
I get the error pasted below and the cluster stays in the INACTIVE state. I first came across this error when using Ignite in embedded server mode, but I can still reproduce it with a fresh out-of-the-box ignite install (not using embedded). I'm surprised that a plain vanilla install just calling a couple of basic commands falls over like this. Any idea what's happening?
This is the error:
C:\temp\apache-ignite-2.14.0-bin\bin>control.bat --set-state ACTIVE
Nov 17, 2022 4:27:17 PM
org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection
INFO: Client TCP connection established: /127.0.0.1:11211 Nov
17, 2022 4:27:17 PM
org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection
close INFO: Client TCP connection closed: /127.0.0.1:11211 Nov 17,
2022 4:27:17 PM org.apache.ignite.internal.client.util.GridClientUtils
shutdownNow WARNING: Runnable tasks outlived thread pool executor
service [owner=GridClientConnectionManager,
tasks=[java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask#6d7b4f4c]]
Control utility [ver. 2.14.0#20220929-sha1:951e8deb] 2022 Copyright(C)
Apache Software Foundation User: info Time: 2022-11-17T16:27:16.344
null suppressed:
Command [SET-STATE] finished with code: 4 Error stack trace: class
org.apache.ignite.internal.client.GridClientException: null
suppressed:
at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.handleClientResponse(GridClientNioTcpConnection.java:632)
at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.handleResponse(GridClientNioTcpConnection.java:563)
at org.apache.ignite.internal.client.impl.connection.GridClientConnectionManagerAdapter$NioListener.onMessage(GridClientConnectionManagerAdapter.java:691)
at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:116)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3734)
at org.apache.ignite.internal.util.nio.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:175)
at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1211)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2508)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2273)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1910)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
at java.lang.Thread.run(Thread.java:748)
Control utility has completed execution at: 2022-11-17T16:27:17.642
Execution time: 1298 ms Press any key to continue . . .
It's a known issue, which is, unfortunately, not fixed yet.
As a workaround, you can execute the command with the autoconfirmation flag --yes, as shown below:
control.bat --set-state ACTIVE --yes

Live Migration Failure: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory

I have a 2 node OpenStack Mitaka environment consisting of a controller/compute node and a compute node.
I've followed the setup guide to enable instance live migration using LVM block storage. I.e.: There's no shared storage backend, just local LVM block storage.
Using OpenStack Horizon to perform the live migration a success message is displayed, however, the migration is far from successful. This worked pretty much out-of-the-box with our Juno installation. I've exhausted Google and cannot find any other instances of people facing the same problem. I thought it might have been a time synchronisation problem so have set both nodes to UTC. Still the problems persists.
Source machine /var/log/nova/nova-compute.log
2016-08-12 15:56:42.120 2230 ERROR nova.virt.libvirt.driver [req-b71ea7b0-5fa8-4b57-92d2-4edec62135c2 b017d86d1143461a92a267d4b912c104 88c686f09e1b427fb750f5c00716f84e - - -] [instance: 5763b6b6-370c-448c-8e8f-8b71eafaa8f1] Migration operation has aborted
2016-08-12 15:56:42.470 2230 ERROR nova.virt.libvirt.driver [req-b71ea7b0-5fa8-4b57-92d2-4edec62135c2 b017d86d1143461a92a267d4b912c104 88c686f09e1b427fb750f5c00716f84e - - -] [instance: 5763b6b6-370c-448c-8e8f-8b71eafaa8f1] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory
Target node /var/log/libvirt/libvirtd.log
2016-08-12 15:56:41.864+0000: 2170: error : qemuMonitorJSONGetMigrationStatsReply:2443 : internal error: info migration reply was missing return status
2016-08-12 15:56:41.864+0000: 2170: error : virNetClientProgramDispatchError:177 : Cannot open log file: '/var/log/libvirt/qemu/instance-0000006a.log': Device or resource busy
There are no other events captured in the source or target nova or libvirt logs.
I should also note that I am trying to use qemu+tcp (libvirt listening enabled, default tcp port, no auth) rather than qemu+ssh in order to keep things simple while testing. In fact, I intend to only use qemu+tcp anyway.
Which version of ubuntu did you deploy?
I had the same error with ubuntu 14.04 and mitaka version.
And I figured out that default kernel (3.13) makes this problem.
I upgraded the kernel from 3.13 to 4.40 and this problem is gone now.
I hope my experience help you solve this problem out.
Thanks

Why, dows 'neo4j console' work, and 'neo4j start' doesn't?

I want to use neo4j. I installed neo4j-community 2.3.2-1 from archlinux AUR and when I ise neo4j consoleeverything works fine. But when I want to start the server in the background with neo4j startthe server won't start with error message:
WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual.
Starting Neo4j Server...process [20559]... waiting for server to be ready... Failed to start within 120 seconds.
Neo4j Server may have failed to start, please check the logs.
The server did not try to start or 120 seconds, more like 2 seconds. In addition I callot find the log-files anywhere.
Google told me to tryout neo4j start-no-wait
When I do this command:
WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual.
Starting Neo4j Server...process [21088]...Started the server in the background, returning...
But nothing is started and the webclient doesn't work like it does when I use neo4j console.
So my basic question is: Why does neo4j consolework and neo4j startdoes not? And how can I start neo4j in the background and stop it again without killing the process?
EDIT:
the console.log says:
2016-01-29 12:51:18.338+0100 INFO Successfully shutdown Neo4j Server
2016-01-29 12:51:18.348+0100 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#77e4c80f' was successfully initialized, but failed to start. Please see attached cause exception. Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#77e4c80f' was successfully initialized, but failed to start. Please see attached cause exception.
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase#77e4c80f' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:67)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:234)
at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:97)
at org.neo4j.server.CommunityBootstrapper.start(CommunityBootstrapper.java:48)
at org.neo4j.server.CommunityBootstrapper.main(CommunityBootstrapper.java:35)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase#77e4c80f' was successfully initialized, but failed to start. Please see attached cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:462)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:194)
... 3 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /var/lib/neo4j/data/graph.db/messages.log (Permission denied)
at org.neo4j.kernel.impl.factory.PlatformModule.createLogService(PlatformModule.java:261)
at org.neo4j.kernel.impl.factory.PlatformModule.<init>(PlatformModule.java:140)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.createPlatform(GraphDatabaseFacadeFactory.java:181)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:124)
at org.neo4j.kernel.impl.factory.CommunityFacadeFactory.newFacade(CommunityFacadeFactory.java:43)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:108)
at org.neo4j.server.CommunityNeoServer$1.newGraphDatabase(CommunityNeoServer.java:66)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:95)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: java.io.FileNotFoundException: /var/lib/neo4j/data/graph.db/messages.log (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at org.neo4j.io.fs.DefaultFileSystemAbstraction.openAsOutputStream(DefaultFileSystemAbstraction.java:61)
at org.neo4j.io.file.Files.createOrOpenAsOuputStream(Files.java:47)
at org.neo4j.logging.RotatingFileOutputStreamSupplier.openOutputFile(RotatingFileOutputStreamSupplier.java:254)
at org.neo4j.logging.RotatingFileOutputStreamSupplier.<init>(RotatingFileOutputStreamSupplier.java:138)
at org.neo4j.logging.RotatingFileOutputStreamSupplier.<init>(RotatingFileOutputStreamSupplier.java:122)
at org.neo4j.kernel.impl.logging.StoreLogService.<init>(StoreLogService.java:164)
at org.neo4j.kernel.impl.logging.StoreLogService.<init>(StoreLogService.java:43)
at org.neo4j.kernel.impl.logging.StoreLogService$Builder.toFile(StoreLogService.java:110)
at org.neo4j.kernel.impl.logging.StoreLogService$Builder.inStoreDirectory(StoreLogService.java:105)
at org.neo4j.kernel.impl.factory.PlatformModule.createLogService(PlatformModule.java:252)
... 13 more

Zookeeper error connection loss exception

I'm running a SeqWare VM on an amazon EC2 instance I'm trying to use the SeqWare query engine to query data from VCF files. When I first launch the instance and follow the instructions to import data, It works fine, and continues to work until I stop the instance. When I restart it. It won't let me import anything, nor create a new workspace. It always returns the error below. I looked at the processes and found that none of the required nodes were running, so I logged into root and went to the etc/init.d directory and start everything again, at which point, when T try to import data, I don't even get an error and I have to stop the process.
[seqware#master target]$ java -classpath seqware-distribution-0.13.6.7-qe-full.jar com.github.seqware.queryengine.system.importers.SOFeatureImporter -i ../../seqware-queryengine/src/test/resources/com/github/seqware/queryengine/system/FeatureImporter/consequences_annotated.vcf ALL.chr3.phase1_release_v3.20101123.snps_indels_svs.genotypes.3_100001-101000.vcf -o keyValueVCF.out -r hg_19 -s c111aea5-5e18-4c62-a8a7-ec82fe151301 -a ad_hoc -w VCFVariantImportWorker
[SeqWare Query Engine] 0 [main] ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - ZooKeeper exists failed after 3 retries
[SeqWare Query Engine] 1 [main] ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher - hconnection Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:82)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:569)
at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:100)
at com.github.seqware.queryengine.impl.HBaseStorage.<init>(HBaseStorage.java:89)
at com.github.seqware.queryengine.factory.SWQEFactory$Storage_Type$3.buildStorage(SWQEFactory.java:109)
at com.github.seqware.queryengine.factory.SWQEFactory.getStorage(SWQEFactory.java:174)
at com.github.seqware.queryengine.factory.SWQEFactory.getQueryInterface(SWQEFactory.java:199)
at com.github.seqware.queryengine.impl.SimpleModelManager.<init>(SimpleModelManager.java:49)
at com.github.seqware.queryengine.impl.HBaseModelManager.<init>(HBaseModelManager.java:36)
at com.github.seqware.queryengine.impl.MRHBaseModelManager.<init>(MRHBaseModelManager.java:32)
at com.github.seqware.queryengine.factory.SWQEFactory.getModelManager(SWQEFactory.java:211)
at com.github.seqware.queryengine.system.importers.FeatureImporter.performImport(FeatureImporter.java:66)
at com.github.seqware.queryengine.system.importers.SOFeatureImporter.runMain(SOFeatureImporter.java:141)
at com.github.seqware.queryengine.system.importers.SOFeatureImporter.main(SOFeatureImporter.java:60)
[SeqWare Query Engine] 3 [main] FATAL org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Unexpected exception during initialization, aborting
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:82)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:569)
at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:100)
at com.github.seqware.queryengine.impl.HBaseStorage.<init>(HBaseStorage.java:89)
at com.github.seqware.queryengine.factory.SWQEFactory$Storage_Type$3.buildStorage(SWQEFactory.java:109)
at com.github.seqware.queryengine.factory.SWQEFactory.getStorage(SWQEFactory.java:174)
at com.github.seqware.queryengine.factory.SWQEFactory.getQueryInterface(SWQEFactory.java:199)
at com.github.seqware.queryengine.impl.SimpleModelManager.<init>(SimpleModelManager.java:49)
at com.github.seqware.queryengine.impl.HBaseModelManager.<init>(HBaseModelManager.java:36)
at com.github.seqware.queryengine.impl.MRHBaseModelManager.<init>(MRHBaseModelManager.java:32)
at com.github.seqware.queryengine.factory.SWQEFactory.getModelManager(SWQEFactory.java:211)
at com.github.seqware.queryengine.system.importers.FeatureImporter.performImport(FeatureImporter.java:66)
at com.github.seqware.queryengine.system.importers.SOFeatureImporter.runMain(SOFeatureImporter.java:141)
at com.github.seqware.queryengine.system.importers.SOFeatureImporter.main(SOFeatureImporter.java:60)
I figured it out.The apache services were installed from the cloudera package. They weren't being restarted when the instance was being restarted and apparently, just running their script from the etc/init.d was the incorrect way to do it. I found the commands to restart them in the cloudera documentation.
I too faced this problem.I was able to solve this problem by providing jute.maxbuffer parameter while starting zookeeper.
For more info you can refer
https://issues.apache.org/jira/browse/SOLR-4793

Glassfish 3.1.1 start-local-instance fails with JAXBException

I am setting up a GlassFish cluster following the guide at http://javadude.wordpress.com/2011/04/25/glassfish-3-1-clustering-tutorial/. I started from fresh installs of GlassFish 3.1.1. I also have the same architecture as in the guide: two nodes with one instance each. The DAS is on node1.
I've tried starting from scratch several times and am able to create the cluster, nodes and instances without issue. I also have the DAS communicating with node2 via SSH. However, each time when I get to the point where I attempt to start instance2 it fails:
$ ./asadmin start-local-instance --node node1 --sync normal instance2
Previous synchronization failed at Feb 23, 2012 2:41:53 PM
Will perform full synchronization.
Removing all cached state for instance instance2.
CLI802 Synchronization failed for directory config, caused by:
javax.xml.bind.JAXBException
- with linked exception:
[java.lang.ClassNotFoundException: com.sun.xml.bind.v2.ContextFactory]
Command start-local-instance failed.
I spent the day Googling and searching GlassFish's Jira, but couldn't find a solution to this issue. I'd very much appreciate any ideas you have on how to solve this problem.
My operating system is CentOS 5.7 and my Java version is 1.6.0_20
Unfortunately my instance directory is empty, I'm assuming because it never started. So there is no log file. I set AS_DEBUG=true but it gives no stack trace. The last debug lines before the error are
Removing all cached state for instance instance2.
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/config
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/applications
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/generated
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/lib
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/docroot
Got exception: javax.xml.bind.JAXBException
Acting on a tip from a user in the Glassfish forum, I learned that Java 1.6.0_20 is an older release of Java that is not supported by Glassfish 3.1.1. I worked with a sysadmin to get Java 1.6.0_31 installed on both nodes of the cluster and that did the trick--both instances start up without errors.