Live Migration Failure: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory - migration

I have a 2 node OpenStack Mitaka environment consisting of a controller/compute node and a compute node.
I've followed the setup guide to enable instance live migration using LVM block storage. I.e.: There's no shared storage backend, just local LVM block storage.
Using OpenStack Horizon to perform the live migration a success message is displayed, however, the migration is far from successful. This worked pretty much out-of-the-box with our Juno installation. I've exhausted Google and cannot find any other instances of people facing the same problem. I thought it might have been a time synchronisation problem so have set both nodes to UTC. Still the problems persists.
Source machine /var/log/nova/nova-compute.log
2016-08-12 15:56:42.120 2230 ERROR nova.virt.libvirt.driver [req-b71ea7b0-5fa8-4b57-92d2-4edec62135c2 b017d86d1143461a92a267d4b912c104 88c686f09e1b427fb750f5c00716f84e - - -] [instance: 5763b6b6-370c-448c-8e8f-8b71eafaa8f1] Migration operation has aborted
2016-08-12 15:56:42.470 2230 ERROR nova.virt.libvirt.driver [req-b71ea7b0-5fa8-4b57-92d2-4edec62135c2 b017d86d1143461a92a267d4b912c104 88c686f09e1b427fb750f5c00716f84e - - -] [instance: 5763b6b6-370c-448c-8e8f-8b71eafaa8f1] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory
Target node /var/log/libvirt/libvirtd.log
2016-08-12 15:56:41.864+0000: 2170: error : qemuMonitorJSONGetMigrationStatsReply:2443 : internal error: info migration reply was missing return status
2016-08-12 15:56:41.864+0000: 2170: error : virNetClientProgramDispatchError:177 : Cannot open log file: '/var/log/libvirt/qemu/instance-0000006a.log': Device or resource busy
There are no other events captured in the source or target nova or libvirt logs.
I should also note that I am trying to use qemu+tcp (libvirt listening enabled, default tcp port, no auth) rather than qemu+ssh in order to keep things simple while testing. In fact, I intend to only use qemu+tcp anyway.

Which version of ubuntu did you deploy?
I had the same error with ubuntu 14.04 and mitaka version.
And I figured out that default kernel (3.13) makes this problem.
I upgraded the kernel from 3.13 to 4.40 and this problem is gone now.
I hope my experience help you solve this problem out.
Thanks

Related

yarn application accepted but not running cloudera despite resource allocation

I am using a Cloudera quickstart VM 5.13.0.0 to run Spark applications in yarn-client mode. I have allocated 10GB and 3 cores to my Cloudera VM. When I submit the application, the application is ACCEPTED but never moves on to RUNNING. When I try to look for logs using yarn logs -applicationId I do not see anything. Its absolutely blank.
I have looked up this issue on:
here
here
here
here
here
here
here
I have practically meddled with all the configs that these links see a problem with. I still do not have an answer to my problem which on the face of it looks like the ones in the links above. Here are the config parameters of my cloudera cluster:
mapreduce.map.memory.mb 128M
mapreduce.reduce.memory.mb 128M
mapreduce.job.heap.memory-mb.ratio 0.8
yarn.nodemanager.resource.memory-mb 1900M
yarn.nodemanager.resource.percentage-physical-cpu-limit 100
yarn.nodemanager.resource.cpu-vcores 1
yarn.scheduler.minimum-allocation-mb 1M
yarn.scheduler.increment-allocation-mb 100M
yarn.scheduler.maximum-allocation-mb 1600M
yarn.scheduler.minimum-allocation-vcores 1
yarn.scheduler.increment-allocation-vcores 1
yarn.scheduler.maximum-allocation-vcores 2
yarn.scheduler.fair.continuous-scheduling-enabled unchecked
mapreduce.am.max-attempts 1
yarn.resourcemanager.am.max-retries, yarn.resourcemanager.am.max-attempts 1
yarn.app.mapreduce.am.resource.mb 1G
yarn.app.mapreduce.am.resource.cpu-vcores 1
ApplicationMaster Java Maximum Heap Size 512M
yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.scheduler.fair.user-as-default-queue unchecked
yarn.scheduler.fair.preemption unchecked
yarn.scheduler.fair.preemption.cluster-utilization-threshold 0.8
yarn.scheduler.fair.sizebasedweight unchecked
Fair Scheduler Allocations (deployed) {"defaultFairSharePreemptionThreshold":null,"defaultFairSharePreemptionTimeout":null,"defaultMinSharePreemptionTimeout":null,"defaultQueueSchedulingPolicy":"drf","queueMaxAMShareDefault":-1.0,"queueMaxAppsDefault":null,"queuePlacementRules":[{"create":true,"name":"specified","queue":null,"rules":null},{"create":null,"name":"nestedUserQueue","queue":null,"rules":[{"create":true,"name":"default","queue":"users","rules":null}]},{"create":null,"name":"default","queue":null,"rules":null}],"queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"root","queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"default","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null},{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"users","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":"parent"}],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":null,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null}],"userMaxAppsDefault":1,"users":[]}
Here is what the queue description looks like when the application is still in ACCEPTED state:
Likewise, here is the record from the Yarn RM UI (Note that the resources are allocated (memory/cpu) and Running Containers shows 1 container running):
Here is the Application Summary:
Here are the application logs (empty):
And, lastly, here is what the driver sees:
enter code here19/12/26 00:16:42 INFO Client:
client token: N/A
diagnostics: Application application_1577297544619_0002 failed 1 times due to AM Container for appattempt_1577297544619_0002_000001 exited with exitCode: 10
For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1577297544619_0002/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1577297544619_0002_01_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.default
start time: 1577299469533
final status: FAILED
tracking URL: http://quickstart.cloudera:8088/cluster/app/application_1577297544619_0002
user: shepanch
19/12/26 00:16:42 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:165)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:512)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2511)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at cloudera.jobs.ClouderaSampleJob$.delayedEndpoint$cloudera$jobs$ClouderaSampleJob$1(ClouderaSampleJob.scala:17)
at cloudera.jobs.ClouderaSampleJob$delayedInit$body.apply(ClouderaSampleJob.scala:6)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at cloudera.jobs.ClouderaSampleJob$.main(ClouderaSampleJob.scala:6)
at cloudera.jobs.ClouderaSampleJob.main(ClouderaSampleJob.scala)
Is there anything that can be done to solve this issue?
After all the research and apart from the reasons mentioned in the links I have mentioned in the question, I found that this can happen due to various reasons:
when you have different versions of spark in the client (driver) and the cluster. Once you ensure that both bundle the same version of spark, it runs fine.
you might need to mention the property spark.driver.host. Make sure the IP passed in here can be pinged from the guest VM.

GraphDB Workbench would not start

After a VM shutdown, GraphDB Workbench would not start.
I have installed GraphDB on a cloud-hosted VM. Incidentally, the machine was shut down without stopping GraphDB. When trying to start it again, the Workbench would not start and the following message is displayed in the error log.
[ERROR] 2019-06-19 12:12:00,299 [Thread-10 | c.o.t.s.i.PluginManager]
Problem shutting down literals-index java.lang.RuntimeException:
com.ontotext.trree.transactions.TransactionException: Failed to
created journal file: /home/peio/graphdb-se-8.10.1/data/repositor
ies/bgnews/storage/literals-index/numerics.index.precommit
As Damyan suggested, delete literals-index in the storage folder and it will be rebuilt on start-up.

Server Worklight Development Server failed to start

I have install latest version of IBM Worklight on Marketplace of eclipse juno. I have create a new project and then run my project Run As > Run on Worklight Development Server. then i have face an error on the server. Please Help me.
Thanks
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
objc[42627]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/jre/bin/java and /Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be used. Which one is undefined.
ERROR: transport error 202: bind failed: Address already in use
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [debugInit.c:750]
FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
/Users/apple/Downloads/eclipse 2/plugins/com.worklight.studio.plugin_6.2.0.00-20140904-1709/liberty/wlp/bin/server: line 710: 42627 Abort trap: 6 "${JAVA_CMD}" "$#"
Your Log says Address Already in use. Restart your eclipse. Make sure process is killed.
Either your eclipse did not exit gracefully or some other application is using the port.
If some application is using the port go to server view of eclipse, expand "worklight development server" and change port in "server.xml" file. Hope that helps
According to your error message it is clearly saying that address already in use , this means the port number id using java.exe in background process.so this time if you stop server also this will not release.For this please do the following.
1.Download TCPView from http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx for windows.
2.Install TcpView , after that you will find the port number say 10080.Right click on port number and end process.Now Clean the project and run the server.
Hope this helps.

Zookeeper error connection loss exception

I'm running a SeqWare VM on an amazon EC2 instance I'm trying to use the SeqWare query engine to query data from VCF files. When I first launch the instance and follow the instructions to import data, It works fine, and continues to work until I stop the instance. When I restart it. It won't let me import anything, nor create a new workspace. It always returns the error below. I looked at the processes and found that none of the required nodes were running, so I logged into root and went to the etc/init.d directory and start everything again, at which point, when T try to import data, I don't even get an error and I have to stop the process.
[seqware#master target]$ java -classpath seqware-distribution-0.13.6.7-qe-full.jar com.github.seqware.queryengine.system.importers.SOFeatureImporter -i ../../seqware-queryengine/src/test/resources/com/github/seqware/queryengine/system/FeatureImporter/consequences_annotated.vcf ALL.chr3.phase1_release_v3.20101123.snps_indels_svs.genotypes.3_100001-101000.vcf -o keyValueVCF.out -r hg_19 -s c111aea5-5e18-4c62-a8a7-ec82fe151301 -a ad_hoc -w VCFVariantImportWorker
[SeqWare Query Engine] 0 [main] ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - ZooKeeper exists failed after 3 retries
[SeqWare Query Engine] 1 [main] ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher - hconnection Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:82)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:569)
at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:100)
at com.github.seqware.queryengine.impl.HBaseStorage.<init>(HBaseStorage.java:89)
at com.github.seqware.queryengine.factory.SWQEFactory$Storage_Type$3.buildStorage(SWQEFactory.java:109)
at com.github.seqware.queryengine.factory.SWQEFactory.getStorage(SWQEFactory.java:174)
at com.github.seqware.queryengine.factory.SWQEFactory.getQueryInterface(SWQEFactory.java:199)
at com.github.seqware.queryengine.impl.SimpleModelManager.<init>(SimpleModelManager.java:49)
at com.github.seqware.queryengine.impl.HBaseModelManager.<init>(HBaseModelManager.java:36)
at com.github.seqware.queryengine.impl.MRHBaseModelManager.<init>(MRHBaseModelManager.java:32)
at com.github.seqware.queryengine.factory.SWQEFactory.getModelManager(SWQEFactory.java:211)
at com.github.seqware.queryengine.system.importers.FeatureImporter.performImport(FeatureImporter.java:66)
at com.github.seqware.queryengine.system.importers.SOFeatureImporter.runMain(SOFeatureImporter.java:141)
at com.github.seqware.queryengine.system.importers.SOFeatureImporter.main(SOFeatureImporter.java:60)
[SeqWare Query Engine] 3 [main] FATAL org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Unexpected exception during initialization, aborting
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:226)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:82)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:580)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:569)
at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:186)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:100)
at com.github.seqware.queryengine.impl.HBaseStorage.<init>(HBaseStorage.java:89)
at com.github.seqware.queryengine.factory.SWQEFactory$Storage_Type$3.buildStorage(SWQEFactory.java:109)
at com.github.seqware.queryengine.factory.SWQEFactory.getStorage(SWQEFactory.java:174)
at com.github.seqware.queryengine.factory.SWQEFactory.getQueryInterface(SWQEFactory.java:199)
at com.github.seqware.queryengine.impl.SimpleModelManager.<init>(SimpleModelManager.java:49)
at com.github.seqware.queryengine.impl.HBaseModelManager.<init>(HBaseModelManager.java:36)
at com.github.seqware.queryengine.impl.MRHBaseModelManager.<init>(MRHBaseModelManager.java:32)
at com.github.seqware.queryengine.factory.SWQEFactory.getModelManager(SWQEFactory.java:211)
at com.github.seqware.queryengine.system.importers.FeatureImporter.performImport(FeatureImporter.java:66)
at com.github.seqware.queryengine.system.importers.SOFeatureImporter.runMain(SOFeatureImporter.java:141)
at com.github.seqware.queryengine.system.importers.SOFeatureImporter.main(SOFeatureImporter.java:60)
I figured it out.The apache services were installed from the cloudera package. They weren't being restarted when the instance was being restarted and apparently, just running their script from the etc/init.d was the incorrect way to do it. I found the commands to restart them in the cloudera documentation.
I too faced this problem.I was able to solve this problem by providing jute.maxbuffer parameter while starting zookeeper.
For more info you can refer
https://issues.apache.org/jira/browse/SOLR-4793

Glassfish 3.1.1 start-local-instance fails with JAXBException

I am setting up a GlassFish cluster following the guide at http://javadude.wordpress.com/2011/04/25/glassfish-3-1-clustering-tutorial/. I started from fresh installs of GlassFish 3.1.1. I also have the same architecture as in the guide: two nodes with one instance each. The DAS is on node1.
I've tried starting from scratch several times and am able to create the cluster, nodes and instances without issue. I also have the DAS communicating with node2 via SSH. However, each time when I get to the point where I attempt to start instance2 it fails:
$ ./asadmin start-local-instance --node node1 --sync normal instance2
Previous synchronization failed at Feb 23, 2012 2:41:53 PM
Will perform full synchronization.
Removing all cached state for instance instance2.
CLI802 Synchronization failed for directory config, caused by:
javax.xml.bind.JAXBException
- with linked exception:
[java.lang.ClassNotFoundException: com.sun.xml.bind.v2.ContextFactory]
Command start-local-instance failed.
I spent the day Googling and searching GlassFish's Jira, but couldn't find a solution to this issue. I'd very much appreciate any ideas you have on how to solve this problem.
My operating system is CentOS 5.7 and my Java version is 1.6.0_20
Unfortunately my instance directory is empty, I'm assuming because it never started. So there is no log file. I set AS_DEBUG=true but it gives no stack trace. The last debug lines before the error are
Removing all cached state for instance instance2.
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/config
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/applications
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/generated
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/lib
Removing: /usr/local/glassfish3_1_1/glassfish/nodes/blade-50/instance2/docroot
Got exception: javax.xml.bind.JAXBException
Acting on a tip from a user in the Glassfish forum, I learned that Java 1.6.0_20 is an older release of Java that is not supported by Glassfish 3.1.1. I worked with a sysadmin to get Java 1.6.0_31 installed on both nodes of the cluster and that did the trick--both instances start up without errors.