Naming rabbitmq node with a preconfigured name - rabbitmq

I am setting up a Rabbitmq single node container built form a docker image. The Image is configured to persist to nfs mounted disc.
I ran into an issue when the image is restarted. Since every time a node restarted it gets unique name and the restarted node searching for the old nodes it’s reads from cluster_nodes.config file
Error dump shows:
Error during startup: {error,
{failed_to_cluster_with,
[rabbit#9c3bfb851ba3],
"Mnesia could not connect to any nodes."}}
How can I configure my image to use same name each time when it’s restarted instead of using arbitrary node name given by Kubernetes cluster?

Related

I am trying OpenShift origin, I cannot create application

I am trying OO on a RHEL Atomic Host. I spun up OO master as a container following this guide https://docs.openshift.org/latest/getting_started/administrators.html
After attaching a shell to the Master Container, I cannot deploy an app.
# oc new-app openshift/deployment-example
error: can't look up Docker image "openshift/deployment-example": Internal error occurred: Get https://registry-1.docker.io/v2/: net/htt p: request canceled while waiting for connection error: no match for "openshift/deployment-example"
The 'oc new-app' command will match arguments to the following types:
1. Images tagged into image streams in the current project or the 'openshift' project
- if you don't specify a tag, we'll add ':latest'
2. Images in the Docker Hub, on remote registries, or on the local Docker engine
3. Templates in the current project or the 'openshift' project
4. Git repository URLs or local paths that point to Git repositories
--allow-missing-images can be used to point to an image that does not exist yet.
See 'oc new-app -h' for examples.
The host needs proxy to access Internet. I have configured proxy in /etc/sysconfig/docker and that is how I could pull the origin image in the same place.
I have tried setting proxy for master and node with luck
https://docs.openshift.org/latest/install_config/http_proxies.html
It is possible that your proxy is terminating the connection. you can test by creating an internal registry, push image to that and then use
"oc new-app your.internal.registry/openshift/deployment-example"

elasticsearch-mesos not getting listed under frameworks of mesosUI

Iam trying to run elasticsearch-mesos on mesos.My machine is running ubuntu 14.04. I have running mesos cluster installed with mesosphere packages by following these instructions. When I run test frameworks it gets lister under frameworks of mesosUI but for elasticsearch-mesos its not getting listed under mesos webUI. I want to run elasticsearch-mesos on top of mesos. I followed instructions given here. When I run ./elasticsearch-mesos I am getting a message in terminal
I0108 17:24:01.898540 23861 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I tried running ./elasticsearch-mesos on both mesos masters and slaves.
The last few lines of terminal output is given below
2015-01-08 17:24:01,881:23844(0x7f175bfff700):ZOO_INFO#zookeeper_init#786: Initiating
client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7f1762a3e6a0
sessionId=0 sessionPasswd=<null> context=0x7f1710002530 flags=0
I0108 17:24:01.881392 23858 sched.cpp:137] Version: 0.21.1
2015-01-08 17:24:01,881:23844(0x7f172b7fe700):ZOO_INFO#check_events#1703: initiated
connection to server [127.0.0.1:2181]
2015-01-08 17:24:01,897:23844(0x7f172b7fe700):ZOO_INFO#check_events#1750: session
establishment complete on server [127.0.0.1:2181], sessionId=0x14ac7c469270006,
negotiated timeout=10000
I0108 17:24:01.898455 23861 group.cpp:313] Group process (group(1)#127.0.1.1:38668)
connected to ZooKeeper
I0108 17:24:01.898509 23861 group.cpp:790] Syncing group operations: queue size (joins,
cancels, datas) = (0, 0, 0)
I0108 17:24:01.898540 23861 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
According to the README at https://github.com/mesosphere/elasticsearch-mesos,
you may need to modify mesos.master.url to point to the same ZK url that the Mesos master is using (maybe not localhost). If you're using a single-master Mesos cluster, you can skip the ZK url and point this parameter directly to the Mesos master.
Please also note that the elasticsearch framework is a bit outdated, so use with caution

Solr issue: ClusterState says we are the leader, but locally we don't think so

So today we run into a disturbing solr issue.
After a restart of the whole cluster one of the shard stop being able to index/store documents.
We had no hint about the issue until we started indexing (querying the server looks fine).
The error is:
2014-05-19 18:36:20,707 ERROR o.a.s.u.p.DistributedUpdateProcessor [qtp406017988-19] ClusterState says we are the leader, but locally we don't think so
2014-05-19 18:36:20,709 ERROR o.a.s.c.SolrException [qtp406017988-19] org.apache.solr.common.SolrException: ClusterState says we are the leader (http://x.x.x.x:7070/solr/shard3_replica1), but locally we don't think so. Request came from null
at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:503)
at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:267)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550)
at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:126)
at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101)
at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
We run Solr 4.7 in Cluster mode (5 shards) on jetty.
Each shard run on a different host with one zookeeper server.
I checked the zookeeper log and I cannot see anything there.
The only difference is that in the /overseer_election/election folder I see this specific server repeated 3 times, while the other server are only mentioned twice.
45654861x41276x432-x.x.x.x:7070_solr-n_00000003xx
74030267x31685x368-x.x.x.x:7070_solr-n_00000003xx
74030267x31685x369-x.x.x.x:7070_solr-n_00000003xx
Not even sure if this is relevant. (Can it be?)
Any clue what other check can we do?
We've experienced this error under 2 conditions.
Condition 1
On a single zookeeper host there was an orphaned Zookeeper ephemeral node in
/overseer_elect/election. The session this ephemeral node was associated with no longer existed.
The orphaned ephemeral node cannot be deleted.
Caused by: https://issues.apache.org/jira/browse/ZOOKEEPER-2355
This condition will also be accompanied by a /overseer/queue directory that is clogged-up with queue items that are forever waiting to be processed.
To resolve the issue you must restart the Zookeeper node in question with the orphaned ephemeral node.
If after the restart you see Still seeing conflicting information about the leader of shard shard1 for collection <name> after 30 seconds
You will need to restart the Solr hosts as well to resolve the problem.
Condition 2
Cause: a mis-configured systemd service unit.
Make sure you have Type=forking and have PIDFile configured correctly if you are using systemd.
systemd was not tracking the PID correctly, it thought the service was dead, but it wasn't, and at some point 2 services were started. Because the 2nd service will not be able to start (as they both can't listen on the same port) it seems to just sit there in a failed state hanging, or fails to start the process but just messes up the other solr processes somehow by possibly overwriting temporary clusterstate files locally.
Solr logs reported the same error the OP posted.
Interestingly enough, another symptom was that zookeeper listed no leader for our collection in /collections/<name>/leaders/shard1/leader normally this zk node contains contents such as:
{"core":"collection-name_shard1_replica1",
"core_node_name":"core_node7",
"base_url":"http://10.10.10.21:8983/solr",
"node_name":"10.10.10.21:8983_solr"}
But the node is completely missing on the cluster with duplicate solr instances attempting to start.
This error also appeared in the Solr Logs:
HttpSolrCall null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /roles.json
To correct the issue, killall instances of solr (or java if you know it's safe), and restart the solr service.
We figured out!
The issue was that jetty didn't really stop so we had 2 running processes, for whatever reason this was fine for reading but not for writing.
Killing the older java process solved the issue.

cloudstack fail to start primary and second storage

I use 2 hosts to establish my cloudstack cluster.All my hosts is Ubuntu 12.04 using NFSv3 and I use host1 as both the primary and secondary storage server.The management server is also in host1.I can mount host1's primary and second storage on host2,and I can access them through host2's mount points.But,when I try to add my first zone,I encountered the problem as show below,these messages can be found in my management-server.log:
2013-10-26 04:11:47,086 INFO [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) Unable to start secondary storage vm for standby capacity, secStorageVm vm Id : 28, will recycle it and start a new one
2013-10-26 04:11:47,086 INFO [cloud.secstorage.PremiumSecondaryStorageManagerImpl] (secstorage-1:null) Primary secondary storage is not even started, wait until next turn
2013-10-26 04:57:16,615 WARN [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) Exception while trying to start secondary storage vm
com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Unable to start instance due to com.cloud.agent.api.Answer cannot be cast to com.cloud.agent.api.storage.PrimaryStorageDownloadAnswer
The log suggests that cloudstack fail to access host and start primary and secondary storages.I just don't how to test wheather the host is unreachable and how to start the primary and secondary storages.
It directly results in the two system vms stopped.I want to know some clues about how this problem occurs and some methods to debug.Any help will be appreicated!
Check that you have seeded secondary storage with the system VM template.
Prepare the System VM Template from the 4.2 Install Guide
The secondary storage system VM is responsible for adding templates to secondary storage. Without it, CloudStack cannot create new templates, which is why it has to be added manually.

Brisk TaskTracker not starting in a multi-node Brisk setup

I have a 3 node Brisk cluster (Briskv1.0_beta2). Cassandra is working fine (all three nodes see each other and data is balanced across the ring). I started the nodes with the brisk cassandra -t command. I cannot, however, run any Hive or Pig jobs. When I do, I get an exception saying that it cannot connect to the task tracker.
During the startup process, I see the following in the log:
TaskTracker.java (line 695) TaskTracker up at: localhost.localdomain/127.0.0.1:34928
A few lines later, however, I see this:
Retrying connect to server: localhost.localdomain/127.0.0.1:8012. Already tried 9 time(s).
INFO [TASK-TRACKER-INIT] RPC.java (line 321) Server at localhost.localdomain/127.0.0.1:8012 not available yet, Zzzzz...
Those lines are repeated non-stop as long as my cluster is running.
My cassandra.yaml file specifies the box IP (not 0.0.0.0 or localhost) as the listen_address and the rpc_address is set to 0.0.0.0
Why is the client attempting to connect to a different port than the log shows the task tracker as using? Is there anywhere these addresses/ports can be specified?
I figured this out. In case anyone else has the same issues, here's what was going on:
Brisk uses the first entry in the Cassandra cluster's seed list to pick the initial jobtracker. One of my nodes had 127.0.0.1 in the seed list. This worked for the Cassandra setup since all the other nodes in the cluster connected to that box to get the cluster topology but this didn't work for the job tracker selection.
looks like your jobtracker isn't running. What do you see when you run "brisktool jobtracker"?