What will happen for an Openstack instance if a server (Node) go offline? - instance

I'm new to OpenStack and have a basic question about it. Assume that we have 3 Master node (Controller) and 10 Slave node (Compute node) in our cloud. We make 50 VMs (Instances) on the cloud. What will happen if one node (Controller or Compute node) become offline (Failure)? What is the best solution to prevent shutting down a VM if a server get offline?
Best regards

This question requires more than a short Stackoverflow answer. Here are a few initial thoughts.
When a controller goes offline, the instance itself continues running, but if the failed controller hosts a router, the instance might be cut off from the network. Generally, if the controller has anything that the instance needs, that thing won't be available anymore. There are measures like HA routers that can help in such a case.
When the instance's compute host goes down, the instance doesn't run anymore. You can evacuate instances from a failed compute host, which means that they are rebuilt on different hosts. If an instance's root disk resides on a volume or an ephemeral disk that is shared with other compute hosts, this means a mere instance reboot. If the instance has an ephemeral disk inside the failed host, it must be rebuilt from scratch.
OpenStack has a project named Masakari whose goal is to make instances resilient by redundancy. In short, instance HA. The application keeps running even if an instance crashes.
By the way, master and slave are not correct terminology in this context. Use controller and compute instead.

Related

Can I use a load balancer in front of redis sentinels?

We run a container environment (Kubernetes) and we have a set of redis sentinels that watch over a bunch of redis instances.
Since it's a containerized environment, configuration is mostly dynamic. A sentinel container might die, another one replaces it, etc.
This poses a problem for application configuration. Normally on a static setup, you provide the client with all the addresses for the sentinels and he works with it. On a frozen container, if the environment change, the configuration becomes outdated.
To solve this, we can use a load balancer in front of the redis sentinels. This way even if the underlying containers/ips change, the application configuration is still valid.
I'm aware that sentinels never forget other sentinels (and the same for slaves) but we can flush those when changes do happen.
We do use this today, and haven't felt any side-effects AFAIK, but of course I'd like to know if there's a risk of something going wrong because of this.
So the question is: can I use a load balancer in front of redis sentinels without any major issues?

Clone RabbitMQ admin users, etc. on replacement server

We have a couple of crusty AWS hosts running a RabbitMQ implementation in a cluster. We need to upgrade the hardware, and therefore we developed a Chef cookbook to spawn replacement servers.
One thing that we would rather not recreate by hand is the admin users, the queues, etc.
What is the best method to get that stuff from the old hosts to the new ones? I believe it's everything that lives in the /var/lib/rabbitmq/mnesia directory.
Is it wise to copy the files from one host to another?
Is there a programmatic means to do this?
Can it be coded into our Chef cookbook?
You can definitely export and import configuration via command line: https://www.rabbitmq.com/management-cli.html
I'm not sure about admin user, though.
If you create new rabbitmq nodes on your new hardware, you will get all the users in that new node. This is easy to try:
run docker container with image of rabbitmq (with management plugin)
and create a user
run another container and add that node to the
cluster of the first one
kill rabbitmq on the first one, or delete
the docker container and you will see that you still have the newly
created user on the 2nd (but now master) node
I wrote docker since it's faster to create a cluster this way, but if you already have a cluster you could use it for testing if you prefer.
For the queues and exchanges, I don't want to quote almost everything found in the rabbitmq doc page for the high availability, but I will just say that you have to pay attention to the following:
exclusive queues because they are gone once the client connection is gone
queue mirroring (if you have any set up, if not it would be wise to consider it, if not even necessary)
I would do the migration gradually, waiting for the queues to get emptied and then kill of the nodes on the old hardware. It maybe doable in a big-bang fashion, but seems riskier. If you have a running system, than set up queue mirroring and try to find appropriate moment to do manual sync - but careful, this has a huge impact on the broker performance.
Additionally there is this shovel plugin (I have to point out that I did not use it or even explore it) but that may be another way to go since (quoting form the link):
In essence, a shovel is a simple pump. Each shovel:
connects to the source broker and the destination broker, consumes
messages from the queue, re-publishes each message to the destination
broker (using, by default, the original exchange name and
routing_key).

Running Redis with Docker (performance issue)

Has anyone else seen performance issues with running Redis in a Docker container environment?
Here's what I've noticed...
Setup A: Local machine, traditional Redis install
Setup B: Local machine, using canonical Redis image https://registry.hub.docker.com/_/redis/
I've got an identical HTTP server on my local machine that fires as fast as the request/response cycle will allow.
Observations:
- A can sustain approximately 2X the throughput of B.
- B performs identical to A when you benchmark (from within the container)
So, this leads me to believe that B is slower than A because of a networking issue: i.e. the networking relays introduced by running software in a virtualized environment are creating significant performance issues...
Just wondering if anyone else has noticed anything like this?
Docker's default networking option, --net=bridge introduces overhead due to NAT packet rewriting, noticeable with high packet rates.
Network performance can be improved by using --net=host, instructing Docker to not create a separate network stack for the container, allowing full access to the host network interfaces.
This option should be used carefully though, as it lets container processes open low-numbered ports like any other root process, and access local network services like D-bus, which can lead to processes in the container being able to do unexpected things.
In short: If you know what you are running inside the container it is safe. If you suspect unwanted or aggressive behavior - do not do it.

How does ServiceStack PooledRedisClientManager failover work?

According to the git commit messages, ServiceStack has recently added failover support. I initially assumed this meant that I could pull one of my Redis instances down, and my pooled client manager would handle the failover elegantly and try to connect with one of my alternate Redis instances. Unfortunately, my code just bugs out and says that it can't connect with the initial Redis instance.
I am currently running instances of Redis 2.6.12 on a Windows, with the master at port 6379 and a slave at 6380, with sentinels set up to automatically promote the slave to a master if the master goes down. I am currently instantiating my client manager like this:
PooledRedisClientManager pooledClientManager =
new PooledRedisClientManager(new string[1] { "localhost:6379"},
new string[1] {"localhost:6380"});
where the first array is read-write hosts (for the master), and the second array is read-only hosts (for the slave).
When I terminate the master at port 6379, the sentinels promote the slave to a master. Now, when I try to run my C# code, instead of failing over to port 6380, it simply breaks and returns the error "could not connect to redis Instance at localhost:6379".
Is there a way around this, or will failover simply not work the way I want it to?
PooledRedisClientManager.FailoverTo allows you to reset which are the read/write hosts, vs readonly hosts, and restart the factory. This allows for a quick transition without needing to recreate clients.

Can Cloudbees instances within an app communicate directly?

I am looking to build an Akka-based application in the cloud, for a garage startup that I'm bootstrapping; by the nature of the app, it's semi-stateful, with as much as possible cached in RAM for performance. (It'll be tolerant of being shut down and restarted periodically, but we want to mostly operate via cached information inside the Actors.)
The architecture is designed for a cluster of servers, communicating between them as necessary so that a user session on node A can query a middleware Actor on node B when appropriate. So my question is, how hard is that in CloudBees?
My understand from this page is that there is no automatic directory service to manage this sort of intra-cluster communication yet, but I can probably live with that -- worse comes to worst, I should be able to manage discovery via the DB, with each node registering itself when it comes up and opening up many-to-many communications with the others.
What I want to check, though, is that this communication is straightforward. Does each node have a reliable local IP that it can advertise for others to contact it on, that is at least stable during this run of the application? Or is there another/better way for a node to advertise its address to the rest of the nodes running this app?
(I assume that the nodes of an app all share the same DB instance.)
Any guidance here would be greatly appreciated. I'd like to choose a hosting provider soon, and keep returning to CloudBees as the most promising-looking of the options...
There are no limitations currently on instances communicating with each other - the trick is in discovering membership. There is an api that will be shortly be released that will allow you to track membership - but for now, the following may work:
To get the port, look at the file names in $PWD/.genapp/ports (as applications can have multiple ports) - (eg System.getenv("PWD") + ".genapp/ports" - list the files in that directory - generally will be just 1 - the file name is the port). There are other ways - for example the "sun.java.command" system property on JVM apps too.
The hostname can be obtained via the usual means (eg InetAddress.getLocalHost().getHostName()): this host
name will be the private name - ie it will resolve to a private IP -
good for node to node communication.
Public IP/hostname: perform a HTTP get (from the server) to the following URL:
http://instance-data/latest/meta-data/public-hostname (will only
return the public IP on the server side of course).
(see http://developer-blog.cloudbees.com/2012/11/finding-port-or-address-of-your.html)
You can then, as you say, on startup, register the appropriate port/private hostname with a DB, and then read that on each node to "seed" the cluster (akka doesn't have to know about all members - just enough seeds) I would think a 2 phase startup: 1: register host/port, 2, look for other members, add them as seed members to the local Akka configuration (may need to periodically do the same for a while, as other nodes startup - to ensure it is seeded enough)
From my reading of Akka setup here: http://doc.akka.io/docs/akka/snapshot/scala/remoting.html
It looks like you can specify the port - so if possible, I would set that to be the app_port environment variable - that means each node can communicate via the private hostname with that port. However, http traffic will also be routed to it - can akka handle this as well - or does it need to have a discrete port for akka and another for any http interface?