I have 2 questions about the orchestrationtool Kubernetes.
1) What is the Kube controller doing? Sometimes I read that it's really creating pods (the API server tells it how). And Sometimes I read it's just watching the whole process and see changes in the etcd.
2) Why do I see the Replication Controller on the Master in so many architecture-overviews of Kubernetes? I thought it was created for a service (which contains pods). So that it's always placed on the node.
The kube-controller-manager is managing a bunch of the cluster's state asynchronously, including the replication controllers. It's made up of a number of different "controllers" that watch the apiserver to know what the desired state of the world is, then do work to try to get there when the actual state differs from the desired state.
For example, it's the component that creates more pods for a replication controller when not enough exist, or tears one down when too many exist.
It also manages things like external load balancers for services running in the cloud, which endpoints make up a service, persistent volumes and their claims, and many of the new features coming up in 1.1 like daemon sets and pod autoscaling.
Related
I'm looking into altering the architecture of a hosting service intended to scale arbitrarily.
On a given machine, the service works roughly as follows:
Start a container running Redis cluster client that joins a global cluster.
Start containers for each of the "Models" to be hosted.
Use upstream Redis cluster for managing model global state. Handle namespacing via keys themselves.
I'm wondering if it might be possible to change to something like this:
For each Model, start a container running the Model and a Redis cluster client.
Reverse proxy the Redis service using something like Nginx to be available on a certain path, e.g., <host_ip>:6397/redis-<model_name>. (Note: I can't just proxy from different ports, because in theory this is supposed to be able to scale past 65,535 models running globally.)
Join the Redis cluster by using said path.
Internalizing the Redis service to the container is an appealing idea to me because it is closer to what the hosting service is supposed to achieve. We do want to share compute; we don't want to share a KV store.
Anyways, I haven't seen anything that suggests this is possible. So, sticking with the upstream may be my only option. But, in case anyone knows otherwise, I wanted to check and see.
I am having one namespace and one deployment(replica set), My Apache logs should be written outside the pod, how is it possible in Kubernetes.
This is a Community Wiki answer so feel free to edit it and add any additional details you consider important.
You should specify more precisely what you exactly mean by outside the pod, but as David Maze have already suggested in his comment, take a closer look at Logging Architecture section in the official kubernetes documentation.
Depending on what you mean by "outside the Pod", different solution may be the most optimal in your case.
As you can read there:
Kubernetes provides no native storage solution for log data, but you can integrate many existing logging solutions into your Kubernetes
cluster ... Cluster-level logging architectures are described in assumption that a logging backend is present inside or outside of your cluster.
Here are mentioned 3 most popular cluster-level logging architectures:
Use a node-level logging agent that runs on every node.
Include a dedicated sidecar container for logging in an application pod.
Push logs directly to a backend from within an application.
Second solution is widely used. Unlike the third one where the logs pushing needs to be handled by your application container, sidecar approach is application independend, which makes it much more flexible solution.
So that the matter was not so simple, it can be implemented in two different ways:
Streaming sidecar container
Sidecar container with a logging agent
So I have a question regarding setting up OKD for our needs - our team has already established that Kubernetes is basically the simplest way for us to manage our stack. We don't have too much workload; probably 3 dedicated servers could work through all of it, but we have a lot of services and tools that are best served by running in docker containers, and we also strongly benefit from running our fairly monolithic core application as a container to make deployment and maintenance simpler.
The question though, is that how many nodes we need; specifically, whether we need HA Master nodes.
From the documentation, it seems that Infrastructure nodes are responsible for routing. Does this mean that even if the master node goes down, the other nodes are still available and routing works, so long as domains point at the infrastructure nodes? Or would a failed master make all the other nodes unreachable?
In our environment router pods are running on infra nodes and we can safely turn off master node without impact for applications.
master node: api, controllers, etcd
infra node: registry, router, metrics, logging etc.
With master turned off you just can't manage cluster, the rest works fine. It is good to have more than one master node for etcd redundancy, but with such small environment I think it makes no sense maintain more.
We are running one of our services in a newly created kubernetes cluster. Because of that, we have now switched them from the previous "in-memory" cache to a Redis cache.
Preliminary tests on our application which exposes an API shows that we experience timeouts from our applications to the Redis cache. I have no idea why and it issue pops up very irregularly.
So I'm thinking maybe the reason for these timeouts are actually network related. Is it a good idea to put in affinity so we always run the Redis-cache on the same nodes as the application to prevent network issues?
The issues have not arisen during "very high load" situations so it's concerning me a bit.
This is an opinion question so I'll answer in an opinionated way:
Like you mentioned I would try to put the Redis and application pods on the same node, that would rule out wire networking issues. You can accomplish that with Kubernetes pod affinity. But you can also try nodeslector, that way you always pin your Redis and application pods to a specific node.
Another way to do this is to taint your nodes where you want to run your workloads and then add a toleration to the Redis and your application pods.
Hope it helps!
I am looking to build an Akka-based application in the cloud, for a garage startup that I'm bootstrapping; by the nature of the app, it's semi-stateful, with as much as possible cached in RAM for performance. (It'll be tolerant of being shut down and restarted periodically, but we want to mostly operate via cached information inside the Actors.)
The architecture is designed for a cluster of servers, communicating between them as necessary so that a user session on node A can query a middleware Actor on node B when appropriate. So my question is, how hard is that in CloudBees?
My understand from this page is that there is no automatic directory service to manage this sort of intra-cluster communication yet, but I can probably live with that -- worse comes to worst, I should be able to manage discovery via the DB, with each node registering itself when it comes up and opening up many-to-many communications with the others.
What I want to check, though, is that this communication is straightforward. Does each node have a reliable local IP that it can advertise for others to contact it on, that is at least stable during this run of the application? Or is there another/better way for a node to advertise its address to the rest of the nodes running this app?
(I assume that the nodes of an app all share the same DB instance.)
Any guidance here would be greatly appreciated. I'd like to choose a hosting provider soon, and keep returning to CloudBees as the most promising-looking of the options...
There are no limitations currently on instances communicating with each other - the trick is in discovering membership. There is an api that will be shortly be released that will allow you to track membership - but for now, the following may work:
To get the port, look at the file names in $PWD/.genapp/ports (as applications can have multiple ports) - (eg System.getenv("PWD") + ".genapp/ports" - list the files in that directory - generally will be just 1 - the file name is the port). There are other ways - for example the "sun.java.command" system property on JVM apps too.
The hostname can be obtained via the usual means (eg InetAddress.getLocalHost().getHostName()): this host
name will be the private name - ie it will resolve to a private IP -
good for node to node communication.
Public IP/hostname: perform a HTTP get (from the server) to the following URL:
http://instance-data/latest/meta-data/public-hostname (will only
return the public IP on the server side of course).
(see http://developer-blog.cloudbees.com/2012/11/finding-port-or-address-of-your.html)
You can then, as you say, on startup, register the appropriate port/private hostname with a DB, and then read that on each node to "seed" the cluster (akka doesn't have to know about all members - just enough seeds) I would think a 2 phase startup: 1: register host/port, 2, look for other members, add them as seed members to the local Akka configuration (may need to periodically do the same for a while, as other nodes startup - to ensure it is seeded enough)
From my reading of Akka setup here: http://doc.akka.io/docs/akka/snapshot/scala/remoting.html
It looks like you can specify the port - so if possible, I would set that to be the app_port environment variable - that means each node can communicate via the private hostname with that port. However, http traffic will also be routed to it - can akka handle this as well - or does it need to have a discrete port for akka and another for any http interface?