Do containers always require less resources or do they create additional overhead - apache

I have been learning about the Docker and its advantages. These advantages include, but are not limited to:
Rapid Deployment
Portability
Security
Isolation
Version control
Lightweight footprint with minimal overhead
My questions:
Will it always be less stressful on a host machine to run multiple applications such as Zeppelin, Hadoop, Flume, ect. in individual docker containers or would the application virtual machine be added on top of the container (docker creates overhead)?
At some point does the number of containers running create an overhead that will cost more resources than it would to just run all of the tools directly off the host machine?
Would it be better to run all of the apps in one container?
Video about docker: https://www.youtube.com/watch?v=YFl2mCHdv24

I found this forum reply by someone at Docker.
Will it always be less stressful on a host machine to run multiple applications such as Zeppelin, Hadoop, Flume, ect. in individual docker containers or would the application virtual machine be added on top of the container (docker creates overhead)?
The post seems to indicate Docker does come with overhead that scales linearly with the number of containers being ran on a given host system. This has to do with Docker being written in Golang and how Golang operates.
At some point does the number of containers running create an overhead that will cost more resources than it would to just run all of the tools directly off the host machine?
Judging from the aforementioned Docker staff response, it seems like this would be the case. I'd not look at "overhead" in isolation though; running 40 different Java containers with disparate versions on one host is unrealistic for technical reasons. Docker allows you to do this easily because it isolates each process.
So if I may do a slanted comparison, the human management overhead created by managing 40 Java applications on one host would certainly be greater cost than the additional system resources you might save by foregoing containerizing them.
Would it be better to run all of the apps in one container?
Assuming the context here is system overhead, and taking into consideration the linear scaling of Docker process overhead with running container count, fewer containers = less Docker process overhead. Similar to the previous question though, you might be introducing complexity by lumping everything into one container that would cost some poor soul a lot of hours troubleshooting or fixing.
If you have several pieces (web server, app server, database, memcache, activemq), putting them in one container will eventually become very inefficient because you can't scale them individually. If you need to scale your app server, you can't just scale your app server, you have to also unnecessarily scale all the other services in the container.

Related

Docker- what real value does it bring for our team?

I am very very new to Docker. Our team has had a very nice deployment line up where We have different CI engines for different projects including Jenkis and TeamCity.
Developers usually check-in and CI takes over, deploys and its perfectly ready for test team to test. I always thought this to be a perfect model. Of course, some parts and our implementation have their flaws but it worked very well for what we wanted.
Now, our Dev-Ops is introducing Docker where test teams get a Docker Image from Docker Registry Everytime we run a build from teamcity. While it sounds really really fancy I am still failing to understand the benefit of it.
After my research, my conclusion was that Dockers can be a good light weight replacements for VM. BUT that is ONLY IF you are using any VMs? We are not using any VMS? I just do not understand what is the real value here? Also, while searching I found a relatively good link on Docker:
https://www.ctl.io/developers/blog/post/what-is-docker-and-when-to-use-it/
Where they discuss when you should use Docker and one of the point says that:
Use Docker whenever your app needs to go through multiple phases of development (dev/test/qa/prod, try Drone or Shippable, both do Docker CI/CD)
Ok. Howeve rthey do not further elaborate on why is docker useful when my app has to go through multiple phases?
And how it is exptremely helpful over regular Dev/Test set up when the existing set up is already working smooth?
First, you are right about comparing it to VMs in that it is similar to a VM. However, docker is incredibly lightweight. This property is the one that surprised me most in the beginning. As opposed to virtual machines, containers share resources much more efficiently. Virtual machines are isolated. Containers can run simultaneously on a host machine with very little overhead. You can configure containers to be able to talk to each other (via volume or port bindings).
Furthermore, in my team, docker brings the following benefits:
our application consists of one big application and several other few microservices. But we want to release all as one package with inter-dependencies among the applications, which eliminates problems with figuring out which version of application and microservices should be deployed together (compatiblity) etc. That is, the image contains all you need and you can bring all applications or one-by-one up/down using docker-compose. You do not need to deploy, you simply pull the image and fire a container/s. If you wish to stop one of the microservices, it can be done without affecting the others.
developers in the team, can run the very same image on local machine, for example to troubleshoot a problem occurred in the production; which means troubleshooting can be done in the same environment as in the production. This brings environment standardization and no more "but it works on machine" talk.
another benefit it brings to us is the following: we build a docker image, run our tests against it, and push it to the registry once all these phases succeed, which translates into a great portability.
Ability to version control the containers. You can easily inspect containers between the current version and the previous versions. If you wish to rollback - that is done smoothly.
Isolating and securing applications. All containers are isolated and you can easily control what goes in and out.
It took me a year before I got used to the idea, but now it seems simple enough.
I think part of that comes from the fact that people keep calling Docker a "virtual machine", which is not accurate. That's really just a nickname for what's happening behind the scenes. In a lot of ways, Docker will NOT replace a complete virtualization solution, such as VMWare. It does, however, bring forth a new way of thinking about infrastructure. One that many people have a difficult time wrapping their heads around.
You can start asking yourself: What makes a Linux distribution unique?
Aside from the kernel, everything else is just a "standard way" of organizing binaries, libraries, runtime and configuration files. You need your binaries in /bin, your libs in /lib, your configuration in /etc. User installations get placed under /usr...
Most distributions will keep the main structure from the Unix legacy and add its own quirks. Each one will have its own way to manage and distribute packages. Each will maintain their own versions of libraries, drivers, etc.
The key ingrident is the kernel. That's something they all have in common. Nowadays, recent builds of the Linux kernel are compatible with pretty much all major distributions available. So, aside from /boot, most of everything else is just a matter of having the right files in the right place with the right permissions.
Now, imagine you take all that distribution bundle (except the kernel) and place it all in another directory of your running OS. Taking advantage of the same kernel you are already running, you isolate a new process so that it "thinks" that / is now that directory. Bingo! This process now "thinks" it's running all by itself on another operating system.
Docker builds on top of Linux Containers, which allows us to do excatly that, but in a more friendly and easier way. Don't think of it as a virtual machine. Think of it as process isolation. The running kernel will share the machine's resources with this process, while keeping it isolated from the rest of the system. It's like jails on steroids.
That was a broad simplification. But, given the concept, think about the implications of this idea.
You can have on the same host, multiple processes with completely different environments that might otherwise conflict with each other. One may be a legacy binary that needs old libraries in place (legacy systems that never die). Another may be the most recent build of a bleeding edge technology. Sharing the same kernel is a efficient, and valuable resource management.
The most value I found comes from managing the infrastructure. Once you install Docker on the hosts, configure a swarm, and define a way of deploying containers, you mostly forget about the hosts. Adding users, installing packages, customizing, editing configuration files... All that becomes a development task on your desktop. There's an incentive to script more, to automate more. To keep your hands away from the physical or virtual machines, unless absolutely necessary.
Gone are the days when someone changed some obscure setting on the server to work around some weird application behavior, forgot to tell anyone about it and took a vacation. Changes to the environment can be commited to version control, tracked and improved by everyone on the team. If your datacenter goes through a disaster, recreating the whole environment is a matter of rebuilding images and redeploying containers. Your infrastructure becomes consistent and reproducible, while keeping the doors open to a wide variety of operating systems and customized configurations for each application.
Developers can take advantage of Docker with the ability of recreating dev/staging/production environments on their desktops. No need to polute a dev machine with application servers and database installations, or even the toll of Virtual Box to emulate all that.
Testing can be automated with a higher level of isolation. The Selenium team already has official Docker images. Creating an entire test hub should be a walk in the park with those puppies.
Building custom software, such as compiling Nginx with third party modules, can also be done inside containers from specialized images. No need to keep an entire server dedicated to it, or even polute your desktop with all the dependencies and build packages.
Overall, we've been having a great experience with Docker. We've migrated our staging environment to this new platform, and plan to migrate other parts of the infrastructure as well, eventually into production. So far, so good.
I hope you can convince enough people to take a better look at it. I'll admit, it took me sometime to get used to the idea. But once you get it, it's actually worth it.

To virtualize or not to virtualize a bare metal server for a kubernetes deployment

I'd like to deploy kubernetes on a large physical server (24 cores) and I'm uncertain as to a number of things.
What are the pros and cons of creating virtual machines for the k8s cluster other than running on bare-metal.
I have the following considerations:
Creating vms will allow for work load isolation. New vms for experiments can be created and assigned to devs.
On the other hand, with k8s running on bare metal a new NAMESPACE can be created for each developer for experimentation and they can run their code in it. After all their code should be running in docker containers.
Security:
Having vms would limit the amount of access given to future maintainers, limiting the amount of damage that could be done. While on the other hand the primary task for any future maintainers would be adding/deleting nodes and they would require bare metal access to do that.
Authentication:
At the moment devs would only touch the server when their code runs through the CI pipeline and their running deployments are deployed. But what about viewing logs? Could we setup tiered kubectl authentication to allow devs to only access whatever namespaces have been assigned to them (I believe this should be possible with the k8s namespace authorization plugin).
A number of vms already exist on the server. Would this be an issue?
128 cores and doubts.... That is a lot of cores for a single server.
For kubernetes however this is not relevant:
Kubernetes can use different sized servers and utilize them to the maximum. However if you combine the master server processes and the node/worker processes on a single server, you might create unwanted resource issues. You can manage those with namespaces, as you already mention.
What we do is use continuous integration with namespaces in a single dev/qa kubernetes environment in which changes have their own namespace (So we run many many namespaces) and run full environment deployments in those namespaces. A bunch of shell scripts are used to manage this. This works both with a large server as what you have, as well as it does with smaller (or virtual) boxes. The benefit of virtualization for you could mainly be in splitting the large box in smaller ones so that you can also use it for other purposes then just kubernetes (yes, kubernetes runs except MS Windows, no desktops, no kernel modules for VPN purposes, etc).
I would separate dev and prod in the form of different vms. I once had a webapp inside docker which used too many threads so the docker daemon on the host crashed. It was limited to one host luckily. You can protect this by setting limits, but it's a risk: one mistake in dev could bring down prod as well.
I think the answer is "it depends!" which is not really an answer. Personally, I would split up the machine using VM's and deploy that way. You've got better flexibility as to how much of the server's resources you carve out and you can easily create new environments, then destroy easily.
Even if these vms are really big, I think it's still easier to manage also given that you have existing vm's on the machine.
That said, there's not a technical reason that you can't run a single node server, but you may run into problems with downtime with upgrades (if that's an issue), as well as if that server needs patched or rebooted, then your entire cluster is down.
I would look at your environment needs for HA and uptime, as well as how you are going to deploy VM's (if you go that route), and decide what works the best for you.

Is a Docker/LXC container a running app or something in-memory?

I just read the excellent SO question asking "What is the difference between Docker and a VM?". However, the accepted answer left me wanting just a wee bit more.
I sort of understand a container (Docker/LXC - I don't get the difference) to use something called libcontainer and AuFS so that dozens, hundreds, even thousands of containers can share the same CPU, RAM and disk resources. But, the answer still doesn't explain exactly what a "container" is!
Is a container just an instance of this libcontainer running? Is it an application that uses libcontainer? Is it something Linuxy like a service/daemon process? So I ask:
What exactly is a "container"?
What are the exact computing/system resources multiple containers can share inside the same VM/physical?
Is Docker/LXC the "hypervisor" in the container equation? If not, what is the relationship between Docker, LXC and libcontainer?
the answer still doesn't explain exactly what a "container" is!
A container is basically a process, isolated, with all the environment it needs for its job (a webserver, a database, a CMS, any software...).
A container uses Linux kernel namespaces to isolate process, network and filesystems.
A container uses the concept of process isolation: filesystem, process, network, resource (CPU, memory),logging (STDIN...), shell isolation.
Docker/LXC - I don't get the difference
LXC is a set of tools to control containers; Docker is another set of tools (all bundled into the same program), that also adds a file format so the contents of an 'image' can be passed around from machine to machine. Docker is vastly more talked-about than LXC.
Docker used to use the lxc library to control containers, but replaced it with its own library called...libcontainer.
What are the exact computing/system resources multiple containers can share inside the same VM/physical?
Containers on the same machine will share CPU, memory and the kernel. Additionally, Docker lets you optionally have them share the same network.
Is Docker/LXC the "hypervisor" in the container equation?
The Linux kernel is the real "hypervisor", and Docker/LXC are sending it commands to create and control containers.

what are the advantages of running docker on a vm?

Docker is an abstraction of OS (kernal) and below, VM is abstraction of Hardware. What is the point of running a Docker on an VM (like Azure) (apart from app portability)? should they not be directly hosting docker on the hardware?
Docker doesn't provide effective isolation for kernel-level security exploits (there's only one ring 0, and it's shared across all containers). Thus, one could reasonably wish to have the additional isolation provided by a virtualization mechanism.
Keep in mind that much of Docker's value is not about security, but about containerization -- building and distributing portable applications in such a way as to ensure that coupling between layers occurs only where and how intended.
The advantage of a cloud system like Azure is that you can go online with your credit card and get a machine up and running in a few minutes. This is enabled by that machine being virtual. Also VMs let you share hardware across multiple users with hardware-level isolation.
If everything else was equal, i.e. you didn't need any of the features of a VM, then you would be correct that a physical machine should be used, as it will run more efficiently.

Stress testing a server and VPS's vs. Dedicated servers

We used to have a dedicated server (1&1) and very infrequently ran into problems with the server having issues.
Recently, we migrated to a VPS (Wiredtree.com) with similar specs to our old dedicated server, but notice frequent problems running out of memory, mysql having to restart, etc... both when knowingly running intensive scrips and also just randomly during normal use.
Because of this, we're considering migrating to another at VPS - this time at Slicehost to see if it performs better.
My question is two fold...
Are their straightforward ways we could stress test a VPS at Slicehost to see if the same issues occur without having to actually migrate everything over?
Also, is it possible that the issues we're facing aren't just because of the provider (Wiredtree) but just the difference between a dedicated box and VPS (despite having similar specs)?
The best way to stress test an environment is to put it under load. If this VPS is hosting a web application, use one of the many available web server benchmark tools: ab, httperf, Siege or http_load. You don't necessarily care that much about the statistics from the tool itself, but more that it puts a predictable load on the server so that you can tune Apache to handle it, or at least not crash and burn.
The one problem you have with testing against Slicehost is that you are at the mercy of the Internet and your bandwidth to Slicehost. You may not be able to put enough load on the server to reach a meaningful conclusion.
Instead, you might find it just as valuable to run one of the many virtualization products on the market and set up a VM with comparable specs to the VPS plan you're considering. Local testing over your LAN will allow you to put a higher and more predictable load on the server.
In either case, you don't need to migrate everything, but you will need to set up an environment for your application to run in, with representative data in your database.
A VPS with similar specs to a dedicated server should perform approximately the same, but in order to get good performance, you still need to tune Apache, MySQL and any other long-lived server processes. In my experience, the out-of-the-box configuration of Apache in many Linux distributions is not ideal and will allow far too many child processes, overcommitting memory and sending the server into a swap-death spiral.