How to scale host server resources to run multiple applications at once? - virtual-machine

I have to set up a relatively big system consisting of Virtual Machines, where I will need to run several different applications. The applications will be provided to me as black boxes, either in form of software to be installed by myself (on a new VM), or in the form of Virtual Machine containing already everything for an application.
My task is to set up a host server and estimate its general resources, which will be then distributed between all Virtual Machines in my system. Some of the applications are more demanding than the others, and I have also time deadlines, so it could happen that all the application need to be executed simultaneously.
For each application I have the resources description it needs (but no corresponding time and performance estimates), so that I know how many processors and processors cores I normally will need for a single app. But how should I do with all of them running simultaneously? Should I simply add together the requirements or is there some common formula for scaling of the host servers general CPUs, Memory and Storage resources?
And one more questions. Such a system with distribution of real physical resources between several VMs - is it already a cluster? Or not yet?

Related

Isn't virtual machine quite a type of process?

I'm trying to understand the basic concepts of Docker, and lots of docs say that "Docker is not virtual machine, but a process". To me, this sentence looks quite awkward, since as far as I know, virtual machine it self also runs on host os, which makes itself a 'Process'.
Is there any big difference between the way the virtual machine works and the other normal applications/process do?
Docker is a brand name of a container management software system.
TL;DR:
Containers are a packaging concept.
VMs are a compatibility concept.
VMs are a security concept.
A container is not a process, it is an isolation of a collection of processes within a single-system-image. What is isolated? First, and foremost, the path name space. Processes within a given container share a path name space, so that they agree that /usr/bin/env is the same thing. Two processes in different containers, or perhaps inside the non-containered environment, would not necessarily see the same file for /usr/bin/env. This functionality has been a feature of UNIX derived systems for at least 40 years; under the service chroot().
More recently, containers have taken to isolate things that are not in the namespace, like processes, user ids and network interfaces. In older chroot-based systems, running ps in a container would show processes that were not in that container; although special handling hacked into to prevent a chrooted root user from gaining root access on the underlying system.
In these modern systems, not only is the pid space partitioned, but also user ids, so that root in a container does not correspond to root on the overall system.
All this is accomplished by controlling many features of the kernel in a single-system-image. The software that controls these features: Docker, amongst others.
A Virtual Machine is not part of a single-system-image. Each VM is its own logical computer, running its own kernel, shell, etc.. With some careful configuration, you can make it so various files appear within many of the VMs; but that is no different than mounting file systems exported by a network file system.
Why choose one over the other: containers share my os, and are handy to escape the .so verionitis hell caused by conflicting software systems; I can package my software in a container, and it is isolated from whatever the running system is. I cannot, however, package the kernel I need; so if my software requires ubuntu 14.02; and I am running 18.04, containers will not save me. Containers are a packaging concept.
VMs are handy to support multiple versions or types of operating systems in a single computer. Since each VM runs unique system software, I can run my 14.02 app on my 18.04 system and none is the wiser. VMs are a compatibility concept.
VMs are also handy as a security layer. Imagine that a web page has a js-bomb that can corrupt my kernel (I know, quite a stretch). If I run my browser in a container, I have corrupted my kernel. If I run it in a VM, I have corrupted that VMs kernel -- I merely have to delete it, or rewind it, and the corruption is gone. VMs are a security concept.

Do containers always require less resources or do they create additional overhead

I have been learning about the Docker and its advantages. These advantages include, but are not limited to:
Rapid Deployment
Portability
Security
Isolation
Version control
Lightweight footprint with minimal overhead
My questions:
Will it always be less stressful on a host machine to run multiple applications such as Zeppelin, Hadoop, Flume, ect. in individual docker containers or would the application virtual machine be added on top of the container (docker creates overhead)?
At some point does the number of containers running create an overhead that will cost more resources than it would to just run all of the tools directly off the host machine?
Would it be better to run all of the apps in one container?
Video about docker: https://www.youtube.com/watch?v=YFl2mCHdv24
I found this forum reply by someone at Docker.
Will it always be less stressful on a host machine to run multiple applications such as Zeppelin, Hadoop, Flume, ect. in individual docker containers or would the application virtual machine be added on top of the container (docker creates overhead)?
The post seems to indicate Docker does come with overhead that scales linearly with the number of containers being ran on a given host system. This has to do with Docker being written in Golang and how Golang operates.
At some point does the number of containers running create an overhead that will cost more resources than it would to just run all of the tools directly off the host machine?
Judging from the aforementioned Docker staff response, it seems like this would be the case. I'd not look at "overhead" in isolation though; running 40 different Java containers with disparate versions on one host is unrealistic for technical reasons. Docker allows you to do this easily because it isolates each process.
So if I may do a slanted comparison, the human management overhead created by managing 40 Java applications on one host would certainly be greater cost than the additional system resources you might save by foregoing containerizing them.
Would it be better to run all of the apps in one container?
Assuming the context here is system overhead, and taking into consideration the linear scaling of Docker process overhead with running container count, fewer containers = less Docker process overhead. Similar to the previous question though, you might be introducing complexity by lumping everything into one container that would cost some poor soul a lot of hours troubleshooting or fixing.
If you have several pieces (web server, app server, database, memcache, activemq), putting them in one container will eventually become very inefficient because you can't scale them individually. If you need to scale your app server, you can't just scale your app server, you have to also unnecessarily scale all the other services in the container.

To virtualize or not to virtualize a bare metal server for a kubernetes deployment

I'd like to deploy kubernetes on a large physical server (24 cores) and I'm uncertain as to a number of things.
What are the pros and cons of creating virtual machines for the k8s cluster other than running on bare-metal.
I have the following considerations:
Creating vms will allow for work load isolation. New vms for experiments can be created and assigned to devs.
On the other hand, with k8s running on bare metal a new NAMESPACE can be created for each developer for experimentation and they can run their code in it. After all their code should be running in docker containers.
Security:
Having vms would limit the amount of access given to future maintainers, limiting the amount of damage that could be done. While on the other hand the primary task for any future maintainers would be adding/deleting nodes and they would require bare metal access to do that.
Authentication:
At the moment devs would only touch the server when their code runs through the CI pipeline and their running deployments are deployed. But what about viewing logs? Could we setup tiered kubectl authentication to allow devs to only access whatever namespaces have been assigned to them (I believe this should be possible with the k8s namespace authorization plugin).
A number of vms already exist on the server. Would this be an issue?
128 cores and doubts.... That is a lot of cores for a single server.
For kubernetes however this is not relevant:
Kubernetes can use different sized servers and utilize them to the maximum. However if you combine the master server processes and the node/worker processes on a single server, you might create unwanted resource issues. You can manage those with namespaces, as you already mention.
What we do is use continuous integration with namespaces in a single dev/qa kubernetes environment in which changes have their own namespace (So we run many many namespaces) and run full environment deployments in those namespaces. A bunch of shell scripts are used to manage this. This works both with a large server as what you have, as well as it does with smaller (or virtual) boxes. The benefit of virtualization for you could mainly be in splitting the large box in smaller ones so that you can also use it for other purposes then just kubernetes (yes, kubernetes runs except MS Windows, no desktops, no kernel modules for VPN purposes, etc).
I would separate dev and prod in the form of different vms. I once had a webapp inside docker which used too many threads so the docker daemon on the host crashed. It was limited to one host luckily. You can protect this by setting limits, but it's a risk: one mistake in dev could bring down prod as well.
I think the answer is "it depends!" which is not really an answer. Personally, I would split up the machine using VM's and deploy that way. You've got better flexibility as to how much of the server's resources you carve out and you can easily create new environments, then destroy easily.
Even if these vms are really big, I think it's still easier to manage also given that you have existing vm's on the machine.
That said, there's not a technical reason that you can't run a single node server, but you may run into problems with downtime with upgrades (if that's an issue), as well as if that server needs patched or rebooted, then your entire cluster is down.
I would look at your environment needs for HA and uptime, as well as how you are going to deploy VM's (if you go that route), and decide what works the best for you.

I want to learn about virtualization

As a very beginner, I only know how to create VMs and install OS on these using Oracle VirtualBox. All the VMs created are dependent on the hardware resources (CPU, RAM etc.) of a single machine. If the machine goes down the VMs will go down. Need to know how VMs can be created using taking resources from different physical machines (manually or dynamically) to avoid failure of any VMs.
For example: There are 4 physical machines having 8 core and 16GB RAM each. Now, I want to create three VM having having 8 core and 16GB RAM taking from different physical machines. If one physical machine goes down, no VM will be down.
You can look up clustering solutions (e.g. VMware clusters, or Hyper-V failover clusters). In this model, if a physical host goes down, then the virtualization platform will power up the VMs on other hosts.
If you're looking for zero downtime, then VMware has something called Fault Tolerance in which a shadow copy of a VM is running on a different host and is continuously synchronized with the primary copy. If the primary host goes down, the shadow copy can take over with zero downtime (e.g. you don't have to boot from the shadow copy because it's already running). This feature, while cool, has a lot of real-world limitations in how it inter-operates with other features of VMware. For example, as of vSphere 6.0, you cannot do various kinds of migrations for such VMs, etc. I believe it also requires a more expensive license.
These solutions generally require some shared resources between the physical hosts (most notably storage). Otherwise they will not work (or at the very least, performance will greatly suffer).

What are the benefits of a Hypervisor VM?

I'm looking into using virtual machines to host multiple OSes and I'm looking at the free solutions which there are a lot of them. I'm confused by what a hypervisor is and why are they different or better than a "standard" virtual machine. When I mean standard I going to use the benchmark virtual machine VMWare Server 2.0.
For a dual core system with 4 GB of ram that would be capable of running a max of 3 VMs. Which is the best choice? Hypervisor or non-hypervisor and why? I've already read the Wikipedia article but the technical details are over my head. I need a basic answer of what can these different VM flavors do for me.
My main question relates to how I would do testing on multiple environments. I am concerned about the isolation of OSes so I can test applications on multiple OSes at the same time. Also which flavor gives a closer experience of how a real machine operates?
I'm considering the following:
(hypervisor)
Xen
Hyper-V
(non-hypervisor)
VirtualBox
VMWare Server 2.0
Virtual PC 2007
*The classifications of the VMs I've listed may be incorrect.
The main difference is that Hyper-V doesn't run on top of the OS but instead along with the system it runs on top of a thin layer called hypervisor. Hypervisor is a computer hardware platform virtualization software that allows multiple operating systems to run on a host computer concurrently.
Many other virtualization solution uses other techniques like emulation. For more details see Wikipedia.
Disclaimer, everything below is (broadly) my opinion.
Its helpful to consider a virtual machine monitor (a hypervisor) as a very small microkernel. It has very few jobs beyond accessing the underlying hardware, such as monitoring of event channels and granting guest domains access to specific resources .. while enforcing some kind of scheduler.
All guest machines are completely oblivious of the others, the isolation is true. Guests do not share memory with the privileged guest (or each other). So, in this instance, you could (roughly) think of each guest (even the privileged one) as a process, as far as the VMM is concerned. Typically, the first guest gets extra privileges so that it can manage the rest. This is the ideal technology to use when virtual machines are put into production and exposed to the world.
Additionally, some guests can be patched to become aware of the hypervisor, significantly increasing their performance.
On the other hand we have things like VMWare and QEMU, which rely on the host kernel to give it access to bare metal and enough memory to exist. They assume that all guests need to be presented with a complete machine, the limits put on the process presenting these (more or less) become the limits of the virtual machine. I say more or less because device mapper QoS is not commonly implemented. This is the ideal solution for trying code in some other OS, or some other architecture. A lot of people will call QEMU, Simics or even sometimes VMWare (depending on the product) a 'simulator'.
For production roll outs I use Xen, for testing something I just cross compiled I use QEMU, Simics or VirtualBox.
If you are just testing / rolling new code on various operating systems and architectures, I highly recommend #2. If your need is introspection (i.e. watching guest memory change as bad programs run in a guest) ... I'd need more explanation before answering.
Benefits of Hypervisor:
Hypervisor separates virtual machines logically, assigning each its own slice of underlying computing power, memory, and storage, thus preventing the virtual machines from interfering with each other.