what are the advantages of running docker on a vm? - virtual-machine

Docker is an abstraction of OS (kernal) and below, VM is abstraction of Hardware. What is the point of running a Docker on an VM (like Azure) (apart from app portability)? should they not be directly hosting docker on the hardware?

Docker doesn't provide effective isolation for kernel-level security exploits (there's only one ring 0, and it's shared across all containers). Thus, one could reasonably wish to have the additional isolation provided by a virtualization mechanism.
Keep in mind that much of Docker's value is not about security, but about containerization -- building and distributing portable applications in such a way as to ensure that coupling between layers occurs only where and how intended.

The advantage of a cloud system like Azure is that you can go online with your credit card and get a machine up and running in a few minutes. This is enabled by that machine being virtual. Also VMs let you share hardware across multiple users with hardware-level isolation.
If everything else was equal, i.e. you didn't need any of the features of a VM, then you would be correct that a physical machine should be used, as it will run more efficiently.

Related

Do containers always require less resources or do they create additional overhead

I have been learning about the Docker and its advantages. These advantages include, but are not limited to:
Rapid Deployment
Portability
Security
Isolation
Version control
Lightweight footprint with minimal overhead
My questions:
Will it always be less stressful on a host machine to run multiple applications such as Zeppelin, Hadoop, Flume, ect. in individual docker containers or would the application virtual machine be added on top of the container (docker creates overhead)?
At some point does the number of containers running create an overhead that will cost more resources than it would to just run all of the tools directly off the host machine?
Would it be better to run all of the apps in one container?
Video about docker: https://www.youtube.com/watch?v=YFl2mCHdv24
I found this forum reply by someone at Docker.
Will it always be less stressful on a host machine to run multiple applications such as Zeppelin, Hadoop, Flume, ect. in individual docker containers or would the application virtual machine be added on top of the container (docker creates overhead)?
The post seems to indicate Docker does come with overhead that scales linearly with the number of containers being ran on a given host system. This has to do with Docker being written in Golang and how Golang operates.
At some point does the number of containers running create an overhead that will cost more resources than it would to just run all of the tools directly off the host machine?
Judging from the aforementioned Docker staff response, it seems like this would be the case. I'd not look at "overhead" in isolation though; running 40 different Java containers with disparate versions on one host is unrealistic for technical reasons. Docker allows you to do this easily because it isolates each process.
So if I may do a slanted comparison, the human management overhead created by managing 40 Java applications on one host would certainly be greater cost than the additional system resources you might save by foregoing containerizing them.
Would it be better to run all of the apps in one container?
Assuming the context here is system overhead, and taking into consideration the linear scaling of Docker process overhead with running container count, fewer containers = less Docker process overhead. Similar to the previous question though, you might be introducing complexity by lumping everything into one container that would cost some poor soul a lot of hours troubleshooting or fixing.
If you have several pieces (web server, app server, database, memcache, activemq), putting them in one container will eventually become very inefficient because you can't scale them individually. If you need to scale your app server, you can't just scale your app server, you have to also unnecessarily scale all the other services in the container.

How to setup HPC cluster on one server

i'm working on an application that needs to be tested in a HPC cluster.
i'm thinking about using xcat as a resource manager.
i don't have much hardware resources, i have one HP desktop and MacBook laptop.
the question: is it possible to set up a virtual cluster (using virtualBox or KVM) on one hardware resource
thanks,
The short answer here is yes, depending on how much memory and disk you have available on your one machine. I've done this numerous times on a MacBook Pro with 8 GB of RAM.
The long answer is that there is absolutely nothing magical about an HPC cluster. All you need to test basic parallel applications in a simulated cluster environment are two or more VMs which meet these criteria:
Same OS, as identical as possible.
Passwordless authentication (ssh key based auth).
Same software stack in same location on all nodes (See #4 or use rsync).
At least one shared filesystem, e.g. NFS mounted $HOME
Shared network with name resolution configured (correct /etc/hosts on all nodes)
None of this requires job schedulers, provisioning tools or any complex networking. You can find many NFS setup howtos to help get one node set up to share $HOME to the others, this might be the most complicated part. VirtualBox does a good job of setting up local networking.
On top of this you can layer setting up a job scheduler like SLURM (highly recommended), provisioning tools like Warewulf or xCat, parallel filesystems across the VMs (BeeGFS is easy to set up and a great introduction), etc. I have had a full featured stateless cluster simulated on my Macbook Pro a number of times using tools from this list and VirtualBox VMs. It's a great way to learn about setting up an HPC cluster.

To virtualize or not to virtualize a bare metal server for a kubernetes deployment

I'd like to deploy kubernetes on a large physical server (24 cores) and I'm uncertain as to a number of things.
What are the pros and cons of creating virtual machines for the k8s cluster other than running on bare-metal.
I have the following considerations:
Creating vms will allow for work load isolation. New vms for experiments can be created and assigned to devs.
On the other hand, with k8s running on bare metal a new NAMESPACE can be created for each developer for experimentation and they can run their code in it. After all their code should be running in docker containers.
Security:
Having vms would limit the amount of access given to future maintainers, limiting the amount of damage that could be done. While on the other hand the primary task for any future maintainers would be adding/deleting nodes and they would require bare metal access to do that.
Authentication:
At the moment devs would only touch the server when their code runs through the CI pipeline and their running deployments are deployed. But what about viewing logs? Could we setup tiered kubectl authentication to allow devs to only access whatever namespaces have been assigned to them (I believe this should be possible with the k8s namespace authorization plugin).
A number of vms already exist on the server. Would this be an issue?
128 cores and doubts.... That is a lot of cores for a single server.
For kubernetes however this is not relevant:
Kubernetes can use different sized servers and utilize them to the maximum. However if you combine the master server processes and the node/worker processes on a single server, you might create unwanted resource issues. You can manage those with namespaces, as you already mention.
What we do is use continuous integration with namespaces in a single dev/qa kubernetes environment in which changes have their own namespace (So we run many many namespaces) and run full environment deployments in those namespaces. A bunch of shell scripts are used to manage this. This works both with a large server as what you have, as well as it does with smaller (or virtual) boxes. The benefit of virtualization for you could mainly be in splitting the large box in smaller ones so that you can also use it for other purposes then just kubernetes (yes, kubernetes runs except MS Windows, no desktops, no kernel modules for VPN purposes, etc).
I would separate dev and prod in the form of different vms. I once had a webapp inside docker which used too many threads so the docker daemon on the host crashed. It was limited to one host luckily. You can protect this by setting limits, but it's a risk: one mistake in dev could bring down prod as well.
I think the answer is "it depends!" which is not really an answer. Personally, I would split up the machine using VM's and deploy that way. You've got better flexibility as to how much of the server's resources you carve out and you can easily create new environments, then destroy easily.
Even if these vms are really big, I think it's still easier to manage also given that you have existing vm's on the machine.
That said, there's not a technical reason that you can't run a single node server, but you may run into problems with downtime with upgrades (if that's an issue), as well as if that server needs patched or rebooted, then your entire cluster is down.
I would look at your environment needs for HA and uptime, as well as how you are going to deploy VM's (if you go that route), and decide what works the best for you.

Is a google compute virtual machine highly available?

So I have a cloud virtual machine on google compute, does this mean by nature that it is highly available? If the VM is running on a single piece of hardware on GCE, if the piece of hardware breaks then the VM could go down. Is the VM running on some kind of RAID, but for servers? So if one of the machines goes down another machine will pick up and continue running the vm? Thanks.
The machine itself is not highly available. However, Google takes several steps to increase reliability:
Storage is replicated and independent of the physical machine the VM is running on (obviously not for local SSD). This means that even if the physical machine catches on fire, only the "runtime" state is lost but the attached disks are fine.
VMs can live-migrate. This is a setting you can control. If enabled, the VM will be migrated to a different physical machine on maintenance events. Live-migration can lead to brief performance degradation while memory etc. is synced to the other host but the machine is not shut down / restarted.
Even when the physical host suddenly dies, you can set your instance to restart automatically on a new machine. If you plan to use this mode, make sure your instance is able to cleanly boot to serving state without manual intervention.
If you need high availability, the best approach is to spread your instances among zones of the same region and using a network or HTTP(S) loadbalancer. These will automatically stop sending traffic to a machine in case it becomes unhealthy. Also see this short youtube video on Google's network architecture for more info.
For high availability of your application data, there are highly available options like Datastore for database-like usage and Cloud Storage for file-oriented data. Keep in mind that Cloud SQL also runs on a single instance/physical machine which means that you have to setup slaves/replicas to get high availability. However, you can also do that with your favorite DB system on plain Compute Engine instances if you are willing to maintain them yourself.

What are the benefits of a Hypervisor VM?

I'm looking into using virtual machines to host multiple OSes and I'm looking at the free solutions which there are a lot of them. I'm confused by what a hypervisor is and why are they different or better than a "standard" virtual machine. When I mean standard I going to use the benchmark virtual machine VMWare Server 2.0.
For a dual core system with 4 GB of ram that would be capable of running a max of 3 VMs. Which is the best choice? Hypervisor or non-hypervisor and why? I've already read the Wikipedia article but the technical details are over my head. I need a basic answer of what can these different VM flavors do for me.
My main question relates to how I would do testing on multiple environments. I am concerned about the isolation of OSes so I can test applications on multiple OSes at the same time. Also which flavor gives a closer experience of how a real machine operates?
I'm considering the following:
(hypervisor)
Xen
Hyper-V
(non-hypervisor)
VirtualBox
VMWare Server 2.0
Virtual PC 2007
*The classifications of the VMs I've listed may be incorrect.
The main difference is that Hyper-V doesn't run on top of the OS but instead along with the system it runs on top of a thin layer called hypervisor. Hypervisor is a computer hardware platform virtualization software that allows multiple operating systems to run on a host computer concurrently.
Many other virtualization solution uses other techniques like emulation. For more details see Wikipedia.
Disclaimer, everything below is (broadly) my opinion.
Its helpful to consider a virtual machine monitor (a hypervisor) as a very small microkernel. It has very few jobs beyond accessing the underlying hardware, such as monitoring of event channels and granting guest domains access to specific resources .. while enforcing some kind of scheduler.
All guest machines are completely oblivious of the others, the isolation is true. Guests do not share memory with the privileged guest (or each other). So, in this instance, you could (roughly) think of each guest (even the privileged one) as a process, as far as the VMM is concerned. Typically, the first guest gets extra privileges so that it can manage the rest. This is the ideal technology to use when virtual machines are put into production and exposed to the world.
Additionally, some guests can be patched to become aware of the hypervisor, significantly increasing their performance.
On the other hand we have things like VMWare and QEMU, which rely on the host kernel to give it access to bare metal and enough memory to exist. They assume that all guests need to be presented with a complete machine, the limits put on the process presenting these (more or less) become the limits of the virtual machine. I say more or less because device mapper QoS is not commonly implemented. This is the ideal solution for trying code in some other OS, or some other architecture. A lot of people will call QEMU, Simics or even sometimes VMWare (depending on the product) a 'simulator'.
For production roll outs I use Xen, for testing something I just cross compiled I use QEMU, Simics or VirtualBox.
If you are just testing / rolling new code on various operating systems and architectures, I highly recommend #2. If your need is introspection (i.e. watching guest memory change as bad programs run in a guest) ... I'd need more explanation before answering.
Benefits of Hypervisor:
Hypervisor separates virtual machines logically, assigning each its own slice of underlying computing power, memory, and storage, thus preventing the virtual machines from interfering with each other.