Apache Mesos Vs. Apache CloudStack - apache

Managing the infrastructure (private cloud or public cloud) at scale and ease is addressed by Apache Mesos, Apache CloudStack and OpenStack.
I have few questions in this regard and wanted to see if someone can throw light.
Any article(s) that compares and contrast the above?
Why run one over the other at all? (as I see from tutorials that one can run over the other)
It seems like CloudStack is centered around VMs (Hypervisor) and Mesos is centered around Scheduling and allocation of resources in side VMs for different co-existing software systems. Am I right in my conclusion?
If so, why does Mesos claims that it can manage bare metal boxes w/o even the need for Hypervisor? Is this for the reason for workloads that do not go well with VMs? (esp. LSM based systems such as SOLR-Lucene and HBase)
Is it with the choice of Linux VMs. Vs. Linux Container for resource allocation? IOW, Mesos is Linux container based framework, and CloudStack is Linux VMs based framework?

Related

Isn't virtual machine quite a type of process?

I'm trying to understand the basic concepts of Docker, and lots of docs say that "Docker is not virtual machine, but a process". To me, this sentence looks quite awkward, since as far as I know, virtual machine it self also runs on host os, which makes itself a 'Process'.
Is there any big difference between the way the virtual machine works and the other normal applications/process do?
Docker is a brand name of a container management software system.
TL;DR:
Containers are a packaging concept.
VMs are a compatibility concept.
VMs are a security concept.
A container is not a process, it is an isolation of a collection of processes within a single-system-image. What is isolated? First, and foremost, the path name space. Processes within a given container share a path name space, so that they agree that /usr/bin/env is the same thing. Two processes in different containers, or perhaps inside the non-containered environment, would not necessarily see the same file for /usr/bin/env. This functionality has been a feature of UNIX derived systems for at least 40 years; under the service chroot().
More recently, containers have taken to isolate things that are not in the namespace, like processes, user ids and network interfaces. In older chroot-based systems, running ps in a container would show processes that were not in that container; although special handling hacked into to prevent a chrooted root user from gaining root access on the underlying system.
In these modern systems, not only is the pid space partitioned, but also user ids, so that root in a container does not correspond to root on the overall system.
All this is accomplished by controlling many features of the kernel in a single-system-image. The software that controls these features: Docker, amongst others.
A Virtual Machine is not part of a single-system-image. Each VM is its own logical computer, running its own kernel, shell, etc.. With some careful configuration, you can make it so various files appear within many of the VMs; but that is no different than mounting file systems exported by a network file system.
Why choose one over the other: containers share my os, and are handy to escape the .so verionitis hell caused by conflicting software systems; I can package my software in a container, and it is isolated from whatever the running system is. I cannot, however, package the kernel I need; so if my software requires ubuntu 14.02; and I am running 18.04, containers will not save me. Containers are a packaging concept.
VMs are handy to support multiple versions or types of operating systems in a single computer. Since each VM runs unique system software, I can run my 14.02 app on my 18.04 system and none is the wiser. VMs are a compatibility concept.
VMs are also handy as a security layer. Imagine that a web page has a js-bomb that can corrupt my kernel (I know, quite a stretch). If I run my browser in a container, I have corrupted my kernel. If I run it in a VM, I have corrupted that VMs kernel -- I merely have to delete it, or rewind it, and the corruption is gone. VMs are a security concept.

How to setup HPC cluster on one server

i'm working on an application that needs to be tested in a HPC cluster.
i'm thinking about using xcat as a resource manager.
i don't have much hardware resources, i have one HP desktop and MacBook laptop.
the question: is it possible to set up a virtual cluster (using virtualBox or KVM) on one hardware resource
thanks,
The short answer here is yes, depending on how much memory and disk you have available on your one machine. I've done this numerous times on a MacBook Pro with 8 GB of RAM.
The long answer is that there is absolutely nothing magical about an HPC cluster. All you need to test basic parallel applications in a simulated cluster environment are two or more VMs which meet these criteria:
Same OS, as identical as possible.
Passwordless authentication (ssh key based auth).
Same software stack in same location on all nodes (See #4 or use rsync).
At least one shared filesystem, e.g. NFS mounted $HOME
Shared network with name resolution configured (correct /etc/hosts on all nodes)
None of this requires job schedulers, provisioning tools or any complex networking. You can find many NFS setup howtos to help get one node set up to share $HOME to the others, this might be the most complicated part. VirtualBox does a good job of setting up local networking.
On top of this you can layer setting up a job scheduler like SLURM (highly recommended), provisioning tools like Warewulf or xCat, parallel filesystems across the VMs (BeeGFS is easy to set up and a great introduction), etc. I have had a full featured stateless cluster simulated on my Macbook Pro a number of times using tools from this list and VirtualBox VMs. It's a great way to learn about setting up an HPC cluster.

To virtualize or not to virtualize a bare metal server for a kubernetes deployment

I'd like to deploy kubernetes on a large physical server (24 cores) and I'm uncertain as to a number of things.
What are the pros and cons of creating virtual machines for the k8s cluster other than running on bare-metal.
I have the following considerations:
Creating vms will allow for work load isolation. New vms for experiments can be created and assigned to devs.
On the other hand, with k8s running on bare metal a new NAMESPACE can be created for each developer for experimentation and they can run their code in it. After all their code should be running in docker containers.
Security:
Having vms would limit the amount of access given to future maintainers, limiting the amount of damage that could be done. While on the other hand the primary task for any future maintainers would be adding/deleting nodes and they would require bare metal access to do that.
Authentication:
At the moment devs would only touch the server when their code runs through the CI pipeline and their running deployments are deployed. But what about viewing logs? Could we setup tiered kubectl authentication to allow devs to only access whatever namespaces have been assigned to them (I believe this should be possible with the k8s namespace authorization plugin).
A number of vms already exist on the server. Would this be an issue?
128 cores and doubts.... That is a lot of cores for a single server.
For kubernetes however this is not relevant:
Kubernetes can use different sized servers and utilize them to the maximum. However if you combine the master server processes and the node/worker processes on a single server, you might create unwanted resource issues. You can manage those with namespaces, as you already mention.
What we do is use continuous integration with namespaces in a single dev/qa kubernetes environment in which changes have their own namespace (So we run many many namespaces) and run full environment deployments in those namespaces. A bunch of shell scripts are used to manage this. This works both with a large server as what you have, as well as it does with smaller (or virtual) boxes. The benefit of virtualization for you could mainly be in splitting the large box in smaller ones so that you can also use it for other purposes then just kubernetes (yes, kubernetes runs except MS Windows, no desktops, no kernel modules for VPN purposes, etc).
I would separate dev and prod in the form of different vms. I once had a webapp inside docker which used too many threads so the docker daemon on the host crashed. It was limited to one host luckily. You can protect this by setting limits, but it's a risk: one mistake in dev could bring down prod as well.
I think the answer is "it depends!" which is not really an answer. Personally, I would split up the machine using VM's and deploy that way. You've got better flexibility as to how much of the server's resources you carve out and you can easily create new environments, then destroy easily.
Even if these vms are really big, I think it's still easier to manage also given that you have existing vm's on the machine.
That said, there's not a technical reason that you can't run a single node server, but you may run into problems with downtime with upgrades (if that's an issue), as well as if that server needs patched or rebooted, then your entire cluster is down.
I would look at your environment needs for HA and uptime, as well as how you are going to deploy VM's (if you go that route), and decide what works the best for you.

Configuring Development Environments

If the development environment is run on the host, is there a fast way to configure these and deploy them to multiple machines? If so, how?
Vagrant does this but as virtual machines, which may be painfully slow.
What are some viable alternatives?
Absolutely, if you are talking about using a resource pool to deploy and configure your VMs (dev VMs).
For the sake of simplicity, I assume your virtual environment is hosted on VMWare.
Here are a few things to start off with
For Automating creation of VMs (deploying from templates, configure networking etc) you can use VMWare's PowerCLI (Powershell cmdlets) - more on this here
If your dev environment is purely Windows then, for configuration management, you can use DSC (free) or if you (your org) can afford go the expensive route of using SCCM
If you have a lot of Linux boxes to configure then, fortunately, you have more than one option - Ansible, Chef, Puppet or SaltStack
Spinning off and configuring these machines on demand entirely depends on your needs. One of the (more common ways) to do is, create a VM template (aka base vm) and then deploy a VM from this template.
This base template is usually a bare-bones OS + some common utils + tools - Once a VM is deployed from the template, you can then use one of the above mentioned CM (Configuration Management) tools to install/configure this VM.
Hope this gives some pointers in the right direction

Is virtualization still relevant with docker?

I've read this article:
How is Docker different from a normal virtual machine?
I have huge intend of converting all my virtual images into docker instances.
I can't see an angle where vm still make sense...
So what's the point to VM now? Ok... maybe the desktop virtualization to have pulseaudio working?
Once docker solve this, what else?
UPDATE
Okay... So I can't run docker in "non-linux" favour hosts...
For one point you can't run an operating system within your container that is different from the OS on the host.
On Windows and Mac OSX boot2docker is used to run Docker which is VirtualBox running a reduced Linux OS which runs Docker.
The benefits of containers are clear and well known, but the disadvantages have been glossed over somewhat.
Specifically, you don't just need the same OS type (aka linux), you get the same version of the kernel (including any mods you want.) Since containers are an OS construct, there are resource islands per OS kernel version (and different implementations for Windows, BSD or any non-linux if they exist).
VM's are secured with CPU level isolation, containers are secured with OS level isolation (with arguably a bigger attack surface).
There are many claims out there that containers are as slow and as big as VM's once you load up your container with everything you need for production and add lots of overlays, but these are all anecdotal and no large scale survey or trustable data is available yet.