Docker- what real value does it bring for our team? - testing

I am very very new to Docker. Our team has had a very nice deployment line up where We have different CI engines for different projects including Jenkis and TeamCity.
Developers usually check-in and CI takes over, deploys and its perfectly ready for test team to test. I always thought this to be a perfect model. Of course, some parts and our implementation have their flaws but it worked very well for what we wanted.
Now, our Dev-Ops is introducing Docker where test teams get a Docker Image from Docker Registry Everytime we run a build from teamcity. While it sounds really really fancy I am still failing to understand the benefit of it.
After my research, my conclusion was that Dockers can be a good light weight replacements for VM. BUT that is ONLY IF you are using any VMs? We are not using any VMS? I just do not understand what is the real value here? Also, while searching I found a relatively good link on Docker:
https://www.ctl.io/developers/blog/post/what-is-docker-and-when-to-use-it/
Where they discuss when you should use Docker and one of the point says that:
Use Docker whenever your app needs to go through multiple phases of development (dev/test/qa/prod, try Drone or Shippable, both do Docker CI/CD)
Ok. Howeve rthey do not further elaborate on why is docker useful when my app has to go through multiple phases?
And how it is exptremely helpful over regular Dev/Test set up when the existing set up is already working smooth?

First, you are right about comparing it to VMs in that it is similar to a VM. However, docker is incredibly lightweight. This property is the one that surprised me most in the beginning. As opposed to virtual machines, containers share resources much more efficiently. Virtual machines are isolated. Containers can run simultaneously on a host machine with very little overhead. You can configure containers to be able to talk to each other (via volume or port bindings).
Furthermore, in my team, docker brings the following benefits:
our application consists of one big application and several other few microservices. But we want to release all as one package with inter-dependencies among the applications, which eliminates problems with figuring out which version of application and microservices should be deployed together (compatiblity) etc. That is, the image contains all you need and you can bring all applications or one-by-one up/down using docker-compose. You do not need to deploy, you simply pull the image and fire a container/s. If you wish to stop one of the microservices, it can be done without affecting the others.
developers in the team, can run the very same image on local machine, for example to troubleshoot a problem occurred in the production; which means troubleshooting can be done in the same environment as in the production. This brings environment standardization and no more "but it works on machine" talk.
another benefit it brings to us is the following: we build a docker image, run our tests against it, and push it to the registry once all these phases succeed, which translates into a great portability.
Ability to version control the containers. You can easily inspect containers between the current version and the previous versions. If you wish to rollback - that is done smoothly.
Isolating and securing applications. All containers are isolated and you can easily control what goes in and out.

It took me a year before I got used to the idea, but now it seems simple enough.
I think part of that comes from the fact that people keep calling Docker a "virtual machine", which is not accurate. That's really just a nickname for what's happening behind the scenes. In a lot of ways, Docker will NOT replace a complete virtualization solution, such as VMWare. It does, however, bring forth a new way of thinking about infrastructure. One that many people have a difficult time wrapping their heads around.
You can start asking yourself: What makes a Linux distribution unique?
Aside from the kernel, everything else is just a "standard way" of organizing binaries, libraries, runtime and configuration files. You need your binaries in /bin, your libs in /lib, your configuration in /etc. User installations get placed under /usr...
Most distributions will keep the main structure from the Unix legacy and add its own quirks. Each one will have its own way to manage and distribute packages. Each will maintain their own versions of libraries, drivers, etc.
The key ingrident is the kernel. That's something they all have in common. Nowadays, recent builds of the Linux kernel are compatible with pretty much all major distributions available. So, aside from /boot, most of everything else is just a matter of having the right files in the right place with the right permissions.
Now, imagine you take all that distribution bundle (except the kernel) and place it all in another directory of your running OS. Taking advantage of the same kernel you are already running, you isolate a new process so that it "thinks" that / is now that directory. Bingo! This process now "thinks" it's running all by itself on another operating system.
Docker builds on top of Linux Containers, which allows us to do excatly that, but in a more friendly and easier way. Don't think of it as a virtual machine. Think of it as process isolation. The running kernel will share the machine's resources with this process, while keeping it isolated from the rest of the system. It's like jails on steroids.
That was a broad simplification. But, given the concept, think about the implications of this idea.
You can have on the same host, multiple processes with completely different environments that might otherwise conflict with each other. One may be a legacy binary that needs old libraries in place (legacy systems that never die). Another may be the most recent build of a bleeding edge technology. Sharing the same kernel is a efficient, and valuable resource management.
The most value I found comes from managing the infrastructure. Once you install Docker on the hosts, configure a swarm, and define a way of deploying containers, you mostly forget about the hosts. Adding users, installing packages, customizing, editing configuration files... All that becomes a development task on your desktop. There's an incentive to script more, to automate more. To keep your hands away from the physical or virtual machines, unless absolutely necessary.
Gone are the days when someone changed some obscure setting on the server to work around some weird application behavior, forgot to tell anyone about it and took a vacation. Changes to the environment can be commited to version control, tracked and improved by everyone on the team. If your datacenter goes through a disaster, recreating the whole environment is a matter of rebuilding images and redeploying containers. Your infrastructure becomes consistent and reproducible, while keeping the doors open to a wide variety of operating systems and customized configurations for each application.
Developers can take advantage of Docker with the ability of recreating dev/staging/production environments on their desktops. No need to polute a dev machine with application servers and database installations, or even the toll of Virtual Box to emulate all that.
Testing can be automated with a higher level of isolation. The Selenium team already has official Docker images. Creating an entire test hub should be a walk in the park with those puppies.
Building custom software, such as compiling Nginx with third party modules, can also be done inside containers from specialized images. No need to keep an entire server dedicated to it, or even polute your desktop with all the dependencies and build packages.
Overall, we've been having a great experience with Docker. We've migrated our staging environment to this new platform, and plan to migrate other parts of the infrastructure as well, eventually into production. So far, so good.
I hope you can convince enough people to take a better look at it. I'll admit, it took me sometime to get used to the idea. But once you get it, it's actually worth it.

Related

To virtualize or not to virtualize a bare metal server for a kubernetes deployment

I'd like to deploy kubernetes on a large physical server (24 cores) and I'm uncertain as to a number of things.
What are the pros and cons of creating virtual machines for the k8s cluster other than running on bare-metal.
I have the following considerations:
Creating vms will allow for work load isolation. New vms for experiments can be created and assigned to devs.
On the other hand, with k8s running on bare metal a new NAMESPACE can be created for each developer for experimentation and they can run their code in it. After all their code should be running in docker containers.
Security:
Having vms would limit the amount of access given to future maintainers, limiting the amount of damage that could be done. While on the other hand the primary task for any future maintainers would be adding/deleting nodes and they would require bare metal access to do that.
Authentication:
At the moment devs would only touch the server when their code runs through the CI pipeline and their running deployments are deployed. But what about viewing logs? Could we setup tiered kubectl authentication to allow devs to only access whatever namespaces have been assigned to them (I believe this should be possible with the k8s namespace authorization plugin).
A number of vms already exist on the server. Would this be an issue?
128 cores and doubts.... That is a lot of cores for a single server.
For kubernetes however this is not relevant:
Kubernetes can use different sized servers and utilize them to the maximum. However if you combine the master server processes and the node/worker processes on a single server, you might create unwanted resource issues. You can manage those with namespaces, as you already mention.
What we do is use continuous integration with namespaces in a single dev/qa kubernetes environment in which changes have their own namespace (So we run many many namespaces) and run full environment deployments in those namespaces. A bunch of shell scripts are used to manage this. This works both with a large server as what you have, as well as it does with smaller (or virtual) boxes. The benefit of virtualization for you could mainly be in splitting the large box in smaller ones so that you can also use it for other purposes then just kubernetes (yes, kubernetes runs except MS Windows, no desktops, no kernel modules for VPN purposes, etc).
I would separate dev and prod in the form of different vms. I once had a webapp inside docker which used too many threads so the docker daemon on the host crashed. It was limited to one host luckily. You can protect this by setting limits, but it's a risk: one mistake in dev could bring down prod as well.
I think the answer is "it depends!" which is not really an answer. Personally, I would split up the machine using VM's and deploy that way. You've got better flexibility as to how much of the server's resources you carve out and you can easily create new environments, then destroy easily.
Even if these vms are really big, I think it's still easier to manage also given that you have existing vm's on the machine.
That said, there's not a technical reason that you can't run a single node server, but you may run into problems with downtime with upgrades (if that's an issue), as well as if that server needs patched or rebooted, then your entire cluster is down.
I would look at your environment needs for HA and uptime, as well as how you are going to deploy VM's (if you go that route), and decide what works the best for you.

What to know before setting up a new Web Dev Env?

Say you want to create a new environment for a team of developers to build a large website on a LAMP stack.
I am not interested in the knowledge needed for coding the website (php,js,html,css,etc.). This stuff I know.
I am interested in what you need to know to setup a good environment and workflow with test server, production sever, version control, backups, etc.
What would be a good learning path?
As someone who has lead this process at several companies, my recommendation is to gradually raise the "maturity" of your organisation as a software factory by incrementally consolidating a set of practices in an order that makes sense to your needs. The order I tend to follow (starting with things that I consider more basic, to the more advanced stuff):
Version control - control your sources. I used to work with SVN but I'm gradually migrating my team to Mercurial (I agree to meagar's recommendation for a distributed VCS). A great HG tutorial is in hginit
Establish a clear release process, label your releases in VCS, do clean builds in a controlled environment, test and release from these.
Defect tracking - be systematic about your bugs and feature requests. I tend to use Trac because it gives me a more or less complete solution for project management plus a wiki that I use as a knowledge base. But you have choices galore (Jira, Bugzilla, etc...)
Establish routine Testing practices. Unit tests e.g. by using one of the xUnit frameworks (make it a habit to at least write unit tests for new functions you write and old code you modify) and Integration / System tests (for webapps use some tool like Selenium).
Make your tests run frequently, as a part of an automated build process
Eventually, write your tests before you code (Test-Driven Development) and strive to increase coverage.
Go a step forward in your build/test/release cycle by setting up some continuous integration system (to make sure your build and tests are run regularly, at least nightly). I recently started using Hudson and it is great for our Java/Maven projects, but you can use it for any other build process as well
In terms of testing environments, I agree with meagar's recommendations. We have these layers:
Test at developers workstations (should contain a full setup to run your code)
Staging environment: clone your production environment as closely as possible and deploy and run your app there. We also use VMs.
Production preview: we deploy our app to the production servers with production data but in a different "preview" URL for our internal use only. We run part of our automated Integration tests against this server, and do some additional manual testing with internal users
Production - and keep fingers crossed ;)
In terms of backup, at least for your source code, distributed VCS give you the advantage that your full repos are replicated in many machines, thus minimising the risk of data loss (which is much more critical with centralised repos as is the case with SVN).
Before you do anything else, ask your developers what they want out of a test/production environment. You shouldn't be making this decision, they should. The answer to this depends entirely on what kind of workflow they're familiar with and what kind of software they'll be developing.
I'd personally recommend a distributed VCS like git or mercurial, local WAMP/LAMP stacks on each developer's workstation (shared "development" servers are silly) and a server running some testing VMs which are duplicates of your production environment. You can't ask for more specific advice than that without involving your developers.

In which practical ways can virtualization enhance your development environment?

Practical uses of virtualization in software development are about as diverse as the techniques to achieve it.
Whether running your favorite editor in a virtual machine, or using a system of containers to host various services, which use cases have proven worth the effort and boosted your productivity, and which ones were a waste of time ?
I'll edit my question to provide a summary of the answers given here.
Also it'd be interesting to read about about the virtualization paradigms employed too, as they have gotten quite numerous over the years.
Edit : I'd be particularly interested in hearing about how people virtualize "services" required during development, over the more obvious system virtualization scenarios mentioned so far, hence the title edit.
Summary of answers :
Development Environment
Allows encapsulation of a particular technology stack, particularly useful for build systems
Testing
Easy switching of OS-specific contexts
Easy mocking of networked workstations in a n-tier application context
We deploy our application into virtual instances at our host (Amazon EC2). It's amazing how easy that makes it to manage our test, QA and production environments.
Version upgrade? Just fire up a few new virtual servers, install the software to be tested/QA'd/used in production, verify the deployment went well, and throw away the old instances.
Need more capacity? Fire up new virtual servers and deploy the software.
Peak usage over? Just dispose of no-longer-needed virtual servers.
Virtualization is used mainly for various server uses where I work:
Web servers - If we create a new non-production environment, the servers for it tend to be virtual ones so there is a virtual dev server, virtual test server, etc.
Version control and QA applications - Quality Center and SVN are run on virtual servers. The SVN box also runs CC.Net for our CI here.
There may be other uses but those seem to be the big ones at the moment.
We're testing the way our application behaves on a new machine after every development iteration, by installing it onto multiple Windows virtual machines and testing the functionality. This way, we can avoid re-installing the operating system and we're able to test more often.
We needed to test the setup of a collaborative network application in which data produced on some of the nodes was shared amongst cooperating nodes on the network in a setup with ~30 machines, which was logistically (and otherwise) prohibitive to deploy and set up. The test runs could be long, up to 48 hours in some cases. It was also tedious to deploy changes based on the results of our tests because we'd have to go around to each workstation and make the appropriate changes, which was a manual and error-prone process involving several tired developers.
One approach we used with some success was to deploy stripped-down virtual machines containing the software to be tested to various people's PCs and run the software in a simulated data-production/sharing mode on those PCs as a background task in the virtual machine. They could continue working on their day-to-day tasks (which largely consisted of producing documentation, writing email, and/or surfing the web, as near as I could tell) while we could make more productive use of the spare CPU cycles without "harming" their PC configuration. Deployment (and re-deployment) of the software was simplified, since we could essentially just update one image and re-use it on all the PCs. This wasn't the entirety of our testing, but it did make that particular aspect a lot easier.
We put the development environments for older versions of the software in virtual machines. This is particularly useful for Delphi development, as not only do we use different units, but different versions of components. Using the VMs makes managing this much easier, and we can be sure that any updated exes or dlls we issue for older versions of our system are built against the right stuff. We don't waste time changing our compiler setups to point at the right shares, or de-installing and re-installing components. That's good for productivity.
It also means we don't have to keep an old dev machine set up and hanging around just-in-case. Dev machines can be re-purposed as test machines, and it's no longer a disaster if a critical old dev machine expires in a cloud of bits.

Does my development environment mirror user's environment?

I am trying to get a better idea on this as so far I have had mixed answers in person.
I am a solo dev in a 5 man IT dept for a Health Care related business. My developer machine is running Win 7 RC1 (x64) but my users are all running Win XP Pro (x86). Is this a big deal? Whan kind of pitfalls should I be aware of? Is having a VM of the user image enough?
Should my environment completely mirror my end user's?
Your development environment doesn't need to mirror your user's environment, but your testing environment certainly should!
Having a VM of the users image for testing would probably be good enough.
First and foremost, as a developer, your machine will never look like your client's machine. Just accept that.
You will have tools and utilities installed that they won't have. That will fundamentally change the configuration of your machine from the outset. You have DLLs, applications, services, and possibly drivers installed that your users have never even heard of (and likely never will).
As far as the OS is concerned, Win7 and WinXP, despite claims to the contrary, are not the same animal. Don't believe the hype. Having said that, don't believe the anti-hype, either. Just be aware, as you well should, that any piece of software developed under one version of an OS is not guaranteed to behave the same way under another.
The short of it: Yes, it's important that your environment is different. Should you panic about it? No. Should you account for it in testing? Absolutely. As rigorously as possible.
Is this a big deal?
Yes, it is. You have an OS 2 generations ahead of the one the users have, including you are running a non-release version.
Whan kind of pitfalls should I be aware of?
Depends on what you are developing. Some libraries may be missing that you already have, then the versions may be different etc.
Should my environment completely mirror my end user's?
Not necessarily, but you definitely need to have a testing environment that corresponds to the one the users have.
If you were developing web applications, all that would not have been an issue (well, unless you used some fancy fonts that are not present in a clean OS by default).
Its unreasonable to develop on exactly the same type of system as your users. If nothing else, your life is made much easier by installing all sorts of developer tools your end users have no reason to install. I hear Visual Studio in particular likes to squirrel a number of potential dependencies onto a system.
However, you do need to test on a system more inline with that of your end users. If you have access to an image of such a system, your VM approach should be sufficient. If nothing else, you should try for a staged (or better yet, beta/trial) release so as to avoid pushing a completely broken app out the door.
In short, don't fret about the development environment but put some thought into your testing one!
It's effectively impossible for any two machines to be set up the same, so development and production environments will always be different. One advantage in them being VERY different is that you will be more aware of possible deployment problems.
The type of environment that you need to use for testing entirely depends on what you're developing.
If you're writing web applications, having a VM with the user's standard image should be more than enough (just make sure the VM contains all the browsers your users might be using). Web development is much easier in this respect (I'm also running Windows 7 and have a couple VMs to test various environments)
If you're writing a full desktop environment, you'll probably want to ask for a second computer that you can test on (even if just to test before a final release). I say that because of differences in hardware. Just imagine if something runs fine for you, but slows down the users computer so that everything else is unusable. Opposite of that, you might spend hours trying to make something faster in the VM whereas running that on a users computer might run just fine.

Setting up a development environment INSIDE a virtual machine

Heres the problem. I use around three different machines for development. My partner is using two. We have to go through the same freaking set up procedure on all five machines to get to work.
Working with a php project here, so:
Install and configure, PDT, a php debugger, and some version of XAMPP.
Then possible install an svn client, and any other tools.
Again, to each of the five machines.
What if, instead, we did all of this once, in a virtual machine that is set up with the same stack, same versions, as the production server. Then each of us could grab a copy of the VM image, run that image on each of the five machines and do all of our development in that VM. Put Eclipse, apache, mysql, the works, all in that vm.
The only negative of this approach, and please correct me on the only part, is performance. Is it really that big of an issue though? The slowest machine out of the five is a Samsung NC10 powered by an Intel Atom 1.6 ghz processor.
Do you think this is possible and practically usable? Or am I crazy?
I use a VM for development (running on my laptop) and have never had performance problems. Another approach that you could take would be to image the drive in the state that you want. Use Acronis or Ghost to re-image each machine when you need to. Only takes about 5-10 minutes to restore an image on any modern PC.
I use a VM for all my "work" as it keeps it away from my "play". This set up allows me to use the office VPN without exposing my whole machine to the office environment (which I trust about as much as the internets. ;-) Also I don't have to worry about messing up my development environment by trying games or other software. My work VM is currently running inside VirtualBox but I have used VMWare in the past. I have only noticed performance issues when using graphic intensive programs like Webex or the Terminal Server Client.
It can certainly be done. What turns me off is the size of the VM image, which would normally be several GBs. Having it on a network share means it can take longer to transfer then your current setup process takes. I guess an external hard drive would be the easiest way to move it around.
Performance wouldn't be an issue with any web development.
I have to ask why your current machines need to be "re-imaged" each time you sit down for work?
If you're using Windows you'll probably want to use SYSPREP on the master image so that the 'mini-setup' runs when you boot up the virtual machines for the first time.
Otherwise in terms of Windows' point of view, the machines have the exact same SID, hostname and other things - running multiple machines with the same SID on the same network can cause tons of headaches. Even more if you want them to communicate with each other.
I've run websphere for zSeries on a vmware virtual machine with no problem and websphere is more resource intensive then any PHP stack. I find that having a multi core machine or at least hyper threading makes it run a lot faster.
With vmware, disk operations are slower. For PHP development I doubt it would be a problem, but you'd definitely notice it if you are compiling a large C++ project. There is also Sun's VirtualBox which is free, and the latest version is rather nice (but I haven't looked at how slow disk operations are yet).
I am using that idea in practice. Virtual machines are generally great for development.
To run on multiple operating systems and multiple separate development environments.
Preserver older development environments for later support.
Can be easily backed up, when hard drive crashes no need to start from beginning.
Can be copied from developer to another, so everyone don't have to do tedious installations and configurations.
Down sides are:
Virtual machines are slower, you need more powerful computers than you would need otherwise. I would recommend having at least 4 G of ram, but preferably more like 16, fast multi core processors and fast hard drives.
Copying Windows OS virtual machines, each used copy of virtual machine should have it's own product key. When you make a copy, it needs to be registered with new product key.
Did you think about a software configuration manager like ansible, chef or puppet? With such software automation of such tasks is very easy! It can even create fresh vm and then configure it.