Getting started with writing MPI programs - development-environment

In this coming semester, I am starting some research on large-scale distributed computing with MPI. What I am looking for help with is the initial stages, specifically getting a solid development environment set up. Does anyone have any recommendations for good tools to use for this?
I am also curious as to whether there exists a kind of simulator that would allow be to write MPI and distribute it to virtual (rather than physical) nodes.

You could download a MPI library such as Open-MPI, MPICH, etc. and run it on a multi-core system (such as a recent desktop) with number of processes = number of cores. They would operate without a network interconnect (for instance, over shared memory). That should be enough to explore initially.
If you really want multiple nodes, you can experiment with multiple VMs with a VM network before actually moving on to a physical cluster. One of the VMs would have to be configured to act like a NFS server and the rest of the VMs could mount your home directories over NFS.

Depends on what is your favourite language. I dove into MPI using python and the pypar module. It lets you concentrate on MPI procedures without worrying too much about pointers and complicated c / c++ stuff. MPI on a single machine is programmed no differently from MPI on 100s. Getting cross machine setups is more about what MPI implementation and operating systems you use.

Related

Isn't virtual machine quite a type of process?

I'm trying to understand the basic concepts of Docker, and lots of docs say that "Docker is not virtual machine, but a process". To me, this sentence looks quite awkward, since as far as I know, virtual machine it self also runs on host os, which makes itself a 'Process'.
Is there any big difference between the way the virtual machine works and the other normal applications/process do?
Docker is a brand name of a container management software system.
TL;DR:
Containers are a packaging concept.
VMs are a compatibility concept.
VMs are a security concept.
A container is not a process, it is an isolation of a collection of processes within a single-system-image. What is isolated? First, and foremost, the path name space. Processes within a given container share a path name space, so that they agree that /usr/bin/env is the same thing. Two processes in different containers, or perhaps inside the non-containered environment, would not necessarily see the same file for /usr/bin/env. This functionality has been a feature of UNIX derived systems for at least 40 years; under the service chroot().
More recently, containers have taken to isolate things that are not in the namespace, like processes, user ids and network interfaces. In older chroot-based systems, running ps in a container would show processes that were not in that container; although special handling hacked into to prevent a chrooted root user from gaining root access on the underlying system.
In these modern systems, not only is the pid space partitioned, but also user ids, so that root in a container does not correspond to root on the overall system.
All this is accomplished by controlling many features of the kernel in a single-system-image. The software that controls these features: Docker, amongst others.
A Virtual Machine is not part of a single-system-image. Each VM is its own logical computer, running its own kernel, shell, etc.. With some careful configuration, you can make it so various files appear within many of the VMs; but that is no different than mounting file systems exported by a network file system.
Why choose one over the other: containers share my os, and are handy to escape the .so verionitis hell caused by conflicting software systems; I can package my software in a container, and it is isolated from whatever the running system is. I cannot, however, package the kernel I need; so if my software requires ubuntu 14.02; and I am running 18.04, containers will not save me. Containers are a packaging concept.
VMs are handy to support multiple versions or types of operating systems in a single computer. Since each VM runs unique system software, I can run my 14.02 app on my 18.04 system and none is the wiser. VMs are a compatibility concept.
VMs are also handy as a security layer. Imagine that a web page has a js-bomb that can corrupt my kernel (I know, quite a stretch). If I run my browser in a container, I have corrupted my kernel. If I run it in a VM, I have corrupted that VMs kernel -- I merely have to delete it, or rewind it, and the corruption is gone. VMs are a security concept.

On virtual machines like JVM and CLR, is there something similar to an operating system, which provides programming APIs?

JVM and CLR are virtual machines. Similarly to bare metal computer machines, they provide virtual machine languages.
On real computer machines, we have operating systems, which provide system calls and APIs. For example the famous book Advanced Programming Unix Environment describes the APIs provided by Unix and Linux. Windows, however, provide different APIs.
On top of virtual machines like JVM and CLR, is there something which
plays the role of an operating system, and
provides programming APIs?
If there is nothing playing the role of an OS on virtual machines like JVM and CLR, what provides programming APIs (such as those in Java, C#, ...) similar to OS APIs?
Note: I am asking about VMs and on top of them, instead of what is underlying them. Do VMs not run some virtual OS on top of them? If there is no, why is there no such a need?
Thanks.
There are two kinds of virtual machines: system virtual machines and process virtual machines. System virtual machines provide a virtualization of complete instructions sets including user-mode instructions and kernel-mode instructions and therefore they can run operating systems. Process virtual machines virtualizes user-mode instructions and, usually, some system calls (such as those for managing threads, memory, and files) and therefore can only run applications or processes. That is, on top of a single process virtual machine a single app or process can run. The JVM and CLR are process virtual machines.
While in theory it is indeed possible to develop an OS to run on a process virtual machine, this is practically not useful because the performance of the programs that will run on that OS will be terrible due to the excessive layering in software.
Generally, system and process virtual machines themselves are not considered to be operating systems. However, process virtual machines do not necessarily require an OS to run on and may run on a bare-metal computer. The .NET Micro Framework is an example of such VM. Such VMs are sometimes called operating systems. Some virtual ISAs or a subset thereof have been implemented completely in hardware similar to x86 and ARM. One could develop operating systems to run on them. They are almost never used in industry because of their low performance.
An "operating system" is a a large, fuzzy ball of hairs. You got the hal, kernel, userspace... do we count some userspace libraries (typically libc) too?
With squinting you can find some comparable concepts in the JVM/JRE but generally a JVM runs on top of a bona-fide operating system and thus does not reimplement all aspects and instead simply provides platform-independent abstractions over facilities that you can find on almost all systems.
For example these days Thread usually is just java representation of native OS threads, but a JVM could choose to implement thread scheduling in userspace, and in sun's JVM did back during 1.1 times and some other JVMs still do today.
I'll answer this from the perspective of Java since I have no in-depth knowledge of .Net. I would assume, however that the CLR and JVM are similar from this point of view.
Let's start with an operating system. The purpose of this is to abstract away the hardware specific interfaces as well as providing a runtime environment for processes.
The OS uses device drivers to provide a uniform interface to similar devices (like the processor, memory, disk drives, network cards and so on). The OS uses system calls to allow user-level code to interact with these devices. If you wrote code in C you will call 'open' then 'read' and 'write' to the device before calling 'close'. The 'ioctl' (IO control) system call is also used a lot for device control. Each OS provides a standard set of these system calls (you can run Linux on an Intel or ARM processor, but you have the same set of system calls for each distro). Incidentally, this is also how Docker works by using a standard set of system calls to enable containers to be moved from one platform to another without problem.
The OS also provides the ability to run multiple processes simultaneously. With newer, multi-core machines this really can happen in parallel but the OS also uses scheduling to share a CPU between multiple processes or threads. By switching processes or threads very quickly this gives the impression that things are happening simultaneously, even on a single processor.
Now let's look at the JVM, which is a user-level process (from the OS point of view so just like any other user application). This has been designed to abstract away CPU and operating specific functionality from the Java application. The bytecodes generated by the Java compiler do not contain any system calls. If you look at the bytecode instruction set (defined in the Java Virtual Machine Specification) you will find that the instructions provide many familiar low-level features such as loading a register, bit manipulation and so on. In addition, there are many instructions that are higher level and relate more specifically to Java; things like invokestatic that invokes a static method on a class, monitorenter. monitorexit for locking, newarray and so on.
The JVM takes these bytecodes and converts them from a CPU- and OS-independent form (that of a Virtual Machine) into the instructions for the specific CPU architecture and OS that the JVM is running on. In some cases this can be a one-to-one mapping (for things like bitwise operators), but can often be much more complex and involve the use of system calls to open files, access network interfaces, etc. The JVM also uses the OS to deal with threads created by the application. In the very early days of Java operating systems like Windows 95 did not have the concept of threads within a process so the JVM had to provide it's own implementation (this was called green threads and performed pretty badly).
To summarise the JVM takes the platform neutral bytcodes of the class files it is executing and converts them to the apprporiate native CPU instructions and system calls to make the application run. The JVM does not provide any traditional OS services, it just uses them.

best practices for setting development environment

I use Linux as primary OS. I need some suggestions regarding how should I set up my desktop and development. I do work on mostly .Net and Drupal, but some time on other lamp products and C/C++, Qt. I'm also interested in mobile (android..) and embedded development.
Currently I install everything on my main OS, even I use it a little. I use VMs a little (for lamp server).
Should I use separate VM for each kind of development (like one for .Net/Mono, another C++, one for mobile and one for db only, one for xyz things etc)
Keep primary development environment on main os and move others in VM.
main os should be messed up
keep things easy to organize (must)
performance should be optimal (optimal settings for best performance of components)
I'm interested to know how others' are doing.
There are both pros and cons with VM's.
Pros:
portability: you can move image to
different server
easy backup (but lengthy)
replication (new member joins team)
Cons
performance
hardware requirements
size of backups (20-40 GB per VM ...)
management of backed up images (what is the difference is not obvious)
keeping all images up to date
(patching / Windows updates)
For your scenario, I would create base VM with core OS and shared components (Web server, database), replicated it and installed specific tools into separate VM. If you combine tools within VM, you may end up with same mess as in case of using base OS - the advantage is that it is much easier to get rid of it ;-)
Optimal performance != using VMs
if you need to use VMs anyway, then yes: it could be better to use a separate VM for each thing that need one, unless you need more than one at once
Now that OCI containers are stable and well supported, using those through docker, podman or other similar tool is an increasingly popular option.
They are isolated, but under the same kernel, so:
they are almost as portable as virtual machines,
like virtual machines they can have their own virtual IP addresses, so they can run services not visible from the outside and without occupying port on the host, but
they don't reserve any extra space on disk or in memory like virtual machines and
they are not slowed by any virtualization layers and
mounting directories from the host is easy and does not require any special support.
The usual approach is to have the checkout in the developer's normal home directory and mount it into containers for building, testing and running.
Also building in containers is now supported by Remote Development extension for Visual Studio Code

In which practical ways can virtualization enhance your development environment?

Practical uses of virtualization in software development are about as diverse as the techniques to achieve it.
Whether running your favorite editor in a virtual machine, or using a system of containers to host various services, which use cases have proven worth the effort and boosted your productivity, and which ones were a waste of time ?
I'll edit my question to provide a summary of the answers given here.
Also it'd be interesting to read about about the virtualization paradigms employed too, as they have gotten quite numerous over the years.
Edit : I'd be particularly interested in hearing about how people virtualize "services" required during development, over the more obvious system virtualization scenarios mentioned so far, hence the title edit.
Summary of answers :
Development Environment
Allows encapsulation of a particular technology stack, particularly useful for build systems
Testing
Easy switching of OS-specific contexts
Easy mocking of networked workstations in a n-tier application context
We deploy our application into virtual instances at our host (Amazon EC2). It's amazing how easy that makes it to manage our test, QA and production environments.
Version upgrade? Just fire up a few new virtual servers, install the software to be tested/QA'd/used in production, verify the deployment went well, and throw away the old instances.
Need more capacity? Fire up new virtual servers and deploy the software.
Peak usage over? Just dispose of no-longer-needed virtual servers.
Virtualization is used mainly for various server uses where I work:
Web servers - If we create a new non-production environment, the servers for it tend to be virtual ones so there is a virtual dev server, virtual test server, etc.
Version control and QA applications - Quality Center and SVN are run on virtual servers. The SVN box also runs CC.Net for our CI here.
There may be other uses but those seem to be the big ones at the moment.
We're testing the way our application behaves on a new machine after every development iteration, by installing it onto multiple Windows virtual machines and testing the functionality. This way, we can avoid re-installing the operating system and we're able to test more often.
We needed to test the setup of a collaborative network application in which data produced on some of the nodes was shared amongst cooperating nodes on the network in a setup with ~30 machines, which was logistically (and otherwise) prohibitive to deploy and set up. The test runs could be long, up to 48 hours in some cases. It was also tedious to deploy changes based on the results of our tests because we'd have to go around to each workstation and make the appropriate changes, which was a manual and error-prone process involving several tired developers.
One approach we used with some success was to deploy stripped-down virtual machines containing the software to be tested to various people's PCs and run the software in a simulated data-production/sharing mode on those PCs as a background task in the virtual machine. They could continue working on their day-to-day tasks (which largely consisted of producing documentation, writing email, and/or surfing the web, as near as I could tell) while we could make more productive use of the spare CPU cycles without "harming" their PC configuration. Deployment (and re-deployment) of the software was simplified, since we could essentially just update one image and re-use it on all the PCs. This wasn't the entirety of our testing, but it did make that particular aspect a lot easier.
We put the development environments for older versions of the software in virtual machines. This is particularly useful for Delphi development, as not only do we use different units, but different versions of components. Using the VMs makes managing this much easier, and we can be sure that any updated exes or dlls we issue for older versions of our system are built against the right stuff. We don't waste time changing our compiler setups to point at the right shares, or de-installing and re-installing components. That's good for productivity.
It also means we don't have to keep an old dev machine set up and hanging around just-in-case. Dev machines can be re-purposed as test machines, and it's no longer a disaster if a critical old dev machine expires in a cloud of bits.

What are the benefits of a Hypervisor VM?

I'm looking into using virtual machines to host multiple OSes and I'm looking at the free solutions which there are a lot of them. I'm confused by what a hypervisor is and why are they different or better than a "standard" virtual machine. When I mean standard I going to use the benchmark virtual machine VMWare Server 2.0.
For a dual core system with 4 GB of ram that would be capable of running a max of 3 VMs. Which is the best choice? Hypervisor or non-hypervisor and why? I've already read the Wikipedia article but the technical details are over my head. I need a basic answer of what can these different VM flavors do for me.
My main question relates to how I would do testing on multiple environments. I am concerned about the isolation of OSes so I can test applications on multiple OSes at the same time. Also which flavor gives a closer experience of how a real machine operates?
I'm considering the following:
(hypervisor)
Xen
Hyper-V
(non-hypervisor)
VirtualBox
VMWare Server 2.0
Virtual PC 2007
*The classifications of the VMs I've listed may be incorrect.
The main difference is that Hyper-V doesn't run on top of the OS but instead along with the system it runs on top of a thin layer called hypervisor. Hypervisor is a computer hardware platform virtualization software that allows multiple operating systems to run on a host computer concurrently.
Many other virtualization solution uses other techniques like emulation. For more details see Wikipedia.
Disclaimer, everything below is (broadly) my opinion.
Its helpful to consider a virtual machine monitor (a hypervisor) as a very small microkernel. It has very few jobs beyond accessing the underlying hardware, such as monitoring of event channels and granting guest domains access to specific resources .. while enforcing some kind of scheduler.
All guest machines are completely oblivious of the others, the isolation is true. Guests do not share memory with the privileged guest (or each other). So, in this instance, you could (roughly) think of each guest (even the privileged one) as a process, as far as the VMM is concerned. Typically, the first guest gets extra privileges so that it can manage the rest. This is the ideal technology to use when virtual machines are put into production and exposed to the world.
Additionally, some guests can be patched to become aware of the hypervisor, significantly increasing their performance.
On the other hand we have things like VMWare and QEMU, which rely on the host kernel to give it access to bare metal and enough memory to exist. They assume that all guests need to be presented with a complete machine, the limits put on the process presenting these (more or less) become the limits of the virtual machine. I say more or less because device mapper QoS is not commonly implemented. This is the ideal solution for trying code in some other OS, or some other architecture. A lot of people will call QEMU, Simics or even sometimes VMWare (depending on the product) a 'simulator'.
For production roll outs I use Xen, for testing something I just cross compiled I use QEMU, Simics or VirtualBox.
If you are just testing / rolling new code on various operating systems and architectures, I highly recommend #2. If your need is introspection (i.e. watching guest memory change as bad programs run in a guest) ... I'd need more explanation before answering.
Benefits of Hypervisor:
Hypervisor separates virtual machines logically, assigning each its own slice of underlying computing power, memory, and storage, thus preventing the virtual machines from interfering with each other.