Is it possible to migrate one process from one core of a node to another core of another node in MPI? - process

If I want to remap processes-core for MPI program, can I migrate after those are spawned? For example: Node 1 have: P0,P3,P6 and Node 2 have: P1,P4,P7. Can I migrate P1 to Node 1? Topology aware MPI suggests remapping in research papers. That hints of picking a process and put it into such a node that provides best result.
Is it possible to do?

No. MPI does not have any migration functionality. Topology-aware MPI (which as you remark is pretty much research level, not production) uses knowledge of how the application communicates to map ranks to nodes. Normally ranks are put on successive nodes; if you have knowledge about what ranks often communicate, they can be mapped closer together.

To go off of what Victor said:
MPI-libraries do allow you to manually place processes via the use of a hostfile and/or mpirun-based flags (be it inside MPICH, OpenMPI, MVAPICH2, etc.). Profiling your application via something like TAU and viewing a communication matrix (see tau.uoregon.edu for documentation) before choosing the "best" process mapping for your application.

Related

2 separate systems, how to make them communicate

I got a DDS-system(OMG DDS) who's communicating with a ROS-node over radio. The information being received is a struct with velocity, state, longitude, latitude etc. This works well, and my DDS-client has no problem printing the information being transmitted from the node over the radio. Now, I've got a GUI-application written in Qt, who creates models and puts them on a predefined map. These modelse have defined set-information functions, which when triggered updates the map to give a smooth visualisation of the information it receives.
Now here is problem, I've no idea how to make the GUI application communicate with my DDS-client. I would rather not intertwine these two, since I've had enough trouble just making the DDS-client and sender work and compile with ROS. Ive though about a separate queue system, which can be included in the DDS-client and the GUI application, but I dont know if this could work. Ive also though about writing a SQL database, and then push new data, and pull new data when it is detected in my GUI application. Some sort of on_data_available function which triggers the pull-function. Ive heard the last one is a bad idea, since I'm working with only one set of data which is being continuously updated (the model represents one USV), and a database is then considered overkill, but I would love to get inputs here.
Im sorry if this isn't sufficient information, I can't really provide code examples for different reasons. If anyone have any inputs, shout out, would love to hear them. And if I'm not being specific enough, ill try to rewrite it as best as i can.
I've no idea how to make the GUI application communicate with my
DDS-client
Your question is not specific to DDS or your GUI application -- you essentially ask for a simple and convenient inter-process communication (IPC) mechanism. As you can see when you follow the link, there are loads of different options.
Given that you already have your data as well as the associated type definitions available in DDS, I suspect that using DDS for this task would still be the easiest way to go. You could set it up to communicate over shared memory or local loopback. DDS will do all discovery and communication under the hood, including (cross-language) de/serialization. If you choose a different mechanism, you might end up doing more work yourself.
As an alternative, some DDS implementations (commercially) support native integration with SQL databases. Those will introspect the DDS data definitions and create all required tables for you. Updates from DDS are automatically forwarded to the database, and vice-versa. You could feed your GUI off of that database.

Testing reusable components / services across multiple systems

I'm currently starting a new project where we are hoping to develop a new system using reusable components and services.
We currently have 30+ systems that all have common elements, but at the moment we develop each system in isolation so it feels like we are often duplicating code and then of course we have 30+ separate code bases to maintain and support.
What we would like to do is create a generic platform using shared components to enable quick development of new collections, reusing code and reusing automated tests and reduce the code base that needs to be maintained.
Our thoughts so far are that we would have a common code base for specific modules for example User Management and Secure System Access, these modules could consist of their own generic web module, API and Context. This would create a generic package of code.
We could then deploy these different components/packages to build up a new system to save coding the same modules over and over again, so if the new system needed to manage users, you could get the User Management package and boom it does what you need. However, because we have 30+ systems we will deploy the components multiple times for each collection. Also we appreciate that some of the systems will need unique functionality so there would be the potential to add extensions to the generic modules for system specific needs OR to choose not to use one of the generic modules and create a new one, but use the rest of the generic components.
For example if we have 4 generic components that make up the system A, B, C and D. These could be deployed to create the following system set ups:
System 1 - A, B, C and D (Happy with all generic components)
System 2 - Aa, B, C and D (extended component A to include specific functionality)
System 3 - A, E, C and F (Can't reuse components B and D so create specific ones, but still reuse components A and C)
This is throwing up a few issues for me as I need to be able to test this platform and each system to ensure it works and this is the first time I've come across having to test a set up like this.
I've done some reading around Mircroservices and how to test them, but these often approach the problem for just 1 system using microservices where we are looking at multiple systems with different configurations.
My thoughts so far lead me to believe that for the generic components that will be utilised by the different collections I can create automated tests at the base code level and then those tests will confirm the generic functionality and therefore it will not be necessary to retest these functions again for each component, other than perhaps a manual sense check after deployment. Then at each system level additional automated tests can be added to check the specific functionality that may be created.
Ideally what I'd like would be to have some sort of testing platform set up so that if a change is made to a core component such as User Management it would be possible to trigger all the auto tests at the core level and then all of the specific system tests for all systems that will share the component to ensure that any changes don't affect core functionality or create a knock on effect to the specific systems. Then a quick manual check would be required. I'm keen to try and remove a massive manual test overhead checking 30+ systems each time a shared component is changed.
We work in an agile way and for our current projects we have a strong continuous integration process set up, so when a developer checks in some code (Visual Studio) this triggers a CI build (TeamCity / Octopus) that will run all of the unit tests, provided that all these tests pass, this then triggers an Integration build that will run my QA Automated tests which are a mixture of tests run at an API level and Web tests using SpecFlow and PhantomJS or Selenium Webdriver. We would like to keep this sort of framework in place to keep the quick feedback loops.
It all sounds great in theory, but where I'm struggling is trying to put something into practice and create a sound testing strategy to cover this kind of system set up.
So really what I'm hoping is that there is someone out there who has encountered something similar in the past and has thoughts on the best way to tackle this and has proven that they work.
I'm keen to get a better understanding of how I could set up a testing platform / rig to aid the continuous integration for all systems considering that each system could potentially look different, yet have shared code.
Any thoughts or links to blogs / whitepapers etc. that you think might help would be much appreciated!!
Your approach is quite good, and since soon I'll have to face the same issues like you - I can give you my ideas so far. I'm pretty sure that to
create a sound testing strategy to cover this kind of system set up
can't be squeezed-in in one post. So the big picture looks like this (to me) - you're in the middle of the Enterprise application integration process, the fundamental basis to be test covered will be the Data migration. Maybe you need to consider the concept of Service-oriented architecture
generic platform using shared components
since it'll enable you to provide application functionality as services to other applications. Here indirect benefit will be that SOA involves dramatically simplified testing. Services are autonomous, stateless, with fully documented interfaces, and separate from the cross-cutting concerns of the implementation. There are a lot of resources like this E2E testing or efficiently testing SOA.

MPI vs. Microsoft WCF vs. Microsoft TPL

I have a scientific program written in F# which I want to parallelize and run on 1 server with multiple processors (64) and for the future also in the cloud (Windows Azure?). The program will have a simple 1-1 communication between the nodes (no broadcast etc.).
If I used WCF, would it be as fast as MPI? What has MPI that WCF does not? There exists Pure MPI .NET written for WCF which puzzles me even more. I do not know if to use WCF or MPI.NET or Pure Mpi running on WCF.
PS: I guess that TPL is out of the game for 64 processors and more, right?
It is difficult to give a concrete answer, because it all depends on the specific aspects of your application, its current architecture (I suppose you already have some app) etc.
As you mention MPI and WCF, I assume that the application is written as several components that communicate with each other. The best way to structure this kind of application is to use F# agents.
As far as I understand, you want to run the application on a single server first. If you write it using agents, the agents can just communicate directly with each other (so you don't need MPI or WCF).
TPL should work well on a single-server (with lots of CPUs), but it will not scale to the distributed setting - you cannot run Task on another machine. However, you can use it inside individual components (e.g. agents) that will be distributed.
Regarding MPI vs. WCF - I don't have enough experience to answer that. However, if you use agent-based architecture, it should be easy to try various options. You may also check out fracture and related projects, which aims to implement high-performance sockets for F# (and possibly distributed agents in the future).
If you're doing it on 1 server you could just execute one process and execute the code in parallel. That way you could share memory more easily and faster than doing it through messages like MPI and WCF. Although the overhead of communication might not be that much, depending on your problem + solution.
Also the changes to your code would be much less that way, F# can usually be turned into prallel code with little effort. Going to MPI/WCF would require you to rewrite large portions.
Googling for F# + parallel gives plenty useful info that you should read first, like this for a good start:
http://blogs.msdn.com/b/dsyme/archive/2010/01/09/async-and-parallel-design-patterns-in-f-parallelizing-cpu-and-i-o-computations.aspx
So on 1 server, I woudl use the parallel features of F#, it's designed to prallelize easily.
Later when you want to go for cloud, that would be turning it into cleint-server. That's a different problem then parallization. I would treat and solve them seperately.
On the MPI vs WCF. WCF is designed as a RPC technology, i.e. you call remote procedures and get answers. If you want to use it for parallel programming with separate processes, you would have to create the boilerplate code for that. (Keep track of subsribed clients etc.)
MPI was designed to run that kind of architecture and handles it much more easily. (the first process gets number 0 and is the master, the other are slaves get numbered incrementally etc.)
Howver I don't think MPI will be very good to go cloud, since that invloves http, protocols, security etc. Not sure how well MPI works for those kind of things, WCF will handle that very well indeed.
The fact that there is an MPI.NET for WCF is because MPI is about a certain style of parallizing code that a lot of people are familiar with. So you can use the programming concepts and use them on the .NET platform leveraging WCF for the communications.
Something else you might want to look into if you need to exchange a lot of data over the wire is protocol-buffers (see protobuf-net for instance). That can easily be combined with WCF for communication and is very lean in serializing structured data so you can send over the wire efficiently.
Gert-Jan
WCF and MPI are different concepts. WCF is like a person A asks a person B to do something where as MPI is like a person A creates clones of himself (all clone have same ability/logic) and then these clones work on specific parts of the problem to be solved and once done they combine their results.
So choosing between which one fits your specific application depends on the problem your application is trying to solve. It may even be a combination of both WCF and MPI. Where your client application asks the WCF to do some task and the WCF create clones of the "problem solver" using MPI and when the clone are done with solving the problem (in parallel) they return the aggregated result back to the WCF and then that result is sent to client application.
You might also want to take at the 'mbrace' product, which provides a cloud monad (http://blogs.msdn.com/b/dsyme/archive/2011/08/23/m-brace-f-in-the-cloud.aspx). It's still at a fairly early stage though. I'm no expert but it may be that you can run an mbrace-based solution as effectively a private cloud on your 64-processor setup. When you outgrow that, a move to Azure would be seamless.

Use erlang as/instead of expect script

I would like to reset passwords on a bunch of boxes over SSH. Any pointers on how Erlang could be used for this purpose?
Erlang is indeed a well-suited choice for this problem.
You should have a look at the ssh module. Start a connection with
ssh:connect(Host, Port, Options).
Then use the ssh_connection module to execute the right passwd command (hint: start a shell first) and log out.
Edit: The above is mostly wrong, this blog post might get you started faster.
You can even write a simple server that does all of these things on several hosts in parallel, resulting in the most multicore-capable multi-host ssh password changer on this very planet. Weekend project idea: make a web app out of it.
Simply don't use Erlang for such a thing.
Reading from here:
What sort of applications is Erlang particularly suitable for?
Distributed, reliable, soft real-time concurrent systems.
Telecommunication systems, e.g. controlling a switch or converting
protocols.
Servers for Internet applications, e.g. a mail transfer agent, an IMAP-4
server, an HTTP server or a WAP Stack.
Telecommunication applications, e.g. handling mobility in a mobile network
or providing unified messaging.
Database applications which require soft realtime behaviour.
Erlang is good at solving these sorts of problems because this is the
problem domain it was originally
designed for. Stating the above in
terms of features:
Erlang provides a simple and powerful
model for error containment and fault
tolerance (supervised processes).
Concurrency and message passing are a fundamental to the language.
Applications written in Erlang are
often composed of hundreds or
thousands of lightweight processes.
Context switching between Erlang
processes is typically one or two
orders of magnitude cheaper than
switching between threads in a C
program.
Writing applications which are made of parts which execute on different
machines (i.e. distributed
applications) is easy. Erlang's
distribution mechanisms are
transparent: programs need not be
aware that they are distributed.
The OTP libraries provide support for many common problems in networking
and telecommunications systems.
The Erlang runtime environment (a virtual machine, much like the Java
virtual machine) means that code
compiled on one architecture runs
anywhere. The runtime system also
allows code in a running system to be
updated without interrupting the
program.
What sort of problems is Erlang not particularly suitable for?
People use Erlang for all sorts of
surprising things, for instance to
communicate with X11 at the protocol
level, but, there are some common
situations where Erlang is not likely
to be the language of choice.
The most common class of 'less
suitable' problems is characterised by
performance being a prime requirement
and constant-factors having a large
effect on performance. Typical
examples are image processing, signal
processing, sorting large volumes of
data and low-level protocol
termination.
Another class of problem is
characterised by a wide interface to
existing C code. A typical example is
implementing operating system device
drivers.
Most (all?) large systems developed
using Erlang make heavy use of C for
low-level code, leaving Erlang to
manage the parts which tend to be
complex in other languages, like
controlling systems spread across
several machines and implementing
complex protocol logic.
As suggested by Andrzej, you should look into other directions. Maybe a different question on StackOverflow asking "which language would be good for..." could be the first step...
UPDATE
If you still intend to use Erlang to reset your passwords you might want to have a look to the Erlang SSH Channel Behaviour as well.
Reading from the doc:
Ssh services are implemented as channels that are multiplexed over an ssh connection and
communicates via the ssh connection protocol. This module provides a callback API that
takes care of generic channel aspects such as flow control and close messages and lets the
callback functions take care of the service specific parts.

How you test your applications for reliability under badly behaving i/o

Almost every application out there performs i/o operations, either with disk or over network.
As my applications work fine under the development-time environment, I want to be sure they will still do when the Internet connection is slow or unstable, or when the user attempts to read data from badly-written CD.
What tools would you recommend to simulate:
slow i/o (opening files, closing files, reading and writing, enumeration of directory items)
occasional i/o errors
occasional 'access denied' responses
packet loss in tcp/ip
etc...
EDIT:
Windows:
The closest solution to do the job as described seems to be holodeck, commercial software (>$900).
Linux:
Open solution wasn't found by now, but the same effect
can be achived as specified by smcameron and krosenvold.
Decorator pattern is a good idea.
It would require to wrap my i/o classes, but resulting in a testing framework.
The only remaining untested code would be in 3rd party libraries.
Yet I decided not to go this way, but leave my code as it is and simulate i/o errors from outside.
I now know that what I need is called 'fault injection'.
I thought it was a common production-line part with plenty of solutions I just didn't know.
(By the way, another similar good idea is 'fuzz testing', thanks to Lennart)
On my mind, the problem is still not worth $900.
I'm going to implement my own open-source tool based on hooks (targeting win32).
I'll update this post when I'm done with it. Come back in 3 or 4 weeks or so...
What you need is a fault injecting testing system. James Whittaker's 'How to break software' is a good read on this subject and includes a CD with many of the tools needed.
If you're on linux you can do tons of magic with iptables;
iptables -I OUTPUT -p tcp --dport 7991 -j DROP
Can simulate connections up/down as well. There's lots of tutorials out there.
Check out "Fuzz testing": http://en.wikipedia.org/wiki/Fuzzing
At a programming level many frameworks will let you wrap the IO stream classes and delegate calls to the wrapped instance. I'd do this and add in a couple of wait calls in the key methods (writing bytes, closing the stream, throwing IO exceptions, etc). You could write a few of these with different failure or issue type and use the decorator pattern to combine as needed.
This should give you quite a lot of flexibility with tweaking which operations would be slowed down, inserting "random" errors every so often etc.
The other advantage is that you could develop it in the same code as your software so maintenance wouldn't require any new skills.
You don't say what OS, but if it's linux or unix-ish, you can wrap open(), read(), write(), or any library or system call etc, with an LD_PRELOAD-able library to inject faults.
Along these lines:
http://scaryreasoner.wordpress.com/2007/11/17/using-ld_preload-libraries-and-glibc-backtrace-function-for-debugging/
I didn't go writing my own file system filter, as I initially thought, because there's a simpler solution.
1. Network i/o
I've found at least 2 ways to simulate i/o errors here.
a) Running a virtual machine (such as vmware) allows to configure bandwidth and packet loss rate. Vmware supports on-machine debugging.
b) Running a proxy on the local machine and tunneling all the traffic through it. For the case of upd/tcp communications a proxifier (e.g. widecap) can be used.
2. File i/o
I've managed to deduce this scenario to the previous one by mapping a drive letter to a network share which resides inside the virtual machine. The file i/o will be slow.
A cheaper alternative exists: to set up a local ftp server (e.g. FileZilla), configure speeds and use Novell's NetDrive to access it.
You'll wanna setup a test lab for this. What type of application are you building anyway? Are you really expecting the application be fed corrupt data?
A test technique I know the Microsoft Exchange Server people tried was sending noise to the server. Basically feeding every possible input with seemingly random data. They managed to crash the server quite often this way.
But still, if you can't trust input that hasn't been signed then general rules apply. Track every operation which could potentially be untrusted (result of corrupt data) and you should be able to handle most problems gracefully.
Just test your application behavior on random input, that should catch most problems but you'll never be able to fully protect your self from corrupt data. That's just not possible, as the data could be part of some internal buffer being handed off within the application itself.
Be mindful of when and how you decode data. That is all.
The first thing you'll need to do is define what "correct" means under these circumstances. You can only test against a definition of what behaviour is intended.
The tactics of testing will depend on technology. In the context of automated unit testing, I have found it very useful, in OO languages such as Java, to use various flavors of "mocking" or "stubbing" to pass e.g. misbehaving InputStreams to parts of my code that used file I/O.
Consider holodeck for some of the fault injection, if you have access to spare hardware you can simulate network impairment using Netem or a commercial product based on it the Mini-Maxwell, which is much more expensive than free but possibly easier to use.