Rankfile for MPICH - process

I'm interested in the issue related to the MPICH. I know that OpenMPI allows you to run programs using Rankfile. The Rankfile contains information about which node/socket/core the specific rank of the MPI process will be bound to.
Is there the same or similar possibility in MPICH? I mean, every MPI process has to be bound to the core, which I select myself and write in the file.
Thank you!

Related

Synchronizing USRP source blocks - multiple B2xx devices

I am trying to create a synchronized usrp source block in gnu radio consisting of multiple B210 USRP devices. Lang: C++.
From what I have found I need to:
Instantiate multiple multi_usrp_sptr as each B210 requires one and multiple B210 devices cannot be addressed by using single sptr
Use external frequency and PPS sources - an option that can be selected from block or set programmatically
Synchronize re/tuning to achieve repeatable phase offset between nodes - this can be achieved using timed commands API https://kb.ettus.com/Synchronizing_USRP_Events_Using_Timed_Commands_in_UHD
Synchronize sample streams using time_spec property issue_stream cmd
The problem is how should I insert these timed commands and set time_spec of stream in GNU radio block or gr-uhd libs?
I looked into the gr-uhd folder where the sink/source code resided and found functions that could be altered.
Unfortunately I don't know how to copy or export this library to do these modifications and later compile to insert my custom blocks to GNU Radio, because gr-uhd seems to be built in and compiled at GR installation.
I attempted coping and then making the lib but that's not the way - it didn't succeed. Should I add my own source block via gr_modtool and insert only the commands I need?
Compatibility with uhd and its functions apart from just adding a few lines would be advantageous not to write the source from scratch.
Please advise
Edit
Experimental flowchart, based on Marcus Müller suggestion:
Experimental usrp synchronization flow
The problem is how should I insert these timed commands and set time_spec of stream in GNU radio block or gr-uhd libs?
For a USRP sink: add tags containing dictionaries with the correct command times to the streams. The GNU Radio API docs have information on how these dictionaries need to look like. The time field is what you need to set with an appropriate value.
For a USRP source: Use the set_start_time on the uhd_usrp_source block; use the same dictionaries described above to issue commands like tuning, gain setting at a coordinated time.
I was trying to find a proper way of synchronizing the USRPs via tags.
There are a few issues I came across in this approach:
Timed commands require the knowledge of the current moment in time, which is done via usrp.get_time_now(), even though I would request the USRP to give the time through tags I would have to somehow extract it from the output. (make some kind of loop and proper triggering) (source: https://kb.ettus.com/Synchronizing_USRP_Events_Using_Timed_Commands_in_UHD) or maybe plan everything not in a relative way - using absolute values instead of offsets. I have seen an approach to regularly reset the sense of time each PPS (set it to 0.0) and maybe then setting time of commands within range of 0.0-1.0 would be acceptable. Then the loop for reading and inserting time into commands would also be redundant.
I didn't found a way to create dicts in GR via blocks to make the solution scalable (without writing a few lines of code in textbox) or writing OOT block
In the end there is so little information to tell what kind of solution is most appropriate (PDU, events, are tags still relevant in GR
?), and the docs are so very scarce, that after some mailing I decided to add a simple class that inherits from the main top_bock.py and after instantiation of top_block it calls a few functions to synchronize the devices. This kind of solution is not the most flexible one, and the parent class top_block.py has to be called through the inheriting one, but it enables an easy programming interface.
Soon I will add an example of the code used in inheriting class just in case.
If there is any more neat, dynamic or scalable solution please let me know or point me to sources.

What happens if an MPI process crashes?

I am evaluating different multiprocessing libraries for a fault tolerant application. I basically need any process to be allowed to crash without stopping the whole application.
I can do it using the fork() system call. The limit here is that the process can be created on the same machine, only.
Can I do the same with MPI? If a process created with MPI crashes, can the parent process keep running and eventually create a new process?
Is there any alternative (possibly multiplatform and open source) library to get the same result?
As reported here, MPI 4.0 will have support for fault tolerance.
If you want collectives, you're going to have to wait for MPI-3.something (as High Performance Mark and Hristo Illev suggest)
If you can live with point-to-point, and you are a patient person willing to raise a bunch of bug reports against your MPI implementation, you can try the following:
disable the default MPI error handler
carefully check every single return code from your MPI programs
keep track in your application which ranks are up and which are down. Oh, and when they go down they can never get back. but you're unable to use collectives anyway (see my opening statement), so that's not a huge deal, right?
Here's an old paper (back when Bill still worked at Argonne. I think it's from 2003):
http://www.mcs.anl.gov/~lusk/papers/fault-tolerance.pdf . It lays out the kinds of fault tolerant things one can do in MPI. Perhaps such a "constrained MPI" might still work for your needs.
If you're willing to go for something research quality, there's two implementations of a potential fault tolerance chapter for a future version of MPI (MPI-4?). The proposal is called User Level Failure Mitigation. There's an experimental version in MPICH 3.2a2 and a branch of Open MPI that also provides the interfaces. Both are far from production quality, but you're welcome to try them out. Just know that since this isn't in the MPI Standard, the function prefixes are not MPI_*. For MPICH, they're MPIX_*, for the Open MPI branch, they're OMPI_* (though I believe they'll be changing theirs to be MPIX_* soon as well.
As Rob Latham mentioned, there will be lots of work you'll need to do within your app to handle failures, though you don't necessarily have to check all of your return codes. You can/should use MPI error handlers as a callback function to simplify things. There's information/examples in the spec available along with the Open MPI branch.

Obtaining Process Details

In general is there any way to get the details of the process (the process to which my program is translated to by the OS before execution). Is it possible to output the contents of the data structures (PCB for example) while my program is executing as a process?
I recommend running the program in linux .. you can then use readlf and objdump to get lot of information about the process(like its address space, dynamically linked libraries from glibc Ex:printf(), its symbol table etc)...
u can also see cat /proc/process's pid folder in linux, there to you can get a variety of information about the running process.
Ofcourse you can use a debugger to get the process's state as it is executing

Porting newlib to a custom ARM setup

this is my first post, and it covers something which I've been trying to get working on and off for about a year now.
Essentially it boils down to the following: I have a copy of newlib which I'm trying to get working on an LPC2388 (an ARM7TDMI from NXP). This is on a linux box using arm-elf-gcc
The question I have is that I've been looking at a lot of the tutorials talking about porting newlib, and they all talk about the stubs (like exit, open, read/write, sbrk), and I have a pretty good idea of how to implement all of these functions. But where should I put them?
I have the newlib distribution from sources.redhat.com/pub/newlib/newlib-1.18.0.tar.gz and after poking around I found "syscalls.c" (in newlib-1.18.0/newlib/libc/sys/arm) which contains all of the stubs which I have to update, but they're all filled in with rather finished looking code (which does NOT seem to work without the crt0.S, which itself does not work with my chip).
Should I just be wiping out those functions myself, and re-writing them? Or should I write them somewhere else. Should I make a whole new folder in newlib/libc/sys with the name of my "architecture" and change the target to match?
I'm also curious if there's proper etiquette on distribution of something like this after releasing it as an open source project. I currently have a script which downloads binutils, arm-elf-gcc, newlib, and gdb, and compiles them. If I am modifying files which are in the newlib directory, should I hand a patch which my script auto-applies? Or should I add the modified newlib to the repository?
Thanks for bothering to read! Following this is a more detailed breakdown of what I'm doing.
For those who want/need more info about my setup:
I'm building a ARM videogame console based loosely on the Uzebox project ( http://belogic.com/uzebox/ ).
I've been doing all sorts of things pulling from a lot of different resources as I try and figure it out. You can read about the start of my adventures here (sparkfun forums, no one responds as I figure it out on my own): forum.sparkfun.com/viewtopic.php?f=11&t=22072
I followed all of this by reading through the Stackoverflow questions about porting newlib and saw a few of the different tutorials (like wiki.osdev.org/Porting_Newlib ) but they also suffer from telling me to implements stubs without mentioning where, who, what, when, or how!
But where should I put them?
You can put them where you like, so long as they exist in the final link. You might incorporate them in the libc library itself, or you might keep that generic, and have the syscalls as a separate target specific object file or library.
You may need to create your own target specific crt0.s and assemble and link it for your target.
A good tutorial by Miro Samek of Quantum Leaps on getting GNU/ARM development up and running is available here. The examples are based on an Atmel AT91 part so you will need to know a little about your NXP device to adapt the start-up code.
A ready made Newlib porting layer for LPC2xxx was available here, but the links ot teh files appear to be broken. The same porting layer is used in Martin Thomas' WinARM project. This is a Windows port of GNU ARM GCC, but the examples included in it are target specific not host specific.
You should only need to modify the porting layer on Newlib, and since it is target and application specific, you need not (in fact probably should not) submit your code to the project.
When I was using newlib that is exactly what I did, blew away crt0.s, syscalls.c and libcfunc.c. My personal preference was to link in the replacement for crt0.s and syscalls.c (rolled the few functions in libcfunc into the syscalls.c replacement) based on the embedded application.
I never had an interest in pushing any of that work back into the distro, so cannot help you there.
You are on the right path though, crt0.S and syscalls.c are where you want to work to customize for your target. Personally I was interested in a C library (and printf) and would primarily neuter all of the functions to return 0 or 1 or whatever it took to get the function to just work and not get in the way of linking, periodically making the file I/O functions operate on linked in data in rom/ram. Basically without replacing or modifying any other files in newlib I had a fair amount of success, so you are on the right path.

Print complete control flow through gdb including values of variables

The idea is that given a specific input to the program, somehow I want to automatically step-in through the complete program and dump its control flow along with all the data being used like classes and their variables. Is their a straightforward way to do this? Or can this be done by some scripting over gdb or does it require modification in gdb?
Ok the reason for this question is because of an idea regarding a debugging tool. What it does is this. Given two different inputs to a program, one causing an incorrect output and the other a correct one, it will tell what part of the control flow differ for them.
So What I think will be needed is a complete dump of these 2 control flows going into a diff engine. And if the two inputs are following similar control flows then their diff would (in many cases) give a good idea about why the bug exist.
This can be made into a very engaging tool with many features build on top of this.
Tell us a little more about the environment. dtrace, for example, will do a marvelous job of this in Solaris or Leopard. gprof is another possibility.
A bumpo version of this could be done with yes(1), or expect(1).
If you want to get fancy, GDB can be scripted with Python in some versions.
What you are describing sounds a bit like gdb's "tracepoint debugging".
See gdb's internal help "help tracepoint". You can also see a whitepaper
here: http://sourceware.org/gdb/talks/esc-west-1999/
Unfortunately, this functionality is not currently implemented for
native debugging, but I believe that CodeSourcery is doing some work
on it.
Check this out, unlike Coverity, Fenris is free and widly used..
How to print the next N executed lines automatically in GDB?