MPI functions appear to execute out of order compared to printf() [duplicate]

Using MPI, a message appears to have been recieved before it has been sent
I am having a very strange code using MPI, in which statements appear to be executed in the wrong order. Specifically, the MPI statement appears to be executing before the printf even though it comes after it in the code.
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv)
int numProcs, rank, data;
MPI_Status status;
// Initialize the MPI library
MPI_Init(&argc, &argv);
// Get entity identification
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// Do something different in each a rank
if (rank == 0) {
// Get the data from rank 1
// with tag 0
printf("rank = %d\tGet the data from rank 1 with tag 0\n", rank);
MPI_Recv(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &status);
} else if (rank == 1) {
// Send the data to rank 0
// with tag 0
printf("rank = %d\tSend the data to rank 0 with tag 0\n", rank);
MPI_Send(&data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
printf("rank %d finishing\n", rank);
// Clean up the MPI library
return 0;
This is the output that is being generated:
$ mpirun -n 2 ./a.out
rank = 0 Get the data from rank 1 with tag 0
rank 0 finishing
rank = 1 Send the data to rank 0 with tag 0
rank 1 finishing
it seems that rank 0 does the printf, then it gets the data from rank 1, and then it finishes … and then rank 1 does the printf? But since rank 1 has to do the printf before it actually sends the data to rank 0, how can it be that the rank 0 already got the data and finished?

Screen output gets buffered by the OS, and then has to make its way through ssh tunnels to the process that started the MPI run. As a result, screen output can arrive in all sorts of orders. There is basically no way to get neatly ordered screen output, other than sending all text to process zero and printing from there in the correct order.


How to Optimally Shift Large Arrays n Number of Incidences

I am creating my own version of a music visualizer that responds to the frequency of music; a common project. I am using 2 strips of Neopixels, each with 300 LEDs making a total of 600 LEDs.
I have written functions, shown below, that create the desired affect of having a pulse of light travel down the strips independently. However, when running in real time with music, the updates per second is too slow to create a nice pulse; it looks choppy.
I believe the problem is the number of operations that must be preformed when the function is called. For each call to the function, a 300 value array per strip must be shifted 5 indices and 5 new values added.
Here is an illustration of how the function currently works:
-Arbitrary numbers are used to fill the array
-A shift of 2 indices shown
-X represents an index with no value assigned
-N represents the new value added by the function
Initial array: [1][3][7][2][9]
Shifted array: [X][X][1][3][7]
New array: [N][N][1][3][7]
Here if my code. Function declarations below loop(). I am using random() to trigger a pulse for testing purposes; no other functions were included for brevity.
#include <FastLED.h>
// ========================= Define setup parameters =========================
#define NUM_LEDS1 300 // Number of LEDS in strip 1
#define NUM_LEDS2 300 // Number of LEDS in strip 1
#define STRIP1_PIN 6 // Pin number for strip 1
#define STRIP2_PIN 10 // Pin number for strip 2
#define s1Band 1 // String 1 band index
#define s2Band 5 // String 2 band index
#define numUpdate 5 // Number of LEDs that will be used for a single pulse
// Colors for strip 1: Band 2 (Index 1)
#define s1R 255
#define s1G 0
#define s1B 0
// Colors for strip 2: Band 6 (Index 5)
#define s2R 0
#define s2G 0
#define s2B 255
// Create the arrays of LEDs
CRGB strip1[NUM_LEDS1];
CRGB strip2[NUM_LEDS2];
void setup() {
FastLED.addLeds<NEOPIXEL, STRIP1_PIN>(strip1, NUM_LEDS1);
FastLED.addLeds<NEOPIXEL, STRIP2_PIN>(strip2, NUM_LEDS2);
void loop() {
int num = random(0, 31);
// Pulse strip based on random number for testing
if (num == 5) {
// ======================= FUNCTION DECLARATIONS =======================
// Pulse a set of colored LEDs down the strip
void pulseDownStrip1() {
// Move all current LED states by n number of leds to be updated
for (int i = NUM_LEDS1 - 1; i >= 0; i--) {
strip1[i] = strip1[i - numUpdate];
// Add new LED values to the pulse
for (int j = 0; j < numUpdate; j++) {
strip1[j].setRGB(s1R, s1G, s1B);
// Pulse a set of black LEDs down the strip
void pulseBlack1(){
// Move all current LED states by n number of leds to be updated
for (int i = NUM_LEDS1 - 1; i >= 0; i--) {
strip1[i] = strip1[i - numUpdate];
// Add new LED values to the pulse
for (int j = 0; j < numUpdate; j++) {
strip1[j].setRGB(0, 0, 0);
I am looking for any suggestions regarding optimizing this operation. Through my research, copying the desired values to a new array rather than shifting the existing array seems to be a faster operation.
If you have any advice on optimizing this process, or alternate methods to produce the same animation, I would appreciate the help.
The secret is to not shift it. Shift where you start reading it instead. Keep track of a separate variable that keeps the start position and alter your reading through the array to start there, roll back over to zero when it gets to the array length, and stop one short of where it starts.
Google the term "circular buffer" Look at the Arduino HardwareSerial class for a decent implementation example.

sending characters from parent to child process and returning char count to parent in C

So for an assignment I have for my Computer Systems class, I need to type characters in the command line when the program runs.
These characters (such as abcd ef) would be stored in argv[].
The parent sends these characters one at a time through a pipe to the child process which then counts the characters and ignores spaces. After all the characters are sent, the child then returns the number of characters that it counted for the parent to report.
When I try to run the program as it is right now, it tells me the value of readIn is 4, the child processed 0 characters and charCounter is 2.
I feel like I'm so close but I'm missing something important :/ The char array for a and in the parent process was an attempt to hardcode the stuff in to see if it worked but I am still unsuccessful. Any help would be greatly appreciated, thank you!
// Characters from command line arguments are sent to child process
// from parent process one at a time through pipe.
// Child process counts number of characters sent through pipe.
// Child process returns number of characters counted to parent process.
// Parent process prints number of characters counted by child process.
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h> // for fork()
#include <sys/types.h> // for pid_t
#include <sys/wait.h> // for waitpid()
int main(int argc, char **argv)
int fd[2];
pid_t pid;
int status;
int charCounter = 0;
int nChar = 0;
char readbuffer[80];
char readIn = 'a';
//char a[] = {'a', 'b', 'c', 'd'};
pid = fork();
if (pid < 0) {
printf("fork error %d\n", pid);
return -1;
else if (pid == 0) {
// code that runs in the child process
while(readIn != 0)
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
printf("The value of readIn is %d\n", readIn);
if(readIn != ' ')
//write(fd[1], charCounter, sizeof(charCounter));
printf("The value of charCounter is %d\n", charCounter);
return charCounter;
// code that runs in the parent process
write(fd[1], &argv, sizeof(argv));
//write(fd[1], &a, sizeof(a));
//nChar = read(fd[0], readbuffer, sizeof(readbuffer));
nChar = charCounter;
printf("CS201 - Assignment 3 - Andy Grill\n");
printf("The child processed %d characters\n\n", nChar);
if (waitpid(pid, &status, 0) > 0)
if (WIFEXITED(status))
else if (WIFSIGNALED(status))
return 0;
You're misusing pipes.
A pipe is a unidirectional communication channel. Either you use it to send data from a parent process to a child process, or to send data from a child process to the parent. You can't do both - even if you kept the pipe's read and write channels open on both processes, each process would never know when it was its turn to read from the pipe (e.g. you could end up reading something in the child that was supposed to be read by the parent).
The code to send the characters from parent to child seems mostly correct (more details below), but you need to redesign child to parent communication. Now, you have two options to send the results from child to parent:
Use another pipe. You set up an additional pipe before forking for child-to-parent communication. This complicates the design and the code, because now you have 4 file descriptors to manage from 2 different pipes, and you need to be careful where you close each file descriptor to make sure processes don't hang. It is also probably a bit overkill because the child is only sending a number to the parent.
Return the result from the child as the exit value. This is what you're doing right now, and it's a good choice. However, you fail to retrieve that information in the parent: the child's termination status tells you the number of characters processed, you can fetch this value with waitpid(2), which you already do, but then you never look at status (which contains the results you're looking for).
Remember that a child process has its own address space. It makes no sense to try to read charCounter in the parent because the parent never modified it. The child process gets its own copy of charCounter, so any modifications are seen by the child only. Your code seems to assume otherwise.
To make this more obvious, I would suggest moving the declarations of variables to the corresponding process code. Only fd and pid need to be copied in both processes, the other variables are specific to the task of each process. So you can move the declarations of status and nChar to the parent process specific code, and you can move charCounter, readbuffer and readIn to the child. This will make it very obvious that the variables are completely independent on each process.
Now, some more specific remarks:
pipe(2) can return an error. You ignore the return value, and you shouldn't. At the very least, you should print an error message and terminate if pipe(2) failed for some reason. I also noticed you report errors in fork(2) with printf("fork error %d\n", pid);. This is not the correct way to do it: fork(2) and other syscalls (and library calls) always return -1 on error and set the errno global variable to indicate the cause. So that printf() will always print fork error -1 no matter what the error cause was. It's not helpful. Also, it prints the error message to stdout, and for a number of reasons, error messages should be printed to stderr instead. So I suggest using perror(3) instead, or manually print the error to stderr with fprintf(3). perror(3) has the added benefit of appending the error message description to the text you feed it, so it's usually a good choice.
if (pipe(fd) < 0) {
perror("pipe(2) error");
Other functions that you use throughout the code may also fail, and again, you are ignoring the (possible) error returns. close(2) can fail, as well as read(2). Handle the errors, they are there for a reason.
The way you use readIn is wrong. readIn is the result of read(2), which returns the number of characters read (and it should be an int). The code uses readIn as if it were the next character read. The characters read are stored in readbuffer, and readIn will tell you how many characters are on that buffer. So you use readIn to loop through the buffer contents and count the characters. Something like this:
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
while (readIn > 0) {
int i;
for (i = 0; i < readIn; i++) {
if (readbuffer[i] != ' ') {
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
Now, about the parent process:
You are not writing the characters into the pipe. This is meaningless:
write(fd[1], &argv, sizeof(argv));
&argv is of type char ***, and sizeof(argv) is the same as sizeof(char **), because argv is a char **. Array dimensions are not kept when passed into a function.
You need to manually loop through argv and write each entry into the pipe, like so:
int i;
for (i = 1; i < argv; i++) {
size_t to_write = strlen(argv[i]);
ssize_t written = write(fd[1], argv[i], to_write);
if (written != to_write) {
if (written < 0)
perror("write(2) error");
fprintf(stderr, "Short write detected on argv[%d]: %zd/zd\n", i, written, to_write);
Note that argv[0] is the name of the program, that's why i starts at 1. If you want to count argv[0] too, just change it to start at 0.
Finally, as I said before, you need to use the termination status fetched by waitpid(2) to get the actual count returned by the child. So you can only print the result after waitpid(2) returned and after making sure the child terminated gracefully. Also, to fetch the actual exit code you need to use the WEXITSTATUS macro (which is only safe to use if WIFEXITED returns true).
So here's the full program with all of these issues addressed:
// Characters from command line arguments are sent to child process
// from parent process one at a time through pipe.
// Child process counts number of characters sent through pipe.
// Child process returns number of characters counted to parent process.
// Parent process prints number of characters counted by child process.
#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for strlen()
#include <unistd.h> // for fork()
#include <sys/types.h> // for pid_t
#include <sys/wait.h> // for waitpid()
int main(int argc, char **argv)
int fd[2];
pid_t pid;
if (pipe(fd) < 0) {
perror("pipe(2) error");
pid = fork();
if (pid < 0) {
perror("fork(2) error");
if (pid == 0) {
int readIn;
int charCounter = 0;
char readbuffer[80];
if (close(fd[1]) < 0) {
perror("close(2) failed on pipe's write channel");
/* We use abort() here so that the child terminates with SIGABRT
* and the parent knows that the exit code is not meaningful
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
while (readIn > 0) {
int i;
for (i = 0; i < readIn; i++) {
if (readbuffer[i] != ' ') {
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
if (readIn < 0) {
perror("read(2) error");
printf("The value of charCounter is %d\n", charCounter);
return charCounter;
} else {
int status;
if (close(fd[0]) < 0) {
perror("close(2) failed on pipe's read channel");
int i;
for (i = 1; i < argc; i++) {
size_t to_write = strlen(argv[i]);
ssize_t written = write(fd[1], argv[i], to_write);
if (written != to_write) {
if (written < 0) {
perror("write(2) error");
} else {
fprintf(stderr, "Short write detected on argv[%d]: %zd/%zd\n", i, written, to_write);
if (close(fd[1]) < 0) {
perror("close(2) failed on pipe's write channel on parent");
if (waitpid(pid, &status, 0) < 0) {
perror("waitpid(2) error");
if (WIFEXITED(status)) {
printf("CS201 - Assignment 3 - Andy Grill\n");
printf("The child processed %d characters\n\n", WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
fprintf(stderr, "Child terminated abnormally with signal %d\n", WTERMSIG(status));
} else {
fprintf(stderr, "Unknown child termination status\n");
return 0;
Some final notes:
The shell splits arguments by spaces, so if you start the program as ./a.out this is a test, the code will not see a single space. This is irrelevant, because spaces are supposed to be ignored anyway, but if you want to test that the code really ignores spaces, you need to quote the parameters so that the shell does not process them, as in ./a.out "this is a test" "hello world" "lalala".
Only the rightmost (least significant) 8 bits of a program's exit code are used, so WEXITSTATUS will never return more than 255. If the child reads more than 255 characters, the value will wrap around, so you effectively have a character counter modulo 256. If this is a problem, then you need to go with the other approach and set up a 2nd pipe for child-to-parent communication and write the result there (and have the parent read it). You can confirm this on man 2 waitpid:
returns the exit status of the child. This consists of the least
significant 8 bits of the status argument that the child
specified in a call to exit(3) or _exit(2) or as the argument for a return
statement in main(). This macro should be employed only if
WIFEXITED returned true.

Read RC PWM signal using ATMega2560 in Atmel AVR studio

I am trying to read several PWM signals from an RC receiver into an ATMega 2560. I am having trouble understanding how the ICRn pin functions as it appears to be used for all three compare registers.
The RC PWM signal has a period of 20ms with a HIGH pulse of 2ms being a valid upper value and 1ms being a valid lower value. So the value will sweep from 1000us to 2000us. The period should begin at the rising edge of the pulse.
I have prescaled the 16MHz clock by 8 to have a 2MHz timer an thus should be able to measure the signal to 0.5us accuracy which is sufficient for my requirements.
Please note that I am having not problems with PWM output and this question is specifically about PWM input.
My code thus far is attached below. I know that I will have to use ICR3 and an ISR to measure the PWM values but I am unsure as to the best procedure for doing this. I also do not know how to check if the value measured is from PE3, PE4, or PE5. Is this code right and how do I get the value that I am looking for?
Any help would be greatly appreciated.
// Set pins as inputs
DDRE |= ( 0 << PE3 ) | ( 0 << PE4 ) | ( 0 << PE5 );
// Configure Timers for CTC mode
TCCR3A |= ( 1 << WGM31 ) | ( 1 << WGM30 ); // Set on compare match
TCCR3B |= ( 1 << WGM33 ) | ( 1 << WGM32 ) | ( 1 << CS31); // Set on compare match, prescale_clk/8
TCCR3B |= ( 1 << ICES5 ) // Use rising edge as trigger
// 16 bit register - set TOP value
OCR3A = 40000 - 1;
OCR3B = 40000 - 1;
OCR3C = 40000 - 1;
TIMSK3 |= ( 1 << ICIE3 );
I had forgotten to post my solution a few months ago so here it is...
I used a PPM receiver in the end so this code can easily edited to read a simple PWM.
In my header file I made a structure for a 6 channel receiver that I was using for my project. This can be changed as required for receivers with more or less channels.
#ifndef _PPM_H_
#define _PPM_H_
// Libraries included
#include <stdint.h>
#include <avr/interrupt.h>
struct orangeRX_ppm {
uint16_t ch[6];
volatile unsigned char ch_index;
struct orangeRX_ppm ppm;
/* Functions */
void ppm_input_init(void); // Initialise the PPM Input to CTC mode
ISR( TIMER5_CAPT_vect ); // Use ISR to handle CTC interrupt and decode PPM
#endif /* _PPM_H_ */
I then had the following in my .c file.
// Libraries included
#include <avr/io.h>
#include <stdint.h>
#include "ppm.h"
* ---
* ICP5 Pin48 on Arduino Mega
void ppm_input_init(void)
DDRL |= ( 0 << PL1 ); // set ICP5 as an input
TCCR5A = 0x00; // none
TCCR5B = ( 1 << ICES5 ) | ( 1 << CS51); // use rising edge as trigger, prescale_clk/8
TIMSK5 = ( 1 << ICIE5 ); // allow input capture interrupts
// Clear timer 5
TCNT5H = 0x00;
TCNT5L = 0x00;
// Interrupt service routine for reading PPM values from the radio receiver.
// Count duration of the high pulse
uint16_t high_cnt;
high_cnt = (unsigned int)ICR5L;
high_cnt += (unsigned int)ICR5H * 256;
/* If the duration is greater than 5000 counts then this is the end of the PPM signal
* and the next signal being addressed will be Ch0
if ( high_cnt < 5000 )
// Added for security of the array
if ( ch_index > 5 )
ch_index = 5;
}[ch_index] = high_cnt; // Write channel value to array
ch_index++; // increment channel index
ch_index = 0; // reset channel index
// Reset counter
TCNT5H = 0;
TCNT5L = 0;
TIFR5 = ( 1 << ICF5 ); // clear input capture flag
This code will use an trigger an ISR every time ICP5 goes from low to high. In this ISR the 16bit ICR5 register "ICR5H<<8|ICR5L" holds the number of pre-scaled clock pulses that have elapsed since the last change from low to high. This count is typically less than 2000 us. I have said that if the count is greater than 2500us (5000 counts) then the input is invalid and the next input should be[0].
I have attached an image of PPM as seen on my oscilloscope.
This method of reading PPM is quite efficient as we do not need to keep polling pins to check their logic level.
Don't forget to enable interrupts using the sei() command. Otherwise the ISR will never run.
Let's say you want to do the following (I'm not saying this will allow you to accurately measure the PWM signals but it might serve as example on how to set the registers)
Three timers running, which reset every 20 ms. This can be done by setting them in CTC mode for OCRnA: wgm3..0 = 0b0100.
//timer 1
TCCR4A = 0;
TCCR1B = (1<<CS11) | (1<<WGM12);
OCR1A = 40000 - 1;
//timer 3 (there's no ICP2)
TCCR3A = 0;
TCCR3B = (1<<CS31) | (1<<WGM32);
OCR3A = 40000 - 1;
//timer 4
TCCR4A = 0;
TCCR4B = (1<<CS41) | (1<<WGM42);
OCR4A = 40000 - 1;
Now connect each of the three pwm signals to their own ICPn pin (where n = timer). Check the datasheet for the locations of the different ICPn pins (i'm pretty sure it's not PE3, 4, 5)
Assuming the pwm signals start high at t=0 and go low after their high-time for the remainder of the period. You want to measure the high-time so we trigger an interrupt for each when a falling edge occurs on the ICPn pin.
bit ICESn in the TCCRnB register set to 0 will select the falling edge (this is already done in the previous code block).
To trigger the interrupts, set the corresponding interrupt enable bits:
TIMSK1 |= (1<<ICIE1);
TIMSK3 |= (1<<ICIE3);
TIMSK4 |= (1<<ICIE4);
Now each time an interrupt is triggered for ICn you can grab the ICRn register to see the time (in clockperiods/8) at which the falling edge occurred.

MPI message received in different communicator - erroneous program or MPI implementation bug?

This is a follow-up to this previous question of mine, for which the conclusion was that the program was erroneous, and therefore the expected behavior was undefined.
What I'm trying to create here is a simple error-handling mechanism, for which I use that Irecv request for the empty message as an "abort handle", attaching it to my normal MPI_Wait call (and turning it into MPI_WaitAny), in order to allow me to unblock process 1 in case an error occurs on process 0 and it can no longer reach the point where it's supposed to post the matching MPI_Recv.
What's happening is that, due to internal message buffering, the MPI_Isend may succeed right away, without the other process being able to post the matching MPI_Recv. So there's no way of canceling it anymore.
I was hoping that once all processes call MPI_Comm_free I can just forget about that message once and for all, but, as it turns out, that's not the case. Instead, it's being delivered to the MPI_Recv in the following communicator.
So my questions are:
Is this also an erroneous program, or is it a bug in the MPI implementation (Intel MPI 4.0.3)?
If I turn my MPI_Isend calls into MPI_Issend, the program works as expected - can I at least in that case rest assured that the program is correct?
Am I reinventing the wheel here? Is there a simpler way to achieve this?
Again, any feedback is much appreciated!
#include "stdio.h"
#include "unistd.h"
#include "mpi.h"
#include "time.h"
#include "stdlib.h"
int main(int argc, char* argv[]) {
int rank, size;
MPI_Group group;
MPI_Comm my_comm;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_group(MPI_COMM_WORLD, &group);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 1) {
MPI_Request req[2];
int msg = 123, which;
MPI_Isend(&msg, 1, MPI_INT, 0, 0, my_comm, &req[0]);
MPI_Irecv(NULL, 0, MPI_INT, 0, 0, my_comm, &req[1]);
MPI_Waitany(2, req, &which, MPI_STATUS_IGNORE);
if (which == 0) {
printf("rank 1: send succeed; cancelling abort handle\n");
} else {
printf("rank 1: send aborted; cancelling send request\n");
} else {
MPI_Request req;
int msg, r = rand() % 2;
if (r) {
printf("rank 0: receiving message\n");
MPI_Recv(&msg, 1, MPI_INT, 1, 0, my_comm, MPI_STATUS_IGNORE);
} else {
printf("rank 0: sending abort message\n");
MPI_Isend(NULL, 0, MPI_INT, 1, 0, my_comm, &req);
if (!r) {
if (rank == 0) printf("freeing communicator %d\n", my_comm);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 0) {
MPI_Request req;
MPI_Status status;
int msg, cancelled;
MPI_Irecv(&msg, 1, MPI_INT, 1, 0, my_comm, &req);
MPI_Wait(&req, &status);
MPI_Test_cancelled(&status, &cancelled);
if (cancelled) {
printf("rank 0: receive cancelled\n");
} else {
printf("rank 0: OLD MESSAGE RECEIVED!!!\n");
if (rank == 0) printf("freeing communicator %d\n", my_comm);
return 0;
created communicator -2080374784
rank 0: sending abort message
rank 1: send succeed; cancelling abort handle
freeing communicator -2080374784
created communicator -2080374784
freeing communicator -2080374784
As mentioned in one of the above comments by #kraffenetti, this is an erroneous program because the sent messages are not being matched by receives. Even though the messages are cancelled, they still need to have a matching receive on the remote side because it's possible that the cancel might not be successful for sent messages due to the fact that they were already sent before the cancel can be completed (which is the case here).
This question started a thread on this on a ticket for MPICH, which you can find here that has more details.
I tried to build your code using open mpi and it did not work. mpicc complained about status.cancelled
error: ‘MPI_Status’ has no member named ‘cancelled’
I suppose this is a feature of intel mpi. What happens if you switch for :
int flag;
MPI_Test_cancelled(&status, &flag);
if (flag) {
This gives the expected output using open mpi (and it makes your code less dependant). Is it the case using intel mpi ?
We need an expert to tell us what is status.cancelled in intel mpi, because i don't know anything about it !
Edit : i tested my answer many times and i found that the output was random, sometimes correct, sometimes not. Sorry for that... As if something in status was not set. Part of the answer may be in MPI_Wait(), ,
" The MPI_ERROR field of the status return is only set if the return from the MPI routine is MPI_ERR_IN_STATUS. That error class is only returned by the routines that take an array of status arguments (MPI_Testall, MPI_Testsome, MPI_Waitall, and MPI_Waitsome). In all other cases, the value of the MPI_ERROR field in the status is unchanged. See section 3.2.5 in the MPI-1.1 specification for the exact text. " If MPI_Test_cancelled() makes use of the MPI_ERROR, things might get bad.
So here is the trick : use MPI_Waitall(1,&req, &status) ! The output is correct at last !

How to get a grandparents/ancestors process ID?

I would like to know - if possible - how to get the pid of a process' grandparent (or further).
To be more specific, I want for a process to print its depth in a process tree.
For example, when starting with the following:
int main() {
int creator_id = (int) getpid();
pid_t pid1 = fork();
pid_t pid2 = fork();
pid_t pid3 = fork();
//print depth in process tree of each process
return 0;
According to my theory, the tree will look like this:
/ | \
/ | \
0 0 0
/ \ |
0 0 0
So my first idea was to somehow see how often I have to go up until I find the creator's pid.
As a little sidenote:
I also wondered if it was possible to make the printing from bottom up, meaning that all processes in the deepest level would print first.
how to get the pid of a process' grandparent (or further).
This depends on which operating system you are using, since you use fork() to create new process in your example, I suppose you are using some Unix-like system.
If you are using Linux and know the pid of a process, you could get its parent process' pid from /proc/[pid]/stat, the fourth field in that file. Through this parent-child chain, you could find a process' all ancestors.
Following #Lee Duhem's hint, I made the following function that returns the nth ancestor of the current process (the 2nd ancestor is the grandparent).
/* Get the process ID of the calling process's nth ancestor. */
pid_t getapid(int n) {
pid_t pid = getpid();
while(n>0 && pid){ // process with pid 0 has no parent
// strlen("/proc/") == 6
// max [pid] for 64 bits is 4194304 then strlen("[pid]") < 7
// strlen("/stat") == 5
// then strlen("/proc/[pid]/stat") < 6 + 7 + 5
char proc_stat_path[6+7+5+1];
sprintf(proc_stat_path, "/proc/%d/stat", pid);
// open "/proc/<pid>/stat"
FILE *fh = fopen(proc_stat_path, "r");
if (fh == NULL) {
fprintf(stderr, "Failed opening %s: ", proc_stat_path);
// seek to the last ')'
int c;
long pos = 0;
while ((c = fgetc(fh)) != EOF) {
if (c == ')')
pos = ftell(fh);
fseek(fh, pos, SEEK_SET);
// get parent
fscanf(fh, " %*c %d", &pid);
// close "/proc/<pid>/stat"
// decrement n
return -1;
return pid;