waitpid(pid,status,0) status not reading correctly - process

everyone. I've got a problem that is making me very confused. I'm just trying to print out the status received from a terminated process but it isn't working the way I thought it would. Here is the code.
int main(int argc, char *argv[])
{
printf("this process (%d)\n",(int)(getpid()));
int *status;
pid_t pid;
Signal(SIGINT, handler1);
if ((pid = fork())==0){
while(1)
;
}
kill(pid,SIGINT);
while(pid>0){
pid = waitpid(pid,status,0);
printf("status: %d\n", WEXITSTATUS(status));
printf("waitpid return: %d\n",(int)pid);
}
return 0;
}
void handler1(int sig){
printf("process (%d) has received a sigint\n",(int)(getpid()));
exit('d');
}
it has output
this process (8811)
process (8812) has received a sigint
status: 0
waitpid return: 8812
status: 0
waitpid return: -1
when I use WIFEXITED(status) it returns true. So shouldn't WEXITSTATUS return what I passed through exit and not 0?
I did not include the Signal function here.

Here's your problem
int *status;
...
pid = waitpid(pid,status,0);
printf("status: %d\n", WEXITSTATUS(status));
You're not allocating space for the status. You're just declaring a pointer (which is probably initialized to 0, or NULL when we're talking pointers), and then passing it to waitpid (which, if it receives a NULL for the status, ignores it, which prevents a segfault), and then you're passing that NULL pointer to the EXITSTATUS() macro which wants an int (but it treats it as a zero, anyway). And you get "status: 0".
What you want to do is declare your status as a regular int, and then pass its pointer to waitpid():
int status;
...
pid = waitpid(pid,&status,0);
printf("status: %d\n", WEXITSTATUS(status));
This will print out "status: 100" like you're looking for.

Related

Winsock2, BitCoin Select() returns data to read, Recv() returns 0 bytes

I made a connection to BitCoin node via WinSock2. I sent the proper "getaddr" message and then the server responds, the replied data are ready to read, because Select() notifies this, but when I call Recv() there are 0 bytes read.
My code is working OK on localhost test server. The incomplete "getaddr" message (less than 24 bytes) is NOT replied by BitCoin node, only proper message, but I can't read the reply with Recv(). After returning 0 bytes, the Select() still returns there are data to read.
My code is divided into DLL which uses Winsock2 and the main() function.
Here are key fragments:
struct CMessageHeader
{
uint32_t magic;
char command[12];
uint32_t payload;
uint32_t checksum;
};
CSocket *sock = new CSocket();
int actual; /* Actually read/written bytes */
sock->connect("109.173.41.43", 8333);
CMessageHeader msg = { 0xf9beb4d9, "getaddr\0\0\0\0", 0, 0x5df6e0e2 }, rcv = { 0 };
actual = sock->send((const char *)&msg, sizeof(msg));
actual = sock->select(2, 0); /* Select read with 2 seconds waiting time */
actual = sock->receive((char *)&rcv, sizeof(rcv));
The key fragment of DLL code:
int CSocket::receive(char *buf, int len)
{
int actual;
if ((actual = ::recv(sock, buf, len, 0)) == SOCKET_ERROR) {
std::ostringstream s;
s << "Nie mozna odebrac " << len << " bajtow.";
throw(CError(s));
}
return(actual);
}
If select() reports the socket is readable, and then recv() returns 0 afterwards, that means the peer gracefully closed the connection on their end (ie, sent a FIN packet to you), so you need to close your socket.
On a side note, recv() can return fewer bytes than requested, so your receive() function should call recv() in a loop until all of the expected bytes have actually been received, or an error occurs (same with send(), too).

Creating threads with pthread_create() doesn't work on my linux

I have this piece of c/c++ code:
void * myThreadFun(void *vargp)
{
int start = atoi((char*)vargp) % nFracK;
printf("Thread start = %d, dQ = %d\n", start, dQ);
pthread_mutex_lock(&nItermutex);
nIter++;
pthread_mutex_unlock(&nItermutex);
}
void Opt() {
pthread_t thread[200];
char start[100];
for(int i = 0; i < 10; i++) {
sprintf(start, "%d", i);
int ret = pthread_create (&thread[i], NULL, myThreadFun, (void*) start);
printf("ret = %d on thread %d\n", ret, i);
}
for(int i = 0; i < 10; i++)
pthread_join(thread[i], NULL);
}
But it should create 10 threads. I don't understand why, instead, it creates n < 10 threads.
The ret value is always 0 (for 10 times).
But it should create 10 threads. I don't understand why, instead, it creates n < 10 threads. The ret value is always 0 (for 10 times).
Your program contains at least one data race, therefore its behavior is undefined.
The provided source is also is incomplete, so it's impossible to be sure that I can test the same thing you are testing. Nevertheless, I performed the minimum augmentation needed for g++ to compile it without warnings, and tested that:
#include <cstdlib>
#include <cstdio>
#include <pthread.h>
pthread_mutex_t nItermutex = PTHREAD_MUTEX_INITIALIZER;
const int nFracK = 100;
const int dQ = 4;
int nIter = 0;
void * myThreadFun(void *vargp)
{
int start = atoi((char*)vargp) % nFracK;
printf("Thread start = %d, dQ = %d\n", start, dQ);
pthread_mutex_lock(&nItermutex);
nIter++;
pthread_mutex_unlock(&nItermutex);
return NULL;
}
void Opt() {
pthread_t thread[200];
char start[100];
for(int i = 0; i < 10; i++) {
sprintf(start, "%d", i);
int ret = pthread_create (&thread[i], NULL, myThreadFun, (void*) start);
printf("ret = %d on thread %d\n", ret, i);
}
for(int i = 0; i < 10; i++)
pthread_join(thread[i], NULL);
}
int main(void) {
Opt();
return 0;
}
The fact that its behavior is undefined notwithstanding, when I run this program on my Linux machine, it invariably prints exactly ten "Thread start" lines, albeit not all with distinct numbers. The most plausible conclusion is that the program indeed does start ten (additional) threads, which is consistent with the fact that the output also seems to indicate that each call to pthread_create() indicates success by returning 0. I therefore reject your assertion that fewer than ten threads are actually started.
Presumably, the followup question would be why the program does not print the expected output, and here we return to the data race and accompanying undefined behavior. The main thread writes a text representation of iteration variable i into local array data of function Opt, and passes a pointer to that same array to each call to pthread_create(). When it then cycles back to do it again, there is a race between the newly created thread trying to read back the data and the main thread overwriting the array's contents with new data. I suppose that your idea was to avoid passing &i, but this is neither better nor fundamentally different.
You have several options for avoiding a data race in such a situation, prominent among them being:
initialize each thread indirectly from a different object, for example:
int start[10];
for(int i = 0; i < 10; i++) {
start[i] = i;
int ret = pthread_create(&thread[i], NULL, myThreadFun, &start[i]);
}
Note there that each thread is passed a pointer to a different array element, which the main thread does not subsequently modify.
initialize each thread directly from the value passed to it. This is not always a viable alternative, but it is possible in this case:
for(int i = 0; i < 10; i++) {
start[i] = i;
int ret = pthread_create(&thread[i], NULL, myThreadFun,
reinterpret_cast<void *>(static_cast<std::intptr_t>(i)));
}
accompanied by corresponding code in the thread function:
int start = reinterpret_cast<std::intptr_t>(vargp) % nFracK;
This is a fairly common idiom, though more often used when writing in pthreads's native language, C, where it's less verbose.
Use a mutex, semaphore, or other synchronization object to prevent the main thread from modifying the array before the child has read it. (Left as an exercise.)
Any of those options can be used to write a program that produces the expected output, with each thread responsible for printing one line. Supposing, of course, that the expectations of the output do not include that the relative order of the threads' outputs will be the same as the relative order in which they were started. If you want that, then only the option of synchronizing the parent and child threads will achieve it.

sending characters from parent to child process and returning char count to parent in C

So for an assignment I have for my Computer Systems class, I need to type characters in the command line when the program runs.
These characters (such as abcd ef) would be stored in argv[].
The parent sends these characters one at a time through a pipe to the child process which then counts the characters and ignores spaces. After all the characters are sent, the child then returns the number of characters that it counted for the parent to report.
When I try to run the program as it is right now, it tells me the value of readIn is 4, the child processed 0 characters and charCounter is 2.
I feel like I'm so close but I'm missing something important :/ The char array for a and in the parent process was an attempt to hardcode the stuff in to see if it worked but I am still unsuccessful. Any help would be greatly appreciated, thank you!
// Characters from command line arguments are sent to child process
// from parent process one at a time through pipe.
//
// Child process counts number of characters sent through pipe.
//
// Child process returns number of characters counted to parent process.
//
// Parent process prints number of characters counted by child process.
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h> // for fork()
#include <sys/types.h> // for pid_t
#include <sys/wait.h> // for waitpid()
int main(int argc, char **argv)
{
int fd[2];
pid_t pid;
int status;
int charCounter = 0;
int nChar = 0;
char readbuffer[80];
char readIn = 'a';
//char a[] = {'a', 'b', 'c', 'd'};
pipe(fd);
pid = fork();
if (pid < 0) {
printf("fork error %d\n", pid);
return -1;
}
else if (pid == 0) {
// code that runs in the child process
close(fd[1]);
while(readIn != 0)
{
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
printf("The value of readIn is %d\n", readIn);
if(readIn != ' ')
{
charCounter++;
}
}
close(fd[0]);
//open(fd[1]);
//write(fd[1], charCounter, sizeof(charCounter));
printf("The value of charCounter is %d\n", charCounter);
return charCounter;
}
else
{
// code that runs in the parent process
close(fd[0]);
write(fd[1], &argv, sizeof(argv));
//write(fd[1], &a, sizeof(a));
close(fd[1]);
//open(fd[0]);
//nChar = read(fd[0], readbuffer, sizeof(readbuffer));
nChar = charCounter;
printf("CS201 - Assignment 3 - Andy Grill\n");
printf("The child processed %d characters\n\n", nChar);
if (waitpid(pid, &status, 0) > 0)
{
if (WIFEXITED(status))
{
}
else if (WIFSIGNALED(status))
{
}
}
return 0;
}
}
You're misusing pipes.
A pipe is a unidirectional communication channel. Either you use it to send data from a parent process to a child process, or to send data from a child process to the parent. You can't do both - even if you kept the pipe's read and write channels open on both processes, each process would never know when it was its turn to read from the pipe (e.g. you could end up reading something in the child that was supposed to be read by the parent).
The code to send the characters from parent to child seems mostly correct (more details below), but you need to redesign child to parent communication. Now, you have two options to send the results from child to parent:
Use another pipe. You set up an additional pipe before forking for child-to-parent communication. This complicates the design and the code, because now you have 4 file descriptors to manage from 2 different pipes, and you need to be careful where you close each file descriptor to make sure processes don't hang. It is also probably a bit overkill because the child is only sending a number to the parent.
Return the result from the child as the exit value. This is what you're doing right now, and it's a good choice. However, you fail to retrieve that information in the parent: the child's termination status tells you the number of characters processed, you can fetch this value with waitpid(2), which you already do, but then you never look at status (which contains the results you're looking for).
Remember that a child process has its own address space. It makes no sense to try to read charCounter in the parent because the parent never modified it. The child process gets its own copy of charCounter, so any modifications are seen by the child only. Your code seems to assume otherwise.
To make this more obvious, I would suggest moving the declarations of variables to the corresponding process code. Only fd and pid need to be copied in both processes, the other variables are specific to the task of each process. So you can move the declarations of status and nChar to the parent process specific code, and you can move charCounter, readbuffer and readIn to the child. This will make it very obvious that the variables are completely independent on each process.
Now, some more specific remarks:
pipe(2) can return an error. You ignore the return value, and you shouldn't. At the very least, you should print an error message and terminate if pipe(2) failed for some reason. I also noticed you report errors in fork(2) with printf("fork error %d\n", pid);. This is not the correct way to do it: fork(2) and other syscalls (and library calls) always return -1 on error and set the errno global variable to indicate the cause. So that printf() will always print fork error -1 no matter what the error cause was. It's not helpful. Also, it prints the error message to stdout, and for a number of reasons, error messages should be printed to stderr instead. So I suggest using perror(3) instead, or manually print the error to stderr with fprintf(3). perror(3) has the added benefit of appending the error message description to the text you feed it, so it's usually a good choice.
Example:
if (pipe(fd) < 0) {
perror("pipe(2) error");
exit(EXIT_FAILURE);
}
Other functions that you use throughout the code may also fail, and again, you are ignoring the (possible) error returns. close(2) can fail, as well as read(2). Handle the errors, they are there for a reason.
The way you use readIn is wrong. readIn is the result of read(2), which returns the number of characters read (and it should be an int). The code uses readIn as if it were the next character read. The characters read are stored in readbuffer, and readIn will tell you how many characters are on that buffer. So you use readIn to loop through the buffer contents and count the characters. Something like this:
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
while (readIn > 0) {
int i;
for (i = 0; i < readIn; i++) {
if (readbuffer[i] != ' ') {
charCounter++;
}
}
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
}
Now, about the parent process:
You are not writing the characters into the pipe. This is meaningless:
write(fd[1], &argv, sizeof(argv));
&argv is of type char ***, and sizeof(argv) is the same as sizeof(char **), because argv is a char **. Array dimensions are not kept when passed into a function.
You need to manually loop through argv and write each entry into the pipe, like so:
int i;
for (i = 1; i < argv; i++) {
size_t to_write = strlen(argv[i]);
ssize_t written = write(fd[1], argv[i], to_write);
if (written != to_write) {
if (written < 0)
perror("write(2) error");
else
fprintf(stderr, "Short write detected on argv[%d]: %zd/zd\n", i, written, to_write);
}
}
Note that argv[0] is the name of the program, that's why i starts at 1. If you want to count argv[0] too, just change it to start at 0.
Finally, as I said before, you need to use the termination status fetched by waitpid(2) to get the actual count returned by the child. So you can only print the result after waitpid(2) returned and after making sure the child terminated gracefully. Also, to fetch the actual exit code you need to use the WEXITSTATUS macro (which is only safe to use if WIFEXITED returns true).
So here's the full program with all of these issues addressed:
// Characters from command line arguments are sent to child process
// from parent process one at a time through pipe.
//
// Child process counts number of characters sent through pipe.
//
// Child process returns number of characters counted to parent process.
//
// Parent process prints number of characters counted by child process.
#include <stdlib.h>
#include <stdio.h>
#include <string.h> // for strlen()
#include <unistd.h> // for fork()
#include <sys/types.h> // for pid_t
#include <sys/wait.h> // for waitpid()
int main(int argc, char **argv)
{
int fd[2];
pid_t pid;
if (pipe(fd) < 0) {
perror("pipe(2) error");
exit(EXIT_FAILURE);
}
pid = fork();
if (pid < 0) {
perror("fork(2) error");
exit(EXIT_FAILURE);
}
if (pid == 0) {
int readIn;
int charCounter = 0;
char readbuffer[80];
if (close(fd[1]) < 0) {
perror("close(2) failed on pipe's write channel");
/* We use abort() here so that the child terminates with SIGABRT
* and the parent knows that the exit code is not meaningful
*/
abort();
}
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
while (readIn > 0) {
int i;
for (i = 0; i < readIn; i++) {
if (readbuffer[i] != ' ') {
charCounter++;
}
}
readIn = read(fd[0], readbuffer, sizeof(readbuffer));
}
if (readIn < 0) {
perror("read(2) error");
}
printf("The value of charCounter is %d\n", charCounter);
return charCounter;
} else {
int status;
if (close(fd[0]) < 0) {
perror("close(2) failed on pipe's read channel");
exit(EXIT_FAILURE);
}
int i;
for (i = 1; i < argc; i++) {
size_t to_write = strlen(argv[i]);
ssize_t written = write(fd[1], argv[i], to_write);
if (written != to_write) {
if (written < 0) {
perror("write(2) error");
} else {
fprintf(stderr, "Short write detected on argv[%d]: %zd/%zd\n", i, written, to_write);
}
}
}
if (close(fd[1]) < 0) {
perror("close(2) failed on pipe's write channel on parent");
exit(EXIT_FAILURE);
}
if (waitpid(pid, &status, 0) < 0) {
perror("waitpid(2) error");
exit(EXIT_FAILURE);
}
if (WIFEXITED(status)) {
printf("CS201 - Assignment 3 - Andy Grill\n");
printf("The child processed %d characters\n\n", WEXITSTATUS(status));
} else if (WIFSIGNALED(status)) {
fprintf(stderr, "Child terminated abnormally with signal %d\n", WTERMSIG(status));
} else {
fprintf(stderr, "Unknown child termination status\n");
}
return 0;
}
}
Some final notes:
The shell splits arguments by spaces, so if you start the program as ./a.out this is a test, the code will not see a single space. This is irrelevant, because spaces are supposed to be ignored anyway, but if you want to test that the code really ignores spaces, you need to quote the parameters so that the shell does not process them, as in ./a.out "this is a test" "hello world" "lalala".
Only the rightmost (least significant) 8 bits of a program's exit code are used, so WEXITSTATUS will never return more than 255. If the child reads more than 255 characters, the value will wrap around, so you effectively have a character counter modulo 256. If this is a problem, then you need to go with the other approach and set up a 2nd pipe for child-to-parent communication and write the result there (and have the parent read it). You can confirm this on man 2 waitpid:
WEXITSTATUS(status)
returns the exit status of the child. This consists of the least
significant 8 bits of the status argument that the child
specified in a call to exit(3) or _exit(2) or as the argument for a return
statement in main(). This macro should be employed only if
WIFEXITED returned true.

Stange behavior with my C string reverse function

I'm just an amateur programmer...
And when reading, for the second time, and more than two years apart, kochan's "Programming in Objective-C", now the 6th ed., reaching the pointer chapter i tried to revive the old days when i started programming with C...
So, i tried to program a reverse C string function, using char pointers...
At the end i got the desired result, but... got also a very strange behavior, i cannot explain with my little programming experience...
First the code:
This is a .m file,
#import <Foundation/Foundation.h>
#import "*pathToFolder*/NSPrint.m"
int main(int argc, char const *argv[])
{
#autoreleasepool
{
char * reverseString(char * str);
char *ch;
if (argc < 2)
{
NSPrint(#"No word typed in the command line!");
return 1;
}
NSPrint(#"Reversing arguments:");
for (int i = 1; argv[i]; i++)
{
ch = reverseString(argv[i]);
printf("%s\n", ch);
//NSPrint(#"%s - %s", argv[i], ch);
}
}
return 0;
}
char * reverseString(char * str)
{
int size = 0;
for ( ; *(str + size) != '\0'; size++) ;
//printf("Size: %i\n", size);
char result[size + 1];
int i = 0;
for (size-- ; size >= 0; size--, i++)
{
result[i] = *(str + size);
//printf("%c, %c\n", result[i], *(str + size));
}
result[i] = '\0';
//printf("result location: %lu\n", result);
//printf("%s\n", result);
return result;
}
Second some notes:
This code is compiled in a MacBook Pro, with MAC OS X Maverick, with CLANG (clang -fobjc-arc $file_name -o $file_name_base)
That NSPrint is just a wrapper for printf to print a NSString constructed with stringWithFormat:arguments:
And third the strange behavior:
If I uncomment all those commented printf declarations, everything work just fine, i.e., all printf functions print what they have to print, including the last printf inside main function.
If I uncomment one, and just one, randomly chosen, of those comment printf functions, again everything work just fine, and I got the correct printf results, including the last printf inside main function.
If I leave all those commented printf functions as they are, I GOT ONLY BLANK LINES with the last printf inside main block, and one black line for each argument passed...
Worst, if I use that NSPrint function inside main, instead of the printf one, I get the desired result :!
Can anyone bring some light here please :)
You're returning a local array, that goes out of scope as the function exits. Dereferencing that memory causes undefined behavior.
You are returning a pointer to a local variable of the function that was called. When that function returns, the memory for the local variable becomes invalid, and the pointer returned is rubbish.

MPI message received in different communicator - erroneous program or MPI implementation bug?

This is a follow-up to this previous question of mine, for which the conclusion was that the program was erroneous, and therefore the expected behavior was undefined.
What I'm trying to create here is a simple error-handling mechanism, for which I use that Irecv request for the empty message as an "abort handle", attaching it to my normal MPI_Wait call (and turning it into MPI_WaitAny), in order to allow me to unblock process 1 in case an error occurs on process 0 and it can no longer reach the point where it's supposed to post the matching MPI_Recv.
What's happening is that, due to internal message buffering, the MPI_Isend may succeed right away, without the other process being able to post the matching MPI_Recv. So there's no way of canceling it anymore.
I was hoping that once all processes call MPI_Comm_free I can just forget about that message once and for all, but, as it turns out, that's not the case. Instead, it's being delivered to the MPI_Recv in the following communicator.
So my questions are:
Is this also an erroneous program, or is it a bug in the MPI implementation (Intel MPI 4.0.3)?
If I turn my MPI_Isend calls into MPI_Issend, the program works as expected - can I at least in that case rest assured that the program is correct?
Am I reinventing the wheel here? Is there a simpler way to achieve this?
Again, any feedback is much appreciated!
#include "stdio.h"
#include "unistd.h"
#include "mpi.h"
#include "time.h"
#include "stdlib.h"
int main(int argc, char* argv[]) {
int rank, size;
MPI_Group group;
MPI_Comm my_comm;
srand(time(NULL));
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_group(MPI_COMM_WORLD, &group);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 1) {
MPI_Request req[2];
int msg = 123, which;
MPI_Isend(&msg, 1, MPI_INT, 0, 0, my_comm, &req[0]);
MPI_Irecv(NULL, 0, MPI_INT, 0, 0, my_comm, &req[1]);
MPI_Waitany(2, req, &which, MPI_STATUS_IGNORE);
MPI_Barrier(my_comm);
if (which == 0) {
printf("rank 1: send succeed; cancelling abort handle\n");
MPI_Cancel(&req[1]);
MPI_Wait(&req[1], MPI_STATUS_IGNORE);
} else {
printf("rank 1: send aborted; cancelling send request\n");
MPI_Cancel(&req[0]);
MPI_Wait(&req[0], MPI_STATUS_IGNORE);
}
} else {
MPI_Request req;
int msg, r = rand() % 2;
if (r) {
printf("rank 0: receiving message\n");
MPI_Recv(&msg, 1, MPI_INT, 1, 0, my_comm, MPI_STATUS_IGNORE);
} else {
printf("rank 0: sending abort message\n");
MPI_Isend(NULL, 0, MPI_INT, 1, 0, my_comm, &req);
}
MPI_Barrier(my_comm);
if (!r) {
MPI_Cancel(&req);
MPI_Wait(&req, MPI_STATUS_IGNORE);
}
}
if (rank == 0) printf("freeing communicator %d\n", my_comm);
MPI_Comm_free(&my_comm);
sleep(2);
MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
if (rank == 0) printf("created communicator %d\n", my_comm);
if (rank == 0) {
MPI_Request req;
MPI_Status status;
int msg, cancelled;
MPI_Irecv(&msg, 1, MPI_INT, 1, 0, my_comm, &req);
sleep(1);
MPI_Cancel(&req);
MPI_Wait(&req, &status);
MPI_Test_cancelled(&status, &cancelled);
if (cancelled) {
printf("rank 0: receive cancelled\n");
} else {
printf("rank 0: OLD MESSAGE RECEIVED!!!\n");
}
}
if (rank == 0) printf("freeing communicator %d\n", my_comm);
MPI_Comm_free(&my_comm);
MPI_Finalize();
return 0;
}
outputs:
created communicator -2080374784
rank 0: sending abort message
rank 1: send succeed; cancelling abort handle
freeing communicator -2080374784
created communicator -2080374784
rank 0: STRAY MESSAGE RECEIVED!!!
freeing communicator -2080374784
As mentioned in one of the above comments by #kraffenetti, this is an erroneous program because the sent messages are not being matched by receives. Even though the messages are cancelled, they still need to have a matching receive on the remote side because it's possible that the cancel might not be successful for sent messages due to the fact that they were already sent before the cancel can be completed (which is the case here).
This question started a thread on this on a ticket for MPICH, which you can find here that has more details.
I tried to build your code using open mpi and it did not work. mpicc complained about status.cancelled
error: ‘MPI_Status’ has no member named ‘cancelled’
I suppose this is a feature of intel mpi. What happens if you switch for :
...
int flag;
MPI_Test_cancelled(&status, &flag);
if (flag) {
...
This gives the expected output using open mpi (and it makes your code less dependant). Is it the case using intel mpi ?
We need an expert to tell us what is status.cancelled in intel mpi, because i don't know anything about it !
Edit : i tested my answer many times and i found that the output was random, sometimes correct, sometimes not. Sorry for that... As if something in status was not set. Part of the answer may be in MPI_Wait(), http://www.mpich.org/static/docs/v3.1/www3/MPI_Wait.html ,
" The MPI_ERROR field of the status return is only set if the return from the MPI routine is MPI_ERR_IN_STATUS. That error class is only returned by the routines that take an array of status arguments (MPI_Testall, MPI_Testsome, MPI_Waitall, and MPI_Waitsome). In all other cases, the value of the MPI_ERROR field in the status is unchanged. See section 3.2.5 in the MPI-1.1 specification for the exact text. " If MPI_Test_cancelled() makes use of the MPI_ERROR, things might get bad.
So here is the trick : use MPI_Waitall(1,&req, &status) ! The output is correct at last !