The OpenMPI 4.0.5 on our cluster is built with CUDA support, but I want to benchmark pnetcdf without needing CUDA for that. Since I want to do a number of test runs that I can start on like 1/4th of a node and my tests won't make use of the GPUs I wanted to ask if there is a way to suppress the MPI check for CUDA devices. Because when I simply obtain a SLURM allocation without GPUs, I get lots of errors from that alone.
These errors come from hwloc, and can be suppressed with HWLOC_HIDE_ERRORS=1 but I'd like to know if there is a more specific method.
Steps to reproduce:
frontend$ salloc -n 16 -t 8:00:00 -A k20200
node$ exec bash -l
node$ module load gcc openmpi
node$ mpicc -o /tmp/hello ~/usr/src/helloworld_mpi.c
node$ srun -n 1 /tmp/hello
CUDA: Failed to get number of devices with cudaGetDeviceCount(): no CUDA-capable device is detected
Hello world!, I'm rank 0 of 1!
node$ HWLOC_HIDE_ERRORS=1 srun -n 1 /tmp/hello
Hello world!, I'm rank 0 of 1!
node$ logout
The example code used above is the following but any program without CUDA use is equally useful in this exercise
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define xmpi(rc) \
do { \
int err = (rc); \
if (err != MPI_SUCCESS) { \
char msg[MPI_MAX_ERROR_STRING + 1]; \
int msg_len; \
\
if (MPI_Error_string(err, msg, &msg_len) \
== MPI_SUCCESS){ \
msg[msg_len] = '\0'; \
\
fprintf(stderr, \
"Problem in MPI call: %d = %s\n", \
err, msg); \
MPI_Abort(MPI_COMM_WORLD, 1); \
} \
} \
} while (0)
int
main(int argc, char *argv[])
{
xmpi(MPI_Init(&argc, &argv));
int rank, size;
xmpi(MPI_Comm_rank(MPI_COMM_WORLD, &rank));
xmpi(MPI_Comm_size(MPI_COMM_WORLD, &size));
printf("Hello world!, I'm rank %d of %d!\n", rank, size);
xmpi(MPI_Finalize());
return EXIT_SUCCESS;
}
Related
valgrind-3.15.0
I have a weird issue with valgrind when I use switches --trace-children, --trace-children-skip and --log-file along with popen().
My code:
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
FILE *fp = NULL;
fp = popen("ls -l", "r");
pclose(fp);
fp = popen("ls -l", "r");
pclose(fp);
fp = popen("ls -l", "r");
pclose(fp);
fp = popen("ls -l", "r");
pclose(fp);
fp = popen("ls -l", "r");
pclose(fp);
return EXIT_SUCCESS;
}
Command 1: valgrind --leak-check=full --track-origins=yes --trace-children=yes --trace-children-skip=*/sh,*/ls ./test
When I run this command I get only one process output in STDOUT. That's what I expect because I'm specifying with the --trace-children-skip to ignore what I'm doing with popen().
Command 2: valgrind --leak-check=full --track-origins=yes --trace-children=yes --trace-children-skip=*/sh,*/ls --log-file=logs/%p ./test
When I run this command I get 6 log files. One for the main process and one for each of the popen() calls. valgrind reports the command as ./test. This is not what I'd expect. I'd expect the same as above, only one process log.
Running without --trace-children-skip I'd get 11 files; one for the main process and two for each of the popen() calls (sh and ls as the commands). This is what I'd expect since I'm not skipping anything and popen() calls sh which then calls ls.
I'm not sure what the deal is here. --trace-children-skip with --log-file is working in that it's not showing the sh and ls logs, but it's still creating a process for each popen() call which doesn't happen if I don't use --log-file. Am I missing something?
I know this a trivial question but I am having difficulties in running the m5ops in gem5,
lets take for example the m5-exit.c file that has been provided by gem5, in the test programs, how would I compile it and link it to the file m5op_x86.S
Currently this is the way I am compiling and linking it:
gcc m5-exit.c -I ~/Desktop/gem5_86/gem5/include -o test ~/Desktop/gem5_86/gem5/util/m5/m5op_x86.S
the error i get:
/tmp/ccXsGX3d.o: relocation R_X86_64_16 against undefined symbol `M5OP_ARM' can not be used when making a PIE object; recompile with -fPIC
the directory i am in is:
gem5/tests/test-progs/m5-exit/src
the code for m5-exit.c is from the gem5 directory found here
This is a copy of: How to use m5 in gem5-20 which was deleted on my other answer, since my previous DRY link-only answer was removed followed by an unsuccessful (although correct, but not enough users who care) dupe close attempt.
On gem5 046645a4db646ec30cc36b0f5433114e8777dc44 I can do:
scons -C util/m5 build/x86/out/m5
gcc -static -I include -o main.out main.c util/m5/build/x86/out/libm5.a
with:
main.c
#include <gem5/m5ops.h>
int main(void) {
m5_exit(0);
}
Or for ARM:
sudo apt install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
scons -C util/m5 build/aarch64/out/m5
aarch64-linux-gnu-gcc -static -I include -o main.out main.c \
util/m5/build/aarch64/out/libm5.a
But in practice, I often just don't have the patience for this business, so I just misbehave and add raw assembly directly as shown here muahahaha e.g.:
#if defined(__x86_64__)
#define LKMC_M5OPS_CHECKPOINT __asm__ __volatile__ (".word 0x040F; .word 0x0043;" : : "D" (0), "S" (0) :)
#define LKMC_M5OPS_DUMPSTATS __asm__ __volatile__ (".word 0x040F; .word 0x0041;" : : "D" (0), "S" (0) :)
#elif defined(__aarch64__)
#define LKMC_M5OPS_CHECKPOINT __asm__ __volatile__ ("mov x0, 0; mov x1, 0; .inst 0xFF000110 | (0x43 << 16);" : : : "x0", "x1")
#define LKMC_M5OPS_DUMPSTATS __asm__ __volatile__ ("mov x0, 0; mov x1, 0; .inst 0xFF000110 | (0x41 << 16);" : : : "x0", "x1")
More general m5op information can also be found at: What are pseudo-instructions for in gem5?
Tested on Ubuntu 20.04.
There are two codes in Fortran and C++ with simple array manipulations with different declarations of arrays for C++. Let me know how to improve C++ code to get similar efficiency as in the Fortran code. The duration of runs in seconds is summarized below.
The Fortran program:
! fort.f90
PARAMETER ( N=1000000, M=10000 )
REAL*8, ALLOCATABLE :: D(:)
ALLOCATE(D(N))
A=1.0
DO J=1,M
DO I=1,N
D(I)=A+I+J
ENDDO
ENDDO
END
The Cpp program:
// main.cpp
using namespace std;
using namespace blitz;
int main(int argc, char* argv[]){
int N=1000000, M=10000;
double* D=new double[N];
//Array<double,1> D(N); // for Blitz C++
//vector<double>D(N);
//valarray<double>D(N);
const double a=1.0;
size_t i,j;
for (j=0; j<M; j++)
for (i=0; i<N; i++)
D[i]=a+i+j;
return 0;
}
g++ main.cpp -o main && time main
g++ main.cpp -o main -Ofast && time main
f95 fort.f90 -o fort && time fort
f95 fort.f90 -o fort -Ofast && time fort
Here is statistics:
1) double* D=new double[N]; g++: 58,1s, g++ -Ofast : 16,413s
2) valarrayD(N); ~ the same
3) vector; ~ the same
4) BlitzC++ ; g++ : 3m19,017s, g++ -Ofast : 15,142s
5) ALLOCATE(d(N)); f95 : 42,092s, f95 -Ofast : 0,002s
I have written my own assertion debug.
#define ASSERT_EQUALS(a,b) \
do { \
if ((a)!=(b)) \
{ \
printf(". ASSERT_EQUALS (%s:%d) %d!=%d\n",__FUNCTION__,__LINE__,a,b); \
} \
} while (0)
However its only compatible with integer types. Is there a way I can change this so I can support float/double types as well?
Thanks.
Maybe you should just print them as floats.
#define ASSERT_EQUALS(a, b) \
do { \
if ((a)!=(b)) { \
printf(". ASSERT_EQUALS (%s:%d) %f!=%f\n",__FUNCTION__,__LINE__,(float)(a),(float)(b)); \
} \
} while (0)
It looks bad with integers, for example 1 will show up as 1.00000, but it will work for both types.
Print them as strings; this works for all types and prints the actual text of the expressions, not just their values:
#define ASSERT_EQUALS(a,b) \
do { \
if ((a)!=(b)) \
{ \
printf(". ASSERT_EQUALS (%s:%d) %s!=%s\n",__FUNCTION__,__LINE__, #a,#b); \
} \
} while (0)
Use a variadic macro and vprintf. Look at the assert.h in your C library
I have now whittled this down to a minimal test case. Thus far I have been able to determine that this is an issue related to pseudo-terminals which come about with the pipe of ssh. Adding the '-t -t' to the ssh call improved things, in that now, it takes a second call to fgets() to cause the issue. I suspect that the stderr output of the ssh command somehow works into the issue, for now I have redirected stderr to stdout in the ssh code to execute. I do wonder if the "tcgetattr: Invalid argument" error is part of the problem, but am not sure how to get rid of that. It seems to come from the -t -t being present. I believe the -t -t is moving in the right direction, but I have to set up the pseudo terminal for stderr somehow and perhaps the test will work properly?
The Makefile:
test:
gcc -g -DBUILD_MACHINE='"$(shell hostname)"' -c -o test.o test.c
gcc -g -o test test.o
.PHONY: clean
clean:
rm -rf test.o test
The test.c source file:
#include <unistd.h>
#include <string.h>
#include <stdio.h>
int
main(int argc, char *argv[])
{
const unsigned int bufSize = 32;
char buf1[bufSize];
char buf2[bufSize];
int ssh = argv[1][0] == 'y';
const char *cmd = ssh ? "ssh -t -t " BUILD_MACHINE " \"ls\" 2>&1" : "ls";
FILE *fPtr = popen(cmd, "r");
if (fPtr == NULL) {
fprintf(stderr,"Unable to spawn command.\n");
perror("popen(3)");
exit(1);
}
printf("Command: %s\n", cmd);
if (feof(fPtr) == 0 && fgets(buf2, bufSize, fPtr) != NULL) {
printf("First result: %s\n", buf2);
if (feof(fPtr) == 0 && fgets(buf2, bufSize, fPtr) != NULL) {
printf("Second result: %s\n", buf2);
int nRead = read(fileno(stdin), buf1, bufSize);
if (nRead == 0) {
printf("???? popen() of ssh consumed the beginning of stdin ????\n");
} else if (nRead > 0) {
if (strncmp("The quick brown fox jumped", buf1, 26) != 0) {
printf("??? Failed ???\n");
} else {
printf("!!!!!!! Without ssh popen() did not consume stdin !!!!!!!\n");
}
}
}
}
}
This shows it running the passing way:
> echo "The quick brown fox jumped" | ./test n
Command: ls
First result: ARCH.linux_26_i86
Second result: Makefile
!!!!!!! Without ssh popen() did not consume stdin !!!!!!!
This shows it running the failing way:
> echo "The quick brown fox jumped" | ./test y
Command: ssh -t -t hostname "ls" 2>&1
First result: tcgetattr: Invalid argument
Second result: %backup%~ gmon.out
???? popen() of ssh consumed the beginning of stdin ????
Okay, I have got this working finally. The secret was to supply /dev/null as the input to my ssh command as follows from the test case above:
const char *cmd
= ssh ? "ssh -t -t " BUILD_MACHINE " \"ls\" 2>&1 < /dev/null" : "ls";
However, while the code works correctly, I get a nasty message which apparently I can ignore for my purposes (although I'd like to make the message go away):
tcgetattr: Inappropriate ioctl for device