How can nvcc compile with multi gen-code? - gpu

nvcc can be used like this with multi gencode?
nvcc vec_add.cu -o vec_add -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35, -gencode arch=compute_70,code=sm_70
does the command line mean that the final target "vec_add" can support for plartform with compute compatibility 20,35,70.
but maybe there is a conflict, for example the compute compatibility below 5.2 does't support half precision, like the function

Related

Compile errors running the ot-br-posix ./script/setup on RPi4

I'm trying to run the ./script/setup, but get compile errors:
Please note that the total 65 steps listed below is because I've restarted the setup script. The initial number of steps were closer to 465.
[1/65] Building CXX object src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o
FAILED: src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o
/usr/bin/c++ -DHAVE_LIBSYSTEMD=1 -DOTBR_ENABLE_BACKBONE_ROUTER=1 -DOTBR_ENABLE_BORDER_AGENT=1 -DOTBR_ENABLE_BORDER_ROUTING=1 -DOTBR_ENABLE_BORDER_ROUTING_COUNTERS=1 -DOTBR_ENABLE_DBUS_SERVER=1 -DOTBR_ENABLE_DNSSD_DISCOVERY_PROXY=1 -DOTBR_ENABLE_NAT64=1 -DOTBR_ENABLE_NOTIFY_UPSTART=1 -DOTBR_ENABLE_REST_SERVER=1 -DOTBR_ENABLE_SRP_ADVERTISING_PROXY=1 -DOTBR_ENABLE_SRP_SERVER_AUTO_ENABLE_MODE=1 -DOTBR_ENABLE_VENDOR_INFRA_LINK_SELECT=0 -DOTBR_MESHCOP_SERVICE_INSTANCE_NAME="\"OpenThread BorderRouter\"" -DOTBR_PACKAGE_NAME=\"OpenThread_BorderRouter\" -DOTBR_PACKAGE_VERSION=\"0.3.0-0cdef3c\" -DOTBR_PRODUCT_NAME=\"BorderRouter\" -DOTBR_SYSLOG_FACILITY_ID=LOG_USER -DOTBR_VENDOR_NAME=\"OpenThread\" -I../../include -I../../src -Ithird_party/openthread/repo/etc/cmake -I../../third_party/openthread/repo/etc/cmake -I../../third_party/openthread/repo/include -I../../third_party/openthread/repo/src/posix/platform/include -I../../third_party/openthread/repo/src -Wall -Wextra -Werror -Wfatal-errors -Wuninitialized -Wno-missing-braces -std=c++11 -MD -MT src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o -MF src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o.d -o src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o -c ../../src/common/mainloop.cpp
In file included from /usr/include/c++/8/list:63,
from ../../src/common/mainloop_manager.hpp:41,
from ../../src/common/mainloop.cpp:30:
/usr/include/c++/8/bits/stl_list.h:811:19: error: expected ‘)’ before ‘&’ token
list(_InputIterat&... __args)`
compilation terminated due to -Wfatal-errors.
I receive a lot more errors, but they follow the same pattern as above.
I have followed the guide from openthread.io to setup an Open Thread Border Router
The execution of the bootstrap script ran smoothly.
Additional information:
Git local repository path: ~/src/openthread/ot-br-posix
Command for executing the setup script:
pi#raspberrypi:~/src/openthread/ot-br-posix$> INFRA_IF_NAME=eth0 ./script/setup
RPi OS: Recommended image from the guide Raspberry Pi OS lite
Libgcc versions:
libgcc-8-dev/oldstable,now 8.3.0-6+rpi1 armhf [installed,automatic]
libgcc1/oldstable,now 1:8.3.0-6+rpi1 armhf [installed]
Cmake versions:
cmake-data/oldstable,now 3.16.3-3~bpo10+1 all [installed,automatic]
cmake/oldstable,now 3.16.3-3~bpo10+1 armhf [installed]

nvcc fatal : Unknown option '--threads'

I have installed CUDA-11.3 and NVIDIA Driver Version 465, CMAKE version 3.16.3.
I was trying to compile samples included in th toolkit to verify the installation but getting the following error.
make[1]: Entering directory '/home/user/NVIDIA_CUDA-11.3_Samples/0_Simple/simpleSeparateCompilation'
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc -m64 -dc --threads 0 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o simpleDeviceLibrary.o -c simpleDeviceLibrary.cu
nvcc fatal : Unknown option '--threads'
make[1]: *** [Makefile:321: simpleDeviceLibrary.o] Error 1
make[1]: Leaving directory '/home/user/NVIDIA_CUDA-11.3_Samples/0_Simple/simpleSeparateCompilation'
make: *** [Makefile:51: 0_Simple/simpleSeparateCompilation/Makefile.ph_build] Error 2
The --threads option (which controls the number of threads which nvcc will attempt to spawn during compilation) was only added to nvcc 11.3.
The OP was unwittingly using nvcc 11.1 to try and compile samples from the CUDA 11.3 toolkit using the supplied Makefiles, which include this option. This resulted in an unrecognized option error and build failure.
If you have this problem, double check that your compiler is new enough to accept this compiler option and that search paths/symlinks/modulefiles etc. are really updated to point to that compiler version.

sse2 instruction set not enabled

CC=g++
CFLAGS=-O3 -c -Wall
DFLAGS=-g -Wall
LDFLAGS= -lz -lm -lpthread
KSWSOURCE=ksw.c
ALGNSOURCES=main.cpp aligner.cpp graph.cpp readfl.cpp hash.cpp form.cpp btree.cpp conLSH.cpp
INDSOURCES=whash.cpp genhash.cpp formh.cpp conLSH.cpp
INDOBJECTS=$(INDSOURCES:.cpp=.o) $(KSWSOURCE:.c=.o)
ALGNOBJECTS=$(ALGNSOURCES:.cpp=.o) $(KSWSOURCE:.c=.o)
INDEXER=conLSH-indexer
ALIGNER=conLSH-aligner
all: $(INDSOURCES) $(ALGNSOURCES) $(KSWSOURCE) $(ALIGNER) $(INDEXER)
$(ALIGNER): $(ALGNOBJECTS)
$(CC) $(ALGNOBJECTS) -o $# $(LDFLAGS)
$(INDEXER): $(INDOBJECTS)
$(CC) $(INDOBJECTS) readfl.o -o $# $(LDFLAGS)
debug:
$(CC) $(DFLAGS) $(ALGNSOURCES) $(KSWSOURCE) $(LDFLAGS)
.cpp.o:
$(CC) $(CFLAGS) $< -o $#
.c.o:
$(CC) $(CFLAGS) $< -o $#
clean:
rm -rf *.o $(ALIGNER) $(INDEXER) a.out
I have the above makefile but I am getting an error
/usr/lib/gcc/i686-linux-gnu/4.8/include/emmintrin.h:31:3: error: #error "SSE2 instruction set not enabled"
# error "SSE2 instruction set not enabled"
From what I understand and googled this is a flag for parallel computation.
I tried from other posts with the same problem to either include:
CXXFLAGS=-03 -c Wall -mfpmath=sse
OR:
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse -msse2 -msse3")
but without any success. Can you help?
I am not sure a CXX flags is necessary because a lot of (probably) cascading errors are shown in ksw like,
ksw.c:49:2: error: ‘__m128i’ does not name a type
__m128i *qp, *H0, *H1, *E, *Hmax;
-msse2 is the specific option, so passing that to GCC will work, if you get your build scripts set up to actually do that. https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#x86-Options
Or better, use -march=native to enable everything your CPU has, if you're building for local use, not for distributing a binary that might have to work on an old-but-not-ancient CPU. (Of course, if you care about performance, it's weird to be building for 32-bit mode. SSE2 is baseline for x86-64. Unless your CPU is too old to support SSE2, e.g. a Pentium III. Or for example, there are embedded x86 CPUs without SSE, like AMD Geode. In that case, a binary built (successfully) with -msse2 will probably crash with an illegal instruction on such a CPU.)
-mfpmath=sse just tells GCC to use SSE for scalar FP math assuming that SSE is available; unrelated to telling GCC to assume the target CPU does support SSE2. It can be good to use it as well for performance, but it's not going to matter in getting your code to compile.
And yes, SSE1/2 intrinsic types like __m128i will only get defined when SSE is enabled, so error: ‘__m128i’ does not name a type is a clear sign that -msse wasn't enabled
If using autoconf or something, maybe use this:
./configure CPPFLAGS="-O3 -march=native -fno-math-errno"
If you have .c files as well as .cpp, set CFLAGS as well as CPPFLAGS. More options like -flto can be helpful for optimization (cross-file inlining at link time), if you get those added to your LD options. As well as any other optimization options like -ffast-math if you want to use it. Or at least -fno-trapping-math helps some, and GCC already did optimizations that violated the semantics trapping-math was supposed to provide. See this Q&A re: -fno-trapping-math -fno-math-errno being safe to use basically everywhere, even in code that depends on strict FP like Kahan summation.
This worked for me also:
./configure CPPFLAGS="-march=native"

How to build tenssorflow op with bazel with additional include directories

I got tensorflow binaries (already compiled)
I have added to tensorflow source:
tensorflow\core\user_ops\icp_op_kernel.cc - contains:
https://github.com/tensorflow/models/blob/master/research/vid2depth/ops/icp_op_kernel.cc
tensorflow\core\user_ops\BUILD - contains:
load("//tensorflow:tensorflow.bzl", "tf_custom_op_library")
tf_custom_op_library(
name = "icp_op_kernel.so",
srcs = ["icp_op_kernel.cc"],
)
I am trying to build with:
bazel build --config opt //tensorflow/core/user_ops:icp_op_kernel.so
And I get:
tensorflow/core/user_ops/icp_op_kernel.cc(16): fatal error C1083: Cannot open include file: 'pcl/point_types.h': No such file or directory
Because bazel don't know where the pcl include files are.
I have installed pcl and the include directory is in:
C:\Program Files\PCL 1.6.0\include\pcl-1.6
How do I tell bazel to also include this directory?
Also I will probably need to add C:\Program Files\PCL 1.6.0\lib to the link, How do I do that?
You don't need bazel for building ops if it fails.
I have implemented customized ops both in CPU and GPU, and basically follow the two Tensorflow tutorials.
For CPU ops, follow Tensorflow tutorial on Build the op library:
TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared zero_out.cc -o zero_out.so -fPIC ${TF_CFLAGS[#]} ${TF_LFLAGS[#]} -O2
Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi.
For GPU ops, check the current official GPU ops building instructions on Tensorflow adding GPU op support
nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
${TF_CFLAGS[#]} -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
g++ -std=c++11 -shared -o cuda_op_kernel.so cuda_op_kernel.cc \
cuda_op_kernel.cu.o ${TF_CFLAGS[#]} -fPIC -lcudart ${TF_LFLAGS[#]}
As it says, Note that if your CUDA libraries are not installed in /usr/local/lib64, you'll need to specify the path explicitly in the second (g++) command above. For example, add -L /usr/local/cuda-8.0/lib64/ if your CUDA is installed in /usr/local/cuda-8.0.
Also, Note in some linux settings, additional options to nvcc compiling step are needed. Add -D_MWAITXINTRIN_H_INCLUDED to the nvcc command line to avoid errors from mwaitxintrin.h.

Compile openjdk 7 on arm ubuntu

I am trying to compile openjdk 7 on my arm ubuntu:
make all ALLOW_DOWNLOADS=true DISABLE_HOTSPOT_OS_VERSION_CHECK=ok
Then I received this error:
g++ -DLINUX -D_GNU_SOURCE -DIA32 -I/home/darklord/Develop/jdk7/hotspot/src/share/vm/prims -I/home/darklord/Develop/jdk7/hotspot/src/share/vm -I/home/darklord/Develop/jdk7/hotspot/src/cpu/x86/vm -I/home/darklord/Develop/jdk7/hotspot/src/os_cpu/linux_x86/vm -I/home/darklord/Develop/jdk7/hotspot/src/os/linux/vm -I/home/darklord/Develop/jdk7/hotspot/src/os/posix/vm -I/home/darklord/Develop/jdk7/hotspot/src/share/vm/adlc -I../generated -DASSERT -DTARGET_OS_FAMILY_linux -DTARGET_ARCH_x86 -DTARGET_ARCH_MODEL_x86_32 -DTARGET_OS_ARCH_linux_x86 -DTARGET_OS_ARCH_MODEL_linux_x86_32 -DTARGET_COMPILER_gcc -DCOMPILER2 -DCOMPILER1 -fno-rtti -fno-exceptions -D_REENTRANT -fcheck-new -fvisibility=hidden -m32 -march=i586 -pipe -Werror -g -c -o ../generated/adfiles/adlparse.o /home/darklord/Develop/jdk7/hotspot/src/share/vm/adlc/adlparse.cpp
g++: error: unrecognized argument in option '-march=i586'
It seems it is trying compile using x86 configuration. So how can I make the build pass on ARM machine?
You have to specify proper architecture option for g++. Reference here.
-march=name
This specifies the name of the target ARM architecture. GCC uses this name to determine what kind of instructions it can emit when
generating assembly code. This option can be used in conjunction with
or instead of the -mcpu= option. Permissible names are: armv2',
armv2a', armv3',armv3m', armv4',armv4t', armv5',armv5t',
armv5te',armv6', armv6j',iwmmxt', `ep9312'.
Please make sure you refer proper version docs of gcc