How to compile gpu offloading code with icc - gpu

I compiled gpu offloading with gcc 11 on centos machine and executed on nvidia with offload of nvptx-none
I have icc 18 in my machine. How to compile my gcc offloading with icc and target of nvidia gpus.
what are the compiler options to do it ??
Thanks in advance

You can try using icx or other compilers instead :
Link to the documentation: https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html#top_GUID-B965754E-4380-4B04-9377-8170E5030762

Related

A program that supports different types of GPU

If i want to develop A program that supports different types of GPU.what should i do? I know that OPENCL can do this.but how OPENCL did this?
OpenCL is the best choice for vendor-independent GPU programming.
The way it works is runtime-compilation: You compile C++ to an executable and this executable contains the OpenCL C source code as text/string. When you run the executable on the CPU, it compiles the OpenCL C code specifically for the installed GPU. The OpenCL compiler is provided as part of the graphics driver. This way, you can copy the executable to another computer and there the contained OpenCL C code can compiled for a different GPU, all without having to recompile the executable.
This is similar to how Java works.

__cxa_atexit could not be located in the dynamic link library autodock_gpu_128wi.exe

I am building this git hub repository for GPU acceleration in autodock from source. I have the Nvidia Development toolkit and can run the make command without issues (after modifying the make file to specify locations for the Cuda dependencies). However, when I try to run the .exe file that it creates it gives me this error: __cxa_atexit could not be located in the dynamic link library autodock_gpu_128wi.exe. What could be causing this issue? I think it should be compatible.
Machine:
OS: Windows 10
CPU: i7 9750H
GPU: GTX 1650
__cxa_atexit is a cxxabi in glibc
You need to check if linker has -lc by -Wl,-v argument for compiler.
If libc is not found by linker for some reason, you need to specify libc.so.6 path in glibc or just reinstall glibc.

MXNET CMake reports "GPU support is disabled"

I am trying to build money from source in a GPU-enabled machine. I have a NVIDIA P 100 GPU. When I build MXNET using make, it reports that:
The snippet from CMake log is given below
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- GPU support is disabled
I want to build it with GPU support. What can I do?
When building MXNet from source, the message GPU support is disabled in the output of CMake is a bit confusing. This specific message has nothing to do with CUDA. It is actually generated if we try to enable Intel MKL DNN support (-DUSE_MKLDNN=1) and the Intel OpenCL runtime is not installed in the system ( see the CMake file in which it is printed ). CMake tries to look for Intel OpenCL runtime and throws this message if not found. I believe the OpenCL runtime enables GPU support for Intel's GPUs ( Intel HD Graphics, Intel Iris and Intel Iris Pro ).
Don't worry about this error message if you do not intend to use Intel Graphics support. It won't affect the CUDA support for MXNet. If you want to suppress this message, then disable MKL DNN support (-DUSE_MKLDNN=0). If you care about Intel Graphics too, then install Intel OpenCL Runtime.

I'd like to manipulate the way using gpu in tensorflow lite, what can i study for that

At first, let me explain what i have to do.
My develop enviroment is Tizen OS. may be you are unfamilier that, anyway this os is using linux kernel based redhat and targeting on mobile, tv, etc.. And my target device is consists of exynos 5422 and arm mali-t628.
My main work is implement some gpu library to let tensorflow lite's operation can use the library.
I proceeded to build and install tensorflow lite as a rpm package file.
I am googling many times about the tensorflow and gpu. and get some useless information about cuda. i didnt see any info for my case(tizen and mali gpu).
i think linux have gpu instruction like the cpu or library.. but i cant find them.
can you suggest search keyword or document?
You can go to nvidia’s cuda toolkit page, where you can find the documentation and
Training buttons / options.
Also there’s the CUDA programming guide wich i myself find very usefull and helpull for CUDA.
I believe that one or two of those may help you.
CUDA is for NVidia GPU. Mali is not NVidia's, but ARM's. So you CANNOT use CUDA in your given hardware. Besides, if you want CUDA, you'd better drop Tensorflow-lite and use Tensorflow.
If you want to use CUDA, get a hardware with supported NVidia GPU (e.g., x64 machine with NVidia GPU). Note that you can use Tensorflow-GPU & CUDA/CUDNN in Tizen with x64+NVidia GPU. You just need to be careful on nvidia GPU kernel driver version and userspace driver version. Because NVidia's GPU userspace driver and CUDA/CUDNN are statically built, its Linux drivers are compatible with Tizen. (I've tested tensorflow-gpu, CUDA/CUDNN in Tizen with NVidia driver version 111... probably in winter, 2017)
If you want to use Tizen/Tensorflow-lite in the given hardware, forget CUDA.

Distributed compilation for Tensorflow

I am trying to use distcc for speeding-up the Tensorflow compilation. My distcc installations (both on host and client) are working correctly with traditional source packages that use Makefiles for compilation (make -j). But apparently bazel does not use distcc for compiling the Tensorflow sources.
Note that the target platform is a Raspberry Pi 3 running Raspbian Jessy (Ubuntu), thus it is important to offload the compilation on a more powerful machine.
Do you have any experience or suggestions?
Yes. We are using bazel with distcc. You have to write a CROSSTOOL file and then create a wrapper script for gcc that prepends distcc and then forwards the arguments to real gcc.