I am interested in the RDMA support in tensorflow 1.15 for workers and parameter servers to communicate directly without going through CPU. I do not have infiniband VERBS devices but can build tensorflow from source with VERBS support
bazel build --config=opt --config=cuda --config=verbs //tensorflow/tools/pip_package:build_pip_package
after sudo yum install libibverbs-devel on centos-7. However, after pip installing the built package via
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg && pip install /tmp/tensorflow_pkg/tensorflow-1.15.0-cp36-cp36m-linux_x86_64.whl,
my training failed with the following error:
F tensorflow/contrib/verbs/rdma.cc:127] Check failed: dev_list No InfiniBand device found
This is expected since I do not have infiniband hardware on my machine. But do I really need infiniband if my job is run not cross-machine, but on a single machine? I just want to test whether RDMA can significantly speed up parameter server-based training. Thanks.
But do I really need Infiniband if my job is run not cross-machine, but on a single machine?
No and it seems you misunderstand what RDMA actually is. RDMA ("GPUDirect") is a way for third party devices like network interfaces and storage adapters to directly write to a GPUs memory across the PCI bus. It is intended for improved multi-node performance in things like compute clusters. It has nothing to do with multi-GPU operations within in a single node ("peer-to-peer"), where GPUs connected to one node can directly access one another's memory without trips to the host CPU.
Related
I have an unknown Linux machine, I need to check which GPU it uses (more specifically if it uses an AMD GPU). To know more about CPUs I have used cat /proc/cpuinfo. Is there something similar for GPUs?
If clinfo is available, it'll give you a list of OpenCL-capable compute devices, including GPUs. You're out of luck if GPUs are not supporting OpenCL or drivers are not installed. There is no generic way of getting a list of all kinds of GPU devices. On some platforms you can at least get a list of discrete GPUs from lspci output, but you'll miss the integrated and non-PCI GPUs this way.
If you already have an X11 server running on that box, you can always do glxinfo on it. It cannot be done in a headless way though.
Is there a way to run RAPIDS without a GPU? I usually develop on a small local machine without a GPU, then push my code to a powerful remote server for real use. Things like TensorFlow allow switching between the CPU and GPU depending on if they're available. Can an equivalent thing be done with RAPIDS? Even if it's slow, being able to test things on a machine without a GPU would be extremely helpful.
There isn't a way to use RAPIDS without a GPU, and part of the reason for that is we're following the APIs the community has adopted in CPU packages across Pandas, Numpy, SKLearn, NetworkX, etc. This way it should be as easy as swapping an import statement to get something working on the CPU vs the GPU.
I'm want to run tensorflow on a very standard machine setup (windows 64 bit) and have read that tensorflow has greater performance if built from source as it is optimised for your system. When installing tensorflow via pip for why does pip not select the optimal build for your system?
Also if you did install via pip is there a way or being able to tell whether the optimal build has been installed, or is the only way of knowing that simply remembering how you installed it?
Google has taken the position, that it is not reasonable to build TF for every possible instruction set out there. They only release generic builds for Linux, Mac and Windows. You must build from source if you want all the optimizations for your particular machine's instruction set.
Is it possible to run Tensorboard on a machine without CUDA support?
I'm working at a computation center (via ssh) which has two major clusters:
CPU-Cluster which is a general workhorse without CUDA support (no dedicated GPU)
GPU-Cluster with dedicated GPUs e.g. for running neural networks with tensorflow-gpu.
The access to the GPU-cluster is limited to Training etc. such that I can't afford to run Tensorboard on a machine with CUDA-support. Instead, I'd like to run Tensorboard on the CPU-Cluster.
With the TF bundled Tensorboard I get import errors due to missing CUDA support.
It seems reasonable that the official Tensorboard should have a mode for running with CPU-only. Is this true?
I've also found an inofficial standalone Tensorboard version (github.com/dmlc/tensorboard), does this work without CUDA-support?
Solved my problem: just install tensorflow instead of tensorflow-gpu.
Didn't work for me for a while due to my virtual environment (conda), which didn't properly remove tensorflow-gpu.
Tensorboard is not limited by whether a machine has GPU or not.
And as far as I know, what Tensorboard do is parsing events pb files and display them on web. There is not computing, so it doesn't need GPU.
I have a CPU with integrated GPU. I also have an external GPU that i have been using for ML. What i want is to use the integrated GPU only for display and dedicate the external GPU to NN training (in order to free some memory).
I have set at the BIOS the external GPU to be the primary GPU, but also to both be active. So they are both working. After i boot the system i can plug the monitor to any one of them and they both work.
The problem is that when i plug the monitor to the motherboard (integrated GPU) theano stops using the external GPU:
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
Is there a way to explicitly point theano to the external GPU? here is my the relevant part of my .theanorc:
[global]
floatX = float32
device = gpu
I have a similar system to yours. For linux, installing bumblebee worked.
sudo apt-get install bumblebee-nvidia
(adapt to your distro's package manager)
Then launch python via:
optirun python