Distributed compilation for Tensorflow - tensorflow

I am trying to use distcc for speeding-up the Tensorflow compilation. My distcc installations (both on host and client) are working correctly with traditional source packages that use Makefiles for compilation (make -j). But apparently bazel does not use distcc for compiling the Tensorflow sources.
Note that the target platform is a Raspberry Pi 3 running Raspbian Jessy (Ubuntu), thus it is important to offload the compilation on a more powerful machine.
Do you have any experience or suggestions?

Yes. We are using bazel with distcc. You have to write a CROSSTOOL file and then create a wrapper script for gcc that prepends distcc and then forwards the arguments to real gcc.

Related

Can the booted kernel version differ from the version against which a module has been built?

I am trying to build and install kernel modules for a network card, from source. The module sources seem very picky in terms of which kernel version they can compile against.
I have managed to build the modules against the LTS kernel headers for my distribution, Arch Linux, which at the moment are linux-lts-headers 5.10.37-1. Does this mean that I need to actually install and boot this exact same kernel version, to use the modules? Or do the modules have some tolerance between the booted version and the version they were compiled against?
I realise this is dependent on what exactly I'm building but I'm interested in common practice, do's and don'ts. For example, for a rolling release distro it would be a lot of work to rebuild the module with every minor mainline kernel update, for example right now linux-headers 5.12.3 -> linux-headers 5.12.4. Pointers appreciated.
That's why you usually never find prebuilt kernel module distributed somewhere. You have to build kernel module with kernel headers of your running kernel. Common practice is always having the right kernel headers in your /usr/src

Does PyInstaller include CUDA

I am working on a Python script (I use Python 3.7.3) that uses tensorflow-gpu (1.14.0) and used PyInstaller 3.5 to convert this script to an executable. I am using CUDA 10.0 and cuDNN 7.6.1 and my graphics card is a NVIDIA GeForce GTX 960M. I recently deinstalled CUDA to test if the executable of the Python script still runs and surprisingly it still runs via GPU, which does not work when I now run the Python script directly.
My question is, can this executable be run on systems without the CUDA toolkit but with a CUDA-capable graphics card?
According to this documentation PyInstaller will make and store a private copy of all of the dependent external libraries which Python code relies on when building a single file executable.
Therefore it is safe to assume that your executable runs irrespective of the installation status of the CUDA toolkit because it has a full private copy of the necessary CUDA libraries internally which it uses when the executable is run.
According to the GitHub issues in the official repository (here and here for example) CUDA libraries are usually dynamically loaded at run-time and not at link-time, so they are typically not included in the final exe file (or folder) with the result that the exe file won't work on a machine without CUDA installed. The solution (please refer to the linked issues too) is to put the DLLs necessary to run the exe in its dist folder (if generated without the --onefile option) or install the CUDA runtime on the target machine.
The behaviour that you're experimenting maybe it's due to the specific version of TF, that loads the libraries in a different fashion with respect to what described above, but it's not the expected behaviour nowadays.

build tensorflow from source to use SSE3 and SSE4

Whenever I use tensorflow, it displays the message "The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations" and 2 more similar messages.
So I decided to build tensorflow from source to get rid of these messages. I'm using python 3.5 on Debian, and followed the instructions at https://www.tensorflow.org/install/install_sources (CPU only, no GPU).
It asked during the build if the build should be for the machine it's doing the build on, I selected that, it included -march=native in some compiler option.
Everything seemed to work, but when I ran python3 to test the build, it still gives the messages about "The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available ..." etc. How to I make the build use the hardware that it's running on?
There have been similar questions, and most of the answers to them are wrong. They say it's necessary to specify options like "--copt=-msse4.1 --copt=-msse4.2" in the build; it isn't. With the default option "-march=native", the GNU compiler will use SSE4.1 and SSE4.2 instructions if they are available.
The real problem is that if you build tensorflow from source, after installing the default build with pip, pip won't replace the old build with the new build. Everything will seem to work, but your old build remains in place in a directory under ~/.local.
The solution is simply to uninstall the old tensorflow with pip ('pip uninstall tensorflow' or 'pip3 uninstall tensorflow'), and then rebuild from source. If you have already done a build, and wondered why nothing seemed to change, you needn't repeat the build but can just execute the last couple of steps (https://www.tensorflow.org/install/install_sources), namely bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg , followed by the pip install.

Disable SSE4.1 when compiling TensorFlow

I followed the instruction on TF's website and install the TensorFlow from the source code. I did not change any configurations, all are the default values.
When I run my program (which works fine when using the pre-complied TensorFlow 0.12 wheel), it gives me the following error
F tensorflow/core/platform/cpu_feature_guard.cc:86] The TensorFlow library was compiled to use SSE4.1 instructions, but these aren't available on your machine.
By default TensorFlow enables SSE4 support, I guess there is a way to disable it? Thanks for any inputs.
This line in tensorflow/tensorflow.bzl is responsible for enabling SSE 4.1 instructions in all x86 builds. If you delete that line, the resulting build should work on your machine.

Bazel builds cause issues when I install TensorFlow using pip

So the documentation mentions that it is better to install from source, then build a pip package. Why is this recommended over doing a direct pip install using the wheel file provided on the downloads page? here
I tried the direct pip install and then runnning some scripts in the inception folder.This results in errors with bazel not finding some of the dependencies. I am guessing this is related to not building tensorflow from source, but I can't figure out why this is the case. Any pointers? Thanks!
Installing from pip is supported, can you provide more details on your os and the specific errors you saw?
The main reason to build from source is simply performance.
Building and installing from source
The default TensorFlow binaries target the broadest range of hardware to make TensorFlow accessible to everyone. If using CPUs for training or inference, it is recommended to compile TensorFlow with all of the optimizations available for the CPU in use. Speedups for training and inference on CPU are documented below in Comparing compiler optimizations.
To install the most optimized version of TensorFlow, build and install from source. If there is a need to build TensorFlow on a platform that has different hardware than the target, then cross-compile with the highest optimizations for the target platform. The following command is an example of using bazel to compile for a specific platform
ref: https://www.tensorflow.org/performance/performance_guide