Building TensorFlow from source on Ubuntu 14.04 LTS: gcc: internal compiler error: Killed (program cc1plus) - tensorflow

I have successfully built TensorFlow from source under Debian but at present cannot get it to build starting with a new virtual machine using Ubuntu 14.04 LTS. IIRC for Debian I tried g++/gcc 5.2 but had to downgrade to g++/gcc 4.9 and it worked. Following the instructions Installing from sources if I install g++ the version is 4.8 and it failed .
gcc: internal compiler error: Killed (program cc1plus)
I have not tired 4.9 yet.
I checked the info on the last Jenkins build but could not find anything listed for the tools and their versions. Even opened issue: Build tools and versions listed in Jenkins build log
What version(s) of g++/gcc is know to work?
What version of g++/gcc do the build machines use?
EDIT
Found this: TensorFlow.org Continuous Integration

The problem is not with the g++/gcc version but the number of CPU cores Bazel uses to build TensorFlow.
In running multiple builds on a VMware Workstation 7.1 with a fresh install of Ubuntu 14.04 LTS with one CPU core, 2G ram, 2G swap partition and 2G swap file the builds run the fastest. This may not be the best setup, but is the best one I have found so far that consistently works. If I allow 4 cores via VMware and build with Bazel it fails. If I limit the resources with the Bazel option --local_resources using
--local_resources 2048,2.0,1.0
builds successfully
INFO: Elapsed time: 11683.908s, Critical Path: 11459.26s
using
--local_resources 4096,2.0,1.0
builds successfully
INFO: Elapsed time: 39765.257s, Critical Path: 39578.52s
using
--local_resources 4096,1.0,1.0
builds successfully
INFO: Elapsed time: 6562.744s, Critical Path: 6443.80s
using
--local_resources 6144,1.0,1.0
builds successfully
INFO: Elapsed time: 2810.509s, Critical Path: 2654.90s
In summary more memory and less CPU cores works best for my environment.
TLDR;
While keeping an eye during the build process I noticed that certain source files would take a long time to compile and appeared to tie down the flow rate while building. It is as if they are in competition for a resource with other source files and that Bazel does not know about this critical resource so it allows the competing files to compile at the same time. Thus the more files competing with the unknown resource the slower the build.

Related

What is an automated option for using x86 npm packages on arm?

Any time I run npm install on an ARM machine (ubuntu) I am rolling a dice and sometimes they do not have arm binary and the code does not compile, so I have to replace the package or go spelunking on why it did not compile...
Is there a way to use x86 packages on ARM machine without individual configuration?
Ideal way would be if the package manager could not find an ARM build and compilation failed it would just use an emulator to run x86 binary.

WSL2- $nvidia-smi command not running

I have Ubuntu 18.04LTS install inside WSL2 and I was able to use GPU. I can run
$nvidia-smi from window run terminal.
However, I can not find any result when I run $nvidia-smi on WSL2
The fix is now available at the nvidia-docs.
cp /usr/lib/wsl/lib/nvidia-smi /usr/bin/nvidia-smi
chmod ogu+x /usr/bin/nvidia-smi
Source: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations
From the known limitations in documentation from nvidia :
NVIDIA Management Library (NVML) APIs are not supported. Consequently,
nvidia-smi may not be functional in WSL 2.
However you should be able to run https://docs.nvidia.com/cuda/wsl-user-guide/index.html#unique_1238660826
EDIT: since this answer, nvidia-smi supported since driver 465.42
I am running it well using 470.57.02.
When I was installing CUDA 11.7.1 in WSL2, a same error raised which noted me
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system
I fixed it by updating the Windows system to version 19044 21H2.

__cxa_atexit could not be located in the dynamic link library autodock_gpu_128wi.exe

I am building this git hub repository for GPU acceleration in autodock from source. I have the Nvidia Development toolkit and can run the make command without issues (after modifying the make file to specify locations for the Cuda dependencies). However, when I try to run the .exe file that it creates it gives me this error: __cxa_atexit could not be located in the dynamic link library autodock_gpu_128wi.exe. What could be causing this issue? I think it should be compatible.
Machine:
OS: Windows 10
CPU: i7 9750H
GPU: GTX 1650
__cxa_atexit is a cxxabi in glibc
You need to check if linker has -lc by -Wl,-v argument for compiler.
If libc is not found by linker for some reason, you need to specify libc.so.6 path in glibc or just reinstall glibc.

Errors when trying to build label_image neural net with bazel

Environment info
Operating System: El Capitan, 10.11.1
I'm doing this tutorial: https://petewarden.com/2016/09/27/tensorflow-for-mobile-poets/
Trying to classify images using tensorflow on iOS app.
When I try to build my net using bazel:
bazel build tensorflow/examples/label_image:label_image
I get these errors:
https://gist.github.com/galharth/36b8f6eeb12f847ab120b2642083a732
From the related github issue https://github.com/tensorflow/tensorflow/issues/6487 I think we narrowed it down to a lack of resources on the virtual machine. Bazel tends to get flakey with only 2GB of RAM allocated to it.

Tensorflow: RC 0.10 3X Slower than 0.9

I am compiling the current master version from source. If I compile using CUDA 7.5 and CUDNN 4.0 I get the following compilation error:
ERROR: /home/rob/tensorflow/tensorflow/contrib/rnn/BUILD:45:1: undeclared inclusion(s) in rule '//tensorflow/contrib/rnn:python/ops/_lstm_ops_gpu':
this rule is missing dependency declarations for the following files included by 'tensorflow/contrib/rnn/kernels/lstm_ops_gpu.cu.cc':
'/usr/local/cuda-7.5/include/cuda_runtime.h'
'/usr/local/cuda-7.5/include/host_config.h'
'/usr/local/cuda-7.5/include/builtin_types.h'
[etc...]
If I compile with CUDNN 5.1, everything compiles and runs but the execution time is roughly 3X longer for a training script I am currently running compared to the same using the 0.9.0 release installed via pip.
I also tried the pip version of 0.10.rc0 (gpu) and saw the same 3X slow down vs. version 0.9.0
I am using Ubuntu 14.04, py 3.4 and a Tesla K40c gpu. Bazel is version 0.3.1
What is the cause of the 3X slow down of ver 0.10.0rc0 and is there any way to regain the prior performance?
Secondarily, how could I eliminate the build errors when using CUDNN 4?
The relative slowness of 0.10.0rc0 is a confirmed bug that is being addressed. More information and status can be found in this thread.