C++ compilation of rule '//tensorflow/core/kernels:sparse_reduce_op' failed (Exit 1) - tensorflow

I'm trying to build tensorflow from source, it keeps running for several hours then the build fails and I get the following error:
....................................................................................................................................................................................................
ERROR: /home/emadboctor/tensorflow/tensorflow/core/kernels/BUILD:5398:1: C++ compilation of rule '//tensorflow/core/kernels:sparse_reduce_op' failed (Exit 1)
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:104:0,
from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
from ./tensorflow/core/framework/numeric_types.h:20,
from ./tensorflow/core/framework/allocator.h:26,
from ./tensorflow/core/framework/op_kernel.h:24,
from tensorflow/core/kernels/sparse_reduce_op.cc:20:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h: In static member function 'static void std::_Function_handler<void(_ArgTypes ...), _Functor>::_M_invoke(const std::_Any_data&, _ArgTypes&& ...) [with _Functor = Eigen::internal::TensorExecutor<Expression, Eigen::ThreadPoolDevice, Vectorizable, Tiling>::run(const Expression&, const Eigen::ThreadPoolDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::TensorFixedSize<std::complex<float>, Eigen::Sizes<>, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::DimensionList<long int, 1ul>, const Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 1, 1, long int>, 0, Eigen::MakePointer>, Eigen::MakePointer> >; bool Vectorizable = true; Eigen::internal::TiledEvaluation Tiling = (Eigen::internal::TiledEvaluation)0u]::<lambda(Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::TensorFixedSize<std::complex<float>, Eigen::Sizes<>, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::DimensionList<long int, 1ul>, const Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 1, 1, long int>, 0, Eigen::MakePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, (Eigen::internal::TiledEvaluation)0u>::StorageIndex, Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::TensorFixedSize<std::complex<float>, Eigen::Sizes<>, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::DimensionList<long int, 1ul>, const Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 1, 1, long int>, 0, Eigen::MakePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, (Eigen::internal::TiledEvaluation)0u>::StorageIndex)>; _ArgTypes = {long int, long int}]':
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h:806:9: internal compiler error: in emit_move_insn, at expr.c:3547
values[i] = internal::InnerMostDimReducer<Self, Op>::reduce(*this, firstIndex + i * num_values_to_reduce,
^~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/emadboctor/tensorflow/tensorflow/tools/pip_package/BUILD:65:1 C++ compilation of rule '//tensorflow/core/kernels:sparse_reduce_op' failed (Exit 1)
INFO: Elapsed time: 8153.055s, Critical Path: 926.03s
INFO: 8566 processes: 8566 local.
FAILED: Build did NOT complete successfully
System information(GCP instance):
Distributor ID: Debian
Description: Debian GNU/Linux 9.12 (stretch)
Release: 9.12
Codename: stretch
gcc version: gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
g++ version: g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
tensorflow version: 2.1
bazel version: 3.1.0
Steps that led to the problem:
apt update && apt install -y \
build-essential \
libc-ares-dev \
libjpeg-dev \
openjdk-8-jdk \
gcc \
g++ \
python3-pip \
pip3 install six numpy wheel setuptools mock && \
pip3 install keras_applications --no-deps && \
pip3 install keras_preprocessing --no-deps
sudo apt-get update
sudo apt-get -y upgrade
wget https://dl.google.com/go/go1.13.3.linux-amd64.tar.gz
sudo tar -xvf go1.13.3.linux-amd64.tar.gz
sudo mv go /usr/local
export GOROOT=/usr/local/go
export GOPATH=$HOME/Projects/Proj1
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH
go get github.com/bazelbuild/bazelisk
mkdir -p ~/bin
ln -s $(go env GOPATH)/bin/bazelisk ~/bin/bazel
export PATH=$HOME/bin:$PATH
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
./configure
Configuration:
emadboctor#reg-build-vm:~/tensorflow$ ./configure
You have bazel 3.0.0 installed.
Please specify the location of python. [Default is /usr/bin/python3]: /usr/bin/python3
Found possible Python library paths:
/usr/local/lib/python3.5/dist-packages
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages]
/usr/local/lib/python3.5/dist-packages
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.
Do you wish to download a fresh release of clang? (Experimental) [y/N]: y
Clang will be downloaded and used to compile tensorflow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: -config=mkl
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=ngraph # Build with Intel nGraph support.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
emadboctor#reg-build-vm:~/tensorflow$
Then
bazel build -c opt \
--define=grpc_no_ares=true \
--linkopt="-lrt" \
--linkopt="-lm" \
--host_linkopt="-lrt" \
--host_linkopt="-lm" \
--action_env="LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" \
--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both \
--copt=-w \
--jobs=26 \
//tensorflow/tools/pip_package:build_pip_package

Related

Location of base directory of OpenMPI for cmake

I am trying to compile Trilinos with MPI capabilities. But to specify the cmake command, i need to also specify the MPI base directory:
cmake \
-DTPL_ENABLE_MPI=ON \
-DMPI_BASE_DIR:FILEPATH="" \
-DTrilinos_ENABLE_PyTrilinos:BOOL=ON \
-DTrilinos_ENABLE_ALL_PACKAGES=ON \
-DTrilinos_ENABLE_TESTS:BOOL=ON \
-DBUILD_SHARED_LIBS:BOOL=ON \
-DCMAKE_INSTALL_PREFIX:STRING="$HOME/trilinos-install" \
$SOURCE_DIR
However, I am unable to find any base directory even though MPI is installed on my machine. When i enter commands like mpirun --version, I get:
mpirun (Open MPI) 2.1.1
or ompi_info:
Package: Open MPI buildd#lcy01-amd64-009 Distribution
Open MPI: 2.1.1
Open MPI repo revision: v2.1.0-100-ga2fdb5b
Open MPI release date: May 10, 2017
Open RTE: 2.1.1
...
I am running Ubuntu 18.04 LTS on WSL if that is useful info.
The command "which mpich" can give you the default directory of mpi installation.
Another way is to use mpic++ as the compiler in the CMake.

How to build tenssorflow op with bazel with additional include directories

I got tensorflow binaries (already compiled)
I have added to tensorflow source:
tensorflow\core\user_ops\icp_op_kernel.cc - contains:
https://github.com/tensorflow/models/blob/master/research/vid2depth/ops/icp_op_kernel.cc
tensorflow\core\user_ops\BUILD - contains:
load("//tensorflow:tensorflow.bzl", "tf_custom_op_library")
tf_custom_op_library(
name = "icp_op_kernel.so",
srcs = ["icp_op_kernel.cc"],
)
I am trying to build with:
bazel build --config opt //tensorflow/core/user_ops:icp_op_kernel.so
And I get:
tensorflow/core/user_ops/icp_op_kernel.cc(16): fatal error C1083: Cannot open include file: 'pcl/point_types.h': No such file or directory
Because bazel don't know where the pcl include files are.
I have installed pcl and the include directory is in:
C:\Program Files\PCL 1.6.0\include\pcl-1.6
How do I tell bazel to also include this directory?
Also I will probably need to add C:\Program Files\PCL 1.6.0\lib to the link, How do I do that?
You don't need bazel for building ops if it fails.
I have implemented customized ops both in CPU and GPU, and basically follow the two Tensorflow tutorials.
For CPU ops, follow Tensorflow tutorial on Build the op library:
TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared zero_out.cc -o zero_out.so -fPIC ${TF_CFLAGS[#]} ${TF_LFLAGS[#]} -O2
Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi.
For GPU ops, check the current official GPU ops building instructions on Tensorflow adding GPU op support
nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
${TF_CFLAGS[#]} -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
g++ -std=c++11 -shared -o cuda_op_kernel.so cuda_op_kernel.cc \
cuda_op_kernel.cu.o ${TF_CFLAGS[#]} -fPIC -lcudart ${TF_LFLAGS[#]}
As it says, Note that if your CUDA libraries are not installed in /usr/local/lib64, you'll need to specify the path explicitly in the second (g++) command above. For example, add -L /usr/local/cuda-8.0/lib64/ if your CUDA is installed in /usr/local/cuda-8.0.
Also, Note in some linux settings, additional options to nvcc compiling step are needed. Add -D_MWAITXINTRIN_H_INCLUDED to the nvcc command line to avoid errors from mwaitxintrin.h.

How to build tensorflow benchmark_model for android arm64_v8a?

I'm using the following command in the Tensorflow 1.8 folder
bazel build -c opt --cxxopt='--std=c++11' \
//tensorflow/tools/benchmark:benchmark_model \
--crosstool_top=//external:android/crosstool \
--host_crosstool_top=#bazel_tools//tools/cpp:toolchain \
--cpu=arm64-v8a --verbose_failures
It's giving me the error:
ERROR: No default_toolchain found for cpu 'arm64-v8a'. Valid cpus are: [
k8,
local,
armeabi-v7a,
x64_windows,
x64_windows_msvc,
x64_windows_msys,
s390x,
ios_x86_64,
]
INFO: Elapsed time: 0.315s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
Ubuntu 16.04, Bazel 0.14.1, Tensorflow 1.8
This is because the Android NDK hasn't been configured in the root WORKSPACE file. Download the Android NDK and add the following line to the WORKSPACE:
android_ndk_repository(
name="androidndk",
path="<PATH_TO_NDK>",
)
I figured it out. Using bazel 0.10.1, SDK API level 27, NDK 15, Build tools 27.0.3, tensorflow 1.8.
First run
./configure
Then
bazel build --config=monolithic --cxxopt=--std=c++11 //tensorflow/tools/benchmark:benchmark_model --config=android_arm64 --cpu=arm64-v8a --fat_apk_cpu=arm64-v8a

Tensorflow compilation error with latest tensorflow source on MacOS

I am trying to build the tensorflow source on my Mac OSx Yosemite (10.10.5). After I run this command
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
I get this error
C++ compilation of rule '//tensorflow/core:candidate_sampling_ops_op_lib' failed: cc_wrapper.sh failed: error executing command external/local_config_cc/cc_wrapper.sh -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -fcolor-diagnostics -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG ... (remaining 95 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
tensorflow/core/ops/candidate_sampling_ops.cc:392:7: error: return type 'tensorflow::Status' must match previous return type 'const ::tensorflow::Status' when lambda expression has unspecified explicit return type
return Status::OK();
^
tensorflow/core/ops/candidate_sampling_ops.cc:376:17: error: no viable conversion from 'tensorflow::(lambda at tensorflow/core/ops/candidate_sampling_ops.cc:376:17)' to 'tensorflow::Status (*)(shape_inference::InferenceContext )'
.SetShapeFn([](InferenceContext c) {
What may I be doing wrong ?
(outdated but still relevant for this version of TF)
The latest version of tensorflow is NOT compileable/working for mac os x.
Here is my script to get tensorflow working on mac-osx sierra tensorflow 1.0 on mac-osx sierra i7 no gpu. I'm still working on getting SSE and such to compile correctly and a later version of tensorflow - but whatever. Tensorflow is not friently with macs - but DL4J is!
UPDATE:
You shouldn't need to update from Yosemite. I was able to get r1.3 to compile with SSE and AVX! So the 'latest release' at time of writing has known issues - r1.3 is the latest stable build. I've included the script to do a proper build below, but also including http://www.josephmiguel.com/building-tensorflow-1-3-from-source-on-mac-osx-sierra-macbook-pro-i7-with-sse-and-avx/ for all the details on the matter.
one time install
install anaconda3 pkg # manually download this and install the package
conda update conda
conda create -n dl python=3.6 anaconda
source activate dl
cd /
brew install bazel
pip install six numpy wheel
pip install –upgrade https://storage.googleapis.com/tensorflow/mac/cpu/protobuf-3.1.0-cp35-none-macosx_10_11_x86_64.whl
sudo -i
cd /
rm -rf tensorflow # if rerunning the script
cd /
git clone https://github.com/tensorflow/tensorflow
Step 1
cd /tensorflow
git checkout r1.3 -f
cd /
chmod -R 777 tensorflow
cd /tensorflow
./configure # accept all default settings
Step 2
// https://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions
bazel build –config=opt –copt=-mavx –copt=-mavx2 –copt=-mfma //tensorflow/tools/pip_package:build_pip_package
Step 3
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-1.0.1-cp36-cp36m-macosx_10_7_x86_64.whl
Step 4
cd ~
ipython
Step 5
import tensorflow as tf
hello = tf.constant(‘Hello, TensorFlow!’)
sess = tf.Session()
print(sess.run(hello))
Step 6
pip uninstall /tmp/tensorflow_pkg/tensorflow-1.0.1-cp36-cp36m-macosx_10_7_x86_64.whl

TensorFlow on Nvidia TX1

Anyone gotten tensorflow working on the Nvidia Tegra X1?
I've found a few sources indicating it's possible on the TK1 or with significant hacking/errors on the TX1 but no definitive recipe yet.
http://cudamusing.blogspot.de/2015/11/building-tensorflow-for-jetson-tk1.html
https://github.com/tensorflow/tensorflow/issues/851
I am using a Jetson 2.3 install but haven't gotten it working yet - any tips most appreciated.
Got TensorFlow R0.9 working on TX1 with Bazel 0.2.1, CUDA 8.0, CUDNN5.1, L4T24.2, and fresh JetPack 2.3 install. I've tested it with basic MLP, Conv, and LSTM nets using BN, Sigmoid, ReLU etc with no errors yet. I removed sparse_matmul_op though otherwise believe compilation should be fully operational. Many of these steps come directly from MaxCuda's excellent blog, so huge thanks to them for providing.
Plan to continue hammering on R0.10/R0.11 (gRPC binary is preventing Bazel 0.3.0 right now) but until then figured I'd post the R0.9 formula. As below:
First get java
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
Install some other deps
sudo apt-get install git zip unzip autoconf automake libtool curl zlib1g-dev maven swig
Need to build protobuf 3.0.0-beta-2 jar yourself
git clone https://github.com/google/protobuf.git
cd protobuf
# autogen.sh downloads broken gmock.zip in d5fb408d
git checkout master
./autogen.sh
git checkout d5fb408d
./configure --prefix=/usr
make -j 4
sudo make install
cd java
mvn package
Get bazel. We want version 0.2.1, it doesn't require gRPC binary unlike 0.3.0 which I can't build yet (maybe soon!)
git clone https://github.com/bazelbuild/bazel.git
cd bazel
git checkout 0.2.1
cp /usr/bin/protoc third_party/protobuf/protoc-linux-arm32.exe
cp ../protobuf/java/target/protobuf-java-3.0.0-beta-2.jar third_party/protobuf/protobuf-java-3.0.0-beta-1.jar
Need to edit a bazel file to recognize aarch64 as ARM
--- a/src/main/java/com/google/devtools/build/lib/util/CPU.java
+++ b/src/main/java/com/google/devtools/build/lib/util/CPU.java
## -25,7 +25,7 ## import java.util.Set;
public enum CPU {
X86_32("x86_32", ImmutableSet.of("i386", "i486", "i586", "i686", "i786", "x86")),
X86_64("x86_64", ImmutableSet.of("amd64", "x86_64", "x64")),
- ARM("arm", ImmutableSet.of("arm", "armv7l")),
+ ARM("arm", ImmutableSet.of("arm", "armv7l", "aarch64")),
UNKNOWN("unknown", ImmutableSet.<String>of());
Now compile
./compile.sh
And install
sudo cp output/bazel /usr/local/bin
Get tensorflow R0.9. Higher than R0.9 requires Bazel 0.3.0 which I haven't figured out how to build yet due to gRPC issues.
git clone -b r0.9 https://github.com/tensorflow/tensorflow.git
Build once. It will fail, but now you have the bazel .cache dir where you can place updated config.guess & config.sub files that will figure what architecture you're running
./configure
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
cd ~
wget -O config.guess 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD'
wget -O config.sub 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD'
# below are commands I ran, yours will vary depending on .cache details. `find` is your friend
cp config.guess ./.cache/bazel/_bazel_socialh/742c01ff0765b098544431b60b1eed9f/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260/config.guess
cp config.sub ./.cache/bazel/_bazel_socialh/742c01ff0765b098544431b60b1eed9f/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260/config.sub
sparse_matmul_op had a couple errors, I took the cowardly route and removed from the build
--- a/tensorflow/core/kernels/BUILD
+++ b/tensorflow/core/kernels/BUILD
## -985,7 +985,7 ## tf_kernel_libraries(
"reduction_ops",
"segment_reduction_ops",
"sequence_ops",
- "sparse_matmul_op",
+ #DC "sparse_matmul_op",
],
deps = [
":bounds_check",
--- a/tensorflow/python/BUILD
+++ b/tensorflow/python/BUILD
## -1110,7 +1110,7 ## medium_kernel_test_list = glob([
"kernel_tests/seq2seq_test.py",
"kernel_tests/slice_op_test.py",
"kernel_tests/sparse_ops_test.py",
- "kernel_tests/sparse_matmul_op_test.py",
+ #DC "kernel_tests/sparse_matmul_op_test.py",
"kernel_tests/sparse_tensor_dense_matmul_op_test.py",
])
TX1 can't do fancy constructors in cwise_op_gpu_select.cu.cc
--- a/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
+++ b/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
## -43,8 +43,14 ## struct BatchSelectFunctor<GPUDevice, T> {
const int all_but_batch = then_flat_outer_dims.dimension(1);
#if !defined(EIGEN_HAS_INDEX_LIST)
- Eigen::array<int, 2> broadcast_dims{{ 1, all_but_batch }};
- Eigen::Tensor<int, 2>::Dimensions reshape_dims{{ batch, 1 }};
+ //DC Eigen::array<int, 2> broadcast_dims{{ 1, all_but_batch }};
+ Eigen::array<int, 2> broadcast_dims;
+ broadcast_dims[0] = 1;
+ broadcast_dims[1] = all_but_batch;
+ //DC Eigen::Tensor<int, 2>::Dimensions reshape_dims{{ batch, 1 }};
+ Eigen::Tensor<int, 2>::Dimensions reshape_dims;
+ reshape_dims[0] = batch;
+ reshape_dims[1] = 1;
#else
Eigen::IndexList<Eigen::type2index<1>, int> broadcast_dims;
broadcast_dims.set(1, all_but_batch);
Same in sparse_tensor_dense_matmul_op_gpu.cu.cc
--- a/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc
## -104,9 +104,17 ## struct SparseTensorDenseMatMulFunctor<GPUDevice, T, ADJ_A, ADJ_B> {
int n = (ADJ_B) ? b.dimension(0) : b.dimension(1);
#if !defined(EIGEN_HAS_INDEX_LIST)
- Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz{{ 1, nnz }};
- Eigen::array<int, 2> n_by_1{{ n, 1 }};
- Eigen::array<int, 1> reduce_on_rows{{ 0 }};
+ //DC Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz{{ 1, nnz }};
+ Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz;
+ matrix_1_by_nnz[0] = 1;
+ matrix_1_by_nnz[1] = nnz;
+ //DC Eigen::array<int, 2> n_by_1{{ n, 1 }};
+ Eigen::array<int, 2> n_by_1;
+ n_by_1[0] = n;
+ n_by_1[1] = 1;
+ //DC Eigen::array<int, 1> reduce_on_rows{{ 0 }};
+ Eigen::array<int, 1> reduce_on_rows;
+ reduce_on_rows[0] = 0;
#else
Eigen::IndexList<Eigen::type2index<1>, int> matrix_1_by_nnz;
matrix_1_by_nnz.set(1, nnz);
Running with CUDA 8.0 requires new macros for FP16. Many thanks to Kashif/Mrry for pointing out the fix!
--- a/tensorflow/stream_executor/cuda/cuda_blas.cc
+++ b/tensorflow/stream_executor/cuda/cuda_blas.cc
## -25,6 +25,12 ## limitations under the License.
#define EIGEN_HAS_CUDA_FP16
#endif
+#if CUDA_VERSION >= 8000
+#define SE_CUDA_DATA_HALF CUDA_R_16F
+#else
+#define SE_CUDA_DATA_HALF CUBLAS_DATA_HALF
+#endif
+
#include "tensorflow/stream_executor/cuda/cuda_blas.h"
#include <dlfcn.h>
## -1680,10 +1686,10 ## bool CUDABlas::DoBlasGemm(
return DoBlasInternal(
dynload::cublasSgemmEx, stream, true /* = pointer_mode_host */,
CUDABlasTranspose(transa), CUDABlasTranspose(transb), m, n, k, &alpha,
- CUDAMemory(a), CUBLAS_DATA_HALF, lda,
- CUDAMemory(b), CUBLAS_DATA_HALF, ldb,
+ CUDAMemory(a), SE_CUDA_DATA_HALF, lda,
+ CUDAMemory(b), SE_CUDA_DATA_HALF, ldb,
&beta,
- CUDAMemoryMutable(c), CUBLAS_DATA_HALF, ldc);
+ CUDAMemoryMutable(c), SE_CUDA_DATA_HALF, ldc);
#else
LOG(ERROR) << "fp16 sgemm is not implemented in this cuBLAS version "
<< "(need at least CUDA 7.5)";
And lastly ARM has no NUMA nodes so this needs to be added or you will get an immediate crash on starting tf.Session()
--- a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
## -888,6 +888,9 ## CudaContext* CUDAExecutor::cuda_context() { return context_; }
// For anything more complicated/prod-focused than this, you'll likely want to
// turn to gsys' topology modeling.
static int TryToReadNumaNode(const string &pci_bus_id, int device_ordinal) {
+ // DC - make this clever later. ARM has no NUMA node, just return 0
+ LOG(INFO) << "ARM has no NUMA node, hardcoding to return zero";
+ return 0;
#if defined(__APPLE__)
LOG(INFO) << "OS X does not support NUMA - returning NUMA node zero";
return 0;
After these changes, build and install! Hope this is useful to some folks.
Follow Dwight's answer but also create a swap file of at least 6 GB
Following Dwight Crow's answer but with an 8 GB swap file and using the following command successfully built TensorFlow 0.9 on the Jetson TX1 from a fresh install of JetPack 2.3:
bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package
I used the default settings for TensorFlow's ./configure script except to enable GPU support.
My build took at least 6 hours. It'll be faster if you use an SSD instead of a USB drive.
Creating a swap file
# Create a swapfile for Ubuntu at the current directory location
fallocate -l *G swapfile
# List out the file
ls -lh swapfile
# Change permissions so that only root can use it
chmod 600 swapfile
# List out the file
ls -lh swapfile
# Set up the Linux swap area
mkswap swapfile
# Now start using the swapfile
sudo swapon swapfile
# Show that it's now being used
swapon -s
I used this USB drive to store my swap file.
The most memory I saw my system use was 7.7 GB (3.8 GB on Mem and 3.9 GB on Swap). The most swap memory I saw used at once was 4.4 GB. I used free -h to view memory usage.
Creating the pip package and installing
Adapted from the TensorFlow docs:
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# The name of the .whl file will depend on your platform.
$ pip install /tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl
Acknowledgements
Thanks to Dwight Crow (guide), elirex (bazel option values and free -h), tylerfox (swap file idea and local_resources option), everyone that helped them, and everyone in the Github issue thread.
The swap file script was adapted from JetsonHack's gist.
Errors I received when not using a swap file
To help search engines find this answer.
Error: unexpected EOF from Bazel server.
gcc: internal compiler error: Killed (program cc1plus)