I'm trying to build tensorflow from source, it keeps running for several hours then the build fails and I get the following error:
....................................................................................................................................................................................................
ERROR: /home/emadboctor/tensorflow/tensorflow/core/kernels/BUILD:5398:1: C++ compilation of rule '//tensorflow/core/kernels:sparse_reduce_op' failed (Exit 1)
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:104:0,
from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
from ./tensorflow/core/framework/numeric_types.h:20,
from ./tensorflow/core/framework/allocator.h:26,
from ./tensorflow/core/framework/op_kernel.h:24,
from tensorflow/core/kernels/sparse_reduce_op.cc:20:
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h: In static member function 'static void std::_Function_handler<void(_ArgTypes ...), _Functor>::_M_invoke(const std::_Any_data&, _ArgTypes&& ...) [with _Functor = Eigen::internal::TensorExecutor<Expression, Eigen::ThreadPoolDevice, Vectorizable, Tiling>::run(const Expression&, const Eigen::ThreadPoolDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::TensorFixedSize<std::complex<float>, Eigen::Sizes<>, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::DimensionList<long int, 1ul>, const Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 1, 1, long int>, 0, Eigen::MakePointer>, Eigen::MakePointer> >; bool Vectorizable = true; Eigen::internal::TiledEvaluation Tiling = (Eigen::internal::TiledEvaluation)0u]::<lambda(Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::TensorFixedSize<std::complex<float>, Eigen::Sizes<>, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::DimensionList<long int, 1ul>, const Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 1, 1, long int>, 0, Eigen::MakePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, (Eigen::internal::TiledEvaluation)0u>::StorageIndex, Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::TensorFixedSize<std::complex<float>, Eigen::Sizes<>, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::DimensionList<long int, 1ul>, const Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 1, 1, long int>, 0, Eigen::MakePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, (Eigen::internal::TiledEvaluation)0u>::StorageIndex)>; _ArgTypes = {long int, long int}]':
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h:806:9: internal compiler error: in emit_move_insn, at expr.c:3547
values[i] = internal::InnerMostDimReducer<Self, Op>::reduce(*this, firstIndex + i * num_values_to_reduce,
^~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/emadboctor/tensorflow/tensorflow/tools/pip_package/BUILD:65:1 C++ compilation of rule '//tensorflow/core/kernels:sparse_reduce_op' failed (Exit 1)
INFO: Elapsed time: 8153.055s, Critical Path: 926.03s
INFO: 8566 processes: 8566 local.
FAILED: Build did NOT complete successfully
System information(GCP instance):
Distributor ID: Debian
Description: Debian GNU/Linux 9.12 (stretch)
Release: 9.12
Codename: stretch
gcc version: gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
g++ version: g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
tensorflow version: 2.1
bazel version: 3.1.0
Steps that led to the problem:
apt update && apt install -y \
build-essential \
libc-ares-dev \
libjpeg-dev \
openjdk-8-jdk \
gcc \
g++ \
python3-pip \
pip3 install six numpy wheel setuptools mock && \
pip3 install keras_applications --no-deps && \
pip3 install keras_preprocessing --no-deps
sudo apt-get update
sudo apt-get -y upgrade
wget https://dl.google.com/go/go1.13.3.linux-amd64.tar.gz
sudo tar -xvf go1.13.3.linux-amd64.tar.gz
sudo mv go /usr/local
export GOROOT=/usr/local/go
export GOPATH=$HOME/Projects/Proj1
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH
go get github.com/bazelbuild/bazelisk
mkdir -p ~/bin
ln -s $(go env GOPATH)/bin/bazelisk ~/bin/bazel
export PATH=$HOME/bin:$PATH
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
./configure
Configuration:
emadboctor#reg-build-vm:~/tensorflow$ ./configure
You have bazel 3.0.0 installed.
Please specify the location of python. [Default is /usr/bin/python3]: /usr/bin/python3
Found possible Python library paths:
/usr/local/lib/python3.5/dist-packages
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages]
/usr/local/lib/python3.5/dist-packages
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.
Do you wish to download a fresh release of clang? (Experimental) [y/N]: y
Clang will be downloaded and used to compile tensorflow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: -config=mkl
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=ngraph # Build with Intel nGraph support.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
emadboctor#reg-build-vm:~/tensorflow$
Then
bazel build -c opt \
--define=grpc_no_ares=true \
--linkopt="-lrt" \
--linkopt="-lm" \
--host_linkopt="-lrt" \
--host_linkopt="-lm" \
--action_env="LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" \
--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both \
--copt=-w \
--jobs=26 \
//tensorflow/tools/pip_package:build_pip_package
I got tensorflow binaries (already compiled)
I have added to tensorflow source:
tensorflow\core\user_ops\icp_op_kernel.cc - contains:
https://github.com/tensorflow/models/blob/master/research/vid2depth/ops/icp_op_kernel.cc
tensorflow\core\user_ops\BUILD - contains:
load("//tensorflow:tensorflow.bzl", "tf_custom_op_library")
tf_custom_op_library(
name = "icp_op_kernel.so",
srcs = ["icp_op_kernel.cc"],
)
I am trying to build with:
bazel build --config opt //tensorflow/core/user_ops:icp_op_kernel.so
And I get:
tensorflow/core/user_ops/icp_op_kernel.cc(16): fatal error C1083: Cannot open include file: 'pcl/point_types.h': No such file or directory
Because bazel don't know where the pcl include files are.
I have installed pcl and the include directory is in:
C:\Program Files\PCL 1.6.0\include\pcl-1.6
How do I tell bazel to also include this directory?
Also I will probably need to add C:\Program Files\PCL 1.6.0\lib to the link, How do I do that?
You don't need bazel for building ops if it fails.
I have implemented customized ops both in CPU and GPU, and basically follow the two Tensorflow tutorials.
For CPU ops, follow Tensorflow tutorial on Build the op library:
TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared zero_out.cc -o zero_out.so -fPIC ${TF_CFLAGS[#]} ${TF_LFLAGS[#]} -O2
Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi.
For GPU ops, check the current official GPU ops building instructions on Tensorflow adding GPU op support
nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
${TF_CFLAGS[#]} -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
g++ -std=c++11 -shared -o cuda_op_kernel.so cuda_op_kernel.cc \
cuda_op_kernel.cu.o ${TF_CFLAGS[#]} -fPIC -lcudart ${TF_LFLAGS[#]}
As it says, Note that if your CUDA libraries are not installed in /usr/local/lib64, you'll need to specify the path explicitly in the second (g++) command above. For example, add -L /usr/local/cuda-8.0/lib64/ if your CUDA is installed in /usr/local/cuda-8.0.
Also, Note in some linux settings, additional options to nvcc compiling step are needed. Add -D_MWAITXINTRIN_H_INCLUDED to the nvcc command line to avoid errors from mwaitxintrin.h.
I'm trying to convert my *.pb tensorflow model to coreML. I'm getting stuck on identifying my output node of my model.
In order to obtain my output node, I've attempted to build and run summarize_graph on my *.pb file, but running into issues. How do I build and run summarize_graph after downloading the source?
I've run the following command:
bazel build tensorflow/tools/graph_transforms:summarize_graph
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=tensorflow_inception_graph.pb
and I get the following error:
INFO: Analysed 0 targets (0 packages loaded). INFO: Found 0 targets...
INFO: Elapsed time: 0.389s, Critical Path: 0.01s INFO: Build completed
successfully, 1 total action
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph: No such
file or directory
After issuing the bazel command, a blank bazel-bin directory appears in the location I executed the command.
Note, summarize_graph didn't exist in my tensorflow installation. So I downloaded the source from github tensorflow/tools/graph_transforms and copied it into my tensorflow/tools/graph_transforms directory.
the directory contains the following:
BUILD README.md
init.py
init.pyc add_default_attributes.cc add_default_attributes_test.cc backports.cc backports_test.cc compare_graphs.cc
fake_quantize_training.cc fake_quantize_training_test.cc file_utils.cc
file_utils.h file_utils_test.cc flatten_atrous.cc
flatten_atrous_test.cc fold_batch_norms.cc fold_batch_norms_test.cc
fold_constants_lib.cc fold_constants_lib.h fold_constants_test.cc
fold_old_batch_norms.cc fold_old_batch_norms_test.cc
freeze_requantization_ranges.cc freeze_requantization_ranges_test.cc
fuse_convolutions.cc fuse_convolutions_test.cc insert_logging.cc
insert_logging_test.cc obfuscate_names.cc obfuscate_names_test.cc out
python quantize_nodes.cc quantize_nodes_test.cc quantize_weights.cc
quantize_weights_test.cc remove_attribute.cc remove_attribute_test.cc
remove_device.cc remove_device_test.cc remove_ema.cc
remove_ema_test.cc remove_nodes.cc remove_nodes_test.cc
rename_attribute.cc rename_attribute_test.cc rename_op.cc
rename_op_test.cc round_weights.cc round_weights_test.cc set_device.cc
set_device_test.cc sort_by_execution_order.cc
sort_by_execution_order_test.cc sparsify_gather.cc
sparsify_gather_test.cc strip_unused_nodes.cc
strip_unused_nodes_test.cc summarize_graph_main.cc transform_graph.cc
transform_graph.h transform_graph_main.cc transform_graph_test.cc
transform_utils.cc transform_utils.h transform_utils_test.cc
I'm on a macbook pro
Thanks!
In case anyone is running into the similar problem, I solved it.
Navigate to the root of the tensorflow source directory
cmd> ./configure
cmd> bazel build tensorflow/tools/graph_transforms:summarize_graph
(you may get an error about xcode; if so, run the following)
cmd> xcode-select -s /Applications/Xcode.app/Contents/Developer
cmd> bazel clean --expunge
cmd> bazel build tensorflow/tools/graph_transforms:summarize_graph
CentOS 7 walkthrough:
yum install epel-release
yum update
yum install patch
curl https://copr.fedorainfracloud.org/coprs/vbatts/bazel/repo/epel-7/vbatts-bazel-epel-7.repo -o /etc/yum.repos.d/vbatts-bazel-epel-7.repo
yum install bazel
curl -L -O https://github.com/tensorflow/tensorflow/archive/v1.8.0.tar.gz
cd tensorflow-1.8.0
./configure # interactive!
bazel build tensorflow/tools/graph_transforms:summarize_graph
bazel-bin/tensorflow/tools/graph_transforms/summarize_graph
I'm building TensorFlow with Bazel using bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer as instructed to by the TensorFlow 'installing from sources' instructions.
I get the following error:
ERROR: /home/ubuntu/tensorflow/tensorflow/stream_executor/BUILD:5:1: C++ compilation of rule '//tensorflow/stream_executor:stream_e
xecutor' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command third_party/gpus/crosstool/clang/bin/crosstool
_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-fr
ee-nonheap-object ... (remaining 87 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exite
d with status 1.
tensorflow/stream_executor/cuda/cuda_dnn.cc: In function 'cudnnConvolutionFwdAlgo_t perftools::gputools::cuda::{anonymous}::ToConvF
orwardAlgo(perftools::gputools::dnn::AlgorithmType)':
tensorflow/stream_executor/cuda/cuda_dnn.cc:269:10: error: 'CUDNN_CONVOLUTION_FWD_ALGO_FFT' was not declared in this scope
case CUDNN_CONVOLUTION_FWD_ALGO_FFT:
...
Stack: EC2 g2.8xlarge machine running Ubuntu 14.04.2. Bazel version 0.1.5 (installed w/ bazel-0.1.5-jdk7-installer-linux-x86_64.sh).
I've tried Bazel 0.1.4 and 0.2.3 and I get the same error.
I had the same issue building tensorflow in Ubuntu 16.04.
First of all ensure that you are using gcc version <= 4.8
In my case I had to install it doing:
For gcc
sudo apt-get install gcc-4.8
sudo update-alternatives --remove-all gcc
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 10
For g++
sudo apt-get install g++-4.8
sudo update-alternatives --remove-all g++
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 10
Once having the right version of gcc and g++, I had to edit the CROSSTOOL file as follows:
gedit tensorflow_sources_folder/third_party/gpus/crosstool/CROSSTOOL
Search every ocurrence of this specific line:
tool_path { name: "gcc" path: "clang/bin/crosstool_wrapper_driver_is_not_gcc" }
And insert the following line exactly above it:
cxx_flag: "-D_FORCE_INLINES"
So the result must be:
cxx_flag: "-D_FORCE_INLINES"
tool_path { name: "gcc" path: "clang/bin/crosstool_wrapper_driver_is_not_gcc" }
I'm trying to install TensorFlow serving on OSX El Capitan using Docker but keep running into an error. Here is the tutorial I'm following:
https://tensorflow.github.io/serving/docker.html
Here is the command causing the error:
bazel test tensorflow_serving/...
Here's the error I'm getting:
for (int i = 0; i < suffix.size(); ++i) {
^
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/tf/tensorflow/core/kernels/BUILD:212:1: C++ compilation of rule '#tf//tensorflow/core/kernels:mirror_pad_op' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -iquote external/tf -iquote ... (remaining 65 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
Solved! Looks like the issues was with running out of memory in the VM.
Here's how I fixed it:
1) When creating the machine, make sure it has more memory (mine was only 1GB). Here is how you create a docker machine with 4GB:
docker-machine create -d virtualbox --virtualbox-memory 4096 default
2) When running the bazel command pass in a parameter limiting the amount of memory to use. Here I'm running the command using only 2GB:
bazel build -c opt --copt=-mavx --verbose_failures --local_resources 2048,2.0,1.0 -j 1 //tensorflow_serving/example:mnist_export
Where the original command was:
bazel build //tensorflow_serving/example:mnist_export