I'm currently following the instructions here
to build tensorflow from source using bazel.
After setting up the configuration, and attempting to build it, I get this error:
Cuda Configuration Error: Error reading C:/Program Files/NVIDIA GPU
Computing To olkit/CUDA/v9.0/include/cudnn.h: java.io.IOException:
ERROR: src/main/native/win dows/processes-jni.cc(239):
CreateProcessW("grep" --color=never -A1 -E "#define CUDNN_MAJOR"
"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0/include/cu
dnn.h"): Das System kann die angegebene Datei nicht finden.
The last phrase translates to "no such file or directory".
But I'm 100% sure I installed cuDNN 7.1.2 correctly, merging the downloaded bin/include/lib folder's with the preexisting CUDA folders. If I copy/paste the path from the error message, there it is cudnn.h! I also run everything in administator etc..
This problem shows up a few times on google, linking it to a mis-configured theano setup, but I'm not using theano right now.
Why doesn't bazel find the file, when it's definitely there?
EDIT: I would also be very thankful if anyone has a link a pre-built version of tensorflow for CUDA compute capability 3.0!
Thank you in advance!
The error indicated that grep.exe can't be found. I tried grep for Windows and msys2. Both will do.
Related
I am following the official instructions from http://caffe.berkeleyvision.org/install_osx.html
A couple of things are unclear. Instructions say
"CUDA: Install via the NVIDIA package that includes both CUDA and the bundled driver."
I do not have an Nvidia GPU (am on MacBook Pro 2016 which has amd gpu instead) and plan to use caffe with CPU only.
a) do I have to install CUDA in order to install caffe?
b) inside Makefile.config it have uncommented code that asks to set CUDA directory, saying the following:
# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
Seems to suggest that I should have CUDA. It was my impression that I can't have CUDA without Nvidia gpu. (My ultimate goal is to install and work with OpenPose from Spyder.)
Why do I need CUDA uncommented when I am specifying that it will be CPU only
# CPU-only switch (uncomment to build without GPU support).
CPU_ONLY := 1
So far I have left CUDA commands uncommented as they were and proceeded with make all caffe compilation from terminal. I am encountering the following error and not sure how to solve it. Anyone managed please?
LD -o .build_release/lib/libcaffe.so.1.0.0
ld: framework not found vecLib
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [.build_release/lib/libcaffe.so.1.0.0] Error 1
I have tried uninstalling and re-installing Xcode command line tools but that did not help. Now I am trying to find the file from which a call to search for vecLib.framework is made, so that I can edit that file and set the vecLib path correctly. I have found it under Library/Developer section. Does anyone know which file is being used to search for vecLib path?
Next problem:
LD -o .build_release/lib/libcaffe.so.1.0.0
ld: library not found for -lboost_python3
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [.build_release/lib/libcaffe.so.1.0.0] Error 1
I uncommented this line in Makefile.config as per instructions I was following
PYTHON_LIBRARIES := boost_python3 python3.7m
Not sure how to resolve the missing -lboost_python3.
I found and can see the directory of boost_python3 but don't know how/where I shall use it?
The objective of my experiment is to build tensorflow on Jetson TK1 arm based embedded board. Since pre-builts of tensorflow for arm architecture are not given by the official releases, I was forced to the option of building it from source.
To build tensorflow, we need Bazel which should be also build from source. Now I got stuck here, not able to build bazel at all.
I have referred various blogs and github projects and tried to follow the instructions everyone said it worked for them.
1) Tensorflow on Raspberry-pi
2) Jetson Hacks building Tensorflow from source
3) Official Documentation
Steps Followed:
$ sudo apt-get install build-essential openjdk-8-jdk python zip
$ wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/bazel-0.4.5-dist.zip
$ unzip -d bazel bazel-0.4.5-dist.zip
$ cd bazel
$ sudo ./compile.sh
Error Log:
ERROR: /build/bazel/src/main/protobuf/BUILD:25:2: Java compilation in rule '//src/main/protobuf:extra_actions_base_java_proto' failed: Worker process sent response with exit code: 1.
java.lang.InternalError: Cannot find requested resource bundle for locale en_US
at com.sun.tools.javac.util.JavacMessages.getBundles(JavacMessages.java:128)
at com.sun.tools.javac.util.JavacMessages.getLocalizedString(JavacMessages.java:147)
at com.sun.tools.javac.util.JavacMessages.getLocalizedString(JavacMessages.java:140)
at com.sun.tools.javac.util.Log.localize(Log.java:673)
at com.sun.tools.javac.util.Log.printLines(Log.java:485)
at com.sun.tools.javac.api.JavacTaskImpl.handleExceptions(JavacTaskImpl.java:156)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:93)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:87)
at com.google.devtools.build.buildjar.javac.BlazeJavacMain.compile(BlazeJavacMain.java:104)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder$1.invokeJavac(SimpleJavaLibraryBuilder.java:163)
at com.google.devtools.build.buildjar.ReducedClasspathJavaLibraryBuilder.compileSources(ReducedClasspathJavaLibraryBuilder.java:52)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.compileJavaLibrary(SimpleJavaLibraryBuilder.java:166)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.run(SimpleJavaLibraryBuilder.java:178)
at com.google.devtools.build.buildjar.BazelJavaBuilder.processRequest(BazelJavaBuilder.java:90)
at com.google.devtools.build.buildjar.BazelJavaBuilder.runPersistentWorker(BazelJavaBuilder.java:67)
at com.google.devtools.build.buildjar.BazelJavaBuilder.main(BazelJavaBuilder.java:44)
Caused by: java.util.MissingResourceException: Can't find bundle for base name com.google.errorprone.errors, locale en_US
at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1573)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:854)
at com.sun.tools.javac.util.JavacMessages.lambda$add$0(JavacMessages.java:106)
at com.sun.tools.javac.util.JavacMessages.getBundles(JavacMessages.java:125)
... 15 more
Target //src:bazel failed to build
INFO: Elapsed time: 291.995s, Critical Path: 258.92s
ERROR: Could not build Bazel
To make sure the error is independent of the architecture, I have tried to build Bazel in x86_64 PC. Even there I am getting the same error. I have seen people created the similar issue in bazel github group, none solved.
Version 0.4.5 is very old. We just released 0.12.0, could you try that one?
I am trying to build MKL-accelerated version of TensorFlow using bazel 0.5.1, gcc 6.2, binutils 2.28, Anaconda2 python on Scientific Linux 7.2.
Apparently the system /lib64/libstdc++.so.6 is too old, so I am trying to use gcc installed in another directory. PATH, LD_LIBRARY_PATH are modified to prepend the corresponding paths (using modules). However, while bazel has no trouble picking up correctly executables for gcc, ld, python, it still tries to load old system /lib64/libstdc++.so.6. How to force it to use the one from gcc 6.2? Why does not it pick it up from LD_LIBRARY_PATH?
According to google many people are having trouble with this but I could not find a solution that would work for me. I had no trouble building TensorFlow under Ubuntu 16.04 that has sufficiently new gcc in the standard location.
I do:
1) ./configure
The only non-default options I choose is use MKL and download MKL
2) bazel build --config=mkl --copt="-DEIGEN_USE_VML" -s -c opt //tensorflow/tools/pip_package:build_pip_package
.....
example/example_parser_configuration.proto tensorflow/core/protobuf/control_flow.proto tensorflow/core/protobuf/meta_graph.proto tensorflow/core/protobuf/named_tensor.proto tensorflow/core/protobuf/saved_model.proto tensorflow/core/protobuf/tensorflow_server.proto tensorflow/core/util/event.proto tensorflow/core/util/test_log.proto)
ERROR: /scratch/midway2/ivy2/TF_intel/tensorflow/tensorflow/tools/tfprof/BUILD:42:1: null failed: protoc failed: error executing command bazel-out/host/bin/external/protobuf/protoc '--python_out=bazel-out/local-opt/genfiles/' -I. -I. -Iexternal/protobuf/python -Ibazel-out/local-opt/genfiles/external/protobuf/python ... (remaining 5 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
bazel-out/host/bin/external/protobuf/protoc: /lib64/libstdc++.so.6: version GLIBCXX_3.4.20' not found (required by bazel-out/host/bin/external/protobuf/protoc)
bazel-out/host/bin/external/protobuf/protoc: /lib64/libstdc++.so.6: versionCXXABI_1.3.8' not found (required by bazel-out/host/bin/external/protobuf/protoc)
bazel-out/host/bin/external/protobuf/protoc: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by bazel-out/host/bin/external/protobuf/protoc)
.....
Thank you,
Igor
sorry for the slow reply. Bazel by design ignores LD_LIBRARY_PATH when running actions. It doesn't have to ignore them during C++ toolchain detection, but at the moment, it does :/ To help you forward, I would try adding --sysroot= as linkopt or using bazel grte_top flag. Depending on where your libstdc++.so lives, you might need to disable sandbox. The principled solution would be to write a custom CROSSTOOL that specifies builtin_sysroot or grte_top. But that is not an easy task.
Let me know if I lost you somewhere in that paragraph :)
I was trying to using syntaxnet and I have finished most of processes. Upgrade bazel version to 0.43 in case of errors (Ubuntu 16.04 Ver, Anaconda python 2.7).
However, I am having a troubles with ./configure part. I am reading the official instruction via tensorflow github.
git clone --recursive https://github.com/tensorflow/models.git
cd models/syntaxnet/tensorflow
**./configure**
cd ..
bazel test syntaxnet/... util/utf8/...
# On Mac, run the following:
bazel test --linkopt=-headerpad_max_install_names \
syntaxnet/... util/utf8/...
Following logs will help you to understand what’s going on my machine. Thanks for the advice
Please specify the location of python. [Default is /home/ryan/anaconda2/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] n
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/home/ryan
/home/ryan/pynaoqi-python2.7
/home/ryan/anaconda2/lib/python2.7/site-packages
Please input the desired Python library path to use. Default is [/home/ryan]
/home/ryan/anaconda2/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5.0
Please specify the location where cuDNN 5.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to cuDNN toolkit. Neither of the following two files can be found:
/usr/local/cuda-8.0/lib64/libcudnn.so.5.0
/usr/local/cuda-8.0/libcudnn.so.5.0
.5.0
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
libcudnn.so resolves to libcudnn.5
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=120
INFO: Reading options for 'clean' from /home/ryan/git_ryan/models/syntaxnet/tensorflow/tools/bazel.rc:
Inherited 'build' options: --force_python=py2 --host_force_python=py2 --python2_path=/home/ryan/anaconda2/bin/python --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --define PYTHON_BIN_PATH=/home/ryan/anaconda2/bin/python --spawn_strategy=standalone --genrule_strategy=standalone
**INFO: Reading options for 'clean' from /etc/bazel.bazelrc:
Inherited 'build' options: --action_env=PATH --action_env=LD_LIBRARY_PATH --action_env=TMPDIR --test_env=PATH --test_env=LD_LIBRARY_PATH
Unrecognized option: --action_env=PATH
ERROR: /home/ryan/git_ryan/models/syntaxnet/tensorflow/tensorflow/tensorflow.bzl:568:26: Traceback (most recent call last):
File "/home/ryan/git_ryan/models/syntaxnet/tensorflow/tensorflow/tensorflow.bzl", line 562
rule(attrs = {"srcs": attr.label_list..."), <3 more arguments>)}, <2 more arguments>)
File "/home/ryan/git_ryan/models/syntaxnet/tensorflow/tensorflow/tensorflow.bzl", line 568, in rule
attr.label_list(cfg = "data", allow_files = True)
expected ConfigurationTransition or NoneType for 'cfg' while calling label_list but got string instead: data.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Extension file 'tensorflow/tensorflow.bzl' has errors.
Configuration finished**
I think the version of your bazel is too high for Syntaxnet. you can try bazel-0.3.1 please.
I'm getting this weird config error I cannot decipher. It doesn't seem any other ppl has encountered this. CUDA is correctly config'ed. What is this 'repository_rule' and 'external' package thing?
(tensorflow)weiwe#weiwe:~/tensorflow$ ./configure
Please specify the location of python. [Default is /home/weiwe/tensorflow/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Traceback (most recent call last):
File "<stdin>", line 18, in <module>
TypeError: can only concatenate list (not "str") to list
Found possible Python library paths:
Please input the desired Python library path to use. Default is []
/home/weiwe/tensorflow/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
libcudnn.so resolves to libcudnn.4
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.........
ERROR: /home/weiwe/tensorflow/third_party/gpus/cuda_configure.bzl:415:18: function 'repository_rule' does not exist.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
Configuration finished
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel build -c opt --config=cuda \
> //tensorflow/tools/pip_package:build_pip_package
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
INFO: Elapsed time: 0.380s
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
INFO: Elapsed time: 0.109s
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel clean
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
INFO: Elapsed time: 0.242s
There was a patch recently to the cuda build layer. Once you get that error
Error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
You have to re-run ./configure again to re-create that target.
This can become tiresome and a way to skip through the configuration process is to provide the variable settings from command line
PYTHON_BIN_PATH=$HOME/anaconda2/bin/python CUDA_TOOLKIT_PATH="/usr/local/cuda" CUDNN_INSTALL_PATH="/usr/local/cuda" TF_NEED_CUDA=1 TF_CUDA_COMPUTE_CAPABILITIES="6.1" TF_CUDNN_VERSION="5" TF_CUDA_VERSION="8.0" TF_CUDA_VERSION_TOOLKIT=8.0 ./configure
Hope it helps