I was trying to using syntaxnet and I have finished most of processes. Upgrade bazel version to 0.43 in case of errors (Ubuntu 16.04 Ver, Anaconda python 2.7).
However, I am having a troubles with ./configure part. I am reading the official instruction via tensorflow github.
git clone --recursive https://github.com/tensorflow/models.git
cd models/syntaxnet/tensorflow
**./configure**
cd ..
bazel test syntaxnet/... util/utf8/...
# On Mac, run the following:
bazel test --linkopt=-headerpad_max_install_names \
syntaxnet/... util/utf8/...
Following logs will help you to understand what’s going on my machine. Thanks for the advice
Please specify the location of python. [Default is /home/ryan/anaconda2/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] n
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/home/ryan
/home/ryan/pynaoqi-python2.7
/home/ryan/anaconda2/lib/python2.7/site-packages
Please input the desired Python library path to use. Default is [/home/ryan]
/home/ryan/anaconda2/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5.0
Please specify the location where cuDNN 5.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to cuDNN toolkit. Neither of the following two files can be found:
/usr/local/cuda-8.0/lib64/libcudnn.so.5.0
/usr/local/cuda-8.0/libcudnn.so.5.0
.5.0
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
libcudnn.so resolves to libcudnn.5
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=120
INFO: Reading options for 'clean' from /home/ryan/git_ryan/models/syntaxnet/tensorflow/tools/bazel.rc:
Inherited 'build' options: --force_python=py2 --host_force_python=py2 --python2_path=/home/ryan/anaconda2/bin/python --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --define PYTHON_BIN_PATH=/home/ryan/anaconda2/bin/python --spawn_strategy=standalone --genrule_strategy=standalone
**INFO: Reading options for 'clean' from /etc/bazel.bazelrc:
Inherited 'build' options: --action_env=PATH --action_env=LD_LIBRARY_PATH --action_env=TMPDIR --test_env=PATH --test_env=LD_LIBRARY_PATH
Unrecognized option: --action_env=PATH
ERROR: /home/ryan/git_ryan/models/syntaxnet/tensorflow/tensorflow/tensorflow.bzl:568:26: Traceback (most recent call last):
File "/home/ryan/git_ryan/models/syntaxnet/tensorflow/tensorflow/tensorflow.bzl", line 562
rule(attrs = {"srcs": attr.label_list..."), <3 more arguments>)}, <2 more arguments>)
File "/home/ryan/git_ryan/models/syntaxnet/tensorflow/tensorflow/tensorflow.bzl", line 568, in rule
attr.label_list(cfg = "data", allow_files = True)
expected ConfigurationTransition or NoneType for 'cfg' while calling label_list but got string instead: data.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Extension file 'tensorflow/tensorflow.bzl' has errors.
Configuration finished**
I think the version of your bazel is too high for Syntaxnet. you can try bazel-0.3.1 please.
Related
I got the following error when trying to train my tensorflow model on sagemaker ml.p2.xlarge instance. I use tensorflow==2.3.0. I wonder whether this is because of the tensorflow version incompatibility with cuda. sagemaker ml.p2.xlarge seems to use cuda 10.0
GPU error:
2020-08-31 08:46:46.429756: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/openmpi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-08-31 08:47:02.170819: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/openmpi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-08-31 08:47:02.764874: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
This question is probably old, but it falls back on an open issue found at the beginning of choosing which versions of frameworks to use.
The problem does not depend on the type of instance that you specified (which has NVidia GPU).
From the official documentation "Available Deep Learning Containers Images", to date 20/10/2022, precompiled versions higher than 2.2 do not seem to be usable:
Framework
Job Type
Horovod Options
CPU/GPU
Python Version Options
Example URL
TensorFlow 2.2 (Cuda 10.2)
training
Yes
GPU
3.7 (py37)
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.2.0-gpu-py37-cu102-ubuntu18.04
TensorFlow 2.2
inference
No
GPU
3.7 (py37)
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.2.0-gpu-py37-cu102-ubuntu18.04
Within the dockerfile that is used to use the container is the instruction to install the libraries that your custom version is missing:
RUN apt-get update && apt-get install -y --no-install-recommends --allow-unauthenticated \
python3-dev \
python3-pip \
python3-setuptools \
ca-certificates \
cuda-command-line-tools-10-1 \
cuda-cudart-dev-10-1 \
cuda-cufft-dev-10-1 \
cuda-curand-dev-10-1 \
cuda-cusolver-dev-10-1 \
cuda-cusparse-dev-10-1 \
curl \
libcudnn7=7.6.2.24-1+cuda10.1 \
# TensorFlow doesn't require libnccl anymore but Open MPI still depends on it
libnccl2=2.4.7-1+cuda10.1 \
libgomp1 \
libnccl-dev=2.4.7-1+cuda10.1 \
....
Then you can install the required libraries from your custom version directly with a requirements.txt file or run the install command directly in the training script.
If there are no special project requirements, I recommend using the precompiled versions of sagemaker. Otherwise, build a docker image from scratch instead of installing libraries this way..
The objective of my experiment is to build tensorflow on Jetson TK1 arm based embedded board. Since pre-builts of tensorflow for arm architecture are not given by the official releases, I was forced to the option of building it from source.
To build tensorflow, we need Bazel which should be also build from source. Now I got stuck here, not able to build bazel at all.
I have referred various blogs and github projects and tried to follow the instructions everyone said it worked for them.
1) Tensorflow on Raspberry-pi
2) Jetson Hacks building Tensorflow from source
3) Official Documentation
Steps Followed:
$ sudo apt-get install build-essential openjdk-8-jdk python zip
$ wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/bazel-0.4.5-dist.zip
$ unzip -d bazel bazel-0.4.5-dist.zip
$ cd bazel
$ sudo ./compile.sh
Error Log:
ERROR: /build/bazel/src/main/protobuf/BUILD:25:2: Java compilation in rule '//src/main/protobuf:extra_actions_base_java_proto' failed: Worker process sent response with exit code: 1.
java.lang.InternalError: Cannot find requested resource bundle for locale en_US
at com.sun.tools.javac.util.JavacMessages.getBundles(JavacMessages.java:128)
at com.sun.tools.javac.util.JavacMessages.getLocalizedString(JavacMessages.java:147)
at com.sun.tools.javac.util.JavacMessages.getLocalizedString(JavacMessages.java:140)
at com.sun.tools.javac.util.Log.localize(Log.java:673)
at com.sun.tools.javac.util.Log.printLines(Log.java:485)
at com.sun.tools.javac.api.JavacTaskImpl.handleExceptions(JavacTaskImpl.java:156)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:93)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:87)
at com.google.devtools.build.buildjar.javac.BlazeJavacMain.compile(BlazeJavacMain.java:104)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder$1.invokeJavac(SimpleJavaLibraryBuilder.java:163)
at com.google.devtools.build.buildjar.ReducedClasspathJavaLibraryBuilder.compileSources(ReducedClasspathJavaLibraryBuilder.java:52)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.compileJavaLibrary(SimpleJavaLibraryBuilder.java:166)
at com.google.devtools.build.buildjar.SimpleJavaLibraryBuilder.run(SimpleJavaLibraryBuilder.java:178)
at com.google.devtools.build.buildjar.BazelJavaBuilder.processRequest(BazelJavaBuilder.java:90)
at com.google.devtools.build.buildjar.BazelJavaBuilder.runPersistentWorker(BazelJavaBuilder.java:67)
at com.google.devtools.build.buildjar.BazelJavaBuilder.main(BazelJavaBuilder.java:44)
Caused by: java.util.MissingResourceException: Can't find bundle for base name com.google.errorprone.errors, locale en_US
at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1573)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:854)
at com.sun.tools.javac.util.JavacMessages.lambda$add$0(JavacMessages.java:106)
at com.sun.tools.javac.util.JavacMessages.getBundles(JavacMessages.java:125)
... 15 more
Target //src:bazel failed to build
INFO: Elapsed time: 291.995s, Critical Path: 258.92s
ERROR: Could not build Bazel
To make sure the error is independent of the architecture, I have tried to build Bazel in x86_64 PC. Even there I am getting the same error. I have seen people created the similar issue in bazel github group, none solved.
Version 0.4.5 is very old. We just released 0.12.0, could you try that one?
I'm currently following the instructions here
to build tensorflow from source using bazel.
After setting up the configuration, and attempting to build it, I get this error:
Cuda Configuration Error: Error reading C:/Program Files/NVIDIA GPU
Computing To olkit/CUDA/v9.0/include/cudnn.h: java.io.IOException:
ERROR: src/main/native/win dows/processes-jni.cc(239):
CreateProcessW("grep" --color=never -A1 -E "#define CUDNN_MAJOR"
"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0/include/cu
dnn.h"): Das System kann die angegebene Datei nicht finden.
The last phrase translates to "no such file or directory".
But I'm 100% sure I installed cuDNN 7.1.2 correctly, merging the downloaded bin/include/lib folder's with the preexisting CUDA folders. If I copy/paste the path from the error message, there it is cudnn.h! I also run everything in administator etc..
This problem shows up a few times on google, linking it to a mis-configured theano setup, but I'm not using theano right now.
Why doesn't bazel find the file, when it's definitely there?
EDIT: I would also be very thankful if anyone has a link a pre-built version of tensorflow for CUDA compute capability 3.0!
Thank you in advance!
The error indicated that grep.exe can't be found. I tried grep for Windows and msys2. Both will do.
I am trying to build MKL-accelerated version of TensorFlow using bazel 0.5.1, gcc 6.2, binutils 2.28, Anaconda2 python on Scientific Linux 7.2.
Apparently the system /lib64/libstdc++.so.6 is too old, so I am trying to use gcc installed in another directory. PATH, LD_LIBRARY_PATH are modified to prepend the corresponding paths (using modules). However, while bazel has no trouble picking up correctly executables for gcc, ld, python, it still tries to load old system /lib64/libstdc++.so.6. How to force it to use the one from gcc 6.2? Why does not it pick it up from LD_LIBRARY_PATH?
According to google many people are having trouble with this but I could not find a solution that would work for me. I had no trouble building TensorFlow under Ubuntu 16.04 that has sufficiently new gcc in the standard location.
I do:
1) ./configure
The only non-default options I choose is use MKL and download MKL
2) bazel build --config=mkl --copt="-DEIGEN_USE_VML" -s -c opt //tensorflow/tools/pip_package:build_pip_package
.....
example/example_parser_configuration.proto tensorflow/core/protobuf/control_flow.proto tensorflow/core/protobuf/meta_graph.proto tensorflow/core/protobuf/named_tensor.proto tensorflow/core/protobuf/saved_model.proto tensorflow/core/protobuf/tensorflow_server.proto tensorflow/core/util/event.proto tensorflow/core/util/test_log.proto)
ERROR: /scratch/midway2/ivy2/TF_intel/tensorflow/tensorflow/tools/tfprof/BUILD:42:1: null failed: protoc failed: error executing command bazel-out/host/bin/external/protobuf/protoc '--python_out=bazel-out/local-opt/genfiles/' -I. -I. -Iexternal/protobuf/python -Ibazel-out/local-opt/genfiles/external/protobuf/python ... (remaining 5 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
bazel-out/host/bin/external/protobuf/protoc: /lib64/libstdc++.so.6: version GLIBCXX_3.4.20' not found (required by bazel-out/host/bin/external/protobuf/protoc)
bazel-out/host/bin/external/protobuf/protoc: /lib64/libstdc++.so.6: versionCXXABI_1.3.8' not found (required by bazel-out/host/bin/external/protobuf/protoc)
bazel-out/host/bin/external/protobuf/protoc: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by bazel-out/host/bin/external/protobuf/protoc)
.....
Thank you,
Igor
sorry for the slow reply. Bazel by design ignores LD_LIBRARY_PATH when running actions. It doesn't have to ignore them during C++ toolchain detection, but at the moment, it does :/ To help you forward, I would try adding --sysroot= as linkopt or using bazel grte_top flag. Depending on where your libstdc++.so lives, you might need to disable sandbox. The principled solution would be to write a custom CROSSTOOL that specifies builtin_sysroot or grte_top. But that is not an easy task.
Let me know if I lost you somewhere in that paragraph :)
I'm getting this weird config error I cannot decipher. It doesn't seem any other ppl has encountered this. CUDA is correctly config'ed. What is this 'repository_rule' and 'external' package thing?
(tensorflow)weiwe#weiwe:~/tensorflow$ ./configure
Please specify the location of python. [Default is /home/weiwe/tensorflow/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Traceback (most recent call last):
File "<stdin>", line 18, in <module>
TypeError: can only concatenate list (not "str") to list
Found possible Python library paths:
Please input the desired Python library path to use. Default is []
/home/weiwe/tensorflow/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
libcudnn.so resolves to libcudnn.4
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.........
ERROR: /home/weiwe/tensorflow/third_party/gpus/cuda_configure.bzl:415:18: function 'repository_rule' does not exist.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
Configuration finished
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel build -c opt --config=cuda \
> //tensorflow/tools/pip_package:build_pip_package
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
INFO: Elapsed time: 0.380s
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
INFO: Elapsed time: 0.109s
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel clean
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
INFO: Elapsed time: 0.242s
There was a patch recently to the cuda build layer. Once you get that error
Error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
You have to re-run ./configure again to re-create that target.
This can become tiresome and a way to skip through the configuration process is to provide the variable settings from command line
PYTHON_BIN_PATH=$HOME/anaconda2/bin/python CUDA_TOOLKIT_PATH="/usr/local/cuda" CUDNN_INSTALL_PATH="/usr/local/cuda" TF_NEED_CUDA=1 TF_CUDA_COMPUTE_CAPABILITIES="6.1" TF_CUDNN_VERSION="5" TF_CUDA_VERSION="8.0" TF_CUDA_VERSION_TOOLKIT=8.0 ./configure
Hope it helps