How to use bazel to build tensorflow as external dependencies to support gpu inference? - tensorflow

I want to use tensorflow's c++ api to inference in my project, when I build it for cpu, it's ok. But when I build it for support gpu (bazel build --config=cuda ...), it has some error.
My WORKSPACE:
http_archive(
name = "org_tensorflow",
strip_prefix = "tensorflow-1.15.0",
sha256 = "a5d49c00a175a61da7431a9b289747d62339be9cf37600330ad63b611f7f5dc9",
urls = ["https://github.com/tensorflow/tensorflow/archive/v1.15.0.tar.gz"]
)
load("#org_tensorflow//tensorflow:workspace.bzl", "tf_workspace")
tf_workspace(tf_repo_name = "org_tensorflow")
# Uncomment the following two lines has the same error.
# load("#org_tensorflow//third_party/gpus:cuda_configure.bzl", "cuda_configure")
# cuda_configure(name="local_config_cuda")
My .bazelrc:
...
# config for using cuda
build:cuda --define=using_cuda=true --define=using_cuda_nvcc=true
build:cuda --action_env TF_NEED_CUDA=1
build:cuda --crosstool_top=#local_config_cuda//crosstool:toolchain
...
My BUILD:
cc_library(
name = "tf_utils",
srcs = ["tf_utils.cpp"],
hdrs = ["tf_utils.h"],
deps = ["#org_tensorflow//tensorflow/cc/saved_model:loader"
"#org_tensorflow//tensorflow/core:tensorflow"]
)
cc_binarary(
name = "test",
srcs = ["test.cpp"],
deps = ["tf_utils"]
)
When I run bazel build --config=cuda //:test
The error message is:
ERROR: external/com_google_protobuf/BUILD:405:1: C++ compilation of rule '#com_google_protobuf//:protoc' failed (Exit 1) crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/host/bin/external/com_google_protobuf/_objs/protoc/main.d ... (remaining 46 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
src/main/tools/linux-sandbox-pid1.cc:413: "execvp(external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc, 0x6dfe10)": No such file or directory
Target //:test failed to build
Use --verbose_failures to see the command lines of failed build steps.
The error message shows no such file or directory, but it does exist.
# ls -la external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
-rwxrwxr-x 1 root root 9129 Dec 1 14:14 external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc*
The tensorflow version is 1.15, bazel version is 3.1.0.
Does anyone know how to configure bazel correctly so that tensorflow can support gpu as an external dependency? Thanks very much.

Related

How to build tensorflow benchmark tool

Following https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/benchmark
bazel build -c opt --crosstool_top=//external:android/crosstool --cpu=armeabi-v7a --host_crosstool_top=#bazel_tools//tools/cpp:toolchain --config monolithic tensorflow/tools/benchmark:benchmark_model
I get
WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
/Users/user/external_projects/tensorflow/tools/bazel.rc
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=176
ERROR: Config value monolithic is not defined in any .rc file
How to fix it?
bazel version
WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
/Users/user/external_projects/tensorflow/tools/bazel.rc
Build label: 0.23.1
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Mar 4 10:40:32 2019 (1551696032)
Build timestamp: 1551696032
Build timestamp as int: 1551696032
Update:
For fresh tensorflow master I get:
INFO: Analysed target //tensorflow/tools/benchmark:benchmark_model (71 packages loaded, 4664 targets configured).
INFO: Found 1 target...
ERROR: /private/var/tmp/_bazel_user/144c1461f36cde95de1693452c235294/external/com_google_absl/absl/types/BUILD.bazel:178:1: C++ compilation of rule '#com_google_absl//absl/types:bad_optional_access' failed (Exit 1)
clang: error: unknown argument: '-m<platform_for_version_min>-version-min=10.14'
Target //tensorflow/tools/benchmark:benchmark_model failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 98.302s, Critical Path: 18.67s
INFO: 399 processes: 399 local.
FAILED: Build did NOT complete successfully
Update 2:
On Ubuntu 16 and fresh tensorflow master:
bazel version
INFO: Invocation ID: 34e40dab-96b2-45ef-b549-dab45a2738bc
Build label: 0.22.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Mon Jan 28 12:58:08 2019 (1548680288)
Build timestamp: 1548680288
Build timestamp as int: 1548680288
Output:
WARNING: /data/user_data/external_projects/tensorflow/tensorflow/core/BUILD:1794:12: in srcs attribute of cc_library rule //tensorflow/core:android_tensorflow_lib_lite: please do not import '//tensorflow/core/distributed_runtime:server_lib.h' directly. You should either move the file to this package or depend on an appropriate rule there
INFO: Analysed target //tensorflow/tools/benchmark:benchmark_model (72 packages loaded, 4809 targets configured).
INFO: Found 1 target...
INFO: From Compiling external/snappy/snappy-sinksource.cc [for host]:
cc1plus: warning: command line option '-Wno-implicit-function-declaration' is valid for C/ObjC but not for C++
INFO: From Compiling external/snappy/snappy-stubs-internal.cc [for host]:
cc1plus: warning: command line option '-Wno-implicit-function-declaration' is valid for C/ObjC but not for C++
INFO: From Compiling external/snappy/snappy.cc [for host]:
cc1plus: warning: command line option '-Wno-implicit-function-declaration' is valid for C/ObjC but not for C++
ERROR: /home/user/.cache/bazel/_bazel_user/b4774fbdb8542988b4e302c9e073f145/external/com_google_absl/absl/container/BUILD.bazel:529:1: C++ compilation of rule '#com_google_absl//absl/container:raw_hash_set' failed (Exit 1)
Target //tensorflow/tools/benchmark:benchmark_model failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 59.522s, Critical Path: 11.24s
INFO: 456 processes: 456 local.
FAILED: Build did NOT complete successfully
Which version of the NDK are you using? I used the following command line and it worked:
$ bazel build -c opt \
--crosstool_top=//external:android/crosstool \
--cpu=armeabi-v7a \
--host_crosstool_top=#bazel_tools//tools/cpp:toolchain \
--config monolithic \
--cxxopt=-std=c++11 \
tensorflow/tools/benchmark:benchmark_model
This is with Bazel 0.23.2 and NDK r17c on https://github.com/tensorflow/tensorflow/commit/7bd86377dedaf459d22b68363a0a2d4580180379.

How to build tensorflow benchmark_model for android arm64_v8a?

I'm using the following command in the Tensorflow 1.8 folder
bazel build -c opt --cxxopt='--std=c++11' \
//tensorflow/tools/benchmark:benchmark_model \
--crosstool_top=//external:android/crosstool \
--host_crosstool_top=#bazel_tools//tools/cpp:toolchain \
--cpu=arm64-v8a --verbose_failures
It's giving me the error:
ERROR: No default_toolchain found for cpu 'arm64-v8a'. Valid cpus are: [
k8,
local,
armeabi-v7a,
x64_windows,
x64_windows_msvc,
x64_windows_msys,
s390x,
ios_x86_64,
]
INFO: Elapsed time: 0.315s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
Ubuntu 16.04, Bazel 0.14.1, Tensorflow 1.8
This is because the Android NDK hasn't been configured in the root WORKSPACE file. Download the Android NDK and add the following line to the WORKSPACE:
android_ndk_repository(
name="androidndk",
path="<PATH_TO_NDK>",
)
I figured it out. Using bazel 0.10.1, SDK API level 27, NDK 15, Build tools 27.0.3, tensorflow 1.8.
First run
./configure
Then
bazel build --config=monolithic --cxxopt=--std=c++11 //tensorflow/tools/benchmark:benchmark_model --config=android_arm64 --cpu=arm64-v8a --fat_apk_cpu=arm64-v8a

error bazel build in tensorflow

At first, I would like to use bazel to help me run tensorflow with SSE and avx so I tried this within work space:
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
but it gives me a new error like following, I wonder what is wrong and what should I do? Thanks for help.
WARNING: Config values are not defined in any .rc file: cuda
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': no such package 'tensorflow/tools/pip_package': BUILD file not found on package path
WARNING: Target pattern parsing failed.
INFO: Analysed 0 targets (2 packages loaded).
INFO: Found 0 targets...
ERROR: command succeeded, but there were errors parsing the target pattern
INFO: Elapsed time: 2.727s, Critical Path: 0.02s
FAILED: Build did NOT complete successfully
You probably have an outdated bazel. I am not sure but you can try to use --config=opt instead of -c opt for initial versions.
You have to run ./configure. That will create a .bazelrc and .tf_configure.bazel file in your Tensorflow workspace.
The --config=cuda Bazel flag refers to entries in those two files (they are both text files). The entries typically look like this: build:cuda --some_bazel_flag.
It was answered here

Compiling Tensorflow with Bazel

I tried compiling tensorflow 1.3 from the HEAD of the master branch using the following line of shell command after running ./configure
sudo bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 --config=cuda -k --verbose_failures //tensorflow/tools/pip_package:build_pip_package
I get the following error in the end.
At global scope:cc1plus: warning: unrecognized command line option '-Wno-self-assign'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 3834.785s, Critical Path: 196.95s FAILED: Build did NOT complete successfully
These were the warnings it gave initially.
WARNING: /home/pranav/tensorflow_install/tensorflow/tensorflow/core/BUILD:1634:1: in includes attribute of cc_library rule //tensorflow/core:framework_headers_lib: '../../external/nsync/public' resolves to 'external/nsync/public' not below the relative path of its package 'tensorflow/core'. This will be an error in the future. Since this rule was created by the macro 'cc_header_only_library', the error might have been caused by the macro implementation in /home/pranav/tensorflow_install/tensorflow/tensorflow/tensorflow.bzl:911:30
WARNING: /home/pranav/tensorflow_install/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:exporter': No longer supported. Switch to SavedModel immediately.
WARNING: /home/pranav/tensorflow_install/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:gc': No longer supported. Switch to SavedModel immediately.
INFO: Analysed target //tensorflow/tools/pip_package:build_pip_package (208 packages loaded).
Then loads of INFO. I'm not sure if it is of any help.
Bazel Version:
Build label: 0.5.4
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Aug 25 10:00:00 2017 (1503655200)
Build timestamp: 1503655200
Build timestamp as int: 1503655200
I read in some answer to run the following code,
$ bazel query --output=build 'somepath("//tensorflow/core:version_info_gen", "//tensorflow/tools/git:gen/spec.json")'
And it gave me this.maybe this will be of help.
# /home/pranav/tensorflow_install/tensorflow/tensorflow/core/BUILD:1546:1
genrule(
name = "version_info_gen",
generator_name = "version_info_gen",
generator_function = "tf_version_info_genrule",
generator_location = "tensorflow/core/BUILD:1546",
srcs = ["//tensorflow/tools/git:gen/spec.json", "//tensorflow/tools/git:gen/head", "//tensorflow/tools/git:gen/branch_ref"],
tools = ["//tensorflow/tools/git:gen_git_source.py"],
outs = ["//tensorflow/core:util/version_info.cc"],
cmd = "$(location //tensorflow/tools/git:gen_git_source.py) --generate $(SRCS) \"$#\"",
local = True,
)
Also, "the bazel command i wrote" > log.txt doesn't fill the text file with the terminal outputs.
If you guys want more information to help me. Suggest me a way to copy the terminal output to a text file so that i can upload it on github and give you the link.
I also used --explain to write all explanations to a file . I can upload that also if you want.
I also tried --local_resources 2048,.5,1.0 to reduce my memory allocation in case of memory issues. Still doesn't work.
Thanks a lot in advance.

bazel - Error building TensorFlow with local LLVM repository

Essentially, I want to run TensorFlow with a custom LLVM repository and not the llvm-mirror that bazel pulls from.
I made the following changes:
Changed the temp_workaround_http_archive rule in //tensorflow/workspace.bzl to:
native.local_repository (
name = "llvm",
path = "/git/llvm/",
)
In /git/llvm I added the file WORKSPACE containing:
workspace( name = "llvm" )
However, I know that an llvm.build file is required, but since I am new to bazel, I am not sure where it should be located.
I am getting the following error log:
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
ERROR: /git/tensorflow/tensorflow/tools/pip_package/BUILD:81:1: no such package '#llvm//': BUILD file not found on package path and referenced by '//tensorflow/tools/pip_package:licenses'.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted.
INFO: Elapsed time: 0.219s
I installed TensorFlow from source. Here is the version info:
$ git rev-parse HEAD
4c3bb1aeb7bb46bea35036433742a720f39ce348
$ bazel version
Build label: 0.4.5
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Mar 16 12:19:38 2017 (1489666778)
Build timestamp: 1489666778
Build timestamp as int: 1489666778
Thanks in advance for the help!
Found the fix. Quite simple actually.
The local_repository rule in bazel is for external bazel repositories only. To use a non-bazel external repository, we need to use new_local_repository which takes build_file as an argument.
You can use the http server function of python to build a local file server, like:
python3 -m http.server
Then edit the file "tensorflow/workspace.bzl"
tf_http_archive(
name = "llvm",
urls = [
"https://mirror.bazel.build/**/195a164675af86f390f9816e53291013d1b551d7.tar.gz",
"http://localhost:8000/195a164675af86f390f9816e53291013d1b551d7.tar.gz",
"https://github.com/**/195a164675af86f390f9816e53291013d1b551d7.tar.gz",
],
sha256 = "57a8333f8e6095d49f1e597ca18e591aba8a89d417f4b58bceffc5fe1ffcc02b",
strip_prefix = "llvm-195a164675af86f390f9816e53291013d1b551d7",
build_file = str(Label("//third_party/llvm:llvm.BUILD")),
)
Add one local file path in the middle line of urls, and then rebuild it again.