Correct way to configure interdependent projects (e.g. tensorflow) in bazel build system so proto imports work as is? - tensorflow

As the title suggests, I'm running into an issue where proto import statements do not seem to be relative to the correct path. For concreteness, consider the directory structure in a dir (let's call it ~/base):
`>> tree -L 1
├── models
├── my-lib
| ├── nlp
| ├── BUILD
| └── nlp_parser.cc
| └── WORKSPACE
├── serving
└── tensorflow
For those not familiar, models (as in https://github.com/tensorflow/models/) has tensorflow (https://github.com/tensorflow/tensorflow) as a git submodule, as does serving. Because of this coupled with the fact that the git submodules of tensorflow where on different commits and sometimes incompatible, I have removed the gitsubmodule from the projects and symlinked them to the tensorflow repo on the top most directory so that I can manage only one tensor flow repo instead of 3. That is I have done the following:
`cd models/syntaxnet; rm -rf tensorflow; ln -s ../../tensorflow/ .; cd -`
`cd serving; rm -rf tensorflow tf_models; ln -s ../tensorflow/ .; ln -s ../models .`
Now I want to build a target within my-lib that depends on serving, tensorflow, and models. I added these as local repositories in my WORKSPACE as follows (cat my-lib/WORKSPACE):
workspace(name = "myworkspace")
local_repository(
name = "org_tensorflow",
path = __workspace_dir__ + "/../tensorflow",
)
local_repository(
name = "syntaxnet",
path = __workspace_dir__ + "/../models/syntaxnet",
)
local_repository(
name = "tf_serving",
path = __workspace_dir__ + "/../serving",
)
load('#org_tensorflow//tensorflow:workspace.bzl', 'tf_workspace')
tf_workspace("~/base/tensorflow", "#org_tensorflow")
# ===== gRPC dependencies =====
bind(
name = "libssl",
actual = "#boringssl_git//:ssl",
)
bind(
name = "zlib",
actual = "#zlib_archive//:zlib",
)
Here is my BUILD file (cat my-lib/nlp/BUILD):
load("#tf_serving//tensorflow_serving:serving.bzl", "serving_proto_library")
cc_binary(
name = "nlp_parser",
srcs = [ "nlp_parser.cc" ],
linkopts = ["-lm"],
deps = [
"#org_tensorflow//tensorflow/core:core_cpu",
"#org_tensorflow//tensorflow/core:framework",
"#org_tensorflow//tensorflow/core:lib",
"#org_tensorflow//tensorflow/core:protos_all_cc",
"#org_tensorflow//tensorflow/core:tensorflow",
"#syntaxnet//syntaxnet:parser_ops_cc",
"#syntaxnet//syntaxnet:sentence_proto",
"#tf_serving//tensorflow_serving/servables/tensorflow:session_bundle_config_proto",
"#tf_serving//tensorflow_serving/servables/tensorflow:session_bundle_factory",
"#org_tensorflow//tensorflow/contrib/session_bundle",
"#org_tensorflow//tensorflow/contrib/session_bundle:signature",
],
)
Lastly, here is the output of the build (cd my-lib; bazel build nlp/nlp_parser --verbose_failures):
INFO: Found 1 target...
ERROR: /home/blah/blah/external/org_tensorflow/tensorflow/core/debug/BUILD:33:1: null failed: linux-sandbox failed: error executing command
(cd /home/blah/blah/execroot/my-lib && \
exec env - \
/home/blah/blah/execroot/my-lib/_bin/linux-sandbox #/home/blah/blah/execroot/my-lib/bazel-sandbox/c65fa6b6-9b7d-4710-b19c-4d42a3e6a667-31.params -- bazel-out/host/bin/external/protobuf/protoc '--cpp_out=bazel-out/local-fastbuild/genfiles/external/org_tensorflow' '--plugin=protoc-gen-grpc=bazel-out/host/bin/external/grpc/grpc_cpp_plugin' '--grpc_out=bazel-out/local-fastbuild/genfiles/external/org_tensorflow' -Iexternal/org_tensorflow -Ibazel-out/local-fastbuild/genfiles/external/org_tensorflow -Iexternal/protobuf/src -Ibazel-out/local-fastbuild/genfiles/external/protobuf/src external/org_tensorflow/tensorflow/core/debug/debug_service.proto).
bazel-out/local-fastbuild/genfiles/external/protobuf/src: warning: directory does not exist.
tensorflow/core/util/event.proto: File not found.
tensorflow/core/debug/debug_service.proto: Import "tensorflow/core/util/event.proto" was not found or had errors.
tensorflow/core/debug/debug_service.proto:38:25: "Event" is not defined.
Target //nlp:nlp_parser failed to build
INFO: Elapsed time: 0.776s, Critical Path: 0.42s
What is the correct way to add the modules as local_repository in WORKSPACE so that the proto imports work?

I was having a similar problem after trying to build a project of mine depending on tensorflow on Ubuntu after getting it building on OS X. What ended up working for me was disabling sandboxing with --spawn_strategy=standalone

Related

setup.py sdist creates an archive without the package

I have a project called Alexandria that I want to upload on PyPi as a package. To do so, I have a top folder called alexandria-python in which I put the package and all the elements required to create a package archive with setup.py. The folder alexandria-python has the following structure:
|- setup.py
|- README.md
|- alexandria (root folder for the package)
|- __init__.py
|- many sub-packages
Then, following many tutorials to create an uploadable archive, I open a terminal, cd to alexandria-python, and use the command:
python setup.py sdist
This creates additional folders, so the structure of alexandria-python is now:
|- setup.py
|- README.md
|- alexandria (root folder for the package)
|- __init__.py
|- many sub-packages
|- alexandria.egg-info
|- dist
everything looks fine, and from my understanding the package should now be archived in the dist folder. But when I open the dist folder and extract the alexandria-0.0.2.tar.gz archive that has been created, it does not contain the 'alexandria' package. Everything else thus seems to be there, except the most important element: the package, as shown on the image:
Following, when I upload the project to test-PyPi and then pip install it, any attempt to import a module from the toolbox results in a ModuleNotFoundError. How is it that my package does not get uploaded to the archive? Am I doing something very silly?
Note: in case it can help, this is the structure of my setup.py file:
from setuptools import setup
# set up the package
setup(
name = "alexandria",
license = "Other/Proprietary License",
version = "0.0.2",
author = "Romain Legrand",
author_email = "alexandria.toolbox#gmail.com",
description = "a software for Bayesian time-series econometrics applications",
python_requires = ">=3.6",
keywords=["python", "Bayesian", "time-series", "econometrics"])
Your setup.py has neither py_modules nor packages. Must have one of those. In your case alexandria is a package so
setup(
…
packages = ['alexandria'],
…
)
or
from setuptools import find_packages, setup
…
packages = find_packages('.')

cc_import for debug and release versions?

My toolset:
Windows 10 x64 (1909)
Bazel 3.1.0
Visual Studio 2019 (16.6)
Powershell
I need to use a prebuild third-party C++ DLL. The third-party lib looks like this:
<directory> third-party-lib
├── <directory> bin
| ├── <file> third_party_lib.dll
| └── <file> third_party_libd.dll
├── <directory> lib
| ├── <file> third_party_lib.lib
| └── <file> third_party_libd.lib
└── <directory> includes
└── <file> third_party_lib.h
So there are two versions a release and a debug version. Filenames ending with "d" indicate the debug version.
To consume this library I am using a cc_import target:
cc_import(
name = "third-party-lib",
interface_library = "lib/third_party_lib.lib",
shared_library = "bin/third_party_lib.dll",
)
My build target depends on the third-party-lib. Building in release (opt) mode works without any problems:
bazel build //:MyBuildTarget
But if I try to do a debug build I run into linker problems:
bazel build --compilation_mode=dbg //:MyBuildTarget
Is there any possibility to specify debug and release DLLs in cc_import rule? Or is there any other rule that can I use for this propose?
You can use select() to switch between library variants:
cc_import(
name = "third-party-lib",
interface_library = "lib/third_party_lib.lib",
shared_library = select({
":debug_build": "third_party_libd.dll",
"//conditions:default": "third_party_lib.dll",
}),
)
config_setting(
name = "debug_build",
values = {
"compilation_mode": "dbg",
},
)

When running with Bazel, where should I save .pb graphs for Tensorflow?

Directory structure:
~/tensorflow/tensorflow/cc/dnnops/
├── BUILD
├── graph.pb
└── main.cc
The failing code line from main.cc: status = ReadBinaryProto(tf::Env::Default(), "graph.pb", &graph_def);. Full code can be found here.
Bazel output:
WARNING: /home//.cache/bazel/_bazel_rd1/4ab077b6e1a2076b6ea9f23b417088a6/external/protobuf_archive/WORKSPACE:1: Workspace name in /home//.cache/bazel/bazel/4ab077b6e1a2076b6ea9f23b417088a6/external/protobuf_archive/WORKSPACE (#com_google_protobuf) does not match the name given in the repository's definition (#protobuf_archive); this will cause a build error in future versions
INFO: Analysed target //tensorflow/cc/dnnops:dnnops (0 packages loaded).
INFO: Found 1 target...
Target //tensorflow/cc/dnnops:dnnops up-to-date:
bazel-bin/tensorflow/cc/dnnops/dnnops
INFO: Elapsed time: 5.378s, Critical Path: 5.04s
INFO: 2 processes, local.
INFO: Build completed successfully, 3 total actions
INFO: Running command line: bazel-bin/tensorflow/cc/dnnops/dnnops
Not found: graph.pb; No such file or directory
ERROR: Non-zero return code '1' from command: Process exited with status 1
What I tried
Copy the same graph file inside bazel-bin/tensorflow/cc/dnnops. Still fails with same output from Bazel.
Question
How should I expose the graph file location to Tensorflow/Bazel?
Silly oversight... I entered the full path. From this:
status = ReadBinaryProto(tf::Env::Default(), "graph.pb", &graph_def);
To this:
status = ReadBinaryProto(tf::Env::Default(), "/home/<user>/path/to/graph.pb", &graph_def);

Run additional tests by using a feature flag to "cargo test"

I have some tests that I would like to ignore when using cargo test and only run when explicitly passed a feature flag. I know this can be done by using #[ignore] and cargo test -- --ignored, but I'd like to have multiple sets of ignored tests for other reasons.
I have tried this:
#[test]
#[cfg_attr(not(feature = "online_tests"), ignore)]
fn get_github_sample() {}
This is ignored when I run cargo test as desired, but I can't get it to run.
I have tried multiple ways of running Cargo but the tests continue to be ignored:
cargo test --features "online_tests"
cargo test --all-features
I then added the feature definition into my Cargo.toml as per this page, but they continue to be ignored.
I am using workspaces in Cargo. I tried adding the feature definition in both Cargo.toml files with no difference.
Without a workspace
Cargo.toml
[package]
name = "feature-tests"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
[features]
network = []
filesystem = []
[dependencies]
src/lib.rs
#[test]
#[cfg_attr(not(feature = "network"), ignore)]
fn network() {
panic!("Touched the network");
}
#[test]
#[cfg_attr(not(feature = "filesystem"), ignore)]
fn filesystem() {
panic!("Touched the filesystem");
}
Output
$ cargo test
running 2 tests
test filesystem ... ignored
test network ... ignored
$ cargo test --features network
running 2 tests
test filesystem ... ignored
test network ... FAILED
$ cargo test --features filesystem
running 2 tests
test network ... ignored
test filesystem ... FAILED
(some output removed to better show effects)
With a workspace
Layout
.
├── Cargo.toml
├── feature-tests
│   ├── Cargo.toml
│   ├── src
│   │   └── lib.rs
├── src
│   └── lib.rs
feature-tests contains the files from the first section above.
Cargo.toml
[package]
name = "workspace"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
[features]
filesystem = ["feature-tests/filesystem"]
network = ["feature-tests/network"]
[workspace]
[dependencies]
feature-tests = { path = "feature-tests" }
Output
$ cargo test --all
running 2 tests
test filesystem ... ignored
test network ... ignored
$ cargo test --all --features=network
running 2 tests
test filesystem ... ignored
test network ... FAILED
(some output removed to better show effects)
With a virtual workspace
Virtual workspaces do not support specifying features (Cargo issue #4942). You will need to run the tests from within the sub project or specify the path to the appropriate Cargo.toml
Layout
.
├── Cargo.toml
└── feature-tests
├── Cargo.toml
└── src
└── lib.rs
feature-tests contains the files from the first section above.
Cargo.toml
[workspace]
members = ["feature-tests"]
Output
$ cargo test --all --manifest-path feature-tests/Cargo.toml --features=network
running 2 tests
test filesystem ... ignored
test network ... FAILED
$ cargo test --all --manifest-path feature-tests/Cargo.toml
running 2 tests
test filesystem ... ignored
test network ... ignored
(some output removed to better show effects)

How to include the .so of custom ops in the pip wheel and organize the sources of custom ops?

Following the documentation, I put my_op.cc and my_op.cu.cc under tensorflow/core/user_ops, and created tensorflow/core/user_ops/BUILD which contains
load("//tensorflow:tensorflow.bzl", "tf_custom_op_library")
tf_custom_op_library(
name = "my_op.so",
srcs = ["my_op.cc"],
gpu_srcs = ["my_op.cu.cc"],
)
Then I run the following commands under the root of tensorflow:
bazel build -c opt //tensorflow/core/user_ops:all
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
After building and installing the pip wheel, I want to use my_op in the project my_project.
I think I should create something like my_project/tf_op/__init__.py and my_project/tf_op/my_op.py, which calls tf.load_op_library like the example cuda_op.py. However, the my_op.so is not included in the installed pip wheel. How can I generate the input (the path of my_op.so) for tf.load_op_library?
Is there any better way to organized my_op.cc, my_op.cu.cc, my_op.py with my_project?
You can preserve directory structure of your project and create setup.py such that it also include .so files. You can also add other non-python files of your project same way.
Example Directory Structure:
my_package
my_project
__init__.py
setup.py
You can install 'my_project' package while in my_package directory using command:
pip install . --user (Avoid --user argument if you install packages with root access)
from setuptools import setup, find_packages
setup(name='my_project',
version='1.0',
description='Project Details',
packages=find_packages(),
include_package_data=True,
package_data = {
'': ['*.so', '*.txt', '*.csv'],
},
zip_safe=False)
Don't forget to add __init__.py in all folders containing python modules you want to import.
Reference: https://docs.python.org/2/distutils/setupscript.html#installing-package-data