Bazel build fails with "Executing genrule #six_archive//:copy_six failed" error while building syntaxnet - tensorflow

I'm trying to follow the instructions at syntaxnet's github page to build syntaxnet parser models.
My system is a Debian Wheezy. Shouldn't be very different from Ubuntu 14.04 LTS or 15.05. I have compiled bazel 0.2.2 (as opposed to 0.2.2b) from source and it appears to work correctly.
Whenever I launch the bazel test syntaxnet/... util/utf8/... command, no tests are executed (all skipped) with some quite cryptic error messages. Here's an example:
root#host:~/tensorflow_syntaxnet/models/syntaxnet# ../../bazel/output/bazel test syntaxnet/... util/utf8/...
Extracting Bazel installation...
.............
INFO: Found 65 targets and 12 test targets...
ERROR: /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/external/six_archive/BUILD:1:1: Executing genrule #six_archive//:copy_six failed: namespace-sandbox failed: error executing command /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox ... (remaining 5 argument(s) skipped).
unshare failed with EINVAL even after 101 tries, giving up.
INFO: Elapsed time: 95.469s, Critical Path: 22.46s
//syntaxnet:arc_standard_transitions_test NO STATUS
//syntaxnet:beam_reader_ops_test NO STATUS
//syntaxnet:graph_builder_test NO STATUS
//syntaxnet:lexicon_builder_test NO STATUS
//syntaxnet:parser_features_test NO STATUS
//syntaxnet:parser_trainer_test NO STATUS
//syntaxnet:reader_ops_test NO STATUS
//syntaxnet:sentence_features_test NO STATUS
//syntaxnet:shared_store_test NO STATUS
//syntaxnet:tagger_transitions_test NO STATUS
//syntaxnet:text_formats_test NO STATUS
//util/utf8:unicodetext_unittest NO STATUS
Executed 0 out of 12 tests: 12 were skipped.
I'm using Oracle Java 8 JDK as recommended, and my compiler is:
~/tensorflow_syntaxnet/models/syntaxnet# gcc --version
gcc (Debian 4.7.2-5) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Tried looking into the namespace-sandbox binary that's mentioned in the error message, but before I dive deep into this, I thought I'd ask here.
~/tensorflow_syntaxnet/models/syntaxnet# ls -l /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox
lrwxrwxrwx 1 root root 108 May 13 14:52 /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox -> /root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox
~/tensorflow_syntaxnet/models/syntaxnet# readlink /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox
/root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox
Command seems to work fine though:
~/tensorflow_syntaxnet/models/syntaxnet# file $(readlink /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox)
/root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.26, BuildID[md5/uuid]=0xecfd97b6a6b9a193b045be13654bd55b, not stripped
~/tensorflow_syntaxnet/models/syntaxnet# /root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox
No command specified.
Usage: /root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox [-S sandbox-root] -- command arg1
provided: /root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox
Mandatory arguments:
-S <sandbox-root> directory which will become the root of the sandbox
-- command to run inside sandbox, followed by arguments
Optional arguments:
-W <working-dir> working directory
-T <timeout> timeout after which the child process will be terminated with SIGTERM
-t <timeout> in case timeout occurs, how long to wait before killing the child with SIGKILL
-d <dir> create an empty directory in the sandbox
-M/-m <source/target> system directory to mount inside the sandbox
Multiple directories can be specified and each of them will be mounted readonly.
The -M option specifies which directory to mount, the -m option specifies where to
mount it in the sandbox.
-n if set, a new network namespace will be created
-r if set, make the uid/gid be root, otherwise use nobody
-D if set, debug info will be printed
-l <file> redirect stdout to a file
-L <file> redirect stderr to a file
#FILE read newline-separated arguments from FILE
Any idea?
UPDATE: I have done exactly the same steps on a Ubuntu 14.04 LTS (my small workstation, as opposed to the production server running Debian) and everything works well there, with all tests passing. I wonder what's the difference.

Apparently some permission errors happens when setting up the sandbox. A quick workaround is to deactivate the sandbox by using --genrule_strategy=standalone --spawn_strategy=standalone (note that the second one is already specified in the TensorFlow rc file).
You can set those flag in your ~/.bazelrc:
echo "build --genrule_strategy=standalone --spawn_strategy=standalone" >>~/.bazelrc

Related

Drake build from source stubgen failed

I'm building drake from source, specifically this branch of Russ' fork:
https://github.com/RussTedrake/drake
everything worked without issue until the last command make -j
where I get the following output:
[ 12%] Performing build step for 'drake_cxx_python'
INFO: Analyzed target //:install (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /Users/chewchiashaoyuan/Documents/Software/drake/bindings/pydrake/BUILD.bazel:780:22: GenerateMypyStubs bindings/pydrake/pydrake/__init__.pyi failed: (Exit 1): stubgen failed: error executing command bazel-out/darwin-opt/bin/bindings/pydrake/stubgen --quiet '--package=pydrake' '--output=bazel-out/darwin-opt/bin/bindings/pydrake'
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
Matplotlib created a temporary config/cache directory at /var/folders/sl/37m0k__51_3_5c5j02w201r40000gn/T/matplotlib-8mr5qkfg because the default path (/Users/chewchiashaoyuan/.matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Critical error during semantic analysis: /usr/local/lib/python3.10/site-packages/pydrake/symbolic.pyi:203: error: invalid syntax
Target //:install failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /Users/chewchiashaoyuan/Documents/Software/drake/BUILD.bazel:63:8 Middleman _middlemen/install-runfiles failed: (Exit 1): stubgen failed: error executing command bazel-out/darwin-opt/bin/bindings/pydrake/stubgen --quiet '--package=pydrake' '--output=bazel-out/darwin-opt/bin/bindings/pydrake'
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
INFO: Elapsed time: 9.617s, Critical Path: 9.33s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully
make[2]: *** [drake_cxx_python-prefix/src/drake_cxx_python-stamp/drake_cxx_python-build] Error 1
make[1]: *** [CMakeFiles/drake_cxx_python.dir/all] Error 2
make: *** [all] Error 2
My operating system is:
macOS Monterey Version 12.6
The full make -j -d output is here
I referenced https://drake.mit.edu/from_source.html, https://drake.mit.edu/bazel.html#snopt and https://github.com/RobotLocomotion/drake/issues/12175
I did the following:
git clone https://github.com/RussTedrake/drake
cd drake
git checkout kin_traj_opt2
./setup/mac/install_prereqs.sh
cd ..
mkdir drake-build
cd drake-build
cmake -DWITH_ROBOTLOCOMOTION_SNOPT=ON ../drake
make -j
Fixes attempted:
tried deleting the whole drake-build directory and doing the whole process from scratch, got the same errors
One filename from the error message stands out: /usr/local/lib/python3.10/site-packages/pydrake/symbolic.pyi. It looks like you have sudo pip install drake installed into a system-wide directory. That is likely interfering with the from-source build of Drake and will need to be removed.

"IOError: Can't find a path to system files" for x86 full system on Mac

I'm trying to set up gem5 x86 full system on Mac OS Mojave (10.14)
First I did a git clone to get the gem5 sources, which are located at ~/gem5.
Then I ran scons build/x86/gem5.fast to build the whole thing. I had to change some of the -Werror flags to get it to compile, but it seems to work.
To test it, I ran build/x86/gem5.fast configs/example/se.py -c tests/test-progs/hello/bin/x86/linux/hello and got the following output:
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Jan 18 2020 15:28:50
gem5 started Jan 18 2020 17:48:35
gem5 executing on My-MacBook-Pro-208.local, pid 89984
command line: build/x86/gem5.fast configs/example/se.py -c tests/test-progs/hello/bin/x86/linux/hello
Global frequency set at 1000000000000 ticks per second
warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
0: system.remote_gdb: listening for remote gdb on port 7000
**** REAL SIMULATION ****
info: Entering event queue # 0. Starting simulation...
Hello world!
Exiting # tick 5941500 because exiting with last active thread context
I wanted to configure full system simulation, so I went to the "Full-System Stuff" section at http://gem5.org/Download and downloaded the Full System Files. I extracted the tar into ~/gem5/x86-system.
So now there's ~/gem5/x86-system/binaries which contains x86_64-vmlinux-2.6.22.9 and ~/gem5/x86-system/disks which contains linux-x86.img
In ~/.bash_profile I added export M5_PATH="/Users/me/gem5/x86-system".
However, when I run scons build/x86/tests/fast/quick, almost all of the tests fail. A lot of them have a failure like this:
...
File "/Users/me/gem5/configs/common/SysPaths.py", line 62, in __call__
raise IOError("Can't find a path to system files.")
IOError: Can't find a path to system files.
I also tried to run build/x86/gem5.fast configs/example/fs.py but I get the following error:
...
File "/Users/me/gem5/configs/common/SysPaths.py", line 71, in __call__
raise IOError("Can't find file '%s' on path." % filename)
IOError: Can't find file 'x86root.img' on path.
I'm not sure what part of configuration I'm missing. The docs and google searches aren't giving any working solutions...

Installing Tensorflow GPU/CUDA dependencies on a machine with no internet access

I have 2 machines -
dccten1a with no internet access where I need to install Tensorflow with GPU support
dccten1b with internet access so that I can download packages and transfer to dccten1a
In the final step of installing Tensorflow, when running the bazel build command to produce a whl file, I get an error which says that it can't find a file in a folder it is looking in, and also cannot download, obviously, as 1a doesn't have internet access.
bazel build --config=opt --config=cuda /home/tensorflow/Documents/tf_dependencies/tensorflow-master/tensorflow/tools/pip_package:build_pip_package --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
ERROR: error loading package '': Encountered error while reading extension file 'closure/defs.bzl': no such package '#io_bazel_rules_closure//closure': Error downloading [http://bazel-mirror.storage.googleapis.com/github.com/bazelbuild/rules_closure/archive/5ca1dab6df9ad02050f7ba4e816407f88690cf7d.tar.gz, https://github.com/bazelbuild/rules_closure/archive/5ca1dab6df9ad02050f7ba4e816407f88690cf7d.tar.gz] to /home/xyzuser/.cache/bazel/_bazel_xyzuser/cb1e63cb5e61cab49a9fd2f5ba92d003/external/io_bazel_rules_closure/5ca1dab6df9ad02050f7ba4e816407f88690cf7d.tar.gz: All mirrors are down: [Unknown host: github.com, Unknown host: mirror.bazel.build]
I checked in the system, and there is no such directory as shown in the error message (i.e., /home/xyzuser/.cache/bazel/_bazel_xyzuser/cb1e63cb5e61cab49a9fd2f5ba92d003/external/io_bazel_rules_closure/). So, I created it, searched and found the requisite (?) file online, downloaded the file in the machine with internet, transferred it to the target machine, moved the file to the just created directory, and tried running the command again:
(tensorflow#dccten1a):
mkdir -p /home/tensorflow/.cache/bazel/_bazel_tensorflow/cb1e63cb5e61cab49a9fd2f5ba92d003/external/io_bazel_rules_closure
(tensorflow#dccten1b):
http://bazel-mirror.storage.googleapis.com/github.com/bazelbuild/rules_closure/archive/5ca1dab6df9ad02050f7ba4e816407f88690cf7d.tar.gz
sudo scp -r /home/tensorflow/Downloads/5ca1dab6df9ad02050f7ba4e816407f88690cf7d.tar.gz tensorflow#160.88.114.17:/home/tensorflow/Documents/tf_dependencies
(tensorflow#dccten1a):
mv /home/tensorflow/Documents/tf_dependencies/5ca1dab6df9ad02050f7ba4e816407f88690cf7d.tar.gz /home/tensorflow/.cache/bazel/_bazel_tensorflow/cb1e63cb5e61cab49a9fd2f5ba92d003/external/io_bazel_rules_closure
Then I run the bazel build command again, but the same error persists.
Use --experimental_repository_cache to download the dependencies on the machine with internet access, transfer the cache to the machine without internet access, and use --experimental_repository_cache to refer to the same cache.
e.g.
1) On the machine with internet access, run
tensorflow#dccten1b $ bazel build --experimental_repository_cache=/path/to/some/folder --config=opt --config=cuda /home/tensorflow/Documents/tf_dependencies/tensorflow-master/tensorflow/tools/pip_package:build_pip_package --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0""
2) Copy the cache at /path/to/some/folder to the machine without internet access using a SD card or flash drive.
3) On the machine without internet access, run the same command again and setting the flag to the cache's location.
tensorflow#dccten1a $ bazel build --experimental_repository_cache=/path/to/some/folder --config=opt --config=cuda /home/tensorflow/Documents/tf_dependencies/tensorflow-master/tensorflow/tools/pip_package:build_pip_package --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0""

TensorFlow Serving Error

I have installed Tensorflow Serving as outlined on the install page at https://tensorflow.github.io/serving/setup. However, when I follow the build instruction on the page I get the following error:
$ bazel build tensorflow_serving/...
ERROR: /home/**PATH**/external/org_tensorflow/third_party/py/python_configure.bzl:183:20: unexpected keyword 'environ' in call to repository_rule(implementation: function, *, attrs: dict or NoneType = None, local: bool = False).
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Extension file 'third_party/py/python_configure.bzl' has errors.
INFO: Elapsed time: 0.623s
I am running on Ubuntu and TensorFlow 1.0.1 build. I am using Python 2.7 and have set up a virtualenv.
I can successfully build the bazel hello example and also am able to complete the gRPC quick start found at http://www.grpc.io/docs/quickstart/python.html.
Any suggestions?
-Dave
The trouble was an old copy of bazel. To determine your version
$ bazel version
Build label: 0.4.5
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Mar 16 12:19:38 2017 (1489666778)
Build timestamp: 1489666778
Build timestamp as int: 1489666778
In my case it required a manual removal of the old version
rm -fr ~/.bazel ~/.bazelrc
Next, I chose the install using the installer for ubuntu.
$ ./bazel-0.4.5-installer-linux-x86_64.sh
Bazel installer
---------------
Bazel is bundled with software licensed under the GPLv2 with Classpath exception.
You can find the sources next to the installer on our release page:
https://github.com/bazelbuild/bazel/releases
# Release 0.4.5 (2017-03-16)
There was still another trick to getting it to work.
$cd ..
$ bazel test tensorflow_serving/...
Python Configuration Error: 'PYTHON_BIN_PATH' environment variable is not set
This error is also related to versioning, but in this case it was an issue with serving. The solution was to revert to an earlier version and update the submodule from git (I had previously cloned the repository). From the serving directory:
$ git checkout 0.5.1
M tensorflow
M tf_models
Note: checking out '0.5.1'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at 51bb356... Merge pull request #325 from kirilg/0.5.1
(tensorflow) $ git submodule update
Submodule path 'tensorflow': checked out '07bb8ea2379bd459832b23951fb20ec47f3fdbd4'
Submodule path 'tf_models': checked out '2fd3dcf3f31707820126a4d9ce595e6a1547385d'
(tensorflow) $ bazel test tensorflow_serving/...
Serving now reports success:
INFO: Found 199 targets and 57 test targets...
[1,299 / 4,037] Still waiting for 200 jobs to complete:
Running (standalone):

Problems adding DKMS support to kernel module

I'm trying to add DKMS support in a kernel module i'm working on.
I have placed the kernel module source with a static lib to be linked against in the following directory:
/usr/src/dpx/1.0
With the following files:
dkms.conf
Makefile
dpxmtt.c
lib.a
dkms.conf file is like this:
MAKE="make"
CLEAN="make clean"
BUILT_MODULE_NAME=dpx
BUILT_MODULE_LOCATION=src/
DEST_MODULE_LOCATION=/kernel/drivers/input/touchscreen
PACKAGE_NAME=dpxm
PACKAGE_VERSION=1.0
REMAKE_INITRD=yes
And the makefile is like this:
EXTRA_CFLAGS+=-DLINUX_DRIVER -mhard-float
obj-m += dpx.o
dpx-objs:= dpxmtt.o ../source/lib.a
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
The ../source/lib.a is an hack since when the makefile is invoked by the dkms building system it was saying that it couldn't be found in directory (the build directory), but since it was being copied to the source directory, i'm referencing it relatively.
When I call
sudo dkms build -m dpx -v 1.0
The result is almost perfect:
santos#NS-PC:~$ sudo dkms build -m dpx -v 1.0
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
cleaning build area....
make KERNELRELEASE=3.0.0-14-generic....
ERROR (dkms apport): binary package for dpx: 1.0 not found
Error! Build of dpx.ko failed for: 3.0.0-14-generic (i686)
Consult the make.log in the build directory
/var/lib/dkms/dpx/1.0/build/ for more information.
nsantos#NS-PC:~$
And the content of the log file is:
DKMS make.log for dpx-1.0 for kernel 3.0.0-14-generic (i686)
Thu Jan 19 11:07:54 WET 2012
make -C /lib/modules/3.0.0-14-generic/build M=/var/lib/dkms/dpx/1.0/build modules
make[1]: Entering directory `/usr/src/linux-headers-3.0.0-14-generic'
CC [M] /var/lib/dkms/dpx/1.0/build/dpxmtt.o
LD [M] /var/lib/dkms/dpx/1.0/build/dpx.o
Building modules, stage 2.
MODPOST 1 modules
CC /var/lib/dkms/dpx/1.0/build/dpx.mod.o
LD [M] /var/lib/dkms/dpx/1.0/build/dpx.ko
make[1]: Leaving directory `/usr/src/linux-headers-3.0.0-14-generic'
The module was built correctly but it ends with the error:
ERROR (dkms apport): binary package for dpx: 1.0 not found
Error! Build of dpx.ko failed for: 3.0.0-14-generic (i686)
And I don't know what it means. Does anybody know?
Using:
$(shell uname -r)
in the Makefile it might be also wrong! The "shell uname -r" refers to the currently running kernel, but the main reason to use the dkms it's because it offers an automated method to recompile the kernel modules that reside outside of the kernel tree for every newly installed kernel. What i mean is that the Makefile might refers to a different kernel which the dkms is building the module for.
Use:
${kernelver} instead.
I had a similar problem. I think your BUILT_MODULE_LOCATION is set incorrectly to the src directory. It should be set in your example to the current directory, or you can just omit this variable and dkms would default to the current directory.