tutorials_example_trainer fails in debug mode (-c dbg) - tensorflow

The build for tutorials_example_trainer works fine in release mode (-c opt), but fails in debug mode (-c dbg).
Did anyone encounter this? It seems to be a bug.
The command I run:
bazel build -c dbg --config=cuda //tensorflow/cc:tutorials_example_trainer --verbose_failures
The build fails with the following message:
/usr/include/c++/4.8/mutex(125) (col. 5): error: calling a host
function("std::mutex_base::__mutex_base [subobject]") from a
__device function("std::mutex::mutex") is not allowed
< some warnings>
1 error detected in the compilation of
"/tmp/tmpxft_00005e78_00000000-10_cwise_op_gpu_log.cu.compute_52.cpp1.ii".
ERROR:
/home/uriv/git/tensorflow/tensorflow/tensorflow/core/BUILD:248:1:
output
'tensorflow/core/_objs/gpu_kernels/tensorflow/core/kernels/cwise_op_gpu_log.cu.pic.o'
was not created. ERROR:
/home/uriv/git/tensorflow/tensorflow/tensorflow/core/BUILD:248:1: not
all outputs were created.
Thanks.

You can workaround the problem by editing
tensorflow/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceType.h
and commenting out the following 2 lines of code:
static tensorflow::mutex m_devicePropInitMutex(tensorflow::LINKER_INITIALIZED);
and
tensorflow::mutex_lock l(m_devicePropInitMutex);
I'll push a proper fix to the tensorflow repository shortly.

Related

Drake build from source stubgen failed

I'm building drake from source, specifically this branch of Russ' fork:
https://github.com/RussTedrake/drake
everything worked without issue until the last command make -j
where I get the following output:
[ 12%] Performing build step for 'drake_cxx_python'
INFO: Analyzed target //:install (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /Users/chewchiashaoyuan/Documents/Software/drake/bindings/pydrake/BUILD.bazel:780:22: GenerateMypyStubs bindings/pydrake/pydrake/__init__.pyi failed: (Exit 1): stubgen failed: error executing command bazel-out/darwin-opt/bin/bindings/pydrake/stubgen --quiet '--package=pydrake' '--output=bazel-out/darwin-opt/bin/bindings/pydrake'
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
Matplotlib created a temporary config/cache directory at /var/folders/sl/37m0k__51_3_5c5j02w201r40000gn/T/matplotlib-8mr5qkfg because the default path (/Users/chewchiashaoyuan/.matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Critical error during semantic analysis: /usr/local/lib/python3.10/site-packages/pydrake/symbolic.pyi:203: error: invalid syntax
Target //:install failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /Users/chewchiashaoyuan/Documents/Software/drake/BUILD.bazel:63:8 Middleman _middlemen/install-runfiles failed: (Exit 1): stubgen failed: error executing command bazel-out/darwin-opt/bin/bindings/pydrake/stubgen --quiet '--package=pydrake' '--output=bazel-out/darwin-opt/bin/bindings/pydrake'
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
INFO: Elapsed time: 9.617s, Critical Path: 9.33s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully
make[2]: *** [drake_cxx_python-prefix/src/drake_cxx_python-stamp/drake_cxx_python-build] Error 1
make[1]: *** [CMakeFiles/drake_cxx_python.dir/all] Error 2
make: *** [all] Error 2
My operating system is:
macOS Monterey Version 12.6
The full make -j -d output is here
I referenced https://drake.mit.edu/from_source.html, https://drake.mit.edu/bazel.html#snopt and https://github.com/RobotLocomotion/drake/issues/12175
I did the following:
git clone https://github.com/RussTedrake/drake
cd drake
git checkout kin_traj_opt2
./setup/mac/install_prereqs.sh
cd ..
mkdir drake-build
cd drake-build
cmake -DWITH_ROBOTLOCOMOTION_SNOPT=ON ../drake
make -j
Fixes attempted:
tried deleting the whole drake-build directory and doing the whole process from scratch, got the same errors
One filename from the error message stands out: /usr/local/lib/python3.10/site-packages/pydrake/symbolic.pyi. It looks like you have sudo pip install drake installed into a system-wide directory. That is likely interfering with the from-source build of Drake and will need to be removed.

Compile TensorFlow v1.8.0 cuda/include/cublas_v2.h: No such file or directory

When I compile TensorFlow v1.8
ERROR: /work/tensorflow/tensorflow/stream_executor/BUILD:52:1: C++ compilation of rule '//tensorflow/stream_executor:cuda_platform' failed (Exit 1)
tensorflow/stream_executor/cuda/cuda_blas.cc:16:36: fatal error: cuda/include/cublas_v2.h: No such file or directory
compilation terminated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 119.973s, Critical Path: 43.00s
INFO: 3322 processes, local.
FAILED: Build did NOT complete successfully
I find /usr/local/cuda/include don't have this file. How can get it?
Running bazel clean --expunge and then compiling again seems to have fixed it for me.
Compiling tensorflow 1.12.3 with bazel 0.15.2

Error running tensorflow test_streaming_accuracy.cc

to run test_streaming_accuracy.cc , I ran the following command:
bazel run tensorflow/examples/speech_commands:test_streaming_accuracy --graph=/home/sweta/AudioRecognition/speech_commands_train/my_frozen_graph.pb --labels=/home/sweta/AudioRecognition/speech_commands_train/conv_labels.txt --wav=/home/sweta/AudioRecognition/speech_dataset/streaming_test_labels.wav --ground_truth=/home/sweta/AudioRecognition/speech_dataset/streaming_test_labels.txt --verbose
After executing this, I am getting the following error:
ERROR: Unrecognized option: --graph=/home/sweta/AudioRecognition/speech_commands_train/my_frozen_graph.pb
Anyone has any idea how to go about it?
In order to pass arguments to the binary under bazel run, you'll need to include an additional -- before your args, or else Bazel will parse those as arguments for itself.
e.g. bazel run //my/binary:target --verbose_failures -- --arg_for_binary_target=42

Running seq2seq model error

I am trying to run the code in this tutorial.
When I try to run this command:
sudo bazel run -c opt tensorflow/models/rnn/translate/translate.py -- -- data_dir ../data/translate/
I get the following error:
...................
ERROR: Cannot run target //tensorflow/models/rnn/translate:translate.py: Not executable.
INFO: Elapsed time: 1.537s
ERROR: Build failed. Not running target.
Any ideas how to resolve?
It seems there are a lot of mistakes in the Tensorflow tutorial..
I was able to run it by removing the .py, and adding an extra -- before the options like:
bazel run -c opt tensorflow/models/rnn/translate/translate -- --data_dir /home/minsoo/tensorflowrnn/data
the directory part should be changed according to your system.
I ran it by going to the directory and running:
python translate.py

Error when try to compile Chromium

I try to use the command ninja -C out/Debug chrome to compile Chromium.
However the error msg says that:
ninja error loading 'build.ninja': the system cannot find the file specified
ninja Entering dictory 'out/Debug'
Could I know what's the problem?
Thanks.
The out directory and its contents (including build.ninja) are created by running
python build\gyp_chromium
or
gclient runhooks
Executing either command from within /src should allow your compile to proceed.
On Windows machine!
When I was running gn gen out/Default it also gave me an error:
ERROR at //build/config/win/visual_studio_version.gni:27:7: Script returned non-zero exit code.
exec_script("../../vs_toolchain.py", [ "get_toolchain_dir" ], "scope")
^----------
Current dir: D:/Chromium/src/out/Goma/
Command: C:/Python27/python.exe -- D:/Chromium/src/build/vs_toolchain.py get_toolchain_dir
Returned 1 and printed out:
Please follow the instructions at https://chromium.googlesource.com/chromium/src/+/master/docs/windows_build_instructions.md
I did the following steps and it worked for me.
Set this variable. Reference (not sure about its purpose yet)
set DEPOT_TOOLS_WIN_TOOLCHAIN=0
Run the command gn gen out/Default
Run the build command again
autoninja -C out/Default chrome
It is also recommended to run gclient sync from out/Default directory.
After the switch to "gn" you could try:
gn gen out/Debug