tensorboard: profile option not showing - tensorflow

I'm trying to setup tensorboard to show profile information so I can debug a slow model.
I have following the steps on https://www.tensorflow.org/guide/profiler
Running pip freeze | grep tensorboard shows:
tensorboard==2.8.0
tensorboard-data-server==0.6.1
tensorboard-plugin-profile==2.8.0
tensorboard-plugin-wit==1.8.1
Running /sbin/ldconfig -N -v $(sed 's/:/ /g' <<< $LD_LIBRARY_PATH) | grep libcupti shows:
/sbin/ldconfig.real: Path `/usr/local/cuda-11.6/targets/x86_64-linux/lib' given more than once
/sbin/ldconfig.real: Path `/usr/local/cuda-11.4/targets/x86_64-linux/lib' given more than once
/sbin/ldconfig.real: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib' given more than once
libcupti.so.11.6 -> libcupti.so.2022.1.1
libcupti.so.11.4 -> libcupti.so.2021.2.0
/sbin/ldconfig.real: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link
libcupti.so.10.1 -> libcupti.so.10.1.208
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.31.so is the dynamic linker, ignoring
The instruction page was not clear what output should be shown, but I assumed the references to libcupti.so indicates success.
Tensorboard was started with tensorboard --logdir logs/, which outputs:
2022-04-08 17:42:03.872466: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-04-08 17:42:03.947825: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-04-08 17:42:03.948334: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:922] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
NOTE: Using experimental fast data loading logic. To disable, pass
"--load_fast=false" and report issues on GitHub. More details:
https://github.com/tensorflow/tensorboard/issues/4784
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
However the tensorboard GUI does not display a profile tab:
What am I missing?

I was missing the code to add the tensorboard callback:
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=os.path.join(self.run_folder.value, 'logs'),
histogram_freq=1,
profile_batch='100,120'

Related

extracting .7z file using google colab

I have this file with .7z prefix and I am trying to extract the content of it with google colab
I have tried this linux instruction below:
!7z img_celeba.7z
but unfortunately it give this error below:
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU # 2.30GHz (306F0),ASM,AES-NI)
Command Line Error:
Unsupported command:
img_celeba.7z
then I tried to install p7zip like below:
!apt-get install p7zip-full
!p7zip -d file_name.7z
I also get this error
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU # 2.30GHz (306F0),ASM,AES-NI)
Scanning the drive for archives:
1 file, 734003200 bytes (700 MiB)
Extracting archive: img_celeba.7z
ERROR: img_celeba.7z
img_celeba.7z
Open ERROR: Can not open the file as [7z] archive
ERRORS:
Unexpected end of archive
Can't open as archive: 1
Files: 0
Size: 0
Compressed: 0
what should I do ??...
thanks in advance
I am going to close the question with instructions below:
first of all install p7zip using this comand:
!apt-get install p7zip-full
then try to extract the file with the below inst:
!p7zip -d file_name.7z
make sure that you provide the parent folder.7z
it worked for me it seems like I were tring to extract a part of the zipped folder
after some research it turn out the 7zipping partitioning the data in more than one section so all you need to do just refer to the right folder path with the prefix of .7z not its parts

"IOError: Can't find a path to system files" for x86 full system on Mac

I'm trying to set up gem5 x86 full system on Mac OS Mojave (10.14)
First I did a git clone to get the gem5 sources, which are located at ~/gem5.
Then I ran scons build/x86/gem5.fast to build the whole thing. I had to change some of the -Werror flags to get it to compile, but it seems to work.
To test it, I ran build/x86/gem5.fast configs/example/se.py -c tests/test-progs/hello/bin/x86/linux/hello and got the following output:
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 compiled Jan 18 2020 15:28:50
gem5 started Jan 18 2020 17:48:35
gem5 executing on My-MacBook-Pro-208.local, pid 89984
command line: build/x86/gem5.fast configs/example/se.py -c tests/test-progs/hello/bin/x86/linux/hello
Global frequency set at 1000000000000 ticks per second
warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
0: system.remote_gdb: listening for remote gdb on port 7000
**** REAL SIMULATION ****
info: Entering event queue # 0. Starting simulation...
Hello world!
Exiting # tick 5941500 because exiting with last active thread context
I wanted to configure full system simulation, so I went to the "Full-System Stuff" section at http://gem5.org/Download and downloaded the Full System Files. I extracted the tar into ~/gem5/x86-system.
So now there's ~/gem5/x86-system/binaries which contains x86_64-vmlinux-2.6.22.9 and ~/gem5/x86-system/disks which contains linux-x86.img
In ~/.bash_profile I added export M5_PATH="/Users/me/gem5/x86-system".
However, when I run scons build/x86/tests/fast/quick, almost all of the tests fail. A lot of them have a failure like this:
...
File "/Users/me/gem5/configs/common/SysPaths.py", line 62, in __call__
raise IOError("Can't find a path to system files.")
IOError: Can't find a path to system files.
I also tried to run build/x86/gem5.fast configs/example/fs.py but I get the following error:
...
File "/Users/me/gem5/configs/common/SysPaths.py", line 71, in __call__
raise IOError("Can't find file '%s' on path." % filename)
IOError: Can't find file 'x86root.img' on path.
I'm not sure what part of configuration I'm missing. The docs and google searches aren't giving any working solutions...

syntaxnet ./configure error

I was trying to using syntaxnet and I have finished most of processes. Upgrade bazel version to 0.43 in case of errors (Ubuntu 16.04 Ver, Anaconda python 2.7).
However, I am having a troubles with ./configure part. I am reading the official instruction via tensorflow github.
git clone --recursive https://github.com/tensorflow/models.git
cd models/syntaxnet/tensorflow
**./configure**
cd ..
bazel test syntaxnet/... util/utf8/...
# On Mac, run the following:
bazel test --linkopt=-headerpad_max_install_names \
syntaxnet/... util/utf8/...
Following logs will help you to understand what’s going on my machine. Thanks for the advice
Please specify the location of python. [Default is /home/ryan/anaconda2/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] n
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
/home/ryan
/home/ryan/pynaoqi-python2.7
/home/ryan/anaconda2/lib/python2.7/site-packages
Please input the desired Python library path to use. Default is [/home/ryan]
/home/ryan/anaconda2/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5.0
Please specify the location where cuDNN 5.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Invalid path to cuDNN toolkit. Neither of the following two files can be found:
/usr/local/cuda-8.0/lib64/libcudnn.so.5.0
/usr/local/cuda-8.0/libcudnn.so.5.0
.5.0
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
libcudnn.so resolves to libcudnn.5
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=120
INFO: Reading options for 'clean' from /home/ryan/git_ryan/models/syntaxnet/tensorflow/tools/bazel.rc:
Inherited 'build' options: --force_python=py2 --host_force_python=py2 --python2_path=/home/ryan/anaconda2/bin/python --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --define PYTHON_BIN_PATH=/home/ryan/anaconda2/bin/python --spawn_strategy=standalone --genrule_strategy=standalone
**INFO: Reading options for 'clean' from /etc/bazel.bazelrc:
Inherited 'build' options: --action_env=PATH --action_env=LD_LIBRARY_PATH --action_env=TMPDIR --test_env=PATH --test_env=LD_LIBRARY_PATH
Unrecognized option: --action_env=PATH
ERROR: /home/ryan/git_ryan/models/syntaxnet/tensorflow/tensorflow/tensorflow.bzl:568:26: Traceback (most recent call last):
File "/home/ryan/git_ryan/models/syntaxnet/tensorflow/tensorflow/tensorflow.bzl", line 562
rule(attrs = {"srcs": attr.label_list..."), <3 more arguments>)}, <2 more arguments>)
File "/home/ryan/git_ryan/models/syntaxnet/tensorflow/tensorflow/tensorflow.bzl", line 568, in rule
attr.label_list(cfg = "data", allow_files = True)
expected ConfigurationTransition or NoneType for 'cfg' while calling label_list but got string instead: data.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package '': Extension file 'tensorflow/tensorflow.bzl' has errors.
Configuration finished**
I think the version of your bazel is too high for Syntaxnet. you can try bazel-0.3.1 please.

TF installation failure on ubuntu 14.04

I'm getting this weird config error I cannot decipher. It doesn't seem any other ppl has encountered this. CUDA is correctly config'ed. What is this 'repository_rule' and 'external' package thing?
(tensorflow)weiwe#weiwe:~/tensorflow$ ./configure
Please specify the location of python. [Default is /home/weiwe/tensorflow/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Traceback (most recent call last):
File "<stdin>", line 18, in <module>
TypeError: can only concatenate list (not "str") to list
Found possible Python library paths:
Please input the desired Python library path to use. Default is []
/home/weiwe/tensorflow/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
libcudnn.so resolves to libcudnn.4
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.........
ERROR: /home/weiwe/tensorflow/third_party/gpus/cuda_configure.bzl:415:18: function 'repository_rule' does not exist.
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
Configuration finished
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel build -c opt --config=cuda \
> //tensorflow/tools/pip_package:build_pip_package
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
INFO: Elapsed time: 0.380s
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
INFO: Elapsed time: 0.109s
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel clean
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
(tensorflow)weiwe#weiwe:~/tensorflow$ bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
ERROR: com.google.devtools.build.lib.packages.BuildFileContainsErrorsException: error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
INFO: Elapsed time: 0.242s
There was a patch recently to the cuda build layer. Once you get that error
Error loading package 'external': Extension file '#local_config_cuda//cuda:build_defs.bzl' may not be loaded from a WORKSPACE file since the extension file is located in an external repository.
You have to re-run ./configure again to re-create that target.
This can become tiresome and a way to skip through the configuration process is to provide the variable settings from command line
PYTHON_BIN_PATH=$HOME/anaconda2/bin/python CUDA_TOOLKIT_PATH="/usr/local/cuda" CUDNN_INSTALL_PATH="/usr/local/cuda" TF_NEED_CUDA=1 TF_CUDA_COMPUTE_CAPABILITIES="6.1" TF_CUDNN_VERSION="5" TF_CUDA_VERSION="8.0" TF_CUDA_VERSION_TOOLKIT=8.0 ./configure
Hope it helps

Bazel build fails with "Executing genrule #six_archive//:copy_six failed" error while building syntaxnet

I'm trying to follow the instructions at syntaxnet's github page to build syntaxnet parser models.
My system is a Debian Wheezy. Shouldn't be very different from Ubuntu 14.04 LTS or 15.05. I have compiled bazel 0.2.2 (as opposed to 0.2.2b) from source and it appears to work correctly.
Whenever I launch the bazel test syntaxnet/... util/utf8/... command, no tests are executed (all skipped) with some quite cryptic error messages. Here's an example:
root#host:~/tensorflow_syntaxnet/models/syntaxnet# ../../bazel/output/bazel test syntaxnet/... util/utf8/...
Extracting Bazel installation...
.............
INFO: Found 65 targets and 12 test targets...
ERROR: /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/external/six_archive/BUILD:1:1: Executing genrule #six_archive//:copy_six failed: namespace-sandbox failed: error executing command /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox ... (remaining 5 argument(s) skipped).
unshare failed with EINVAL even after 101 tries, giving up.
INFO: Elapsed time: 95.469s, Critical Path: 22.46s
//syntaxnet:arc_standard_transitions_test NO STATUS
//syntaxnet:beam_reader_ops_test NO STATUS
//syntaxnet:graph_builder_test NO STATUS
//syntaxnet:lexicon_builder_test NO STATUS
//syntaxnet:parser_features_test NO STATUS
//syntaxnet:parser_trainer_test NO STATUS
//syntaxnet:reader_ops_test NO STATUS
//syntaxnet:sentence_features_test NO STATUS
//syntaxnet:shared_store_test NO STATUS
//syntaxnet:tagger_transitions_test NO STATUS
//syntaxnet:text_formats_test NO STATUS
//util/utf8:unicodetext_unittest NO STATUS
Executed 0 out of 12 tests: 12 were skipped.
I'm using Oracle Java 8 JDK as recommended, and my compiler is:
~/tensorflow_syntaxnet/models/syntaxnet# gcc --version
gcc (Debian 4.7.2-5) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Tried looking into the namespace-sandbox binary that's mentioned in the error message, but before I dive deep into this, I thought I'd ask here.
~/tensorflow_syntaxnet/models/syntaxnet# ls -l /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox
lrwxrwxrwx 1 root root 108 May 13 14:52 /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox -> /root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox
~/tensorflow_syntaxnet/models/syntaxnet# readlink /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox
/root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox
Command seems to work fine though:
~/tensorflow_syntaxnet/models/syntaxnet# file $(readlink /root/.cache/bazel/_bazel_root/74c6bab7a21f28ad02405b720243d086/syntaxnet/_bin/namespace-sandbox)
/root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.26, BuildID[md5/uuid]=0xecfd97b6a6b9a193b045be13654bd55b, not stripped
~/tensorflow_syntaxnet/models/syntaxnet# /root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox
No command specified.
Usage: /root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox [-S sandbox-root] -- command arg1
provided: /root/.cache/bazel/_bazel_root/install/ca381eaad1c931167a6355cb8a2b98cf/_embedded_binaries/namespace-sandbox
Mandatory arguments:
-S <sandbox-root> directory which will become the root of the sandbox
-- command to run inside sandbox, followed by arguments
Optional arguments:
-W <working-dir> working directory
-T <timeout> timeout after which the child process will be terminated with SIGTERM
-t <timeout> in case timeout occurs, how long to wait before killing the child with SIGKILL
-d <dir> create an empty directory in the sandbox
-M/-m <source/target> system directory to mount inside the sandbox
Multiple directories can be specified and each of them will be mounted readonly.
The -M option specifies which directory to mount, the -m option specifies where to
mount it in the sandbox.
-n if set, a new network namespace will be created
-r if set, make the uid/gid be root, otherwise use nobody
-D if set, debug info will be printed
-l <file> redirect stdout to a file
-L <file> redirect stderr to a file
#FILE read newline-separated arguments from FILE
Any idea?
UPDATE: I have done exactly the same steps on a Ubuntu 14.04 LTS (my small workstation, as opposed to the production server running Debian) and everything works well there, with all tests passing. I wonder what's the difference.
Apparently some permission errors happens when setting up the sandbox. A quick workaround is to deactivate the sandbox by using --genrule_strategy=standalone --spawn_strategy=standalone (note that the second one is already specified in the TensorFlow rc file).
You can set those flag in your ~/.bazelrc:
echo "build --genrule_strategy=standalone --spawn_strategy=standalone" >>~/.bazelrc