TensorCPU build: dependencies downloading issue - tensorflow

I am trying to build tensorflow cpu version on my centos 6.5, but i am stuck :-
bazel build -c opt //tensorflow/cc:tutorials_example_trainer
........
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Downloading from http://www.ijg.org/files/jpegsrc.v9a.tar.gz: 0B
it tries to download jpeg/eigen/png tar files but is unable to do so due to lack of internet connectivity on my machine .
I can download & put all these dependencies somewhere within the tensorflow sourcecode directory so that build procedure automatically detect them.
Could you please suggest the path to that directory ( relative to tensorflow src root directory)? or is there a file which needs modification?
I tried placing it under $TENSOR_SRC_ROOT/tensorflow/contrib/cmake/external, but that did not help.
Eagerly awaiting your replies,

thanks to bazel development team, setting HTTP_PROXY & HTTPS_PROXY in my environment resolved the issue.

Related

Unable to use Intel oneAPI DPCT for migration of my applicaiton: Error Code -5

I'm attempting to follow the instructions from this site https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-dpcpp-compatibility-tool/top.html. I receive an error when I use the dpct command, stating that the path for CUDA header files is incorrect. Now, in order to add the CUDA path, I must first install the CUDA toolkit, which I am unable to accomplish without sudo access and I don't have sudo privilege to my other server. Can someone please help me here.
dpct exited with code: -5 (Error: Path for CUDA header files is invalid or not available. Use --cuda-include-path to specify the correct path to the header files)
You can install CUDA toolkit in your home directory. You may find the instructions on the Nvidia official webpage that home directory installation doesn't require sudo.
Once you have installed the toolkit, use the below command to perform migration for a single source file as below:
dpct --cuda-include-path=/path/to/cuda/include sample.cu
Thanks,
Santosh

nv-nsight-cu-cli caused Tensorflow to fail

I've downloaded the newest Nsight Compute profiling tool and I want to use it to benchmark Tensorflow applications. The code I'm using is here. It runs perfectly fine when I execute it and when I benchmark it with nvprof ./mnist.py it had no problem at all. However, when I try to run it with command sudo ./nv-nsight-cu-cli [path to the file] I get the following error:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
I suspect that nv-nsight-cu-cli somehow didn't recognized the environment variable at all. Is there any fix around?
You need to search for differences in both environments:
env variables
LD_LIBRARY_PATH
/etc/ld.so.conf
/etc/ld.so.conf.d/*
cuBLAS
Is installation complete/not broken?
Is it installed at the same location on both machines?
Versions
...
You can start with locate libcublas.so on both machines to see if there's a difference. Alternatively, you can strace -f -e open the program to check where it tries to libcublas.so from.
Your error has (for now) nothing to do with GPUs: libcublas.so.9.0 can just not be found. Find it, find why Tensorflow can not find it and your problem will be solved.
It appears that GP100 is not supported by the tool at this moment.
The answer is found here:
Nsight Compute only supports Pascal (other than GP100) and later GPUs.

Bazel build behind proxy

I would like to follow tensorflow example to build generate_streaming_test_wav to generate test wav. And my bazel version is 0.16.1.
The problem is when I use command bazel run tensorflow/examples/speech_commands:generate_streaming_test_wav
, the following error message shown up:
xxx#xxx:~/kws/tensorflow-0911$ bazel run tensorflow/examples/speech_commands:generate_streaming_test_wav
Starting local Bazel server and connecting to it...
ERROR: error loading package '': Encountered error while reading extension file 'closure/defs.bzl': no such package '#io_bazel_rules_closure//closure': Error downloading [https://mirror.bazel.build/github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz, https://github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz] to /home/janet/.cache/bazel/_bazel_janet/2d14dc1ff5782da202e00efcc3cd86bc/external/io_bazel_rules_closure/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz: All mirrors are down: []
ERROR: error loading package '': Encountered error while reading extension file 'closure/defs.bzl': no such package '#io_bazel_rules_closure//closure': Error downloading [https://mirror.bazel.build/github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz, https://github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz] to /home/janet/.cache/bazel/_bazel_janet/2d14dc1ff5782da202e00efcc3cd86bc/external/io_bazel_rules_closure/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz: All mirrors are down: []
INFO: Elapsed time: 57.573s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)
However, I can use wget download those two packages.
1.https://mirror.bazel.build/github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz 2.https://github.com/bazelbuild/rules_closure/archive/dbb96841cc0a5fb2664c37822803b06dab20c7d1.tar.gz
I think my network should be fine. I have no idea why it can't download those files.
Any idea or suggestions would be very appreciated!
If you know your proxy server, you should be able to set:
export HTTPS_PROXY=http://me:mypassword#myproxyserver.domain.com:myport
export HTTP_PROXY=http://me:mypassword#myproxyserver.domain.com:myport
and run the bazel build again.
If you don't know the proxy server used by wget check /etc/wgetrc or ~/.wgetrc
I've seen conflicting statements about HTTPS_PROXY and HTTP_PROXY being uppercase and lowercase, so you might try setting both. (Some have used unset to remove the lowercase settings. See: https://github.com/bazelbuild/bazel/issues/587#issuecomment-412531604)
create a folder say 'dist'.
Now whatever URL bazel is not able to download .
do wget inside that folder. ( wget normally works with most of proxy)
then run
'bazel build ...... --distdir dist'
it will take packages from dist and compilation will do start.
This might be a bug in Bazel's repository rules. If you'd be so kind to file a bug, that'd be great!
As a workaround, extract the downloaded archive somewhere and replace the io_bazel_rules_closure rule in the WORKSPACE file with a local_repository rule pointing to the directory where you extracted the archive.

troubles caused by tensorflow image's LD_LIBRARY_PATH

I'v installed DC/OS v1.8.4, the destination node has gpu resources and nvidia driver has also been installed, I tried to deploy tensorflow in mesos container, but it failed, there is only one error message in mesos's stderr:
mesos-containerizer: error while loading shared libraries: libmesos-1.0.1.so: cannot open shared object file: No such file or directory
But I can deploy other services successfuly, such as nginx, wordpress (also in mesos container)
The problem may be caused by tensorflow image, in its parent image CUDA, it reset LD_LIBRARY_PATH :
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH} ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
In OpenDCOS, before mesos-agent startup, it sets its executor's environment variable LD_LIBRARY_PATH to "/opt/mesosphere/lib", so that executor can locate necessary so files, but in above case, LD_LIBRARY_PATH is reset by tensorflow, so it failed to startup!
Anyone knows how OpenDCOS handle this problem ? Modify these public CUDA images?
GPUs are only officially supported in DC/OS 1.9+
For (unsupported) instructions on getting GPUs to work in 1.8, please see my answer to this question on the DC/OS mailing list:
https://groups.google.com/a/dcos.io/d/msg/users/HEgcUfRRqzk/inIBmapMCQAJ
Additionally, there is also a know issue with setting LD_LIBRARY_PATH in your container image for pre 1.9 clusters (though it usually manifests as a missing libssl.so library).
In your case, the CUDA container is setting LD_LIBRARY_PATH, which is overriding the LD_LIBRARY_PATH setting that DC/OS relies on to find it's library files. This is obviously a bug in DC/OS and has since been fixed in 1.9. The best (unsupported) workaround for this is to run
sudo ldconfig /opt/mesosphere/lib
on all of your nodes to put /opt/mesosphere/lib into the default library path. You will have to redo this on every reboot (or alternatively) add /opt/mesosphere/lib to a file under /etc/ld.so.conf.d/ to make it durable (maybe /etc/ld.so.conf.d/dcos.conf?).
This JIRA addressing the underlying issue can be found here:
https://issues.apache.org/jira/browse/MESOS-7027

poclbm not reporting hashes to deepbit or slush

I run poclbm on my system but for some reason both deepbit and slush don't "see" the work being performed. My system reports about 200 megabashes per second being done. I tried mining with my cpu using the same settings, and then both deepbit and slush recognized that work was being performed.
These are the errors I am getting out of the respective mining hardware (every minute or so):
poclbm error: pit.deepbit.net:8332 22/02/2013 21:50:59, Verification failed, check hardware! (0:0:Cypress, d47b7ba0)
cgminer error: [2013-02-22 22:18:51] GPU0: invalid nonce - HW error
I am using Ubuntu 12.10 (Quantal Quetzal) with the 12.10 version poclbm with an ATI 5800 series video card. The video drivers are installed and work as far as I can tell. When I run a "aticonfig --odgc --adapter=all", the gpu does seem to be utilized with poclbm (around 70% utilization or so).
I found the solution through an irc channel (Freenode on channcel #cgminer). Basically, at least on the version of Ubuntu that I have (12.10), the 2.8 version of the SDK does NOT work properly with cgminer or poclbm. I was instructed to download the 2.4 version of the SDK. Here:
http://developer.amd.com/Downloads/AMD-APP-SDK-v2.4-lnx32.tgz
http://developer.amd.com/Downloads/AMD-APP-SDK-v2.4-lnx64.tgz
Some distributions require the "2.7" version so I'll put the links here:
http://developer.amd.com/Downloads/AMD-APP-SDK-v2.7-lnx32.tgz
http://developer.amd.com/Downloads/AMD-APP-SDK-v2.7-lnx64.tgz
I compiled it. There is no "make install" for this Makefile, apparently, so you have to manually copy the files to your lib directory:
for 32 bit: $ cp -pv lib/x86/* /usr/lib/
for 64 bit: $ cp -pv lib/x86_64/* /usr/lib/
Also copy the include files: $ rsync -avl include/CL/ /usr/include/CL/
With the libraries installed in the appropriate directories, I recompiled cgminer and then it worked. I also tried it with poclbm and it worked with that too.
Hm, I experienced the same error with pclbm and cgminer. Then I found https://bitcointalk.org/index.php?topic=139406.msg1502120#msg1502120 .. I tried phoenix and all is ok now. Hope it helps. Sry my bad english.