Deep Reinforcement Learning Hands on, chapter 7. Can't get tensorflow to work - tensorflow

Doing a course in Machine Learning and can't get Tensorboard to work. I have saved runs from running a DQN and I write:
tensorboard -logdir runs
With the folliwng result:
2019-12-28 18:32:04.265065: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
TensorBoard 1.7.0 at http://david-linux:6006 (Press CTRL+C to quit)
So I click the link and get:
No dashboards are active for the current data set.
Probable causes:
You haven’t written any data to your event files
TensorBoard can’t find your event files.
I also get this result after having the code running for a while:
"W1228 18:34:34.186506 Thread-2 application.py:272] path /[[_dataImageSrc]] not found, sending 404
W1228 18:34:34.205581 Thread-2 application.py:272] path /[[_imageURL]] not found, sending 404"
Running this on Linux using Anaconda Python version 3.6 because that is what the course book uses. Have no idea what the above errors means, quite new to coding in general and reinforment learning in particular.

It could be caused if the browser isn't updated. You could also try installing the latest version of Tensorboard:
pip uninstall tensorflow-tensorboard
pip install tensorboard
Also try using different browsers.

Can you just try going to http://localhost:6006 instead? It looks like your hostname is not one that actually resolves in DNS.

Related

Stopping and starting a deep learning google cloud VM instance causes tensorflow to stop recognizing GPU

I am using the pre-built deep learning VM instances offered by google cloud, with an Nvidia tesla K80 GPU attached. I choose to have Tensorflow 2.5 and CUDA 11.0 automatically installed. When I start the instance, everything works great - I can run:
Import tensorflow as tf
tf.config.list_physical_devices()
And my function returns the CPU, accelerated CPU, and the GPU. Similarly, if I run tf.test.is_gpu_available(), the function returns True.
However, if I log out, stop the instance, and then restart the instance, running the same exact code only sees the CPU and tf.test.is_gpu_available() results in False. I get an error that looks like the driver initialization is failing:
E tensorflow/stream_executor/cuda/cuda_driver.cc:355] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
Running nvidia-smi shows that the computer still sees the GPU, but my tensorflow can’t see it.
Does anyone know what could be causing this? I don’t want to have to reinstall everything when I’m restarting the instance.
Some people (sadly not me) are able to resolve this by setting the following at the beginning of their script/main:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
I had to reinstall CUDA drivers and from then on it worked even after restarting the instance. You can configure your system settings on NVIDIAs website and it will provide you the commands you need to follow to install cuda. It also asks you if you want to uninstall the previous cuda version (yes!).This is luckily also very fast.
I fixed the same issue with the commands below, taken from https://issuetracker.google.com/issues/191612865?pli=1
gsutil cp gs://dl-platform-public-nvidia/b191551132/restart_patch.sh /tmp/restart_patch.sh
chmod +x /tmp/restart_patch.sh
sudo /tmp/restart_patch.sh
sudo service jupyter restart
Option-1:
Upgrade a Notebooks instance's environment. Refer the link to upgrade.
Notebooks instances that can be upgraded are dual-disk, with one boot disk and one data disk. The upgrade process upgrades the boot disk to a new image while preserving your data on the data disk.
Option-2:
Connect to the notebook VM via SSH and run the commands link.
After execution of the commands, the cuda version will update to 11.3 and the nvidia driver version to 465.19.01.
Restart the notebook VM.
Note: Issue has been solved in gpu images. New notebooks will be created with image version M74. About new image version is not yet updated in google-public-issue-tracker but you can find the new image version M74 in console.

Tensorflow does not generate GPU tracing information

I started a new machine learning project.
In according to this document (https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras)
TF with Tensorboard appears to support GPU profiling. So, i used the same code in my Jupyter Notebook for testing.
The sample code generates profiling resulting. However, there is no GPU tracing information in resulting file. (only CPU)
This is my main problem.
I am using two RTX 2080 TI graphic cards.
And also, they were working when running the code.
The sample code does not use MirroredStrategy. So, i could see the one of them was running.
At first, i thought Tensorboard was the problem. But,i realized soon that TF does not generate the GPU tracing information.
The image above is the resulting file (local.trace). There was no GPU data.
It is my system specification.
OS ubuntu 18.04
jupyter-client 5.3.4
jupyter-core 4.6.1
jupyter-tensorboard 0.1.10
tensorflow-gpu 2.0.0
tensorflow-estimator 2.0.1
tensorflow-metadata 0.15.1
tensorboard 2.0.2
nVidia 410.104
CUDA 10.0
anaconda 4.7.12 (with python 3.6)
It looks irrelevant, but there was a warning message like the image below.
I have tested this on other PC and got the same resulting. It could be the GPU profiling is only supporting on Google Colab. (I am still confusing) Recently, I have searched it on google to fix the problem. I could not get still the answer.
Is there someone who is using GPU profiling on your own System instead of Google Colab?
Please give me piece of advices.
I figured out what caused the problem.
It was related with CUPTI(CUDA Profiling Tools Interface)
In contrast to Jupyter Notebook, there was a warning message when the code is running on Ubunto shell.
CUPTI error: CUPTI could not be loaded or symbol could not be found.
TF could not find CUPTI libraries. This is the main reason of the problem.
After adding the path to LD_LABRARY_PATH as below link, the problem is fixed!
https://stackoverflow.com/a/58752904/5553618

Does tensorflow support Python 3.6.4 on Windows?

I'm running a Windows computer with just a CPU (no GPU). When I run pip install tensorflow -vvv in order to see what pip is doing, it lists a lot of links, but for all of them, it says "Skipping link ... it is not compatible with this Python."
Does tensorflow support Python 3.6.4 on Windows? If so, what binary URL should I use to install it?
(I previously installed with this version due to reading this, but ran into this error without the DLL load failed message, so I'm wondering if there's a better version I should use.)
Also, I'm aware that Tensorflow says they support Python 3.x, but right now it hasn't been working for me.
You have probably installed Python 32bits, you need the 64bits version

Tensorboard Interactive Debugger Not Connecting

I am running the latest version of tensorflow (1.7.0) in Windows. It added a new feature for debugging in tensorboard. However, I am not able to get it to connect. I was previously using the LocalCLIDebugWrapperSession debugger. However, I wanted to play with the new features in TensorBoardDebugWrapperSession.
When I run tensorboard using the command "tensorboard --logdir=dir --debugger_port 6064" it tells me "Creating InteractiveDebuggerPlugin at port 6064".
However, when I run my model in tensorflow (after swapping out the LocalCLIDebugWrapperSession for the TensorBoardDebugWrapperSession) it never seems to connect to the interactive debugger. I don't get an error in the output for either tensorboard or tensorflow. Nothing happens and on the tensorboard debugger page it just shows: "Debugger is waiting for Session.run() connections..."
It seems that tensorflow does not have support for grpc connection on windows. A workaround is to use windows subsystem. Note that there is no direct access to system GPU from the subsystem. So it cannot be use with tensorflow-gpu version
start_epoch: grpc:// debug URL scheme is not implemented on Windows yet.
https://github.com/tensorflow/tensorflow/issues/17933

Running Tensorboard without CUDA support

Is it possible to run Tensorboard on a machine without CUDA support?
I'm working at a computation center (via ssh) which has two major clusters:
CPU-Cluster which is a general workhorse without CUDA support (no dedicated GPU)
GPU-Cluster with dedicated GPUs e.g. for running neural networks with tensorflow-gpu.
The access to the GPU-cluster is limited to Training etc. such that I can't afford to run Tensorboard on a machine with CUDA-support. Instead, I'd like to run Tensorboard on the CPU-Cluster.
With the TF bundled Tensorboard I get import errors due to missing CUDA support.
It seems reasonable that the official Tensorboard should have a mode for running with CPU-only. Is this true?
I've also found an inofficial standalone Tensorboard version (github.com/dmlc/tensorboard), does this work without CUDA-support?
Solved my problem: just install tensorflow instead of tensorflow-gpu.
Didn't work for me for a while due to my virtual environment (conda), which didn't properly remove tensorflow-gpu.
Tensorboard is not limited by whether a machine has GPU or not.
And as far as I know, what Tensorboard do is parsing events pb files and display them on web. There is not computing, so it doesn't need GPU.