tensorboard: cannot open subgraph - tensorflow

Recently I wrote a my custom operator (and its gradient) in python following this post
Tensorflow: Custom operation used in two networks simultaneously produces nan
Tensorflow runs with no error and the prediction gives expected accuracy. However, when I want to visualize this graph with tensorboard. I find that I cannot open the subgraph to see its structures. But it's gradient subgraph can be opened and seen. Does anyone has some idea about this problem?
Fig.1: subgraph fc1 cannot be opened but gradient/fc1 can be opened.

I can open the fc1 metanode on TensorBoard 0.1.8.
What version of TensorBoard you are using? You can find the version via running
python -c 'from tensorboard import version; print(version.VERSION)'
After that, could you try upgrading tensorboard via
pip install --upgrade tensorflow-tensorboard
and let me know if the issue persists?

Related

Google Collab incompatibility with TensorBoard

In a Google Collab notebook once I execute the commands:
%load_ext tensorboard
%tensorboard --logdir <data_directory>
I am unable to save the model in the cloud, not only automatically but also manually.
Someone has an idea on why this can be happening?
If its helpful for someone else; executing in an independent module at the beginning the command:
%load_ext tensorboard
And at the end of the program in a separated module
%tensorboard --logdir <folder_name>
Worked for me and allows to save my work despite the TensorBoard module being loaded in the system

pickle.load cannot open up a (Stylegan2 network) pickle model on my local machine, but can on the cloud

Stylegan2 uses network pickle files to store ML models. I transfer trained one model, which I am able to open up on cloud servers. I have been generating images from this model fine with the following setup:
Google Colab: Python 3.6.9, CUDA 10.1, tensorflow-gpu 1.15, CuDNN 7.6.5
However, I cannot open the network pickle file on my local machine, even though I've been trying to replicate that cloud setup the best I can. (I have the right GPU hardware/drivers/etc.)
Local (Windows 10) Python 3.6.9, CUDA 10.1, tensorflow-gpu 1.15, CuDNN 7.6.5
Requires a library 'dnnlib' to be in the PYTHONPATH and for a tf.Session() to be initialized
I get the an assertion error about the pickle.
**Assertion error**: `assert state["version"] in [2,3]`
I find this error very odd because the network pickle works on the cloud--so it was saved properly. Additionally, my local set up can open up other network pickles(ie. ones downloaded from the internet through GET requests), making me think that I have properly set up my PYTHONPATH and initialized a tf.Session. These are prerequisites listed in the Stylegan repo:
"You can import the networks in your own Python code using pickle.load(). For this to work, you need to include the dnnlib source directory in PYTHONPATH and create a default TensorFlow session by calling dnnlib.tflib.init_tf()"
I'm not sure why I cannot open up this pickle in one environment, but can in another. Does anyone have any suggestions as to where I might start looking?
Actually, I figured it out by printing out what version was throwing the error. The version printed was '4'. I realized that this matched the pickle (HIGHEST_PROTOCOL) and that what I needed was the newest pull of the Stylegan2 repository, which included pickle format_version 4 in their allowed versions.

How to open TensorBoard from datalab when proxy port number indicated

I’m using a python notebook in gGloud datalab to re-train a neural network.
From the notebook I call retrain.py with
!python -m retrain --bottleneck_dir=../tf_files/bottlenecks --how_many_training_steps=500 --model_dir=../tf_files/models/ --summaries_dir=../tf_files/training_summaries/'mobilenet_1.0_224' --output_graph=../tf_files/retrained_graph.pb --output_labels=../tf_files/retrained_labels.txt --architecture='mobilenet_1.0_224' --image_dir=../tf_files/flower_photos
Within retrain.py I import the Tensorboard with
from google.datalab.ml import TensorBoard as tb
followed by the main function that does a bunch of things for the training process followed by:
inti=tf.global_variables_initializer()
sess.run(init)
tb.start('./tmp/retrain_logs’)
Executing retrain.py, a neural network will be trained, and TensorBoard will be activated (as stated in the output of my notebook copied below)
TensorBoard 1.8.0 at http://3439c553be9b:59199 (Press CTRL+C to quit)
{'text/html':TensorBoard was started successfully with pid 7707. Click here to access it.}
I tried to see the TensorBoard by:
clicking on the link provided (http://3439c553be9b:59199). A site on my web-brower opens but is empty.
I used gCloud Shell to connect with 'datalab connect --port=59199 .' This brings me to my files on gCloud, but not to a TensorBoard.
Can someone tell me how to access the TensorBoard please?
Thank you,
Julia
Just clicking on the link should work.
Could you check to see if you have a firewall rule that is preventing this?
I think the first string "TensorBoard 1.8.0 at http://3439c553be9b:59199 (Press CTRL+C to quit) " was output by the new tensorboard version. The direct link won't work.
The second string "TensorBoard was started successfully with pid 7707. Click here to access it.", the word "here" should be backed up by a hyperlink. See code https://github.com/googledatalab/pydatalab/blob/master/google/datalab/ml/_tensorboard.py#L73. That link should work. Do you get that link?

TensorFlow 1.0.1 SavedModelBuilder

I'm currently doing exploration on deploying models on Google ML Engine. At first, I developed a model using TensorFlow 1.1.0 as it's the latest version exist (by the time this question is asked). However, it turned out that the highest supported version of TensorFlow on GCP is 1.0.1.
The problem is, previously when I was using TensorFlow 1.1.0, SavedModelBuilder would correctly save the model as SavedModel and its variables under variables/ directory. However, when I switch to TensorFlow 1.0.1, it didn't work similarly: The SavedModel file was created, but no files was created under variables/ and hence no model can be built using only the SavedModel file (missing files under variables/).
Is it a known bug? Or should I do something in order to make the SavedModelBuilder on TensorFlow 1.0.1 works as what TensorFlow 1.1.0 do?
Thank you.
EDIT, more detail:
Actually, there is no explicit tf.Variables exist in my model. However, there exist several tf.contrib.lookup.MutableDenseHashTables and they're exported correctly in TensorFlow 1.1.0, but not in TensorFlow 1.0.1 (as no variable was exported at all in 1.0.1).
It looks like the ability to save and load models in TensorFlow without variables was introduced in this commit which is only available in 1.1.0.
As a workaround, you could create a dummy (unused) variable in your model.
Edit:
Based on OP update, it sounds like there is a MutableDenseHashTable that isn't being saved out.
You can run TensorFlow 1.1 on CloudML Engine, but it requires manually adding it as an additional package.
First, download the TensorFlow 1.1 wheel. Then specify it as an additional package to your training job, e.g.,
gcloud ml-engine jobs submit training my_job \
--module-name trainer.task \
--staging-bucket gs://my-bucket \
--package-path /my/code/path/trainer \
--packages tensorflow-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl

Tensorflow Tensorboard on Windows shows a blank page

I'm using Tensorflow on Windows but when I try to launch Tensorboard opening http://localhost:6006 the browser shows a blank page
I have added the codeline
writer = tf.train.SummaryWriter('mypath/my_graph', sess.graph)
to my Tensorflow model and launched tensorboard with
tensorboard --logdir="mypath/my_graph"
Here the console output:
Following mrry suggestion I have updated to 0.12.0rc1 and now the Tensorboard page is shown but unfortunately I cannot see any graph, and is missing also the left panel to upload a graph file manually that I can see in some screenshots of the official guide.
Tried also to use
writer = tf.summary.FileWriter('mypath/my_graph', sess.graph)
following the deprecation hint
EDIT
I have found the problem. If I launch tensorboard --logdir="mypath/my_graph" TensorBoard is unable to load the path and looks always for the graph files in the default user path C:\Users\andrew\mygraph\ if I run console as user or C:\Windows\System32 if I run console as administrator. This is a bug and should be fixed.
The 0.12.0rc0 (Release Candidate 0) release of TensorFlow on Windows contains a broken version of TensorBoard. We recently made a new release (0.12.0rc1, Release Candidate 1) that contains a fix for TensorBoard on Windows. You can upgrade by following the instructions for installing the latest release on Windows, or simply typing pip install --upgrade tensorflow at the command prompt.
In ubuntu we can use:
tensorboard --logdir=/home/user/graph/
In Windows we have to change the command prompt to the directory in which the graph file is placed and then use:
tensorboard --logdir=\home\user\graph\