Manually writing a tensorboard input - tensorflow

Is there a low-level API to write custom things into the tensorboard input directory?
For instance, this would enable writing summaries into the tensorboard directory without writing them from a tensorflow session, but from a custom executable.
As far as I can see, all the tensorboard inputs are inside a single append-only file where the structure of the file is not declared ahead (ie how many items we expects, what is their type, etc).
And each summary proto is sequentially written to this file through this class : https://github.com/tensorflow/tensorflow/blob/49c20c5814dd80f81ced493d362d374be9ab0b3e/tensorflow/core/lib/io/record_writer.cc
Was it ever attempted to manually create tensorboard input?
Is the format explicitely documented or do I have to reverse-engineer it?
thanks!

The library tensorboardX provides this functionality. It was written by a pytorch user who wanted to use tensorboard, but it doesn't depend on pytorch in any way.
You can install it with pip install tensorboardx.

Related

Using custom StyleGAN2-ada network in GANSpace (.pkl to .pt conversion)

I trained a network using Nvdia's StyleGAN2-ada pytorch implementation. I now have a .pkl file. I would like to use the GANSpace code on my network. However, to use GANSpace with a custom model, you need to be able to give it a checkpoint to your model that should be uploaded somewhere (they suggest Google Drive)(checkpoint required in code here). I am not entirely sure how this works or why it works like this, but either way it seems I need a .pt file of my network, not a .pkl file, which is what I currently have.
I tried following this tutorial. It seems the GANSpace code actually provides a file (models/stylegan2/convert_weight.py) that can do this conversion. However, it seems the file convert_weight.py that was supposed to be there has been replaced by a link to a whole other repo. If I try run the convert_weight.py file as below, it gives me the following error
python content/stylegan2-pytorch/convert_weight.py --repo="content/stylegan2-pytorch/" "content/fruits2_output/00000-fruits2-auto1/network-snapshot-025000.pkl"
ModuleNotFoundError: No module named 'dnnlib'
This makes sense because there is no such dnnlib module. If I instead change it to look for the dnnlib module somewhere that does have it (here) like this
python content/stylegan2-pytorch/convert_weight.py --repo="content/stylegan2/" "content/fruits2_output/00000-fruits2-auto1/network-snapshot-025000.pkl"
it previously gave me an error saying TensorFlow had not been installed (which in all fairness it hadn't because I am using PyTorch), much like this error reported here. I then installed TensorFlow, but then it gives me this error.
ModuleNotFoundError: No module named 'torch_utils'
again the same as in the previous issue reported on github. After installed torch_utils I get the same error as SamTransformer (ModuleNotFoundError: No module named 'torch_utils.persistence'). The response was "convert_weight.py does not supports stylegan2-ada-pytorch".
There is a lot I am not sure about, like why I need to convert a .pkl file to .pt in the first place. A lot of the stuff seems to talk about converting Tensorflow models to Pytorch ones, but mine was done in Pytorch originally, so why do I need to convert it? I just need a way to upload my own network to use in GANSpace - I don't really mind how, so any suggestions would be much appreciated.
Long story short, the conversion script provided was to convert weights from the official Tensorflow implementation of StyleGAN2 into Pytorch. As you mentioned, you already have a model in Pytorch, so it's reasonable for the conversion script to not work.
Instead of StyleGAN2 you used StyleGAN2-Ada which isn't mentioned in the GANspace repo. Most probably it didn't exist by the time the GANspace repo was created. As far as I know, StyleGAN2-Ada uses the same architecture as StyleGAN2, so as long as you manually modify your pkl file into the required pt format,you should be able to continue setup.
Looking at the source code for converting to Pytorch, GANspace requires the pt file to be a dict with keys: ['g', 'g_ema', 'd', 'latent_avg']. StyleGAN2-Ada saves a pkl containing a dict with the following keys: ['G', 'G_ema', 'D', 'augment_pipe']. You might be able to get things to work by loading the contents of your pkl file and resaving them in pt using these keys.

Programmatically inspect Tensorflow SavedModel

Given a large number of untrusted TensorFlow 2 models, in the SavedModel format, is there any way to inspect them in an automated way (ideally with an API, as opposed to manually inspecting them in TensorBoard), in order to check for certain characteristics, such as size of the inputs and output, type and order of the operators, or similar?
I checked the tf.saved_model module, but could not find any indication on how this can be achieved.
If bash can be used instead of python do the following:
git clone git#github.com:tensorflow/tensorflow.git
cd tensorflow
bazel build tensorflow/python/tools:saved_model_cli
python tensorflow/python/tools/saved_model_cli.py show --dir /path/to/model --all
It gives you input and output types and shapes
You can find more information here

Tensorflow 2.0 : Variables in Saved model

Is the variables files saved by saved_model API in protocol buffer (pb) format? If not, is there a way to load this without using tensorflow APIs( restore/ load)
There's a pure Python API which doesn't use TensorFlow operations if that's helpful: list variables and load a single variable. For a SavedModel you can point those at the variables/ subdirectory.
There's also TensorBundle which is the implementation in C++.
If neither of those are helpful the answer is probably "no". Theoretically it could be spun off into a separate package; if you're interested in doing that feel free to reach out.
You can use tf.keras.models.load_model to load a model from saved_model and what you get is a tf.keras.Model object.
i am not sure if it is verified. but it seems like pointing The list_variables and load_variable to variables subdirectory of SavedModel is not working. we will see "checkpoint" file missed assert.
a WA is to create a checkpoint file with one line pointed to the data file name.
model_checkpoint_path: "variables"

Is it possible to make tensorflow graph summary?

I'm aware of Tensorboard and how awesome it is, but I think that simple console output with current graph summary is better (and faster) for prototyping purpose.
And also know that I can generate tensorboard graph after simply running session with last network node as shown here.
What I'm looking for is something similar to model.summary() from Keras.
In another words: how to iterate over tensorflow graph and print out only custom high end layer with their shapes and dtypes in the same order how all these layer where generated?
It's certainly possible. If you are using tf.keras wrapper to build you can easily visualize the graph, even before model.compile() method executes.
It's keras built-in functionality called plot_model().
*This method have dependency on graphviz and pydot libraries.
for pydot installation : pip install pydot
but for graphviz installation you have follow step in this page. And also probably you have to restart the machine because of there it create system environment variables.
for tutorial on how to use this method follow this link
To plot your model with shapes and dtypes before training you could use:
tf.keras.utils.plot_model(model, show_shapes=True, expand_nested=True, show_dtype=True)
where "model" is your built model. The output of a model could looks like this:

Can Tensorflow read from HDFS on Mac?

I'm trying to coerce Tensorflow on OS/X to read from HDFS. The documentation
https://www.tensorflow.org/deploy/hadoop
does not clearly specify whether this is possible, and the code refers only to "posix" operating systems. The error I'm seeing when trying to use the HDFS is the following:
UnimplementedError (see above for traceback): File system scheme hdfs not implemented
[[Node: ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](TFRecordReaderV2, input_producer)]]
Here's what I've done up to this point:
brew installed Hadoop 2.7.2
separately compiled Hadoop 2.7.2 for the native libraries. Hadoop is installed on /usr/local/Cellar/hadoop/2.7.2/libexec on my system, and the native libraries (libhdfs.dylib) are in ~/Source/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.2/lib/native.
Edited the code at https://github.com/tensorflow/tensorflow/blob/v1.0.0/tensorflow/core/platform/hadoop/hadoop_file_system.cc#L113-L119 to read from libhdfs.dylib rather than libhdfs.so, recompiled, and reinstalled Tensorflow. (I have to admit this is pretty boneheaded, and I have no idea if it's all that's required to make this code work on Mac.)
Here is the code to reproduce.
test.sh:
set -x
export JAVA_HOME=$($(dirname $(which java | xargs readlink))/java_home)
export HADOOP_HOME=/usr/local/Cellar/hadoop/2.7.2/libexec
. $HADOOP_HOME/libexec/hadoop-config.sh
export HADOOP_HDFS_HOME=$(echo ~/Source/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.2)
export CLASSPATH=$($HADOOP_HDFS_HOME/bin/hdfs classpath --glob)
# Virtual environment with Tensorflow and necessary dependencies
. venv/bin/activate
python ./test.py
test.py:
import tensorflow as tf
_, example_bytes = tf.TFRecordReader().read(
tf.train.string_input_producer(
[
"hdfs://localhost:9000/user/foo/feature_output/part-r-00000",
"hdfs://localhost:9000/user/foo/feature_output/part-r-00001",
"hdfs://localhost:9000/user/foo/feature_output/part-r-00002",
"hdfs://localhost:9000/user/foo/feature_output/part-r-00003",
]
)
)
with tf.Session().as_default() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
print(len(sess.run(example_bytes)))
The code path I'm seeing in the Tensorflow source seems to indicate to me that I'd receive a different error than the one above if the issue were really mac-specific, since some kind of handler is registered for the "hdfs" scheme regardless: https://github.com/tensorflow/tensorflow/blob/v1.0.0/tensorflow/core/platform/hadoop/hadoop_file_system.cc#L474 . Has anyone else succeeded in coercing Tensorflow to work with Mac? If it isn't supported, is there an easy place to patch it?
I'm also open to suggestions as to what might be a better approach. The high-level goal is to efficiently train a model in parallel, using shared parameter servers, considering that each worker will only read a subset of the data. This is readily accomplished using the local filesystem, but it's less clear how to scale beyond that. Even if I do succeed in making the code above work, the result could suffer from problems with data locality.
This thread https://github.com/tensorflow/tensorflow/issues/2218 suggests using pyspark.RDD.toLocalIterator to iterate over the data set with a placeholder in the graph. Aside from my concern about forcing each worker to iterate through the full dataset, I don't see a way to coerce Tensorflow's builtin Estimator class to accept a custom feed function along with a specified input_fn, and a custom input_fn appears necessary in order to take advantage of models like LinearClassifier (https://www.tensorflow.org/tutorials/linear) that are capable of learning from sparse, weighted features.
Any thoughts?
Did you enable HDFS support in ./configure when building? That's the error you would get if HDFS is disabled.
I think you made the correct change to make it work. Feel free to send a pull request to look for .dylib on macOS.