How do a track validation loss in TensorBoard? [duplicate] - tensorflow

This question already has an answer here:
When using tensorboard, how to summarize a loss that is computed over several minibatches?
(1 answer)
Closed 6 years ago.
I am training a model in TensorFlow. Periodically during training, I evaluate the model on a validation set. I'd like to write a summary of the training procedure so that TensorBoard displays a plot of the validation set loss so that I can see it go down with more training iterations. (Or jump back up if I start to overfit.)
I already have a global iteration variable as part of my summary. I'm thinking of creating a scalar summary validation_loss variable in the model graph that isn't connected to anything, but to which I periodically assign a variable to from my training loop.
Is this a good strategy? Is there a more idiomatic way to do this in TensorFlow?
(The specific project I'm working on is the TensorFlow RNN Language Model, which is a generalization of the RNN tutorial in the TensorFlow documentation.)

As I understand it, the idiomatic solution is to merge all summaries (in case loss is not your only summary) before creating a tf.train.SummaryWriter separately for your training and validation set. Then use the add_summary Op on the validation SummaryWriter for each (periodic) iteration.

Related

Initializing the weights of a layer from the output of another layer in tensorflow/keras [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm trying to implement the paper "learning to segment everything" and I need to set the weights of a layer in the segmentation network using the output of a weight transfer function.
The output of the last layer in the weight transfer fetched using layer.output in Keras is of type 'tensorflow.python.framework.ops.Tensor' while the weights should be initialized as a numpy array. Any idea how I can set the weights?
From what i got from the paper, the weights should be connected to the output of this transform layer let's say it's X. So what you want isn't creating "weights" then initializing the weights with this output X using tf.assign or any other method as this will not be differentiable., what you want is to connect the output X directly to work as weights in this other graph.
The thing is you can't do this through Keras layers or even tf.layers as this high level api doesn't allow you control this, because as soon as you create a layer in tf.layers or keras it creates it's own weights and you don't want that, you want to use this output X as weights not creating a new weights. So what you can do is easily re-implement whatever layer you want by yourself and use X directly as weights in this layer this will allow the gradient to flow back through this X.
Weights are typically stored in Variables. tf.assign operation can be used to assign values (represented as Tensors) to variables. You can see some basic examples of using tf.assign in session tests. It name there is state_ops.assign().
Just be aware that, like other tensorflow operations, it does not update the value of the variable immediately (unless you are using eager execution). It returns a tensor, that when evaluated (e.g. via session.run()), will update the variable.
From your question, I suspect that you might not be 100% clear about tensorflow computation model. The Tensor type is a symbolic representation of some value that will be produced only when the computation is actually run (via session.run()). You can't really talk about "converting a Tensor to numpy array" because you can't really convert the "result of operation foo" to concrete floats. You have to run the computation to compute the "result of operation foo" to know the concrete numbers. tf.assign works in this symbolic space. When using it, you are saying, "whatever the value of this tensor (output of some layer) will be when I run the computation, assign it to this variable".

Questions about tensorflow GetStarted tutorial

So I was reading the tensorflow getstarted tutorial and I found it very hard to follow. There were a lot of explanations missing about each function and why they are necesary (or not).
In the tf.estimator section, what's the meaning or what are they supposed to be the "x_eval" and "y_eval" arrays? The x_train and y_train arrays give the desired output (which is the corresponding y coordinate) for a given x coordinate. But the x_eval and y_eval values are incorrect: for x=5, y should be -4, not -4.1. Where do those values come from? What do x_eval and y_eval mean? Are they necesary? How did they choose those values?
The difference between "input_fn" (what does "fn" even mean?) and "train_input_fn". I see that the only difference is one has
num_epochs=None, shuffle=True
num_epochs=1000, shuffle=False
but I don't understand what "input_fn" or "train_input_fn" are/do, or what's the difference between the two, or if both are necesary.
3.In the
estimator.train(input_fn=input_fn, steps=1000)
piece of code, I don't understand the difference between "steps" and "num_epochs". What's the meaning of each one? Can you have num_epochs=1000 and steps=1000 too?
The final question is, how do i get the W and the b? In the previous way of doing it (not using tf.estimator) they explicitelly found that W=-1 and b=1. If I was doing a more complex neural network, involving biases and weights, I think I would want to recover the actual values of the weights and biases. That's the whole point of why I'm using tensorflow, to find the weights! So how do I recover them in the tf.estimator example?
These are just some of the questions that bugged me while reading the "getStarted" tutorial. I personally think it leaves a lot to desire, since it's very unclear what each thing does and you can at best guess.
I agree with you that the tf.estimator is not very well introduced in this "getting started" tutorial. I also think that some machine learning background would help with understanding what happens in the tutorial.
As for the answers to your questions:
In machine learning, we usually minimizer the loss of the model on the training set, and then we evaluate the performance of the model on the evaluation set. This is because it is easy to overfit the training set and get 100% accuracy on it, so using a separate validation set makes it impossible to cheat in this way.
Here (x_train, y_train) corresponds to the training set, where the global minimum is obtained for W=-1, b=1.
The validation set (x_eval, y_eval) doesn't have to perfectly follow the distribution of the training set. Although we can get a loss of 0 on the training set, we obtain a small loss on the validation set because we don't have exactly y_eval = - x_eval + 1
input_fn means "input function". This is to indicate that the object input_fn is a function.
In tf.estimator, you need to provide an input function if you want to train the estimator (estimator.train()) or evaluate it (estimator.evaluate()).
Usually you want different transformations for training or evaluation, so you have two functions train_input_fn and eval_input_fn (the input_fn in the tutorial is almost equivalent to train_input_fn and is just confusing).
For instance, during training we want to train for multiple epochs (i.e. multiple times on the dataset). For evaluation, we only need one pass over the validation data to compute the metrics we need
The number of epochs is the number of times we repeat the entire dataset. For instance if we train for 10 epochs, the model will see each input 10 times.
When we train a machine learning model, we usually use mini-batches of data. For instance if we have 1,000 images, we can train on batches of 100 images. Therefore, training for 10 epochs means training on 100 batches of data.
Once the estimator is trained, you can access the list of variables through estimator.get_variable_names() and the value of a variable through estimator.get_variable_value().
Usually we never need to do that, as we can for instance use the trained estimator to predict on new examples, using estimator.predict().
If you feel that the getting started is confusing, you can always submit a GitHub issue to tell the TensorFlow team and explain your point.

Tensorflow input pipeline

I have an input pipeline where samples are generated on fly. I use keras and custom ImageDataGenerator and corresponding Iterator to get samples in memory.
Under assumption that keras in my setup is using feed_dict (and that assumption is a question to me) I am thinking of speeding things up by switching to raw tensorflow + Dataset.from_generator().
Here I see that suggested solution for input pipelines that generate data on fly in the most recent Tensorflow is to use Dataset.from_generator().
Questions:
Does keras with Tensorflow backend use feed_dict method?
If I switch to raw tensorflow + Dataset.from_generator(my_sample_generator) will that cut feed_dict memory copy overhead and buy me performance?
During predict (evaluation) phase apart from batch_x, batch_y I have also opaque index vector from my generator output. That vector corresponds to sample ids in the batch_x. Does that mean that I'm stuck with feed_dict approach for predict phase because I need that extra batch_z output from iterator?
The new tf.contrib.data.Dataset.from_generator() can potentially speed up your input pipeline by overlapping the data preparation with training. However, you will tend to get the best performance by switching over to TensorFlow ops in your input pipeline wherever possible.
To answer your specific questions:
The Keras TensorFlow backend uses tf.placeholder() to represent compiled function inputs, and feed_dict to pass arguments to a function.
With the recent optimizations to tf.py_func() and feed_dict copy overhead, I suspect the amount of time spent in memcpy() will be the same. However, you can more easily use Dataset.from_generator() with Dataset.prefetch() to overlap the training on one batch with preprocessing on the next batch.
It sounds like you can define a separate iterator for the prediction phase. The tf.estimator.Estimator class does something similar by instantiating different "input functions" with different signatures for training and evaluation, then building a separate graph for each role.
Alternatively, you could add a dummy output to your training iterator (for the batch_z values) and switch between training and evaluation iterators using a "feedable iterator".

Using tensorflow input pipline with GAN

I want to build a conditional GAN with tensorflow and use input pipline for loading my dataset. The problem is that in each iteration I want to the use same data batch for training both generative and discriminative models, but because their training operators are fetched in different runs they will receive different batches of data. Is there any solution for that or should I use a feed_dict?
One way to use the same data is to use a tf.group on the generator and discriminator train ops so they are trained jointly, and set use_locking=True on your optimizers to prevent pathological race conditions. Note that there still will be some stochasticity due to the fact that TensorFlow runtime won't guarantee that either the generator or the discriminator will consistently be trained first.
This idea is already implemented in TensorFlow's TFGAN library in get_joint_train_hooks, although it uses hooks instead of grouping the training ops (the "joint" refers to the fact that the discriminator and generator are trained jointly, rather than sequentially).

When using tensorboard, how to summarize a loss that is computed over several minibatches?

I would like to use Tensorboard to visualize the evolution of the loss over a validation sample. But the validation set is too large to compute in one minibatch. Therefore, to compute my validation loss, I have to call session.run several times over several minibatches covering the validation set. Then I sum the loss (in python) of each minibatches to obtain the full validation loss.
My problem is that tf.scalar_summary seems to have to be attached to a tensorflow node. But I would need to somehow "attach" it to the sum of the values of a node over several run of session.run.
Is there a way to do that? Maybe by directly summarizing the python float that contains the sum of the minibatch losses? But I have not seen in the docs a way to "summarize" for tensorboard a python value that is outside of a computation. The example in the "How-To" section of the doc is only concerned with losses that can be computed in a single call to session.run.
You could add a Variable that is updated on each sess.Run call and have the summary track the value of the Variable.