Access accuracy and cross-entropy information in tensorboard - tensorflow

I am using object_detection from the models/research tensorflow repository. I managed to successfully train a model, but I miss the accuracy and cross-entropy information when controlling the progress of my training with tensorboard.
Do I need to calculate the accuracy and add it to tb myself or is it already there and I am doing something wrong? In case I have to add it, would trainer.py be the right place to do so?

TensorBoard calculates these metrics for you, there is no need to go down this way. When you open tensorboard via tensorboard --logdir tf_files/training_summaries & (in Terminal), where tf_files/training_summaries is the path to where your trained model is, TensorBoard will provide so called summaries - for scalar variables (accuracy and cross-entropy), histograms, and images. You are also free to recalculate these if you wish, but the point, other than testing, would be none.

Related

Tensorflow object detection API only evaluates the latest checkpoint

I've trained SSD mobilent v2 320x320 model with around 4k steps which produced quite a few checkpoints that are saved in my training folder.The issue I am experiencing now is that this only evaluates the latest checkpoint, but I'd like to evaluate all of them at once.
Ideally I would like to see the results in TensorBoard which shows the validation accuracy (mAP) of the different checkpoints as graph - which it does already, but just for the one checkpoint.
I have tried to run my evaluation code to generate a graph for my mAP but it shows my mAP with a simple dot.
Each checkpoint refers to a previous state of your model during training. The graph you see on TensorBoard for mAP, at some points, is the same as the dots that are produced when you run the evaluation once on checkpoint because the checkpoints are not actually different models but your model at different times during training. So the graph of the last model is what you need.

More elegant way of displaying distributions and activations in TensorBoard

Keras's Tensorboard callback has a write_images and a histogram_freq argument that allows the weights and activations to be saved to Tensorboard to visualize during training.
The issue is, this saves the information for every layer and makes Tensorboard very messy, especially if I have logged other images to Tensorboard. This can be seen in the images below:
A lot of this logged information is redundant. Is there any way to make the weight distribution and activation visualizations more organized? Is there any way to only visualize certain layers?

Tensorboard: Why is there a zigzag pattern at gradient plots?

Here is a picture of the gradient of a conv2d layer (the kernel). It has a zigsag pattern which I would like to understand. What I understand is that the gradient changes from mini-batch to mini-batch. But why does it increase after each epoch?
I am using the Keras Adam optimizer with default settings. I dont think that is the reason. Dropout and Batch-Norm. should also not be the reason. I am using image augmentation but that does not change its behavior from batch to batch.
Does anybody have an idear?
I've seen this before with keras metrics.
In that case the problem was that the metrics maintain a running average across each epoch, and it's that "average so far" that they report to TensorBoard.
How are these grads getting to TensorBoard? Are you passing them to a tf.keras.metrics.Mean? If so you probably want to call "reset_states" on it. Maybe in an custom callback's on_batch_end.

How to visualize my training history in pytorch?

How do you guys visualize the training history of your pytorch model like in keras here.
I have a pytorch trained model and I want to see the graph of its training.
Can I do this using only matplotlib? If yes, can someone give me resources to follow.
You have to save the loss while training. A trained model won't have history of its loss. You need to train again.
Save the loss while training then plot it against the epochs using matplotlib. In your training function, where loss is being calculated save that to a file and visualize it later.
Also, you can use tensorboardX if you want to visualize in realtime.
This is a tutorial for tensorboardX: http://www.erogol.com/use-tensorboard-pytorch/

How to run inference on inception v3 trained models?

I've successfully trained the inception v3 model on custom 200 classes from scratch. Now I have ckpt files in my output dir. How to use those models to run inference?
Preferably, load the model on GPU and pass images whenever I want while the model persists on GPU. Using TensorFlow serving is not an option for me.
Note: I've tried to freeze these models but failed to correctly put output_nodes while freezing. Used ImagenetV3/Predictions/Softmax but couldn't use it with feed_dict as I couldn't get required tensors from freezed model.
There is poor documentation on TF site & repo on this inference part.
It sounds like you're on the right track, you don't really do anything different at inference time as you do at training time except that you don't ask it to compute the optimizer at inference time, and by not doing so, no weights are ever updated.
The save and restore guide in tensorflow documentation explains how to restore a model from checkpoint:
https://www.tensorflow.org/programmers_guide/saved_model
You have two options when restoring a model, either you build the OPS again from code (usually a build_graph() method) then load the variables in from the checkpoint, I use this method most commonly. Or you can load the graph definition & variables in from the checkpoint if the graph definition was saved with the checkpoint.
Once you've loaded the graph you'll create a session and ask the graph to compute just the output. The tensor ImagenetV3/Predictions/Softmax looks right to me (I'm not immediately familiar with the particular model you're working with). You will need to pass in the appropriate inputs, your images, and possibly whatever parameters the graph requires, sometimes an is_train boolean is needed, and other such details.
Since you aren't asking tensorflow to compute the optimizer operation no weights will be updated. There's really no difference between training and inference other than what operations you request the graph to compute.
Tensorflow will use the GPU by default just as it did with training, so all of that is pretty much handled behind the scenes for you.