More elegant way of displaying distributions and activations in TensorBoard - tensorflow

Keras's Tensorboard callback has a write_images and a histogram_freq argument that allows the weights and activations to be saved to Tensorboard to visualize during training.
The issue is, this saves the information for every layer and makes Tensorboard very messy, especially if I have logged other images to Tensorboard. This can be seen in the images below:
A lot of this logged information is redundant. Is there any way to make the weight distribution and activation visualizations more organized? Is there any way to only visualize certain layers?

Related

Change the spatial input dimension during training

I am training a yolov4 (fully convolutional) in tensorflow 2.3.0.
I would like to change the spatial input shape of the network during training, to further adjust the weights to different scales.
Is this possible?
EDIT:
I know of the existence of darknet, but it suffers from some very specific augmentations I use and have implemented in my repo, that is why I ask explicitly for tensorflow.
To be more precisely about what I want to do.
I want to train for several batches at Y1xX1xC then change the input size to Y2xX2xC and train again for several batches and so on.
It is not possible. In the past people trained several networks for different scales but the current state-of-the-art approach is feature pyramids.
https://arxiv.org/pdf/1612.03144.pdf
Another great candidate is to use dilated convolution which can learn long distance dependencies among pixels with varying distance. You can concatenate the outputs of them and the model will then learn which distance is important for which case
https://towardsdatascience.com/review-dilated-convolution-semantic-segmentation-9d5a5bd768f5
It's important to mention which TensorFlow repository you're using. You can definitely achieve this. The idea is to keep the fixed spatial input dimension in a single batch.
But even better approach is to use the darknet repository from AlexeyAB: https://github.com/AlexeyAB/darknet
Just set, random = 1 https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4.cfg [line 1149]. It will train your network with different spatial dimensions randomly.
One thing you can do is, start your training with AlexeyAB repo with random=1 set, then take the trained weights file to tensorflow for fine-tuning.

Why does the loss explode during training from scratch? - Tensorflow Object Detection Models

First of all I want to state out that I am familiar with the benefits of transfer learning. Moreover I am able to train a pretrained model from 'modelzoo' on my dataset. But for research purposes I want to train my model from scratch without transferlearning.
I want to adopt the Faster-RCNN Resnet 101 implementation from tensorsflow's Object Detection API to my dataset. If I use one of the pretrained models the training goes as expected and the loss is always in 'normal' ranges (never above about 6). But if I do not use transferlearning the loss jumps very frequently to extrem high values (about 80,000,000), but between those values the loss is in normal ranges. In addition to this I do not see any predictions of the network on images in TensorBoard. It seems like the network does not make any predictions at all. The only thing which I change is to comment out those two lines in the model.config file:
# fine_tune_checkpoint: 'path'
# from_detection_checkpoint: true
I tried a lot of things to find the reason: Changed optimizer, changed the learning rate, used gradient clipping, changed the initializer used different machines to train on but nothing helps. Moreover I inspected my label_map as well as my record file. To ensure that those files are correct I redid the steps mentioned above by using the pascal voc dataset, the script to create records and the label map from the api, but even with this code from the Object Detection API without any code changes, the loss explodes (Tensorflow Object Detection API own inputs).

Quantization aware training examples?

I want to do quantization-aware training with a basic convolutional neural network that I define directly in tensorflow (I don't want to use other API's such as Keras). The only ressource that I am aware of is the readme here:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize
However its not clear exactly where the different quantization commands should go in the overall process of training and then freezing the graph for actual inference.
Therefore I am wondering if there is any code example out there that shows how to define, train, and freeze a simple convolutional neural network with quantization aware training in tensorflow?
It seems that others have had the same question as well, see for instance here.
Thanks!

Access accuracy and cross-entropy information in tensorboard

I am using object_detection from the models/research tensorflow repository. I managed to successfully train a model, but I miss the accuracy and cross-entropy information when controlling the progress of my training with tensorboard.
Do I need to calculate the accuracy and add it to tb myself or is it already there and I am doing something wrong? In case I have to add it, would trainer.py be the right place to do so?
TensorBoard calculates these metrics for you, there is no need to go down this way. When you open tensorboard via tensorboard --logdir tf_files/training_summaries & (in Terminal), where tf_files/training_summaries is the path to where your trained model is, TensorBoard will provide so called summaries - for scalar variables (accuracy and cross-entropy), histograms, and images. You are also free to recalculate these if you wish, but the point, other than testing, would be none.

How to run inference on inception v3 trained models?

I've successfully trained the inception v3 model on custom 200 classes from scratch. Now I have ckpt files in my output dir. How to use those models to run inference?
Preferably, load the model on GPU and pass images whenever I want while the model persists on GPU. Using TensorFlow serving is not an option for me.
Note: I've tried to freeze these models but failed to correctly put output_nodes while freezing. Used ImagenetV3/Predictions/Softmax but couldn't use it with feed_dict as I couldn't get required tensors from freezed model.
There is poor documentation on TF site & repo on this inference part.
It sounds like you're on the right track, you don't really do anything different at inference time as you do at training time except that you don't ask it to compute the optimizer at inference time, and by not doing so, no weights are ever updated.
The save and restore guide in tensorflow documentation explains how to restore a model from checkpoint:
https://www.tensorflow.org/programmers_guide/saved_model
You have two options when restoring a model, either you build the OPS again from code (usually a build_graph() method) then load the variables in from the checkpoint, I use this method most commonly. Or you can load the graph definition & variables in from the checkpoint if the graph definition was saved with the checkpoint.
Once you've loaded the graph you'll create a session and ask the graph to compute just the output. The tensor ImagenetV3/Predictions/Softmax looks right to me (I'm not immediately familiar with the particular model you're working with). You will need to pass in the appropriate inputs, your images, and possibly whatever parameters the graph requires, sometimes an is_train boolean is needed, and other such details.
Since you aren't asking tensorflow to compute the optimizer operation no weights will be updated. There's really no difference between training and inference other than what operations you request the graph to compute.
Tensorflow will use the GPU by default just as it did with training, so all of that is pretty much handled behind the scenes for you.