Is there linter for model(inputs) of PyTorch like model.predict(inputs) of TensorFlow? - tensorflow

My goal is to do object detection. However, YOLOv7 and (hack to create bounding box with feature map) tutorial is using PyTorch.
The problem is: model(inputs) do not have typings.
The code L148-L150
out = model(inputs)
probs, class_preds = torch.max(out[0], dim=-1)
feature_maps = out[1].to("cpu")
The forced me to debug the helper.py file to understand what [0] and out[1] are. Currently, I assume that out[0] as the softmax probability and out[1] as the feature maps.

I think the answer is no, in general it is non-trivial to automatically infer the semantic meaning of the outputs of a neural network; this is a product of the semantic meaning of the inputs and the model structure itself. You could reference the Yolo model architecture provided in model.py (though as an aside you should not link to external code but rather provide relevant code in your question itself) and investigate the structure of the outputs, then reference the structure of the labeled inputs (as the model by definition is learning to replicate the structure of the labels.)
That being said in your case the output is quite obviously per-class probabilities and class indexes as shown in line 149:
probs, class_preds = torch.max(out[0], dim=-1)
as the outputs from torch.max per pytorch documentation are (maximum value, maximum index).

Related

Keras - sample_weights vs. sample_weight in model.fit?

I am using Keras with a tensorflow backend to train some CNNs for semantic segmentation of biomedical images. I am trying to weight every pixel in my input images during training and believe I am doing so with the data generator I am passing to model.fit.
However, I am a little confused about the meaning of 'sample_weights' vs. 'sample_weight' in the documentation for model.fit.
'sample_weights' is the third optional output from your dataset or image generator - i.e. the output of the generator can either be the tuple (inputs, targets) or the tuple (inputs, targets, sample_weights). I believe this lets me create a mask that weights my samples pixel-by-pixels, but this isn't super clear from the documentation.
'sample_weight' is a separate field that seems to be pretty clearly defined as a weight you can give to every sample. If I understand, this would allow me to give more or less weight to particular images in my training set.
Do I have this right? Thanks.

TensorFlow / Keras Model as function of its weights (and inputs) / Making a tf model pure functional, non-stateful

As the title suggests I'm looking for a way to copy an arbitrary tensorflow (/keras) model, such that I can run the same computational graph from the model, but have the weights (or different weights or a tensor copy of them) as part of the input of the function, something like the following, where an implementation (or idea how to implement) 'smart_copy_function' is what is missing:
model = some_model() #a TF/Keras Model
model_from_weights = smart_copy_function(model)
x = tf.some_model_input # imagine some random input to the model here, e.g. an mnist image
y_model = model(x) #normal way to call model
y_model_from_weights = model_from_weights(model.trainable_weights, x) #same call to copied model
y_model == y_model_from_weights #should be True
I believe this should be doable in a somewhat easy way, as the respective computational graph does exist in TF anyway already.
'This sounds stupid, why would you do this?': I want to build an analog to the PyTorch MetaLearning Framework higher for TensorFlow, since gradients through calls of TF optimizer 'apply_gradients' and variable 'assign' are not supported. To achieve such gradients through parameter gradient updates the above seems to be the way to go. Such gradients through parameter updates are in turn very important for Meta-Learning Research, with papers like 'Model-Agnostic Meta Learning' and 'Teaching with Commentaries' being somewhat famous examples / use cases.

Tensorflow Hub Image Modules: Clarity on Preprocessing and Output values

Many thanks for support!
I currently use TF Slim - and TF Hub seems like a very useful addition for transfer learning. However the following things are not clear from the documentation:
1. Is preprocessing done implicitly? Is this based on "trainable=True/False" parameter in constructor of module?
module = hub.Module("https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1", trainable=True)
When I use Tf-slim I use the preprocess method:
inception_preprocessing.preprocess_image(image, img_height, img_width, is_training)
2.How to get access to AuxLogits for an inception model? Seems to be missing:
import tensorflow_hub as hub
import tensorflow as tf
img = tf.random_uniform([10,299,299,3])
module = hub.Module("https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1", trainable=True)
outputs = module(dict(images=img), signature="image_feature_vector", as_dict=True)
The output is
dict_keys(['InceptionV3/Mixed_6b', 'InceptionV3/MaxPool_5a_3x3', 'InceptionV3/Mixed_6c', 'InceptionV3/Mixed_6d', 'InceptionV3/Mixed_6e', 'InceptionV3/Mixed_7a', 'InceptionV3/Mixed_7b', 'InceptionV3/Conv2d_2a_3x3', 'InceptionV3/Mixed_7c', 'InceptionV3/Conv2d_4a_3x3', 'InceptionV3/Conv2d_1a_3x3', 'InceptionV3/global_pool', 'InceptionV3/MaxPool_3a_3x3', 'InceptionV3/Conv2d_2b_3x3', 'InceptionV3/Conv2d_3b_1x1', 'default', 'InceptionV3/Mixed_5b', 'InceptionV3/Mixed_5c', 'InceptionV3/Mixed_5d', 'InceptionV3/Mixed_6a'])
These are excellent questions; let me try to give good answers also for readers less familiar with TF-Slim.
1. Preprocessing is not done by the module, because it is a lot about your data, and not so much about the CNN architecture within the module. The module only handles transforming input values from the canonical [0,1] range into whatever the pre-trained CNN within the module expects.
Lengthy rationale: Preprocessing of images for CNN training usually consists of decoding the input JPEG (or whatever), selecting a (reasonably large) random crop from it, random photometric and geometric transformations (distort colors, flip left/right, etc.), and resizing to the common image size for a batch of training inputs. The TensorFlow Hub modules that implement https://tensorflow.org/hub/common_signatures/images leave all of that to your code around the module.
The primary reason is that the suitable random transformations depend a lot on your training task, but not on the architecture or trained state weights of the module. For example, color distortions will help if you classify cars vs dogs, but probably not for ripe vs unripe bananas, and so on.
Also, a batch of images that have been decoded but not yet cropped/resized are hard to represent as a single tensor (unless you make it a 1-D tensor of encoded strings, but that brings other problems, such as breaking backprop into module inputs for advanced uses).
Bottom line: The Python code using the module needs to do image preprocessing (except scaling values), for example, as in https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py
The slim preprocessing methods conflate the dataset-specific random transformations (tuned for Imagenet!) with the re-scaling to the architecture's value range (which the Hub module does for you). That means they are not directly applicable here.
2. Indeed, auxiliary heads are missing from the initial set of modules published under tfhub.dev/google/..., but I expect them to work fine for re-training anyways.
More details: Not all architectures have auxiliary heads, and even the original Inception paper says their effect was "relatively minor" [Szegedy&al. 2015; ยง5]. Using an image feature vector module for a custom classification task would burden the module consumer code with checking for aux features and, if found, putting aux logits and a loss term on top.
This complication did not seem to pull its weight, but more experiments might refute that assessment. (Please share in a GitHub issue if you know of any.)
For now, the only way to put an aux head onto https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1 is to copy&paste some lines from https://github.com/tensorflow/models/blob/master/research/slim/nets/inception_v3.py (search "Auxiliary head logits") and apply that to the "Inception_V3/Mixed_6e" output that you saw.
3. You didn't ask, but: For training, the module's documentation recommends to pass hub.Module(..., tags={"train"}), or else batch norm operates in inference mode (and dropout, if the module had any).
Hope this explains how and why things are.
Arno (from the TensorFlow Hub developers)

Feeding a single image into model trained with inception v3

I've searched around the internet for a few days and cannot seem to find an example of someone feeding a single image into a graph created using inception. Please let me know if I have grossly overlooked something obvious. To but the problem in context, I've
1) Trained a model and produced the relevant checkpoint files
model.ckpt-10000.data-00000-of-00001
model.ckpt-10000.index
model.ckpt-10000.meta
2) I then load the model
tf.reset_default_graph()
sess = tf.Session()
saver = tf.train.import_meta_graph(checkpoint_path + "/model.ckpt-10000.meta", clear_devices=True)
#<tensorflow.python.training.saver.Saver object at 0x11eea89e8>
sess.run(saver.restore(sess, checkpoint_path + "/model.ckpt-10000"))
3) This works correctly, so I load the default graph,
graph = tf.get_default_graph()
Here is where I am lost. As seen by this example, we must identify the layers of the graph by name to pass our image data into -- http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/.
So, what are the names of these layers? I suppose they something like "DecodeJpeg" and "/tower1/preditions/logits", but those are no better than guesses.
Thank you for your help.
The standard way of mapping between operations before and after save/restore is by adding them to collections. Search for tf.add_to_collection and tf.get_collection in https://www.tensorflow.org/api_guides/python/meta_graph. These examples save training_op and logits, but you can save your input placeholders as well.
If you cannot re-save the meta graph def and it does not have any collections, looking at node names and types (inputs are typically placeholder ops) might be the best you can do.

Semantic Segmentation with Encoder-Decoder CNNs

Appologizes for misuse of technical terms.
I am working on a project of semantic segmentation via CNNs ; trying to implement an architecture of type Encoder-Decoder, therefore output is the same size as the input.
How do you design the labels ?
What loss function should one apply ? Especially in the situation of heavy class inbalance (but the ratio between the classes is variable from image to image).
The problem deals with two classes (objects of interest and background). I am using Keras with tensorflow backend.
So far, I am going with designing expected outputs to be the same dimensions as the input images, applying pixel-wise labeling. Final layer of model has either softmax activation (for 2 classes), or sigmoid activation ( to express probability that the pixels belong to the objects class). I am having trouble with designing a suitable objective function for such a task, of type:
function(y_pred,y_true),
in agreement with Keras.
Please,try to be specific with the dimensions of tensors involved (input/output of the model). Any thoughts and suggestions are much appreciated. Thank you !
Actually when you use a TensorFlow backend you could simply apply a predefined Keras objectives in a following manner:
output = Convolution2D(number_of_classes, # 1 for binary case
filter_height,
filter_width,
activation = "softmax")(input_to_output) # or "sigmoid" for binary
...
model.compile(loss = "categorical_crossentropy", ...) # or "binary_crossentropy" for binary
And then feed either a one-hot encoded feature map or matrix of shape (image_height, image_width) with integer encoded classes (remember than in this case you should use sparse_categorical_crossentropy as a loss).
To deal with a class inbalance (I guess it's beacuse of a backgroud class) I strongly recommend you to read carefully answers to this Stack Overflow question.
I suggest starting with a base architecture used in practice like this one in nerve-segmentation: https://github.com/EdwardTyantov/ultrasound-nerve-segmentation. Here a dice_loss is used as a loss function. This works very well for a two class problem as has been shown in literature: https://arxiv.org/pdf/1608.04117.pdf.
Another loss function that has been widely used is cross entropy for such a problem. For problems like yours most commonly long and short skip connections are deployed to stabilize training as denoted in the paper above.
Two ways :
You could try 'flattening':
model.add(Reshape(NUM_CLASSES,HEIGHT*WIDTH)) #shape : HEIGHT x WIDTH x NUM_CLASSES
model.add(Permute(2,1)) # now itll be NUM_CLASSES x HEIGHT x WIDTH
#Use some activation here- model.activation()
#You can use Global averaging or Softmax
One hot encoding every pixel:
In this case your final layer should Upsample/Unpool/Deconvolve to HEIGHT x WIDTH x CLASSES. So your output is essentially of the shape: (HEIGHT,WIDTH,NUM_CLASSES).