Changing a trained network to keep only a subset of its output - tensorflow

Suppose I have a trained TensorFlow classification network for 20 classes as in PASCAL VOC 2007: aeroplane, bicycle, ..., car, cat, ..., person, ..., tvmonitor.
Now, I would like to have a sub-network for only a subset of the classes, e.g., 3 classes: car, cat, person.
Then, I can use this network for testing or for re-training/fine-tuning on a new dataset, only for the 3 classes.
It should be possible to extract this sub-network out of the original network, since it is only the last layer that will change. We need to discard the neurons/weights for the discarded classes.
My question: Is there an easy way to do this in TensorFlow?
It will be great if you can point to some sample code or similar solution.
I have googled, but have not come across any mention of this.
The symmetric problem, expanding the number of classes without discarding the original weights, can potentially be useful for some people, but my current focus is the one above.

If you want to only keep the output for a few slices, you could simply extract the corresponding slices from the last layer.
For example, let's assume the last layer is fully connected. Its weights are a tensor of size num_previous x num_output.
You want to keep only a few of these outputs, says output 1, 22, and 42. You can get the weights of your new fully connected layer as:
outputs_to_keep = [1, 22, 42]
new_W = tf.transpose(tf.gather(tf.transpose(old_W), outputs_to_keep))

It is possible to extract a pretrained subnet as you said. It is called transfer learning. There are different ways to do it, here you have one:
Find the layer you want to start with. You can use Tensorboard to find it and then use graph.get_tensor_by_name() Usually you keep the convolutional layers and discard the fully connected ones.
Connect your new layers (normally fully connected ones) to the previous layer.
Freeze the variables (weights) of the pretrained layers using trainable=false. Alternatively, you can instruct the optimizer to update only the weights from the new layers.
Train your model with the new classes.

Related

Multiple BERT binary classifications on a single graph to save on inference time

I have five classes and I want to compare four of them against one and the same class. This isn't a One vs Rest classifier, as for each output I want to score them against one base class.
The four outputs should be: base class vs classA, base class vs classB, etc.
I could do this by having multiple binary classification tasks, but that's wasting computation time if the first layers are BERT preprocessing + pretrained BERT layers, and the only differences between the four classifiers are the last few layers of BERT (finetuned ones) and the Dense layer.
So why not merge the graphs for more performance?
My inputs are four different datasets, each annotated with true/false for each class.
As I understand it, I can re-use most of the pipeline (BERT preprocessing and the first layers of BERT), as those have shared weights. I should then be able to train the last few layers of BERT and the Dense layer on top differently depending on the branch of the classifier (maybe using something like keras.switch?).
I have tried many alternative options including multi-class and multi-label classifiers, with actual and generated (eg, machine-annotated) labels in the case of multiple input labels, different activation and loss functions, but none of the results were acceptable to me (none were as good as the four separate models).
Is there a solution for merging the four different models for more performance, or am I stuck with using 4x binary classifiers?
When you train DNN for specific task it will be (in vast majority of cases) be better than the more general model that can handle several task simultaneously. Saying that, based on my experience the properly trained general model produces very similar results to the original binary ones. Anyways, here couple of suggestions for training strategies (assuming your training datasets for each task are completely different):
Weak supervision approach
Train your binary classifiers, and label your datasets using them (i.e. label with binary classifier trained on dataset 2 datasets [1,3,4]). Then train your joint model as multilabel task using all the newly labeled datasets (don't forget to randomize samples before feeding them to trainer ;) ). Here you will need to experiment if you will use threshold and set a label to 0/1 or use the scores of the binary classifiers.
Create custom loss function that will not penalize if no information provided for certain class. So when your will introduce sample from (say) dataset 2, your loss will be calculated only for the 2nd class.
Of course you can apply both simultaneously. For example, if you know that binary classifier produces scores that are polarized (most results are near 0 or 1), you can use weak labels, and automatically label your data with scores. Now during the second stage penalize loss such that for score x' = 4(x-0.5)^2 (note that you get logits from the model, so you will need to apply sigmoid function). This way you will increase contribution of the samples binary classifier is confident about, and reduce that of less certain ones.
As for releasing last layers of BERT, usually unfreezing upper 3-6 layers is enough. Releasing more layers improves results very little and increases time and memory requirements.

Training Tensorflow only one object

Corresponding Tensorflow documentation I trained 3 objects and get result (It can recognize these objects). When I show other objects (not the 3 ones) it doesn't work correctly.
I want to train only one object (example: a cup) and recognize only this object. Is it possible to do via Tensorflow ?
Your question doesn't provide enough details, but as I can guess your trained the network with softmax activation and Categorical or SparseCategorical cross entropy loss. If my guess is right, such network always generates prediction to one of three classess, regardless to actual data, i.e. there is no option of "no-one".
In order to train network to recognize only one class of objects, make the only one output with only one channel and sigmoid activation. Use BinaryCrossEntropy loss to train your model for the specific object. Provide dataset that includes examples with this object and without it.

Limiting probability percentage of irrelevant image in CNN

I am training a cnn model with five classes using keras library. Using model.predict function i get prediction percentage of the classes. My problem is for a image which doesn't belong to these classes and completely irrelevant, the predict class still predicts the percentages according to the classes.
How do I prevent it? How do I identify it as irrelevant?
I assume you are using a softmax activation on your last layer to generate the probabilities for each class. By definition, the sum of the outputs from the softmax activation must add up to 1. Therefore, it is impossible for the neural net to say that the image does not belong to any of your classes, with your current setup.
There are two potential ways you could address this:
Add another class that represents "other" or "unknown" objects (so you have 6 classes).
Add another output to your neural net (or train a completely independent neural net) that does binary classification on whether or not the image is in one of the 5 classes. That way, if your secondary output says that the image is not in the 5 classes, you can ignore the softmax output.
In both cases, you will need to augment your dataset with images that do not fall in your 5 classes.

Avoiding weight sharing among certain layers in BucketingModule in mxnet?

I am using BucketingModule for training multiple small models/bots together. Here, the bucket key is bot_id. However, each bot has separate set of target labels/classes (and hence, different size of softmax layer for each bot).
Is there any way to train such a model in mxnet, where I want to share the weights for all the layers but one (softmax) among all the bots?
How would I initialize such a model using sym_gen method?
If in the sym_gen method, for the Softmax layer I specify the num_hidden=size_dict[bot] i.e.,
pred = mx.sym.FullyConnected(data=pred, num_hidden=len(size_dict[bot]), name='pred')
pred = mx.sym.SoftmaxOutput(data=pred, label=label, name='softmax')
I get the error:
Inferred shape does not match shared_exec.arg_array's shape
which makes sense as each bot has different number of target classes.
This issue was posted and resolved here: https://github.com/apache/incubator-mxnet/issues/9042
You can make sym_gen(default_bucket_key) returns a "master network" that contains all these FC layers of different shapes, and sym_gen(other_keys) returns a subset of the master network with one particular FC. Note that for the master network, you probably need to use mx.sym.Group to group all outputs together so only one symbol is returned.

pruning tensorflow connections and weights (using cifar10 cnn)

I'm using tensorflow to run a cnn for image classification.
I use tensorflow cifar10 cnn implementation.(tensorflow cifar10)
I want to decrease the number of connections, meaning I want to prune the low-weight connections.
How can I create a new graph(subgraph) without some of the nuerones?
Tensorflow does not allow you lock/freeze a particular kernel of a particular layer, that I have found. The only I've found to do this is to use the tf.assign() function as shown in
How to freeze/lock weights of one Tensorflow variable (e.g., one CNN kernel of one layer
It's fairly cave-man but I've seen no other solution that works. Essentially, you have to .assign() the values every so often as you iterate through the data. Since this approach is so inelegant and brute-force, it's very slow. I do the .assign() every 100 batches.
Someone please post a better solution and soon!
The cifar10 model you point to, and for that matter, most models written in TensorFlow, do not model the weights (and hence, connections) of individual neurons directly in the computation graph. For instance, for fully connected layers, all the connections between the two layers, say, with M neurons in the layer below, and 'N' neurons in the layer above, are modeled by one MxN weight matrix. If you wanted to completely remove a neuron and all of its outgoing connections from the layer below, you can simply slice out a (M-1)xN matrix by removing the relevant row, and multiply it with the corresponding M-1 activations of the neurons.
Another way is add an addition mask to control the connections.
The first step involves adding mask and threshold variables to the
layers that need to undergo pruning. The variable mask is the same
shape as the layer's weight tensor and determines which of the weights
participate in the forward execution of the graph.
There is a pruning implementation under tensorflow/contrib/model_pruning to prune the model. Hope this can help you to prune model quickly.
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning
I think google has an updated answer here : https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning
Removing pruning nodes from the trained graph:
$ bazel build -c opt contrib/model_pruning:strip_pruning_vars
$ bazel-bin/contrib/model_pruning/strip_pruning_vars --checkpoint_path=/tmp/cifar10_train --output_node_names=softmax_linear/softmax_linear_2 --filename=cifar_pruned.pb
I suppose that cifar_pruned.pb will be smaller, since the pruned "or zero masked" variables are removed.