Lower validation accuracy on ImageNet when evaluating Keras pre-trained models - tensorflow

I want to work with Keras models pre-trained on ImageNet. The models and information about their performance are here.
I downloaded ILSVRC 2012 (ImageNet) dataset and evaluated ResNet50 on the validation dataset. The top-1 accuracy should be 0.749 but I get 0.68. The top-5 accuracy should be 0.921, mine is 0.884. I also tried VGG16 and MobileNet with similar discrepancies.
I preprocess the images using built-in preprocess_input function (e.g. tensorflow.keras.applications.resnet50.preprocess_input()).
My guess is that the dataset is different. How can I make sure that the validation dataset that I use for evaluation is the same as the one that was used by the authors? Could there be any other reason why I get different results?

Related

Partially restore weights in TF2

I trained a resnet model whose final layer has 3 outputs (multiclass classification). I want to use these model weights to pretrain a regression model which has the exact same architecture except the last layer, which has 1 output.
This seems like a very basic use case, but I do not see how to do this. Restoring a checkpoint gives an error since the architectures are not the same (mismatched shape). All other solutions I have found are either for TF1 (eg https://innerpeace-wu.github.io/2017/12/13/Tensorflow-Restore-partial-weights/) or using Keras .h5 restore.
How can I do this in TF2?

Freezing BERT layers after importing via TF-hub and training them?

I will describe my intention here. I want to import BERT pretrained model via tf-hub function hub.module(bert_url, trainable = True) and utilize it for text classification task. I plan to use a large corpus to fine-tune weights of BERT as well as a few dense layers whose inputs are the BERT outputs. I would then like to freeze layers of BERT and train only the dense layers following BERT. How can I do this efficiently?
You mention Hub's TF1 API hub.Module, so I suppose you are writing TF1 code and using the TF1-compatible Hub assets google/bert/..., such as https://tfhub.dev/google/bert_cased_L-12_H-768_A-12/1
Are you going to have separate run of your program for the two phases of training? If so, maybe you can just drop trainable=True from the hub.Module call in the second run. This doesn't affect variable names, so you can restore the training result from the first run, including BERT's adjusted weights. (To be clear: the pre-trained weights shipped with the hub.Module are only used for initialization at the very start of training; restoring a checkpoint overrides them.)

Add custom classes to pre-trained data-set

I use the already trained(pre-trained) data-set for object detection using yolo+tensorflow.
My inference results are great but now I want to "add" a new class to pre-trained data-set.
There are 80 classes in pre-trained data-set how can I add my custom classes and made it 81 or 82 in total?
Inference git-hub "https://github.com/thtrieu/darkflow".
In case of transfer learning, pre-trained weights on famous datasets like 'Imagenet', 'fashion-mnist' etc are used. These datasets have defined number of classes and labels which may or may not be same as our dataset. The best practice is to add layers above the output layer of the pre-trained model output. For example in keras:
from tensorflow.keras.applications import mobilenet
from tensorflow.keras.layers import Dense, Flatten
output = mobilenet(include_top=False)
flatten = Flatten()(output)
predictions = Dense(number_of_classes, activation='softmax')(layer)
In this case you need to train(or better call it fine tune) the model using your dataset. The mobilenet network will use pretrained weights and the last layer will be only trained as per your dataset with the your defined number of classes.
You may also use:
from tensorflow.keras.applications import mobilenet
preds = mobilenet(include_top=Flase, classes=number_of_classes, weights='imagenet')
for more information you can refer: keras-applications
and these blog1, blog2
If you have already trained your model for 80 classes and need to add another class, then it would be better to re-train the model starting from previously saved checkpoints.(The network architecture should be designed for the total number of classes since the beginning cause at the output layer you will have neurons equal to the number of classes, if that is not the case you cannot add other class to the data as network has not been designed for it.) This will make use of initial training done on previous classes. The data that you are using for re-training, should now contain all the classes (including all the previous class and the new classes that you want to add). It's similar to initializing the weights from last trained checkpoint(on 80 classes) and then again train using more data (including all the classes 80 + more that you want to add) allowing back propagation through all the layers.

TensorFlow Graph to Keras Model?

Is it possible to define a graph in native TensorFlow and then convert this graph to a Keras model?
My intention is simply combining (for me) the best of the two worlds.
I really like the Keras model API for prototyping and new experiments, i.e. using the awesome multi_gpu_model(model, gpus=4) for training with multiple GPUs, saving/loading weights or whole models with oneliners, all the convenience functions like .fit(), .predict(), and others.
However, I prefer to define my model in native TensorFlow. Context managers in TF are awesome and, in my opinion, it is much easier to implement stuff like GANs with them:
with tf.variable_scope("Generator"):
# define some layers
with tf.variable_scope("Discriminator"):
# define some layers
# model losses
G_train_op = ...AdamOptimizer(...)
.minimize(gloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Generator")
D_train_op = ...AdamOptimizer(...)
.minimize(dloss,
var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="Discriminator")
Another bonus is structuring the graph this way. In TensorBoard debugging complicated native Keras models are hell since they are not structured at all. With heavy use of variable scopes in native TF you can "disentangle" the graph and look at a very structured version of a complicated model for debugging.
By utilizing this I can directly setup custom loss function and do not have to freeze anything in every training iteration since TF will only update the weights in the correct scope, which is (at least in my opinion) far easier than the Keras solution to loop over all the existing layers and set .trainable = False.
TL;DR:
Long story short: I like the direct access to everything in TF, but most of the time a simple Keras model is sufficient for training, inference, ... later on. The model API is much easier and more convenient in Keras.
Hence, I would prefer to set up a graph in native TF and convert it to Keras for training, evaluation, and so on. Is there any way to do this?
I don't think it is possible to create a generic automated converter for any TF graph, that will come up with a meaningful set of layers, with proper namings etc. Just because graphs are more flexible than a sequence of Keras layers.
However, you can wrap your model with the Lambda layer. Build your model inside a function, wrap it with Lambda and you have it in Keras:
def model_fn(x):
layer_1 = tf.layers.dense(x, 100)
layer_2 = tf.layers.dense(layer_1, 100)
out_layer = tf.layers.dense(layer_2, num_classes)
return out_layer
model.add(Lambda(model_fn))
That is what sometimes happens when you use multi_gpu_model: You come up with three layers: Input, model, and Output.
Keras Apologetics
However, integration between TensorFlow and Keras can be much more tighter and meaningful. See this tutorial for use cases.
For instance, variable scopes can be used pretty much like in TensorFlow:
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
with tf.name_scope('block1'):
y = LSTM(32, name='mylstm')(x)
The same for manual device placement:
with tf.device('/gpu:0'):
x = tf.placeholder(tf.float32, shape=(None, 20, 64))
y = LSTM(32)(x) # all ops / variables in the LSTM layer will live on GPU:0
Custom losses are discussed here: Keras: clean implementation for multiple outputs and custom loss functions?
This is how my model defined in Keras looks in Tensorboard:
So, Keras is indeed only a simplified frontend to TensorFlow so you can mix them quite flexibly. I would recommend you to inspect source code of Keras model zoo for clever solutions and patterns that allows you to build complex models using clean API of Keras.
You can insert TensorFlow code directly into your Keras model or training pipeline! Since mid-2017, Keras has fully adopted and integrated into TensorFlow. This article goes into more detail.
This means that your TensorFlow model is already a Keras model and vice versa. You can develop in Keras and switch to TensorFlow whenever you need to. TensorFlow code will work with Keras APIs, including Keras APIs for training, inference and saving your model.

Tensorflow embeddings

I know what embeddings are and how they are trained. Precisely, while referring to the tensorflow's documentation, I came across two different articles. I wish to know what exactly is the difference between them.
link 1: Tensorflow | Vector Representations of words
In the first tutorial, they have explicitly trained embeddings on a specific dataset. There is a distinct session run to train those embeddings. I can then later on save the learnt embeddings as a numpy object and use the
tf.nn.embedding_lookup() function while training an LSTM network.
link 2: Tensorflow | Embeddings
In this second article however, I couldn't understand what is happening.
word_embeddings = tf.get_variable(“word_embeddings”,
[vocabulary_size, embedding_size])
embedded_word_ids = tf.gather(word_embeddings, word_ids)
This is given under the training embeddings sections. My doubt is: does the gather function train the embeddings automatically? I am not sure since this op ran very fast on my pc.
Generally: What is the right way to convert words into vectors (link1 or link2) in tensorflow for training a seq2seq model? Also, how to train the embeddings for a seq2seq dataset, since the data is in the form of separate sequences for my task unlike (a continuous sequence of words refer: link 1 dataset)
Alright! anyway, I have found the answer to this question and I am posting it so that others might benefit from it.
The first link is more of a tutorial that steps you through the process of exactly how the embeddings are learnt.
In practical cases, such as training seq2seq models or Any other encoder-decoder models, we use the second approach where the embedding matrix gets tuned appropriately while the model gets trained.