tf.Estimator.predict() issue when using a Tensorflow Hub module as the basis of a custom tf.Estimator - tensorflow

I am trying to create a custom tensorflow tf.Estimator. In the model_fn passed to the tf.Estimator, I am importing the Inception_V3 module from Tensorflow Hub.
Problem: After fine-tuning the model (using tf.Estimator.train), the results obtained using tf.Estimator.predict are not as good as expected based on tf.Estimator.evaluate (This is for a regression problem.)
I am new to Tensorflow and Tensorflow Hub, so I could be making lots of rookie mistakes.
When I run tf.Estimator.evaluate() on my validation data, the reported loss is in the same ball park as the loss after tf.Estimator.train() was used to train the model. The problem comes in when I try to use tf.Estimator.predict() on the same validation data.
tf.Estimator.predict() returns predictions which I then use to calculate the same loss metric (mean_squared_error) which is computed by tf.Estimator.evaluate(). I am using the same set of data to feed to the predict function as the evaluate function. But I do not get the same result for the mean_squared_error -- not remotely close! (The mse I calculate from predict is much worse.)
Here is what I have done (edited out some details)...
Define a model_fn with Tensorflow Hub module. Then call the tf.Estimator functions to train, evaluate and predict.
def my_model_fun(features, labels, mode, params):
# Load InceptionV3 Module from Tensorflow Hub
iv3_module =hub.Module("https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1",trainable=True, tags={'train'})
# Gather the variables for fine-tuning
var_list = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope='CustomeLayer')
var_list.extend(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope='module/InceptionV3/Mixed_5b'))
predictions = {"the_prediction" : final_output}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
# Define loss, optimizer, and evaluation metrics
loss = tf.losses.mean_squared_error(labels=labels, predictions=final_output)
optimizer =tf.train.AdadeltaOptimizer(learning_rate=learn_rate).minimize(loss,
var_list=var_list, global_step=tf.train.get_global_step())
rms_error = tf.metrics.root_mean_squared_error(labels=labels,predictions=predictions["the_prediction"])
eval_metric_ops = {"rms_error": rms_error}
if mode == tf.estimator.ModeKeys.TRAIN:
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,train_op=optimizer)
if mode == tf.estimator.ModeKeys.EVAL:
tf.summary.scalar('rms_error', rms_error)
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,eval_metric_ops=eval_metric_ops)
iv3_estimator = tf.estimator.Estimator(model_fn=iv3_model_fn)
iv3_estimator.train(input_fn=train_input_fn, steps=TRAIN_STEPS)
iv3_estimator.evaluate(input_fn=val_input_fn)
ii =0
for ans in iv3_estimator.predict(input_fn=test_input_fn):
sqErr = np.square(label[ii] - ans['the_prediction'][0])
totalSqErr += sqErr
ii += 1
mse = totalSqErr/ii
I expect that the mse loss reported by tf.Estimator.evaluate() should be the same as the when I calculate mse from the known labels and the output of tf.Estimator.predict()
Do I need to import the Tensorflow Hub model differently when I use predict? (use trainable=False in the call to hub.Module()?
Are the weights obtained from training being used when tf.Estimator.evaluate() runs, but not when tf.Estimator.predict()- runs?
other?

There's a few things that seem to be missing from the code snippet. How is final_output computed from iv3_module? Also, mean squared error is an unusual choice of loss function for a classification problem; the common approach is to pass image features from the module into a a linear output layer with scores for each class ("logits") and a "softmax cross-entropy loss". For an explanation of these terms, you can review online tutorials like https://developers.google.com/machine-learning/crash-course/ (all the way to multi-class neural nets).
Regarding TF-Hub technicalities:
The variables of a Hub module are automatically added to the GLOBAL_VARIABLES and TRAINABLE_VARIABLES collections (if trainable=True, as you already do). No manual extension of those collections should be needed.
hub.Module(..., tags=...) should be set to {"train"} for mode==TRAIN and set to None or the empty set otherwise.
In general, it's useful to get a solution working end-to-end for your problem without fine-tuning as a baseline, and then add fine-tuning.

Related

Fine tuning embedding weights within my Tensorflow hub model for an unsupervised learning problem

Tensorflow Version: 1.15
I'm currently using the Universal Sentence Encoder embeddings for pairwise similarity. I'd like to fine-tune the Universal Sentence to improve embeddings quality and I've gotten to this point:
module = hub.Module("https://tfhub.dev/google/universal-sentence-encoder/2", trainable=True)
variables_names = [v.name for v in tf.trainable_variables()]
with tf.Session() as sess111:
init = tf.global_variables_initializer()
sess111.run(init)
values = sess111.run(variables_names)
#for k, v in zip(variables_names, values):
# print ("Variable: ", k)
print(values[0])
module_embeds = module(sentences)
values = sess111.run(variables_names)
print(values[0])
My first thought was to pass sentences through the USE module thinking it would update the trainable variables within a tf session which wasn't the case. So at this point, I have access to each of the trainable variables but I'm not sure how to proceed. Reviewing this tensorflow hub issue, they mention the following strategy:
Define a loss, and add an optimizer for that loss, then running the optimizer will update the trained weights of the embed module.
I'm entirely sure what the best way to do this would be for my use case. I've seen this notebook which retrains a classifier but I can't grasp how we end up extracting tuned weights that can be used to generate new embeddings.
Any help or guidance would be much appreciated.

Greedy Initialization in Keras

I'm currently coding up a feed-forward network in Tensorflow and I want to make a custom initializer which initializes each layer using a (externally defined) function of the point with the highest MSE.
Pseudo-code:
Given any input data being passed to my current layer:
- Identify datapoint *X* with highest MSE from target
- Initialize at *f(X)*
Sorry if I posted no pseudocode but I have absolutely no idea how to go about it in Keras (Python not R).
Since you want your Weights to be updated with respect to Maximized Loss rather than Minimized Loss (from the comment), it can be achieved by passing -loss, as shown in the code below:
Tensorflow Version 2.x:
loss = tf.keras.losses.MSE()
opt = tf.keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=opt, loss = -loss)
Tensorflow Version 1.x:
loss = tf.reduce_mean(tf.keras.losses.MSE(y_true, y_pred))
trainm = tf.train.GradientDescentOptimizer(0.01).minimize(-loss)
Hope this helps.

Keras: Custom loss function with training data not directly related to model

I am trying to convert my CNN written with tensorflow layers to use the keras api in tensorflow (I am using the keras api provided by TF 1.x), and am having issue writing a custom loss function, to train the model.
According to this guide, when defining a loss function it expects the arguments (y_true, y_pred)
https://www.tensorflow.org/guide/keras/train_and_evaluate#custom_losses
def basic_loss_function(y_true, y_pred):
return ...
However, in every example I have seen, y_true is somehow directly related to the model (in the simple case it is the output of the network). In my problem, this is not the case. How do implement this if my loss function depends on some training data that is unrelated to the tensors of the model?
To be concrete, here is my problem:
I am trying to learn an image embedding trained on pairs of images. My training data includes image pairs and annotations of matching points between the image pairs (image coordinates). The input feature is only the image pairs, and the network is trained in a siamese configuration.
I am able to implement this successfully with tensorflow layers and train it sucesfully with tensorflow estimators.
My current implementations builds a tf Dataset from a large database of tf Records, where the features is a dictionary containing the images and arrays of matching points. Before I could easily feed these arrays of image coordinates to the loss function, but here it is unclear how to do so.
There is a hack I often use that is to calculate the loss within the model, by means of Lambda layers. (When the loss is independent from the true data, for instance, and the model doesn't really have an output to be compared)
In a functional API model:
def loss_calc(x):
loss_input_1, loss_input_2 = x #arbirtray inputs, you choose
#according to what you gave to the Lambda layer
#here you use some external data that doesn't relate to the samples
externalData = K.constant(external_numpy_data)
#calculate the loss
return the loss
Using the outputs of the model itself (the tensor(s) that are used in your loss)
loss = Lambda(loss_calc)([model_output_1, model_output_2])
Create the model outputting the loss instead of the outputs:
model = Model(inputs, loss)
Create a dummy keras loss function for compilation:
def dummy_loss(y_true, y_pred):
return y_pred #where y_pred is the loss itself, the output of the model above
model.compile(loss = dummy_loss, ....)
Use any dummy array correctly sized regarding number of samples for training, it will be ignored:
model.fit(your_inputs, np.zeros((number_of_samples,)), ...)
Another way of doing it, is using a custom training loop.
This is much more work, though.
Although you're using TF1, you can still turn eager execution on at the very beginning of your code and do stuff like it's done in TF2. (tf.enable_eager_execution())
Follow the tutorial for custom training loops: https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough
Here, you calculate the gradients yourself, of any result regarding whatever you want. This means you don't need to follow Keras standards of training.
Finally, you can use the approach you suggested of model.add_loss.
In this case, you calculate the loss exaclty the same way I did in the first answer. And pass this loss tensor to add_loss.
You can probably compile a model with loss=None then (not sure), because you're going to use other losses, not the standard one.
In this case, your model's output will probably be None too, and you should fit with y=None.

Implementing stochastic forward passes in part of a neural network in Keras?

my problem is the following:
I am working on an object detection problem and would like to use dropout during test time to obtain a distribution of outputs. The object detection network consists of a training model and a prediction model, which wraps around the training model. I would like to perform several stochastic forward passes using the training model and combine these e.g. by averaging the predictions in the prediction wrapper. Is there a way of doing this in a keras model instead of requiring an intermediate processing step using numpy?
Note that this question is not about how to enable dropout during test time
def prediction_wrapper(model):
# Example code.
# Arguments
# model: the training model
regression = model.outputs[0]
classification = model.outputs[1]
predictions = # TODO: perform several stochastic forward passes (dropout during train and test time) here
avg_predictions = # TODO: combine predictions here, e.g. by computing the mean
outputs = # TODO: do some processing on avg_predictions
return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name)
I use keras with a tensorflow backend.
I appreciate any help!
The way I understand, you're trying to average the weight updates for a single sample while Dropout is enabled. Since dropout is random, you would get different weight updates for the same sample.
If this understanding is correct, then you could create a batch by duplicating the same sample. Here I am assuming that the Dropout is different for each sample in a batch. Since, backpropagation averages the weight updates anyway, you would get your desired behavior.
If that does not work, then you could write a custom loss function and train with a batch-size of one. You could update a global counter inside your custom loss function and return non-zero loss only when you've averaged them the way you want it. I don't know if this would work, it's just an idea.

Tensorflow example seems broken, function returns different graph depending on mode. How can this work?

Reading through the tensorflow tutorials, specifically https://www.tensorflow.org/tutorials/layers, there is a function that does something to the effect of:
def some_model(mode):
# some stuff here
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
# some other stuff here
if mode == tf.estimator.ModeKeys.TRAIN:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train_op = optimizer.minimize(
loss=loss,
global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
# return something else dependent on condition
The above does not make sense to me in that if you build the model with mode set to predict, does it not initialize the weights from scratch/some initializer meaning that the results at this step are purely arbitrary?
if so why do this?
surely you should return a model suitable for training that can predict as well as train?
What am I missing?
The model function only to provide a graph tailored to the specific usage required (i.e., the mode you pass).
The idea behind this function is that you plug in your graph only what is necessary for the specific mode:
Prediction: You only need your model (from input to predictions output).
Evaluation: On top of what provided by prediction, you need some kind of accuracy measures to evaluate on.
Training: On top of what is provided by Prediction, you need to compute the loss, attach an optimizer and get a train operation.
In all of this, the part of the graph you actually learn on (i.e., your network's weights) is always the same, because it is always contained between your input and output tensors. The variables in that graph section are loaded from a checkpoint file, if available, or reinitialized otherwise.
The other nodes attached to the graph are what actually changes between different operation modes, but none of those nodes is necessary outside their specific phase (e.g., once you're done with training, you can dispose of loss computation, optimizer and all the boilerplate needed around it).