Unable to save model with tensorflow 2.0.0 beta1 - tensorflow

I have tried all the options described in the documentation but none of them allowed me to save my model in tensorflow 2.0.0 beta1. I've also tried to upgrade to the (also unstable) TF2-RC but that ruined even the code I had working in beta so I quickly rolled back for now to beta.
See a minimal reproduction code below.
What I have tried:
model.save("mymodel.h5")
NotImplementedError: Saving the model to HDF5 format requires the
model to be a Functional model or a Sequential model. It does not work
for subclassed models, because such models are defined via the body of
a Python method, which isn't safely serializable. Consider saving to
the Tensorflow SavedModel format (by setting save_format="tf") or
using save_weights.
model.save("mymodel", format='tf')
ValueError: Model <main.CVAE object at 0x7f1cac2e7c50> cannot be
saved because the input shapes have not been set. Usually, input
shapes are automatically determined from calling .fit() or .predict().
To manually set the shapes, call model._set_inputs(inputs).
3.
model._set_input(input_sample)
model.save("mymodel", format='tf')
AssertionError: tf.saved_model.save is not supported inside a traced
#tf.function. Move the call to the outer eagerly-executed context.
And this is where I am stuck now because it gives me no reasonable hint whatsoever. That's because I am NOT calling the save() function from a #tf.function, I'm already calling it from the outermost scope possible. In fact, I have no #tf.function at all in this minimal reproduction script below and still getting the same error.
So I really have no idea how to save my model, I've tried every options and they all throw errors and provide no hints.
The minimal reproduction example below works fine if you set save_model=False and it reproduces the error when save_model=True.
It may seem unnecessary in this simplified auto-encoder code example to use a subclassed model but I have lots of custom functions added to it in my original VAE code that I need it for.
Code:
import tensorflow as tf
save_model = True
learning_rate = 1e-4
BATCH_SIZE = 100
TEST_BATCH_SIZE = 10
color_channels = 1
imsize = 28
(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()
train_images = train_images[:5000, ::]
test_images = train_images[:1000, ::]
train_images = train_images.reshape(-1, imsize, imsize, 1).astype('float32')
test_images = test_images.reshape(-1, imsize, imsize, 1).astype('float32')
train_images /= 255.
test_images /= 255.
train_dataset = tf.data.Dataset.from_tensor_slices(train_images).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).batch(TEST_BATCH_SIZE)
class AE(tf.keras.Model):
def __init__(self):
super(AE, self).__init__()
self.network = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(imsize, imsize, color_channels)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(50),
tf.keras.layers.Dense(imsize**2 * color_channels),
tf.keras.layers.Reshape(target_shape=(imsize, imsize, color_channels)),
])
def decode(self, input):
logits = self.network(input)
return logits
optimizer = tf.keras.optimizers.Adam(learning_rate)
model = AE()
def compute_loss(data):
logits = model.decode(data)
loss = tf.reduce_mean(tf.losses.mean_squared_error(logits, data))
return loss
def train_step(data):
with tf.GradientTape() as tape:
loss = compute_loss(data)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return loss, 0
def test_step(data):
loss = compute_loss(data)
return loss
input_shape_set = False
epoch = 0
epochs = 20
for epoch in range(epochs):
for train_x in train_dataset:
train_step(train_x)
if epoch % 1 == 0:
loss = 0.0
num_batches = 0
for test_x in test_dataset:
loss += test_step(test_x)
num_batches += 1
loss /= num_batches
print("Epoch: {}, Loss: {}".format(epoch, loss))
if save_model:
print("Saving model...")
if not input_shape_set:
# Note: Why set input shape manually and why here:
# 1. If I do not set input shape manually: ValueError: Model <main.CVAE object at 0x7f1cac2e7c50> cannot be saved because the input shapes have not been set. Usually, input shapes are automatically determined from calling .fit() or .predict(). To manually set the shapes, call model._set_inputs(inputs).
# 2. If I set input shape manually BEFORE the first actual train step, I get: RuntimeError: Attempting to capture an EagerTensor without building a function.
model._set_inputs(train_dataset.__iter__().next())
input_shape_set = True
# Note: Why choose tf format: model.save('MNIST/Models/model.h5') will return NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using save_weights.
model.save('MNIST/Models/model', save_format='tf')

I have tried the same minimal reproduction example in tensorflow-gpu 2.0.0-rc0 and the error was more revealing than what the beta version gave me. The error in RC says:
NotImplementedError: When subclassing the Model class, you should
implement a call method.
This got me read through https://www.tensorflow.org/beta/guide/keras/custom_layers_and_models where I found examples of how to do subclassing in TF2 in a way that allows saving. I was able to resolve the error and have the model saved by replacing my 'decode' method by 'call' in the above example (although this will be more complicated with my actual code where I had various methods defined for the class). This solved the error both in beta and in rc. Strangely, the training (or the saving) got also much faster in rc.

You should change two things:
Change the decode method to call, as you pointed out
As your model is of type Sequential, and not built inside the class, you want to call the save method on the self.network attribute of the model, i.e.,
model.network.save('mymodel.h5')
alternatively, to keep things more standard, you can implement this method inside the AE class, as follows:
def save(self, save_dir):
self.network.save(save_dir)
Cheers mate

Related

MLFLOW on Databricks - Cannot log a Keras model as a mlflow.pyfunc model. Get TypeError: cannot pickle 'weakref' object

Hi all: this is one of my first posts on Stackoverflow - so apologies in advance if i'm not conforming to certain standards!
I'm having trouble saving my Keras model as a mlflow.pyfunc model as it's giving me a "cannot pickle a 'weakref' object when I try to log it.
So why am i saving my Keras model as a pyfunc model object in the first place? This is because I want to override the default predict method and output something custom. I also want to do some pre-processing steps on the X_test or new data by encoding it with a tf.keras.StringLookup and then invert it back to get the original categorical variable class. For this reason, I was advised by Databricks that the mlflow.pyfunc flavor is the best way to go for these types of use-cases
The Keras model works just fine and i'm able to log it using mlflow.keras.log_model. But it fails when i try to wrap it inside a cutomer "KerasWrapper" class.
Here are some snippets of my code. For the purpose of debugging, the current predict method in the custom class is just the default. I simplified it to help debug, but obviously I haven't been able to resolve it.
I would be extremely grateful for any help. Thanks in advance!
ALL CODE ON AZURE DATABRICKS
Custom mlflow.pyfunc class
class KerasWrapper(mlflow.pyfunc.PythonModel):
def __init__(self, keras_model, labelEncoder, labelDecoder, n):
self.keras_model = keras_model
self.labelEncoder = labelEncoder
self.labelDecoder = labelDecoder
self.topn = n
def load_context(self, context):
self.keras_model = mlflow.keras.load_model(model_uri=context.artifacts[self.keras_model], compile=False)
def predict(self, context, input_data):
scores = self.keras_model.predict(input_data)
return scores
My Keras Deep Learning Model (this works fine by the way)
def build_model(vocab_size, steps, drop_embed, n_dim, encoder, modelType):
model = None
i = Input(shape=(None,), dtype="int64")
#embedding layer
e = Embedding(vocab_size, 16)(i)
s = SpatialDropout1D(drop_embed)(e)
x = Conv1D(256, steps, activation='relu')(s)
x = GlobalMaxPooling1D()(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.2)(x)
#output layer
x = Dense(vocab_size, activation='softmax')(x)
model = Model(i, x)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
return model
MLFLOW Section
with mlflow.start_run(run_name=runName):
mlflow.tensorflow.autolog()
#Build the model, compile and train on the training set
#signature: build_model(vocab_size, steps, drop_embed, n_dim, encoder, modelType):
keras_model = build_model((vocab_size + 1), timeSteps, drop_embed, embedding_dimensions, encoder, modelType)
keras_model.fit(X_train_encoded, y_train_encoded, epochs=epochs, verbose=1, batch_size=32, use_multiprocessing = True,
validation_data=(X_test_encoded, y_test_encoded))
# Log the model parameters used for this run.
mlflow.log_param("numofActionsinWorkflow", numofActionsinWf)
mlflow.log_param("timeSteps", timeSteps)
#wrap it up in a pyfunc model
wrappedModel = KerasWrapper(keras_model, encoder, decoder, bestActionCount)
# Create a model signature using the tensor input to store in the MLflow model registry
signature = infer_signature(X_test_encoded, wrappedModel.predict(None, X_test_encoded))
# Let's check out how it looks
print(signature)
# Create an input example to store in the MLflow model registry
input_example = np.expand_dims(X_train[17], axis=0)
# The necessary dependencies are added to a conda.yaml file which is logged along with the model.
model_env = mlflow.pyfunc.get_default_conda_env()
# Record specific additional dependencies required by the serving model
model_env['dependencies'][-1]['pip'] += [
f'tensorflow=={tf.__version__}',
f'mlflow=={mlflow.__version__}',
f'sklearn=={sklearn.__version__}',
f'cloudpickle=={cloudpickle.__version__}',
]
#log the model to experiment
#mlflow.keras.log_model(keras_model, artifact_path = runName, signature=signature, input_example=input_example, conda_env = model_env)
wrapped_model_path = runName
if (os.path.exists(wrapped_model_path)):
shutil.rmtree(wrapped_model_path)
#Log model as pyfunc model
mlflow.pyfunc.log_model(runName, python_model=wrappedModel, signature=signature, input_example=input_example, conda_env = model_env)
#return the run ID for model registration
run_id = mlflow.active_run().info.run_id
mlflow.end_run()
Here is the error that i receive

Quantization aware training worse than post-quantization

I want to use quantization aware training to quantize my model to int8. Unfortunately, I cant simply quantize the entire model, since my first layer is a batch normalization (after InputLayer), so I need to use a custom quantizationConfig for that layer. My problem is that I have a accuracy drop of around 4%, while using post-quantization the drop is only 2%. Is there anything wrong in the following code? If not, do you have any ideas why QAT is that worse in this case?
class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
def get_weights_and_quantizers(self, layer):
return [(layer.weights[i], LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)) for i in range(2)]
def get_activations_and_quantizers(self, layer):
return []
def set_quantize_weights(self, layer, quantize_weights):
layer.weights[0] = quantize_weights[0]
layer.weights[1] = quantize_weights[1]
def set_quantize_activations(self, layer, quantize_activations):
pass
def get_output_quantizers(self, layer):
return [tfmot.quantization.keras.quantizers.MovingAverageQuantizer(
num_bits=8, per_axis=False, symmetric=False, narrow_range=False)]
def get_config(self):
return {}
def apply_quantization_to_dense(self,layer):
if layer != self.base_model.layers[1]:
return tfmot.quantization.keras.quantize_annotate_layer(layer)
else:
return tfmot.quantization.keras.quantize_annotate_layer(
layer,
quantize_config=DefaultDenseQuantizeConfig())
annotated_model = tf.keras.models.clone_model(
self.base_model,
clone_function=self.apply_quantization_to_dense,
)
q_aware_model = quantize_annotate_model(annotated_model)
with quantize_scope(
{'DefaultDenseQuantizeConfig': DefaultDenseQuantizeConfig,}):
q_aware_model = tfmot.quantization.keras.quantize_apply(q_aware_model)
q_aware_model.compile(optimizer=self.optimizer, loss=self.loss, loss_weights=self.loss_weights)
q_aware_model.fit(self.train_dataset,
steps_per_epoch= epochs,
epochs=self.quantization_dict["quantization_epochs"],
callbacks=callbacks, )
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()
It is difficult to say anything conclusive without having the actual model.
I would suggest to start checking that the quantization configuration is applied as you expect on your model. That includes looking at the per-layer output when calling summary() on the annotated model. If needed, it can be inspected further with tools such as Netron.

Tensorflow - Keras: Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset

While training a model in keras / tensorflow:
The code snippet:
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
I got the below error / warning:
Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
2020-12-16 17:12:20.885741: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:127] None of the MLIR optimization passes are enabled (registered 2)
2020-12-16 17:12:20.905570: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3593105000 Hz
Epoch 1/40
Any help is appreciated.
The error message here has newly arrived in tensorflow 2.4.0. While the error hints at a solution, it presupposes that your data is an object of the type tf.data.Dataset. There was previously no strict requirement to have your input data in this form (e.g. numpy arrays were fine), except now it seems to be a requirement with the distribute strategies (e.g tf.distribute.MirroredStrategy()). In any event, there does not appear to be a way to avoid tensorflow's latest console-vomit without wrapping your data in a Dataset object..
So supposing your current code looks something like this:
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
with strategy.scope():
model = ... # awesome model definition
train_x, train_y = np.array(...), np.array(...)
val_x, val_y = np.array(...), np.array(...)
batch_size = 32
model.fit(train_x, train_y, batch_size=batch_size, validation_data=(val_x, val_y))
It needs to be changed to look like this:
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
with strategy.scope():
model = ... # awesome model definition
train_x, train_y = np.array(...), np.array(...)
val_x, val_y = np.array(...), np.array(...)
# Wrap data in Dataset objects.
train_data = tf.data.Dataset.from_tensor_slices((train_x, train_y))
val_data = tf.data.Dataset.from_tensor_slices((val_x, val_y))
# The batch size must now be set on the Dataset objects.
batch_size = 32
train_data = train_data.batch(batch_size)
val_data = val_data.batch(batch_size)
# Disable AutoShard.
options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
train_data = train_data.with_options(options)
val_data = val_data.with_options(options)
model.fit(train_data, validation_data=val_data)
Note that if you don't set the batch size on the Dataset object, you'll get a cryptic error like this:
File "/usr/lib/python3.8/site-packages/tensorflow/python/data/experimental/ops/distribute.py", line 496, in get_static_batch_dim
return output_shape.dims[0].value
IndexError: list index out of range

Adding custom metric Keras Subclassing API

I'm following the section "Losses and Metrics Based on Model Internals" on chapter 12 of "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition - Aurélien Geron", in which he shows how to add custom losses and metrics that do not depend on labels and predictions.
To illustrate this, we add a custom "reconstruction loss" by adding a layer on top of the upper hidden layer which should reproduce the input. The loss is the mean squared difference betweeen the reconstruction loss and the inputs.
He shows the code for adding the custom loss, which works nicely, but even following his description I cannot make add the metric, since it raises `ValueError". He says:
Similarly, you can add a custom metric based on model internals by
computing it in any way you want, as long as the result is the output of a
metric object. For example, you can create a keras.metrics.Mean object
in the constructor, then call it in the call() method, passing it the
recon_loss, and finally add it to the model by calling the model’s
add_metric() method.
This is the code(I have added #MINE for the lines I have added myself)
import tensorflow as tf
from tensorflow import keras
class ReconstructingRegressor(keras.models.Model):
def __init__(self, output_dim, **kwargs):
super().__init__(**kwargs)
self.hidden = [keras.layers.Dense(30, activation="selu",
kernel_initializer="lecun_normal")
for _ in range(5)]
self.out = keras.layers.Dense(output_dim)
self.reconstruction_mean = keras.metrics.Mean(name="reconstruction_error") #MINE
def build(self, batch_input_shape):
n_inputs = batch_input_shape[-1]
self.reconstruct = keras.layers.Dense(n_inputs)
super().build(batch_input_shape)
def call(self, inputs, training=None):
Z = inputs
for layer in self.hidden:
Z = layer(Z)
reconstruction = self.reconstruct(Z)
recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
self.add_loss(0.05 * recon_loss)
if training: #MINE
result = self.reconstruction_mean(recon_loss) #MINE
else: #MINE
result = 0. #MINE, I have also tried different things here,
#but the help showed a similar sample to this.
self.add_metric(result, name="foo") #MINE
return self.out(Z)
Then compiling and fitting the model:
training_set_size=10
X_dummy = np.random.randn(training_set_size, 8)
y_dummy = np.random.randn(training_set_size, 1)
model = ReconstructingRegressor(1)
model.compile(loss="mse", optimizer="nadam")
history = model.fit(X_dummy, y_dummy, epochs=2)
Which throws:
ValueError: in converted code:
<ipython-input-296-878bdeb30546>:26 call *
self.add_metric(result, name="foo") #MINE
C:\Users\Kique\Anaconda3\envs\piz3\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py:1147 add_metric
self._symbolic_add_metric(value, aggregation, name)
C:\Users\Kique\Anaconda3\envs\piz3\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py:1867 _symbolic_add_metric
'We do not support adding an aggregated metric result tensor that '
ValueError: We do not support adding an aggregated metric result tensor that is not the output of a `tf.keras.metrics.Metric` metric instance. Without having access to the metric instance we cannot reset the state of a metric after every epoch during training. You can create a `tf.keras.metrics.Metric` instance and pass the result here or pass an un-aggregated result with `aggregation` parameter set as `mean`. For example: `self.add_metric(tf.reduce_sum(inputs), name='mean_activation', aggregation='mean')`
Having read that, I tried similar things to solve that issue but it just led to different errors. How can I solve this? What is the "correct" way to do this?
I'm using conda on Windows, with tensorflow-gpu 2.1.0 installed.
The problem is just right here:
def call(self, inputs, training=None):
Z = inputs
for layer in self.hidden:
Z = layer(Z)
reconstruction = self.reconstruct(Z)
recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
self.add_loss(0.05 * recon_loss)
if training:
result = self.reconstruction_mean(recon_loss)
else:
result = 0.#<---Here!
self.add_metric(result, name="foo")
return self.out(Z)
The error says that add_metric only gets a metric derived from tf.keras.metrics.Metric but 0 is a scalar, not a metric type.
My proposed solution is to simply do that:
def call(self, inputs, training=None):
Z = inputs
for layer in self.hidden:
Z = layer(Z)
reconstruction = self.reconstruct(Z)
recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
self.add_loss(0.05 * recon_loss)
if training:
result = self.reconstruction_mean(recon_loss)
self.add_metric(result, name="foo")
return self.out(Z)
This way, your mean reconstruction_error will be shown only in training time.
Since you work with eager mode, you should create your layer with dynamic=True as below:
model = ReconstructingRegressor(1,dynamic=True)
model.compile(loss="mse", optimizer="nadam")
history = model.fit(X_dummy, y_dummy, epochs=2, batch_size=10)
P.S - pay attention, that when calling model.fit or model.evaluate you should also make sure that the batch size divides your train set (since this is a stateful network). So, call those function like this: model.fit(X_dummy, y_dummy, epochs=2, batch_size=10) or model.evaluate(X_dummy,y_dummy, batch_size=10).
Good Luck!

using Estimator interface for inference with pre-trained tensorflow object detection model

I'm trying to load a pre-trained tensorflow object detection model from the Tensorflow Object Detection repo as a tf.estimator.Estimator and use it to make predictions.
I'm able to load the model and run inference using Estimator.predict(), however the output is garbage. Other methods of loading the model, e.g. as a Predictor, and running inference work fine.
Any help properly loading a model as an Estimator calling predict() would be much appreciated. My current code:
Load and prepare image
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(list(image.getdata())).reshape((im_height, im_width, 3)).astype(np.uint8)
image_url = 'https://i.imgur.com/rRHusZq.jpg'
# Load image
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
# Format original image size
im_size_orig = np.array(list(image.size) + [1])
im_size_orig = np.expand_dims(im_size_orig, axis=0)
im_size_orig = np.int32(im_size_orig)
# Resize image
image = image.resize((np.array(image.size) / 4).astype(int))
# Format image
image_np = load_image_into_numpy_array(image)
image_np_expanded = np.expand_dims(image_np, axis=0)
image_np_expanded = np.float32(image_np_expanded)
# Stick into feature dict
x = {'image': image_np_expanded, 'true_image_shape': im_size_orig}
# Stick into input function
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x=x,
y=None,
shuffle=False,
batch_size=128,
queue_capacity=1000,
num_epochs=1,
num_threads=1,
)
Side note:
train_and_eval_dict also seems to contain an input_fn for prediction
train_and_eval_dict['predict_input_fn']
However this actually returns a tf.estimator.export.ServingInputReceiver, which I'm not sure what to do with. This could potentially be the source of my problems as there's a fair bit of pre-processing involved before the model actually sees the image.
Load model as Estimator
Model downloaded from TF Model Zoo here, code to load model adapted from here.
model_dir = './pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28/'
pipeline_config_path = os.path.join(model_dir, 'pipeline.config')
config = tf.estimator.RunConfig(model_dir=model_dir)
train_and_eval_dict = model_lib.create_estimator_and_inputs(
run_config=config,
hparams=model_hparams.create_hparams(None),
pipeline_config_path=pipeline_config_path,
train_steps=None,
sample_1_of_n_eval_examples=1,
sample_1_of_n_eval_on_train_examples=(5))
estimator = train_and_eval_dict['estimator']
Run inference
output_dict1 = estimator.predict(predict_input_fn)
This prints out some log messages, one of which is:
INFO:tensorflow:Restoring parameters from ./pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt
So it seems like pre-trained weights are getting loaded. However results look like:
Load same model as a Predictor
from tensorflow.contrib import predictor
model_dir = './pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28'
saved_model_dir = os.path.join(model_dir, 'saved_model')
predict_fn = predictor.from_saved_model(saved_model_dir)
Run inference
output_dict2 = predict_fn({'inputs': image_np_expanded})
Results look good:
When you load the model as an estimator and from a checkpoint file, here is the restore function associated with ssd models. From ssd_meta_arch.py
def restore_map(self,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
"""Returns a map of variables to load from a foreign checkpoint.
See parent class for details.
Args:
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
load_all_detection_checkpoint_vars: whether to load all variables (when
`fine_tune_checkpoint_type='detection'`). If False, only variables
within the appropriate scopes are included. Default False.
Returns:
A dict mapping variable names (to load from a checkpoint) to variables in
the model graph.
Raises:
ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`.
"""
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
if fine_tune_checkpoint_type == 'classification':
return self._feature_extractor.restore_from_classification_checkpoint_fn(
self._extract_features_scope)
if fine_tune_checkpoint_type == 'detection':
variables_to_restore = {}
for variable in tf.global_variables():
var_name = variable.op.name
if load_all_detection_checkpoint_vars:
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable
return variables_to_restore
As you can see even if the config file sets from_detection_checkpoint: True, only the variables in the feature extractor scope will be restored. To restore all the variables, you will have to set
load_all_detection_checkpoint_vars: True
in the config file.
So, the above situation is quite clear. When load the model as an Estimator, only the variables from feature extractor scope will be restored, and the predictors's scope weights are not restored, the estimator would obviously give random predictions.
When load the model as a predictor, all weights are loaded thus the predictions are reasonable.