Using Beholder plugin with tf.estimator.Estimator

Using Beholder plugin with tf.estimator.Estimator - tensorflow

This is the Beholder Plugin, it allows for visualisation of all trainable variables (with sensible restrictions for massively deep networks).
My problem is that I am running my training using the tf.estimator.Estimator class and it appears that the Beholder plugin does not play nicely with the Estimator API.
My code looks like this:
# tf.data input pipeline setup
def dataset_input_fn(train=True):
filenames = ... # training files
if not train:
filenames = ... # test files
dataset = tf.data.TFRecordDataset(filenames), "GZIP")
# ... and so on until ...
iterator = batched_dataset.make_one_shot_iterator()
return iterator.get_next()
def train_input_fn():
return dataset_input_fn(train=True)
def test_input_fn():
return dataset_input_fn(train=False)
# model function
def cnn(features, labels, mode, params):
# build model
# Provide an estimator spec for `ModeKeys.PREDICT`.
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions={"sentiment": y_pred_cls})
eval_metric_ops = {
"accuracy": accuracy_op,
"precision": precision_op,
"recall": recall_op
}
normal_summary_hook = tf.train.SummarySaverHook(
100,
summary_op=summary_op)
return tf.estimator.EstimatorSpec(
mode=mode,
loss=cost_op,
train_op=train_op,
eval_metric_ops=eval_metric_ops,
training_hooks=[normal_summary_hook]
)
classifier = tf.estimator.Estimator(model_fn=cnn,
params=...,
model_dir=...)
classifier.train(input_fn=train_input_fn, steps=1000)
ev = classifier.evaluate(input_fn=test_input_fn, steps=1000)
tf.logging.info("Loss: {}".format(ev["loss"]))
tf.logging.info("Precision: {}".format(ev["precision"]))
tf.logging.info("Recall: {}".format(ev["recall"]))
tf.logging.info("Accuracy: {}".format(ev["accuracy"]))
I can't figure out where to add the beholder hook in this setup.
If I add it in the cnn function as a training hook:
return tf.estimator.EstimatorSpec(
mode=mode,
loss=dnn.cost,
train_op=dnn.train_op,
eval_metric_ops=eval_metric_ops,
training_hooks=[normal_summary_hook, beholder_hook]
)
then I get an InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype uint8 and shape [?,?,?].
If I try to use a tf.train.MonitoredTrainingSession to setup the classifier then the training proceeds as normal but nothing is logged to the beholder plugin. Looking at stdout I see two sessions being created one after the other, so it would appear that when you create a tf.estimator.Estimator classifier it spins up its own session after terminating any existing sessions.
Does anyone have any ideas?

Edited post:
This is a problem with old tensorflow versions. Fortunately, the issue is fixed in tensorflow version 1.9! The code below uses Beholder with tf.estimator.Estimator. It produced the same error as you mention with an older version, but everything works perfectly in version 1.9!
from capser_7_model_fn import *
from tensorflow.python import debug as tf_debug
from tensorflow.python.training import basic_session_run_hooks
from tensorboard.plugins.beholder import Beholder
from tensorboard.plugins.beholder import BeholderHook
import logging
# create estimator for model (the model is described in capser_7_model_fn)
capser = tf.estimator.Estimator(model_fn=model_fn, params={'model_batch_size': batch_size}, model_dir=LOGDIR)
# train model
logging.getLogger().setLevel(logging.INFO) # to show info about training progress in the terminal
beholder = Beholder(LOGDIR)
beholder_hook = BeholderHook(LOGDIR)
capser.train(input_fn=train_input_fn, steps=n_steps, hooks=[beholder_hook])
Another aspect is that I need to specify exactly the same LOGDIR for the summary writer, the tensorboard command line call and the BeholderHook. Before, in order to compare different runs of my model, I wrote summaries for different runs in LOGDIR/run_1, then LOGDIR/run_2, etc. i.e.:
capser = tf.estimator.Estimator(model_fn=model_fn, params={'model_batch_size': batch_size}, model_dir=LOGDIR/run_n)
and I used
tensorboard -logdir=LOGDIR
to launch tensorboard and I used
beholder_hook = BeholderHook(LOGDIR)
to write beholder data. In that case, beholder did not find the data it needed. What I needed to do was to specify exactly the same LOGDIR for everything. I.e., in the code:
capser = tf.estimator.Estimator(model_fn=model_fn, params={'model_batch_size': batch_size}, model_dir=LOGDIR+'/run_n')
beholder_hook = BeholderHook(LOGDIR+'/run_n')
And to launch tensorboard in the terminal:
tensorboard -logdir=LOGDIR+'/run_n'
Hope that helps.

Related

How to argparse() in Google Colab? (TensorFlowOnSpark application)

In Google Colab notebook, I'm developing a project in which I try to scale up my Keras sequential model into Pyspark environment.
At first I developed and tested a CNN model that classifies real faces and comics faces from a Kaggle dataset (2 folders with 20.000 .jpg files). The zip file can be downloaded here: "! kaggle datasets download -d defileroff/comic-faces-paired-synthetic-v2"
Secondly I converted the CNN model in a tf.estimator and followed all the steps from the guide (https://github.com/yahoo/TensorFlowOnSpark/wiki/Conversion-Guide) in order to sun my estimator in Pyspark.
The the estimator works correctly until I try to introduce it in a TFParallel.run(**kargs) command for which a previous argparse() function is required.
The error I recieve is:
"usage: ipykernel_launcher.py [-h] [--cluster_size CLUSTER_SIZE]
[--num_ps NUM_PS] [--tensorboard]
ipykernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-7aa3f316-ee26-49e8-9d72-7814d9a48255.json
An exception has occurred, use %tb to see the full traceback.
SystemExit: 2"
It looks like there is a problem with argparse() function.
Very Briefly, the code is available here (https://colab.research.google.com/github/cosimo-schiavoni/Massive_Data_Project/blob/main/Downloads_TFOS_ERROR.ipynb) and the structure is:
#Import Libraries
...
#def inference function: the function takes all the code indented, including the TFParallel() command.
def inference():
if __name__ == '__main__':
#Start Spark session and context
spark = SparkSession.builder.appName("Pyspark on Google Colab")
sc = SparkContext(conf=SparkConf().setAppName("TFOS"))
conf = SparkConf().setMaster("local[*]").setAppName("Colab")
executors = sc._conf.get("spark.executor.instances")
#Define parameters to parse
num_executors = int(executors) if executors is not None else 1
num_ps=1
#Define Parser
parser = argparse.ArgumentParser()
parser.add_argument("--cluster_size", help="number of nodes in the cluster (for Spark
Standalone)", type=int, default=num_executors)
parser.add_argument("--num_ps", help="number of parameter servers", type=int,
default=num_ps)
args = parser.parse_args()
#define the CNN Keras sequential model and compile it.
cnn = tf.keras.models.Sequential()
...
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
#Convert the CNN model in a tf.estimator.train_and_evaluate
from keras.preprocessing.image import ImageDataGenerator
# create generator object
datagen = ImageDataGenerator(
rescale=1./255,
validation_split=0.2)
#define train test input function
#tf.function
def train_input_fn():
val_it = datagen.flow_from_directory(
...)
return features, labels
#define validation test input function
#tf.function
def eval_input_fn():
val_it = datagen.flow_from_directory(
...)
return features, labels
#define the estimator
import tempfile
model_dir = tempfile.mkdtemp()
keras_estimator = tf.keras.estimator.model_to_estimator(
keras_model=cnn, model_dir=model_dir)
#Train and evaluate the estimator
train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=1000)
eval_spec = tf.estimator.EvalSpec(input_fn=eval_input_fn)
tf.estimator.train_and_evaluate(keras_estimator, train_spec, eval_spec)
#define parallel run of estimator in Spark environment
#TFCluster.run(sc,main_fun,args,args.cluster_size,args.num_ps,TFCluster.InputMode.TENSORFLOW)
TFParallel.run(sc, inference, args, args.cluster_size, use_barrier=False)
#call inference function and activate the code
inference()
Can anybody help me with this issue?
Moreover I have doubts about the configuration of Spark Session, is it correctly configured?
Is there a way tho knwow if I have a cluster or just a single device?
Can I know the number of active workers?
Thank you in advance.

tensorflow estimator passes train data through some weird normalization

Problem Description
I'm using tensorflow Estimator API, and have encountered a weird phenomenon.
I'm passing the exact same input_fn to both training and evaluation, and for some reason the images which are provided to the network are not identical.
They seem similar, but after taking a closer look, it seems that evaluation images are ok, but train images are somewhat distorted.
After loading them both, I noticed that for some reason the training images go through some kind of ReLu. I affirmed it with this code, which operates on mat_eval and mat_train, which are tensors that input_fn provides in evaluation and train mode:
special_relu = lambda mat: ((mat - 0.5) / 0.5) * ((mat - 0.5) / 0.5 > 0)
np.allclose(mat_train, special_relu(mat_eval))
>>> True
What I thought and tried
My initial thought was that it is some form of BatchNormalization. But BatchNormalization is supposed to happen within the network, and not as some preprocess, shouldn't it?
What I recorded (using tf.summary.image) was the features['image'] object, passed to my model_fn. And if I understand correctly, the features object is passed to model_fn by the input_fn called by the Estimator object.
Regardless, I tried to remove the parts in the code which are supposed to call the BatchNormalization. This had no effect. Of course, I might have not done that in the right way, but as I said it I don't really think it is BatchNormalization.
Code
from datetime import datetime
from pathlib import Path
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.python.platform import tf_logging as logging
from dcnn import modeling
from dcnn.dv_constants import BATCH_SIZE, BATCHES_PER_EPOCH
from dcnn.variant_io import get_input_fn, num_variants_in_ds
logging.set_verbosity(logging.INFO)
new_checkpoint_name = lambda: f'./train_dir/' \
f'{datetime.now().strftime("%d-%m %H:%M:%S")}'
if __name__ == '__main__':
model_name = 'small_inception'
start_from_checkpoint = ''
# start_from_checkpoint = '/home/yonatan/Desktop/yonas_code/dcnn/train_dir' \
# '/2111132905/model.ckpt-256'
model_dir = str(Path(start_from_checkpoint).parent) if \
start_from_checkpoint else new_checkpoint_name()
test = False
train = True
predict = False
epochs = 1
train_dataset_name = 'same_example'
val_dataset_name = 'same_example'
test_dataset_name = 'same_example'
predict_dataset_name = 'same_example'
model = modeling.get_model(model_name=model_name)
estimator = model.make_estimator( \
batch_size=BATCH_SIZE,
model_dir=model_dir,
params=dict(batches_per_epoch=BATCHES_PER_EPOCH),
use_tpu=False,
master='',
# The target of the TensorFlow standard server to use. Can be the empty string to run locally using an inprocess server.
start_from_checkpoint=start_from_checkpoint)
if train:
train_input_fn = get_input_fn(train_dataset_name, repeat=True)
val_input_fn = get_input_fn(val_dataset_name, repeat=False)
steps = (epochs * num_variants_in_ds(train_dataset_name)) / \
BATCH_SIZE
train_spec = tf.estimator.TrainSpec(input_fn=val_input_fn,
max_steps=steps)
eval_spec = tf.estimator.EvalSpec(input_fn=val_input_fn,
throttle_secs=1)
metrics = tf.estimator.train_and_evaluate(estimator, train_spec,
eval_spec)
print(metrics)
I have plenty of more code to share, but I tried to be concise. If anyone has any idea why this behavior happens, or needs more information, let me know.

Unable to save model with tensorflow 2.0.0 beta1

I have tried all the options described in the documentation but none of them allowed me to save my model in tensorflow 2.0.0 beta1. I've also tried to upgrade to the (also unstable) TF2-RC but that ruined even the code I had working in beta so I quickly rolled back for now to beta.
See a minimal reproduction code below.
What I have tried:
model.save("mymodel.h5")
NotImplementedError: Saving the model to HDF5 format requires the
model to be a Functional model or a Sequential model. It does not work
for subclassed models, because such models are defined via the body of
a Python method, which isn't safely serializable. Consider saving to
the Tensorflow SavedModel format (by setting save_format="tf") or
using save_weights.
model.save("mymodel", format='tf')
ValueError: Model <main.CVAE object at 0x7f1cac2e7c50> cannot be
saved because the input shapes have not been set. Usually, input
shapes are automatically determined from calling .fit() or .predict().
To manually set the shapes, call model._set_inputs(inputs).
3.
model._set_input(input_sample)
model.save("mymodel", format='tf')
AssertionError: tf.saved_model.save is not supported inside a traced
#tf.function. Move the call to the outer eagerly-executed context.
And this is where I am stuck now because it gives me no reasonable hint whatsoever. That's because I am NOT calling the save() function from a #tf.function, I'm already calling it from the outermost scope possible. In fact, I have no #tf.function at all in this minimal reproduction script below and still getting the same error.
So I really have no idea how to save my model, I've tried every options and they all throw errors and provide no hints.
The minimal reproduction example below works fine if you set save_model=False and it reproduces the error when save_model=True.
It may seem unnecessary in this simplified auto-encoder code example to use a subclassed model but I have lots of custom functions added to it in my original VAE code that I need it for.
Code:
import tensorflow as tf
save_model = True
learning_rate = 1e-4
BATCH_SIZE = 100
TEST_BATCH_SIZE = 10
color_channels = 1
imsize = 28
(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()
train_images = train_images[:5000, ::]
test_images = train_images[:1000, ::]
train_images = train_images.reshape(-1, imsize, imsize, 1).astype('float32')
test_images = test_images.reshape(-1, imsize, imsize, 1).astype('float32')
train_images /= 255.
test_images /= 255.
train_dataset = tf.data.Dataset.from_tensor_slices(train_images).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).batch(TEST_BATCH_SIZE)
class AE(tf.keras.Model):
def __init__(self):
super(AE, self).__init__()
self.network = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(imsize, imsize, color_channels)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(50),
tf.keras.layers.Dense(imsize**2 * color_channels),
tf.keras.layers.Reshape(target_shape=(imsize, imsize, color_channels)),
])
def decode(self, input):
logits = self.network(input)
return logits
optimizer = tf.keras.optimizers.Adam(learning_rate)
model = AE()
def compute_loss(data):
logits = model.decode(data)
loss = tf.reduce_mean(tf.losses.mean_squared_error(logits, data))
return loss
def train_step(data):
with tf.GradientTape() as tape:
loss = compute_loss(data)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return loss, 0
def test_step(data):
loss = compute_loss(data)
return loss
input_shape_set = False
epoch = 0
epochs = 20
for epoch in range(epochs):
for train_x in train_dataset:
train_step(train_x)
if epoch % 1 == 0:
loss = 0.0
num_batches = 0
for test_x in test_dataset:
loss += test_step(test_x)
num_batches += 1
loss /= num_batches
print("Epoch: {}, Loss: {}".format(epoch, loss))
if save_model:
print("Saving model...")
if not input_shape_set:
# Note: Why set input shape manually and why here:
# 1. If I do not set input shape manually: ValueError: Model <main.CVAE object at 0x7f1cac2e7c50> cannot be saved because the input shapes have not been set. Usually, input shapes are automatically determined from calling .fit() or .predict(). To manually set the shapes, call model._set_inputs(inputs).
# 2. If I set input shape manually BEFORE the first actual train step, I get: RuntimeError: Attempting to capture an EagerTensor without building a function.
model._set_inputs(train_dataset.__iter__().next())
input_shape_set = True
# Note: Why choose tf format: model.save('MNIST/Models/model.h5') will return NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using save_weights.
model.save('MNIST/Models/model', save_format='tf')

I have tried the same minimal reproduction example in tensorflow-gpu 2.0.0-rc0 and the error was more revealing than what the beta version gave me. The error in RC says:
NotImplementedError: When subclassing the Model class, you should
implement a call method.
This got me read through https://www.tensorflow.org/beta/guide/keras/custom_layers_and_models where I found examples of how to do subclassing in TF2 in a way that allows saving. I was able to resolve the error and have the model saved by replacing my 'decode' method by 'call' in the above example (although this will be more complicated with my actual code where I had various methods defined for the class). This solved the error both in beta and in rc. Strangely, the training (or the saving) got also much faster in rc.

You should change two things:
Change the decode method to call, as you pointed out
As your model is of type Sequential, and not built inside the class, you want to call the save method on the self.network attribute of the model, i.e.,
model.network.save('mymodel.h5')
alternatively, to keep things more standard, you can implement this method inside the AE class, as follows:
def save(self, save_dir):
self.network.save(save_dir)
Cheers mate

using Estimator interface for inference with pre-trained tensorflow object detection model

I'm trying to load a pre-trained tensorflow object detection model from the Tensorflow Object Detection repo as a tf.estimator.Estimator and use it to make predictions.
I'm able to load the model and run inference using Estimator.predict(), however the output is garbage. Other methods of loading the model, e.g. as a Predictor, and running inference work fine.
Any help properly loading a model as an Estimator calling predict() would be much appreciated. My current code:
Load and prepare image
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(list(image.getdata())).reshape((im_height, im_width, 3)).astype(np.uint8)
image_url = 'https://i.imgur.com/rRHusZq.jpg'
# Load image
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
# Format original image size
im_size_orig = np.array(list(image.size) + [1])
im_size_orig = np.expand_dims(im_size_orig, axis=0)
im_size_orig = np.int32(im_size_orig)
# Resize image
image = image.resize((np.array(image.size) / 4).astype(int))
# Format image
image_np = load_image_into_numpy_array(image)
image_np_expanded = np.expand_dims(image_np, axis=0)
image_np_expanded = np.float32(image_np_expanded)
# Stick into feature dict
x = {'image': image_np_expanded, 'true_image_shape': im_size_orig}
# Stick into input function
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x=x,
y=None,
shuffle=False,
batch_size=128,
queue_capacity=1000,
num_epochs=1,
num_threads=1,
)
Side note:
train_and_eval_dict also seems to contain an input_fn for prediction
train_and_eval_dict['predict_input_fn']
However this actually returns a tf.estimator.export.ServingInputReceiver, which I'm not sure what to do with. This could potentially be the source of my problems as there's a fair bit of pre-processing involved before the model actually sees the image.
Load model as Estimator
Model downloaded from TF Model Zoo here, code to load model adapted from here.
model_dir = './pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28/'
pipeline_config_path = os.path.join(model_dir, 'pipeline.config')
config = tf.estimator.RunConfig(model_dir=model_dir)
train_and_eval_dict = model_lib.create_estimator_and_inputs(
run_config=config,
hparams=model_hparams.create_hparams(None),
pipeline_config_path=pipeline_config_path,
train_steps=None,
sample_1_of_n_eval_examples=1,
sample_1_of_n_eval_on_train_examples=(5))
estimator = train_and_eval_dict['estimator']
Run inference
output_dict1 = estimator.predict(predict_input_fn)
This prints out some log messages, one of which is:
INFO:tensorflow:Restoring parameters from ./pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt
So it seems like pre-trained weights are getting loaded. However results look like:
Load same model as a Predictor
from tensorflow.contrib import predictor
model_dir = './pretrained_models/tensorflow/ssd_mobilenet_v1_coco_2018_01_28'
saved_model_dir = os.path.join(model_dir, 'saved_model')
predict_fn = predictor.from_saved_model(saved_model_dir)
Run inference
output_dict2 = predict_fn({'inputs': image_np_expanded})
Results look good:

When you load the model as an estimator and from a checkpoint file, here is the restore function associated with ssd models. From ssd_meta_arch.py
def restore_map(self,
fine_tune_checkpoint_type='detection',
load_all_detection_checkpoint_vars=False):
"""Returns a map of variables to load from a foreign checkpoint.
See parent class for details.
Args:
fine_tune_checkpoint_type: whether to restore from a full detection
checkpoint (with compatible variable names) or to restore from a
classification checkpoint for initialization prior to training.
Valid values: `detection`, `classification`. Default 'detection'.
load_all_detection_checkpoint_vars: whether to load all variables (when
`fine_tune_checkpoint_type='detection'`). If False, only variables
within the appropriate scopes are included. Default False.
Returns:
A dict mapping variable names (to load from a checkpoint) to variables in
the model graph.
Raises:
ValueError: if fine_tune_checkpoint_type is neither `classification`
nor `detection`.
"""
if fine_tune_checkpoint_type not in ['detection', 'classification']:
raise ValueError('Not supported fine_tune_checkpoint_type: {}'.format(
fine_tune_checkpoint_type))
if fine_tune_checkpoint_type == 'classification':
return self._feature_extractor.restore_from_classification_checkpoint_fn(
self._extract_features_scope)
if fine_tune_checkpoint_type == 'detection':
variables_to_restore = {}
for variable in tf.global_variables():
var_name = variable.op.name
if load_all_detection_checkpoint_vars:
variables_to_restore[var_name] = variable
else:
if var_name.startswith(self._extract_features_scope):
variables_to_restore[var_name] = variable
return variables_to_restore
As you can see even if the config file sets from_detection_checkpoint: True, only the variables in the feature extractor scope will be restored. To restore all the variables, you will have to set
load_all_detection_checkpoint_vars: True
in the config file.
So, the above situation is quite clear. When load the model as an Estimator, only the variables from feature extractor scope will be restored, and the predictors's scope weights are not restored, the estimator would obviously give random predictions.
When load the model as a predictor, all weights are loaded thus the predictions are reasonable.

Trouble restoring checkpointed TensorFlow net

I have built an auto encoder to "convert" the activations from VGG19.relu4_1 into pixels. I use the new convenience functions in tensorflow.contrib.layers (as in TF 0.10rc0). The code is have similar layout as TensorFlow's CIFAR10 tutorial with a train.py that does the training and checkpoints the model to disk and one eval.py that polls for new checkpoints files and run inference on them.
My problem is that the evaluation is never as good as the training, neither in terms of the value of the loss function nor when I look at the output images (even when running on the same images as the training does). This makes me think it has something to do with the restore process.
When I look at the output from the training in TensorBoard it looks good (eventually) so I don't think there is anything wrong with my net per se.
My net looks like this:
import tensorflow.contrib.layers as contrib
bn_params = {
"is_training": is_training,
"center": True,
"scale": True
}
tensor = contrib.convolution2d_transpose(vgg_output, 64*4, 4,
stride=2,
normalizer_fn=contrib.batch_norm,
normalizer_params=bn_params,
scope="deconv1")
tensor = contrib.convolution2d_transpose(tensor, 64*2, 4,
stride=2,
normalizer_fn=contrib.batch_norm,
normalizer_params=bn_params,
scope="deconv2")
.
.
.
And in train.py I do this to save the checkpoint:
variable_averages = tf.train.ExponentialMovingAverage(mynet.MOVING_AVERAGE_DECAY)
variables_averages_op = variable_averages.apply(tf.trainable_variables())
with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
train_op = tf.no_op(name='train')
while training:
# train (with batch normalization's is_training = True)
if time_to_checkpoint:
saver.save(sess, checkpoint_path, global_step=step)
In eval.py I do this:
# run code that creates the net
variable_averages = tf.train.ExponentialMovingAverage(
mynet.MOVING_AVERAGE_DECAY)
saver = tf.train.Saver(variable_averages.variables_to_restore())
while polling:
# sleep and check for new checkpoint files
with tf.Session() as sess:
init = tf.initialize_all_variables()
init_local = tf.initialize_local_variables()
sess.run([init, init_local])
saver.restore(sess, checkpoint_path)
# run inference (with batch normalization's is_training = False)
The blue is the training loss, and the orange is the eval loss.

The problem was that I used the tf.train.AdamOptimizer() directly. During the optimization it didn't call the operations defined in contrib.batch_norm to calculate the running mean/variance of the input so the mean/variance was always 0.0/1.0.
The solution is to add a dependency to the GraphKeys.UPDATE_OPS collection. There already is a function defined in the contrib module that does this (optimize_loss())

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using Beholder plugin with tf.estimator.Estimator - tensorflow

Related

How to argparse() in Google Colab? (TensorFlowOnSpark application)

tensorflow estimator passes train data through some weird normalization

Unable to save model with tensorflow 2.0.0 beta1

using Estimator interface for inference with pre-trained tensorflow object detection model

Trouble restoring checkpointed TensorFlow net

Categories

Resources