Tensorflow Estimator.predict() fails - predict

I am recreating the DnCNN, i.e. Gaussian Denoiser, which does image to image prediction with a series of convolutional layers. And it trains perfectly fine, but when i try to do the list(model.predict(..)),
i get the error:
Labels must not be none
I actually put all of the specs arguments of my EstimatorSpec explicitly in there, as they are lazily evaluated depending on the method (train/eval/predict) that is called upon the Estimator.
def DnCNN_model_fn (features, labels, mode):
# some convolutinons here
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=conv_last + input_layer,
loss=tf.losses.mean_squared_error(
labels=labels,
predictions=conv_last + input_layer),
train_op=tf.train.AdamOptimizer(learning_rate=0.001, epsilon=1e-08).minimize(
loss=tf.losses.mean_squared_error(
labels=labels,
predictions=conv_last + input_layer),
global_step=tf.train.get_global_step()),
eval_metric_ops={
"accuracy": tf.metrics.mean_absolute_error(
labels=labels,
predictions=conv_last + input_layer)}
)
Putting it into an estimator:
d = datetime.datetime.now()
DnCNN = tf.estimator.Estimator(
model_fn=DnCNN_model_fn,
model_dir=root + 'model/' +
"DnCNN_{}_{}_{}_{}".format(d.month, d.day, d.hour, d.minute),
config=tf.estimator.RunConfig(save_summary_steps=2,
log_step_count_steps=10)
)
After training the model i do the predictions as follows:
test_input_fn = tf.estimator.inputs.numpy_input_fn(
x= test_data[0:2,:,:,:],
y= None,
batch_size=1,
num_epochs=1,
shuffle=False)
predicted = DnCNN.predict(input_fn=test_input_fn)
list(predicted) # this is where the error occurs
The traceback says, that tf.losses.mean_squared_error is causing this.
Traceback (most recent call last):
File "<input>", line 16, in <module>
File "...\venv2\lib\site-packages\tensorflow\python\estimator\estimator.py", line 551, in predict
features, None, model_fn_lib.ModeKeys.PREDICT, self.config)
File "...\venv2\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1169, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "<input>", line 95, in DnCNN_model_fn
File "...\venv2\lib\site-packages\tensorflow\python\ops\losses\losses_impl.py", line 663, in mean_squared_error
raise ValueError("labels must not be None.")
ValueError: labels must not be None.

From estimator.predict raises "ValueError: None values not supported":
"In your model_fn, you define the loss in every mode (train / eval / predict). This means that even in predict mode, the labels will be used and need to be provided.
When you are in predict mode, you actually just need to return the predictions so you can return early from the function:"
def model_fn(features, labels, mode):
#...
y = ...
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=y)
#...

I am not entirly sure about what the exact error was, but i managed to get my model predicting.
what i changed (apart from adding the Batch norm UPDATE_OPS, which did not solve my issue) was short-circuiting (i.e. early & seperatly return) the tf.estimator.EstimatorSpec in case of tf.estimator.ModeKeys.PREDICT:
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=conv_last + input_layer
)
apparently there seems to be something wrong with the doc statement (or i did not understand it correctly) found at tf.estimator.EstimatorSpec :
model_fn can populate all arguments independent of mode. In this case, some arguments will be ignored by an Estimator. E.g. train_op will be ignored in eval and infer modes.
BTW: given mode is predict, at some point, labels are automatically replaced by None in any case.

Related

as_list() is not defined on an unknown TensorShape on y_t_rank = len(y_t.shape.as_list()) and related to metrics

TF 2.3.0.dev20200620
I got this error during .fit(...) for a model with a sigmoid binary output. I used tf.data.Dataset as the input pipeline.
The strange thing is it depends on the metric:
Don't work:
model.compile(
optimizer=tf.keras.optimizers.Adam(lr=1e-4, decay=1e-6),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['accuracy']
)
work:
model.compile(
optimizer=tf.keras.optimizers.Adam(lr=1e-4, decay=1e-6),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=[tf.keras.metrics.BinaryAccuracy()]
)
But as I understood, 'accuracy' should be fine. In fact, instead of using my own tf.data.Dataset custom setup (can be provided if needed), using tf.keras.preprocessing.image_dataset_from_directory give no such error. This is the case from tutorial https://keras.io/examples/vision/image_classification_from_scratch.
Trace is pasted below. Notice this is diff from other 2 older questions. it involves somehow the metrics.
ValueError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:806 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:796 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:1211 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2526 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2886 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:789 run_step **
outputs = model.train_step(data)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:759 train_step
self.compiled_metrics.update_state(y, y_pred, sample_weight)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:388 update_state
self.build(y_pred, y_true)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:319 build
self._metrics, y_true, y_pred)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1139 map_structure_up_to
**kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1235 map_structure_with_tuple_paths_up_to
*flat_value_lists)]
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1234 <listcomp>
results = [func(*args, **kwargs) for args in zip(flat_path_list,
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1137 <lambda>
lambda _, *values: func(*values), # Discards the path arg.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:419 _get_metric_objects
return [self._get_metric_object(m, y_t, y_p) for m in metrics]
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:419 <listcomp>
return [self._get_metric_object(m, y_t, y_p) for m in metrics]
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:440 _get_metric_object
y_t_rank = len(y_t.shape.as_list())
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_shape.py:1190 as_list
raise ValueError("as_list() is not defined on an unknown TensorShape.")
ValueError: as_list() is not defined on an unknown TensorShape.
Had exactly the same problem when using 'accuracy' metric.
I followed https://github.com/tensorflow/tensorflow/issues/32912#issuecomment-550363802 example:
def _fixup_shape(images, labels, weights):
images.set_shape([None, None, None, 3])
labels.set_shape([None, 19]) # I have 19 classes
weights.set_shape([None])
return images, labels, weights
dataset = dataset.map(_fixup_shape)
which helped me solve the problem.
But, in my case, instead of using one map function, as kawingkelvin did above, to load and set_shape inside, I needed to use two map functions because of some errors in the TF code.
The final solution for me was to use the following order:
dataset.batch.map(get_data).map(fix_shape).prefetch
NOTE: batch can be done both before and after map(get_data) depending on how your get_data function is created. Fix_shape must be done after.
I am able to fix this in such a way as to keep the metrics 'accuracy' (rather than using BinaryAccuracy). However, I do not quite understand why this is needed for 'accuracy', but not needed for other closely related one (e.g. BinaryAccuracy).
2 things:
construct a ds such that the batch label has shape of (batch_size, 1) but not (batch_size,). Following the keras.io tutorial mentioned, it should have been ok with the latter. This change aims to get rid of the "unknown" in the TensorShape.
add this to the ds pipeline:
label.set_shape([1])
def process_path(file_path):
label = get_label(file_path)
img = tf.io.read_file(file_path)
img = tf.image.decode_jpeg(img, channels=3)
label.set_shape([1])
return img, label
ds = ds.map(process_path, num_parallel_calls=AUTO).shuffle(1024).repeat().batch(batch_size).prefetch(buffer_size=AUTO)
This is the state before .batch(...) so a single sample should have 1 as the shape (and therefore, (batch_size, 1) after batching.
After doing so, the error didn't happen, and I used the exact same metrics 'accuracy' as in
https://keras.io/examples/vision/image_classification_from_scratch
Hope this helps anyone who got hit. I have to admit I don't truly understand why it didn't work in the first place. It still seems like a TF bug to me.

Tensorflow 2.1.0 - An op outside of the function building code is being passed a "Graph" tensor

I am trying to implement a recent paper. Part of this implementation involves moving from tf 1.14 to tf 2.1.0. The code was working with tf 1.14 but is no longer working.
NOTE: If I disable eager execution tf.compat.v1.disable_eager_execution() then the code works as expected.
Is this the solution? I've made plenty of models before in TF 2.x and never had to disable eager execution to achieve normal functionality.
I have distilled the problem to a very short gist that shows what's happening.
Links & Code First Followed By Detailed Error Message
Link to Gist -- https://gist.github.com/darien-schettler/fd5b25626e9eb5b1330cce670bf9cc17
Code
# version 2.1.0
import tensorflow as tf
# version 1.18.1
import numpy as np
# ######## DEFINE CUSTOM FUNCTION FOR TF LAMBDA LAYER ######## #
def resize_like(input_tensor, ref_tensor):
""" Resize an image tensor to the same size/shape as a reference image tensor
Args:
input_tensor : (image tensor) Input image tensor that will be resized
ref_tensor : (image tensor) Reference image tensor that we want to resize the input tensor to.
Returns:
reshaped tensor
"""
reshaped_tensor = tf.image.resize(images=input_tensor,
size=tf.shape(ref_tensor)[1:3],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
preserve_aspect_ratio=False,
antialias=False,
name=None)
return reshaped_tensor
# ############################################################# #
# ############ DEFINE MODEL USING TF.KERAS FN API ############ #
# INPUTS
model_input_1 = tf.keras.layers.Input(shape=(160,160,3))
model_input_2 = tf.keras.layers.Input(shape=(160,160,3))
# OUTPUTS
model_output_1 = tf.keras.layers.Conv2D(filters=64,
kernel_size=(1, 1),
use_bias=False,
kernel_initializer='he_normal',
name='conv_name_base')(model_input_1)
model_output_2 = tf.keras.layers.Lambda(function=resize_like,
arguments={'ref_tensor': model_output_1})(model_input_2)
# MODEL
model = tf.keras.models.Model(inputs=[model_input_1, model_input_2],
outputs=model_output_2,
name="test_model")
# ############################################################# #
# ######### TRY TO UTILIZE PREDICT WITH DUMMY INPUT ########## #
dummy_input = [np.ones((1,160,160,3)), np.zeros((1,160,160,3))]
model.predict(x=dummy_input) # >>>>ERROR OCCURS HERE<<<<
# ############################################################# #
Full Error
>>> model.predict(x=dummy_input) # >>>>ERROR OCCURS HERE<<<<
Traceback (most recent call last):
File "/Users/<username>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 61, in quick_execute
num_outputs)
TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
#tf.function
def has_init_scope():
my_constant = tf.constant(1.)
with tf.init_scope():
added = my_constant * 2
The graph tensor has name: conv_name_base_1/Identity:0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1013, in predict
use_multiprocessing=use_multiprocessing)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 498, in predict
workers=workers, use_multiprocessing=use_multiprocessing, **kwargs)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 475, in _model_iteration
total_epochs=1)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch
batch_outs = execution_function(iterator)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function
distributed_function(input_fn))
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
result = self._call(*args, **kwds)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 638, in _call
return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
self.captured_inputs)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
ctx=ctx)
File "/Users/<user-name>/.virtualenvs/<venv-name>/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 75, in quick_execute
"tensors, but found {}".format(keras_symbolic_tensors))
tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor 'conv_name_base_1/Identity:0' shape=(None, 160, 160, 64) dtype=float32>]
One potential solution I thought of would be to replace the Lambda layer with a custom layer... this seems to fix the issue as well. Not sure what the best practices are surrounding this though. Code below.
# version 2.1.0
import tensorflow as tf
# version 1.18.1
import numpy as np
# ######## DEFINE CUSTOM LAYER DIRECTLY BY SUBCLASSING ######## #
class ResizeLike(tf.keras.layers.Layer):
""" tf.keras layer to resize a tensor to the reference tensor shape.
Attributes:
keras.layers.Layer: Base layer class.
This is the class from which all layers inherit.
- A layer is a class implementing common neural networks
operations, such as convolution, batch norm, etc.
- These operations require managing weights,
losses, updates, and inter-layer connectivity.
"""
def __init__(self, **kwargs):
super().__init__(**kwargs)
def call(self, inputs, **kwargs):
"""TODO: docstring
Args:
inputs (TODO): TODO
**kwargs:
TODO
Returns:
TODO
"""
input_tensor, ref_tensor = inputs
return self.resize_like(input_tensor, ref_tensor)
def resize_like(self, input_tensor, ref_tensor):
""" Resize an image tensor to the same size/shape as a reference image tensor
Args:
input_tensor: (image tensor) Input image tensor that will be resized
ref_tensor: (image tensor) Reference image tensor that we want to resize the input tensor to.
Returns:
reshaped tensor
"""
reshaped_tensor = tf.image.resize(images=input_tensor,
size=tf.shape(ref_tensor)[1:3],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
preserve_aspect_ratio=False,
antialias=False)
return reshaped_tensor
# ############################################################# #
# ############ DEFINE MODEL USING TF.KERAS FN API ############ #
# INPUTS
model_input_1 = tf.keras.layers.Input(shape=(160,160,3))
model_input_2 = tf.keras.layers.Input(shape=(160,160,3))
# OUTPUTS
model_output_1 = tf.keras.layers.Conv2D(filters=64,
kernel_size=(1, 1),
use_bias=False,
kernel_initializer='he_normal',
name='conv_name_base')(model_input_1)
model_output_2 = ResizeLike(name="resize_layer")([model_input_2, model_output_1])
# MODEL
model = tf.keras.models.Model(inputs=[model_input_1, model_input_2],
outputs=model_output_2,
name="test_model")
# ############################################################# #
# ######### TRY TO UTILIZE PREDICT WITH DUMMY INPUT ########## #
dummy_input = [np.ones((1,160,160,3)), np.zeros((1,160,160,3))]
model.predict(x=dummy_input) # >>>>ERROR OCCURS HERE<<<<
# ############################################################# #
Thoughts??
Thanks in advance!!
Let me know if you would like me to provide anything else.
You can try the following steps:
Change resize_like as follows:
def resize_like(inputs):
input_tensor, ref_tensor = inputs
reshaped_tensor = tf.image.resize(images=input_tensor,
size=tf.shape(ref_tensor)[1:3],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR,
preserve_aspect_ratio=False,
antialias=False,
name=None)
return reshaped_tensor
Then, in the Lambda layer:
model_output_2 = tf.keras.layers.Lambda(function=resize_like)([model_input_2, model_output_1])

Confusion Matrix with Tensorflow

I am using finetune AlexNet architecture written by #kratzert on my own dataset which, works properly (I got the code from here: https://github.com/kratzert/finetune_alexnet_with_tensorflow) and I want to figure out how to build confusion matrix from his code. I have tried to use tf.confusion_matrix(labels, predictions, num_classes) to build confusion matrix but I can't. I am confused what should be the values for labels and predictions, I mean, I know what should be but each time I feed these value got an error. Can anyone help me on this or have a look at the code (above link) and guide me?
I added these two lines in finetune.py exactly after calculating accuracy to make the labels and the predictions as the number of the class.
with tf.name_scope("accuracy"):
correct_pred = tf.equal(tf.argmax(score, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
**true_class = tf.argmax(y, 1)
predicted_class = tf.argmax(score, 1)**
and I have added tf.confusion_matrix() inside my session at the very bottom before saving checkpoint of the model
for _ in range(val_batches_per_epoch):
img_batch, label_batch = sess.run(next_batch)
acc, cost = sess.run([accuracy, loss], feed_dict={x: img_batch,
y: label_batch,
keep_prob: 1.})
test_acc += acc
test_count += 1
test_acc /= test_count
print("{} Validation Accuracy = {:.4f} -- Validation Loss = {:.4f}".format(datetime.now(),test_acc, cost))
print("{} Saving checkpoint of model...".format(datetime.now()))
**print(sess.run(tf.confusion_matrix(true_class, predicted_class, num_classes)))**
# save checkpoint of the model
checkpoint_name = os.path.join(checkpoint_path,
'model_epoch'+str(epoch+1)+'.ckpt')
save_path = saver.save(sess, checkpoint_name)
print("{} Model checkpoint saved at {}".format(datetime.now(),
checkpoint_name))
I have tried other places as well but each time I will get an error:
Caused by op 'Placeholder_1', defined at:
File "/home/armin/Desktop/Alexnet_DataPipeline/finetune.py", line 85, in <module>
y = tf.placeholder(tf.float32, [batch_size, num_classes])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1777, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 4521, in placeholder
"Placeholder", dtype=dtype, shape=shape, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_1' with dtype float and shape [128,3]
any help will be appreciated, Thanks.
It's a fairly long piece of code you're referring to, and you did not specify where you put your confusion matrix line.
Just by experience, the most frequent problem with confusion matrices is that tf.confusion_matrix() requires both the labels and the predictions as the number of the class, not as one-hot vectors. In other words, the label and the prediction should be in the form of the number 5 instead of [ 0, 0, 0, 0, 0, 1, 0, 0, 0, 0 ].
In the code you refer to, y is in the one-hot format. The output of the network, score is a vector, giving the probability of each class. That is also not the required format. You need to do something like
true_class = tf.argmax( y, 1 )
predicted_class = tf.argmax( score, 1 )
and use those with the confusion matrix like
tf.confusion_matrix( true_class, predicted_class, num_classes )
(Basically, if you take a look at line 123 of finetune.py, that has both of those elements for determining accuracy, but they are not saved in separate tensors.)
If you want to keep a running total of confusion matrices of all batches, you just have to add them up - since each cell of the matrix counts the number of examples falling into that category, an element-wise addition creates the confusion matrix for the whole set:
cm_running_total = None
cm_nupmy_array = sess.run(tf.confusion_matrix(true_class, predicted_class, num_classes), feed_dict={x: img_batch, y: label_batch, keep_prob: 1.} )
if cm_running_total is None:
cm_running_total = cm_numpy_array
else:
cm_running_total += cm_numpy_array

Custom loss function: perform a model.predict on the data in y_pred

I am training a network to denoise images, for this I am using the CIFAR10 dataset. I am trying to generate a custom loss function so that the loss is mse / classification_accuracy.
Given that my network receives as input 32x32 (noisy) images and predicts 32x32 (denoised) images, I am assuming that y_pred and Y_true would be arrays of 32x32 images. Thus my custom loss functions looks like this:
def custom_loss():
def joint_optimized_loss(y_true, y_pred):
mse = K.mean(K.square(y_pred - y_true), axis=-1)
preds = classif_model.predict(y_pred)
correctPreds = 0
totPreds = 0
for pred in preds:
predictedClass = pred.index(max(pred))
totPreds += 1
if predictedClass == currentClass:
correctPreds += 1
classifAccuracy = correctPreds / totPreds
loss = mse / classifAccuracy
return loss
return joint_optimized_loss
myModel.compile(optimizer='adadelta', loss=custom_loss())
classif_model is a pre-trained model that classifies CIFAR10 images into one of the 10 classes. It receives an array of 32x32 images.
However when I run my code I get the following error:
Traceback (most recent call last):
File "myCode.py", line 94, in
myModel.compile(optimizer='adadelta', loss=custom_loss())
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 850, in compile
sample_weight, mask)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 450, in weighted
score_array = fn(y_true, y_pred)
File "myCode.py", line 57, in joint_optimized_loss
preds = classif_model.predict(y_pred)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/models.py",
line 913, in predict
return self.model.predict(x, batch_size=batch_size, verbose=verbose)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 1713, in predict
verbose=verbose, steps=steps)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 1260, in _predict_loop
batches = _make_batches(num_samples, batch_size)
File "/home/rvidalma/anaconda2/envs/tensorUpdated/lib/python2.7/site-packages/keras/engine/training.py",
line 374, in _make_batches
num_batches = int(np.ceil(size / float(batch_size)))
AttributeError: 'Dimension' object has no attribute 'ceil'
I think this has something to do with the fact that y_true and y_pred are both tensors that, before training, are empty thus classif_model.predict fails as it is expecting an array. However I am not sure on how to fix this...
I tried getting instead the value of y_pred using K.get_value(y_pred), but that gives me the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape
[-1,32,32,3] has negative dimensions [[Node: input_1 =
Placeholderdtype=DT_FLOAT, shape=[?,32,32,3],
_device="/job:localhost/replica:0/task:0/cpu:0"]]
You cannot use accuracy as a loss function, as it is not differentiable. This is why upper bounds on accuracy like the cross-entropy are used instead.
Additionally, the way you implemented accuracy is also non-symbolic, you should have used only functions in keras.backend to implement a loss for it to work properly.
I had almost same problem, and I tried this and it worked for me.
Instead of:
preds = classif_model.predict(y_pred)
try:
preds = classif_model(y_pred)
I am not sure about the reason but it is because when we use model.predict(y) it need batch_size and while compiling we don't have any, so we can not use model.predict(y).
Please correct me if this is wrong.

Running distributed Tensorflow with InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float

I have implemented a variational autoencoder with tensorflow on a single machine. Now I am trying to run it on my cluster with the distributed mechanism provided tensorflow. But the following problem had stuck me for several days.
Traceback (most recent call last):
File "/home/yama/mfs/ZhuSuan/examples/vae.py", line 265, in <module>
print('>> Test log likelihood = {}'.format(np.mean(test_lls)))
File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 942, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 768, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 322, in join
six.reraise(*self._exc_info_to_raise)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 267, in stop_on_exception
yield
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 411, in run
self.run_loop()
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 972, in run_loop
self._sv.global_step])
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:worker/replica:0/task:0/gpu:0"]()]]
[[Node: model_1/fully_connected_10/Relu_G88 = _Recv[client_terminated=false, recv_device="/job:worker/replica:0/task:0/cpu:0", send_device="/job:worker/replica:0/task:0/gpu:0", send_device_incarnation=3964479821165574552, tensor_name="edge_694_model_1/fully_connected_10/Relu", tensor_type=DT_FLOAT, _device="/job:worker/replica:0/task:0/cpu:0"]()]]
Caused by op u'Placeholder', defined at:
File "/home/yama/mfs/ZhuSuan/examples/vae.py", line 201, in <module>
x = tf.placeholder(tf.float32, shape=(None, x_train.shape[1]))
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 895, in placeholder
name=name)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1238, in _placeholder
name=name)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 704, in apply_op
op_def=op_def)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2260, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/mfs/yama/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1230, in __init__
self._traceback = _extract_stack()
Here is my code, I just paste the main function for simplicity:
if __name__ == "__main__":
tf.set_random_seed(1234)
# Load MNIST
data_path = os.path.join(os.path.dirname(os.path.abspath(__file__)),
'data', 'mnist.pkl.gz')
x_train, t_train, x_valid, t_valid, x_test, t_test = \
dataset.load_mnist_realval(data_path)
x_train = np.vstack([x_train, x_valid])
np.random.seed(1234)
x_test = np.random.binomial(1, x_test, size=x_test.shape).astype('float32')
# Define hyper-parametere
n_z = 40
# Define training/evaluation parameters
lb_samples = 1
ll_samples = 5000
epoches = 10
batch_size = 100
test_batch_size = 100
iters = x_train.shape[0] // batch_size
test_iters = x_test.shape[0] // test_batch_size
test_freq = 10
ps_hosts = FLAGS.ps_hosts.split(",")
worker_hosts = FLAGS.worker_hosts.split(",")
# Create a cluster from the parameter server and worker hosts.
clusterSpec = tf.train.ClusterSpec({"ps": ps_hosts, "worker": worker_hosts})
print("Create and start a server for the local task.")
# Create and start a server for the local task.
server = tf.train.Server(clusterSpec,
job_name=FLAGS.job_name,
task_index=FLAGS.task_index)
print("Start ps and worker server")
if FLAGS.job_name == "ps":
server.join()
elif FLAGS.job_name == "worker":
#set distributed device
with tf.device(tf.train.replica_device_setter(
worker_device="/job:worker/task:%d" % FLAGS.task_index,
cluster=clusterSpec)):
print("Build the training computation graph")
# Build the training computation graph
x = tf.placeholder(tf.float32, shape=(None, x_train.shape[1]))
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, epsilon=1e-4)
with tf.variable_scope("model") as scope:
with pt.defaults_scope(phase=pt.Phase.train):
train_model = M1(n_z, x_train.shape[1])
train_vz_mean, train_vz_logstd = q_net(x, n_z)
train_variational = ReparameterizedNormal(
train_vz_mean, train_vz_logstd)
grads, lower_bound = advi(
train_model, x, train_variational, lb_samples, optimizer)
infer = optimizer.apply_gradients(grads)
print("Build the evaluation computation graph")
# Build the evaluation computation graph
with tf.variable_scope("model", reuse=True) as scope:
with pt.defaults_scope(phase=pt.Phase.test):
eval_model = M1(n_z, x_train.shape[1])
eval_vz_mean, eval_vz_logstd = q_net(x, n_z)
eval_variational = ReparameterizedNormal(
eval_vz_mean, eval_vz_logstd)
eval_lower_bound = is_loglikelihood(
eval_model, x, eval_variational, lb_samples)
eval_log_likelihood = is_loglikelihood(
eval_model, x, eval_variational, ll_samples)
global_step = tf.Variable(0)
saver = tf.train.Saver()
summary_op = tf.merge_all_summaries()
init_op = tf.initialize_all_variables()
# Create a "supervisor", which oversees the training process.
sv = tf.train.Supervisor(is_chief=(FLAGS.task_index == 0),
logdir=LogDir,
init_op=init_op,
summary_op=summary_op,
saver=saver,
global_step=global_step,
save_model_secs=600)
# Run the inference
with sv.managed_session(server.target) as sess:
epoch = 0
while not sv.should_stop() and epoch < epoches:
#for epoch in range(1, epoches + 1):
np.random.shuffle(x_train)
lbs = []
for t in range(iters):
x_batch = x_train[t * batch_size:(t + 1) * batch_size]
x_batch = np.random.binomial( n=1, p=x_batch, size=x_batch.shape).astype('float32')
_, lb = sess.run([infer, lower_bound], feed_dict={x: x_batch})
lbs.append(lb)
if epoch % test_freq == 0:
test_lbs = []
test_lls = []
for t in range(test_iters):
test_x_batch = x_test[
t * test_batch_size: (t + 1) * test_batch_size]
test_lb, test_ll = sess.run(
[eval_lower_bound, eval_log_likelihood],
feed_dict={x: test_x_batch}
)
test_lbs.append(test_lb)
test_lls.append(test_ll)
print('>> Test lower bound = {}'.format(np.mean(test_lbs)))
print('>> Test log likelihood = {}'.format(np.mean(test_lls)))
sv.stop()
I have try to correct my code for several days, but all my efforts have failed. Looking for your help!
The most likely cause of this exception is that one of the operations that the tf.train.Supervisor runs in the background depends on the tf.placeholder() tensor x, but doesn't have enough information to feed a value for it.
The most likely culprit is summary_op = tf.merge_all_summaries(), because library code often summarizes values that depend on the training data. To prevent the supervisor from collecting summaries in the background, pass summary_op=None to the tf.train.Supervisor constructor:
# Create a "supervisor", which oversees the training process.
sv = tf.train.Supervisor(is_chief=(FLAGS.task_index == 0),
logdir=LogDir,
init_op=init_op,
summary_op=None,
saver=saver,
global_step=global_step,
save_model_secs=600)
After doing this, you will need to make alternative arrangements to collect summaries. The easiest way to do this is to pass summary_op to sess.run() periodically, then pass the result to sv.summary_computed().
Came across a similar thing. The chief was going down with the aforementioned error message. However, since I was using the MonitoredTrainingSession rather than a self-made Supervisor, I was able to solve the problem by disabling the default summary. To disable, you have to provide
save_summaries_secs=None,
save_summaries_steps=None,
to the constructor of the MonitoredTrainingSession. Afterwards, everything went just smooth!
Code on Github
I had the same exact problem. Following mrry's suggestion I was able to work this out by:
Disabling summary logging in the supervisor by setting summary_op=None (as mrry suggested)
Creating my own summary_op and pass it to sess.run() along with the rest of the ops to be evaluated. Hold on the resulting summary, let's say it's called 'my_summary'.
Creating my own summary writer. Call it with 'my_summary', e.g.: summary_writer.add_summary(summary, epoch_count)
To clarify, I did not use mrry's suggestion to do
sess.run(summary_op) and sv.summary_computed(), but instead ran the summary_op along with the other operations, and then wrote out the summary myself. You might also want to condition the summary writing on being a chief.
So basically, you need to bypass the Supervisor's summary writing services completely. Seems like surprising limitation/bug of Supervisor since it isn't exactly uncommon to want to log things that depend on the input (which lives in a placeholder). For example in my network (an autoencoder) the cost depends on the input.