Error feeding a placeholder - tensorflow

I´m having issues feeding one of my placeholders (keep_prob). The error says that I have to feed a float value, but I´m already doing it. I´ve been trying to solve it, but I can´t figure out a solution. My code is here:
Error while running a convolutional network using my own data in Tensorflow
And my error is:
File "<ipython-input-81-fd184c90091e>", line 4, in <module>
keep_prob = tf.placeholder(tf.float32)
File "c:\python36\lib\site-packages\tensorflow\python\ops\array_ops.py",
line 1530, in placeholder
return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
File "c:\python36\lib\site-packages\tensorflow\python\ops\gen_array_ops.py",
line 1954, in _placeholder
name=name)
File "c:\python36\lib\site-
packages\tensorflow\python\framework\op_def_library.py", line 767, in
apply_op
op_def=op_def)
File "c:\python36\lib\site-packages\tensorflow\python\framework\ops.py",
line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "c:\python36\lib\site-packages\tensorflow\python\framework\ops.py",
line 1269, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): You must feed a value for
placeholder tensor 'Placeholder_17' with dtype float
[[Node: Placeholder_17 = Placeholder[dtype=DT_FLOAT, shape=<unknown>,
_device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Thank you.

I suspect that your error comes not from the line that you think. I can't run your code, but looking at it, my guess is the following:
You are feeding the keep_prob value here, which is OK for training:
train_step.run(feed_dict={x: image_batch_eval, y_: label_batch_eval,
keep_prob: 0.5})
But you are not feeding the keep_prob value here, which you also have to do:
print('Precisión %g' % accuracy.eval(feed_dict={x: image_test_batch_eval,
y_: label_test_batch_eval}))
If you look at your code, the accuracy operation ultimately leeds up to this operation, which needs the placeholder:
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
So, if you are evaluating for example the validation or testing accuracy, then feed value 1.0 otherwise feed whatever you are feeding for training.
Give it a try?

Related

Resource exhausted: OOM when allocating tensor with shape[845246,300]

I am working with a sequence to sequence language model, and after changing the code to pass custom word embedding weights to the Embeddings layer, I am receiving a OOM error when I try to train on the gpu.
Here is the relevant code:
def create_model(word_map, X_train, Y_train, vocab_size, max_length):
# define model
model = Sequential()
# get custom embedding weights as matrix
embedding_matrix = get_weights_matrix_from_word_map(word_map)
model.add(Embedding(len(word_map)+1, 300, weights=[embedding_matrix], input_length=max_length-1))
model.add(LSTM(50))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile network
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=100, verbose=2)
return model
And here is the full error log from the server:
File "/home2/slp24/thesis/UpdatedLanguageModel_7_31.py", line 335, in create_model_2
model.fit(X_train, Y_train, batch_size=32, epochs=1, verbose=2) ## prev X, y
File "/opt/python-3.4.1/lib/python3.4/site-packages/keras/models.py", line 963, in fit
validation_steps=validation_steps)
File "/opt/python-3.4.1/lib/python3.4/site-packages/keras/engine/training.py", line 1682, in fit
self._make_train_function()
File "/opt/python-3.4.1/lib/python3.4/site-packages/keras/engine/training.py", line 990, in _make_train_function
loss=self.total_loss)
File "/opt/python-3.4.1/lib/python3.4/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/opt/python-3.4.1/lib/python3.4/site-packages/keras/optimizers.py", line 466, in get_updates
m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
File "/opt/python-3.4.1/lib/python3.4/site-packages/tensorflow/python/ops/math_ops.py", line 898, in binary_op_wrapper
y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
File "/opt/python-3.4.1/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 932, in convert_to_tensor
as_ref=False)
File "/opt/python-3.4.1/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1022, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/opt/python-3.4.1/lib/python3.4/site-packages/tensorflow/python/ops/gradients_impl.py", line 100, in _IndexedSlicesToTensor
value.values, value.indices, value.dense_shape[0], name=name)
File "/opt/python-3.4.1/lib/python3.4/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5186, in unsorted_segment_sum
num_segments=num_segments, name=name)
File "/opt/python-3.4.1/lib/python3.4/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/opt/python-3.4.1/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/opt/python-3.4.1/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[845246,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: training/Adam/mul_2/y = UnsortedSegmentSum[T=DT_FLOAT, Tindices=DT_INT32, Tnumsegments=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/embedding_1/Gather_grad/Reshape, training/Adam/gradients/embedding_1/Gather_grad/Reshape_1/_101, training/Adam/mul_2/strided_slice)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Edit:
So far I have tried
Adding batching, started with batch_size=32
I am currently working to decrease the number of output classes from 845,286. I think something went wrong when I calculated the custom embedding matrix, specifically when I was "connecting" the vocabulary token index's assigned during preprocessing and the y_categorical values assigned by Keras that the model uses...
Any help or guidance is greatly appreciated! I have searched many similar issued but have not been able to apply those fixes to my code thus far. Thank you
You're exceeding the memory size of your GPU.
You can:
Train/Predict with smaller batches
Or, if even a batch_size=1 is too much, you need a model with less parameters.
Hint, the length in that tensor (845246) is really really big. Is that the correct length?
I had the same problem with Google Colab GPU
The batch size was 64 and this error has appeared and after I reduced the batch size to 32 it worked properly

Tensorflow batch training OutOfRangeError

Saving variables
Variables saved in 0.88 seconds
Saving metagraph
Metagraph saved in 35.81 seconds
Saving variables
Variables saved in 0.95 seconds
Saving metagraph
Metagraph saved in 33.20 seconds
Traceback (most recent call last):
Caused by op u'batch', defined at:
File "ava_train.py", line 155, in <module>
image_batch, label_batch = tf.train.batch([image, label], batch_size=batch_size, allow_smaller_final_batch=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 872, in batch
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 665, in _batch
dequeued = queue.dequeue_up_to(batch_size, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 510, in dequeue_up_to
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1402, in _queue_dequeue_up_to_v2
timeout_ms=timeout_ms, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
self._traceback = _extract_stack()
OutOfRangeError (see above for traceback): FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 100, current size 0)
[[Node: batch = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
my code is here
with tf.Graph().as_default():
global_step = tf.Variable(0, trainable=False)
# process same as cifar10.distorted_inputs
log_dir = '../log'
model_dir = '../model'
max_num_epoch = 80
if not os.path.exists(log_dir):
os.makedirs(log_dir)
if not os.path.exists(model_dir):
os.makedirs(model_dir)
num_train_example = len(os.listdir('../images/'))
# Reads pfathes of images together with their labels
image_list, label_list = read_labeled_image_list('../raw.txt')
images = ops.convert_to_tensor(image_list, dtype=dtypes.string)
labels = ops.convert_to_tensor(label_list, dtype=dtypes.int32)
# Makes an input queue
# input_queue = tf.train.slice_input_producer([images, labels], num_epochs=max_num_epoch, shuffle=True)
input_queue = tf.train.slice_input_producer([images, labels], shuffle=True)
image, label = read_images_from_disk(input_queue)
image_size = 240
keep_probability = 0.8
weight_decay = 5e-5
image = preprocess(image, image_size, image_size, None)
batch_size = 100
epoch_size = 1000
embedding_size = 128
# Optional Image and Label Batching
image_batch, label_batch = tf.train.batch([image, label], batch_size=batch_size, allow_smaller_final_batch=True)
This is the output of training an image classification model based on 20w images. I set allow_smaller_final_batch=True in batch. After some epochs the OutOfRangeError occured.
I don't know the reason and thanks for the help.
Since you get a OutOfRangeError it could be that you are training for more epochs than max_num_epochs, which will result in the slice_input_producer throwing this exception.
One possible workaround would be to remove the num_epochs=max_num_epochs from your slice_input_producer since this will allow it to produce even after the maximum number of epochs has been reached.
I have battled with this particular error for days. I finally found the cause. You are getting this error because your file is corrupted somewhere. Try running this code on another train and test data

How to use tensorflow tf.metrics.mean_iou?

I am trying to use the inbuilt mean_iou function of tensorflow to compute the IoU score for semantic segmentation.
My code is:
#y_mask.shape == [batch_size, h * w, n_classes]
#y_mask.shape == [batch_size, h * w, n_classes]
iou = tf.metrics.mean_iou(tf.argmax(y_mask,2), tf.argmax(mask_,2), n_classes)
However I am getting the following error trace:
tensorflow.python.framework.errors_impl.FailedPreconditionError:
Attempting to use uninitialized value mean_iou/total_confusion
_matrix
[[Node: mean_iou/AssignAdd = AssignAdd[T=DT_DOUBLE, _class=["loc:#mean_iou/total_confusion_matrix"], use_locking=false
, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/total_confusion_matrix, mean_iou/confusion_matrix/SparseTensorDense
Add)]]
Caused by op u'mean_iou/AssignAdd', defined at:
File "sample_tf_ynet.py", line 207, in <module>
trainSeg()
File "sample_tf_ynet.py", line 166, in trainSeg
iou, cm_op = tf.metrics.mean_iou(tf.argmax(y_mask,2), tf.argmax(mask_,2), n_classes)
File "/home/meetshah1995/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/metrics_impl.py", line 782, in mean_iou
update_op = state_ops.assign_add(total_cm, current_cm)
File "/home/meetshah1995/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 75, in assign_ad
d
use_locking=use_locking, name=name)
File "/home/meetshah1995/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in a
pply_op
op_def=op_def)
File "/home/meetshah1995/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/meetshah1995/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
self._traceback = _extract_stack()
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value mean_iou/total_confusion_matrix
[[Node: mean_iou/AssignAdd = AssignAdd[T=DT_DOUBLE, _class=["loc:#mean_iou/total_confusion_matrix"], use_locking=false
, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/total_confusion_matrix, mean_iou/confusion_matrix/SparseTensorDense
Add)]]
Please guide me on the correct usage of this for semantic segmentation.
I solved it by calling
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
Simplest form I could come up with (3 classes):
# y_pred and y_true are np.arrays of shape [1, size, channels]
with tf.Session() as sess:
ypredT = tf.constant(np.argmax(y_pred, axis=-1))
ytrueT = tf.constant(np.argmax(y_true, axis=-1))
iou,conf_mat = tf.metrics.mean_iou(ytrueT, ypredT, num_classes=3)
sess.run(tf.local_variables_initializer())
sess.run([conf_mat])
miou = sess.run([iou])
print(miou)
prints:
[0.6127908]

FailedPreconditionError while trying to use RMSPropOptimizer on tensorflow

I am trying to use the RMSPropOptimizer for minimizing loss. Here's the part of the code that is relevant:
import tensorflow as tf
#build large convnet...
#...
opt = tf.train.RMSPropOptimizer(learning_rate=0.0025, decay=0.95)
#do stuff to get targets and loss...
#...
grads_and_vars = opt.compute_gradients(loss)
capped_grads_and_vars = [(tf.clip_by_value(g, -1, 1), v) for g, v in grads_and_vars]
opt_op = self.opt.apply_gradients(capped_grads_and_vars)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
while(1):
sess.run(opt_op)
Problem is as soon as I run this I get the following error:
W tensorflow/core/common_runtime/executor.cc:1091] 0x10a0bba40 Compute status: Failed precondition: Attempting to use uninitialized value train/output/bias/RMSProp
[[Node: RMSProp/update_train/output/bias/ApplyRMSProp = ApplyRMSProp[T=DT_FLOAT, use_locking=false, _device="/job:localhost/replica:0/task:0/cpu:0"](train/output/bias, train/output/bias/RMSProp, train/output/bias/RMSProp_1, RMSProp/learning_rate, RMSProp/decay, RMSProp/momentum, RMSProp/epsilon, clip_by_value_9)]]
[[Node: _send_MergeSummary/MergeSummary_0 = _Send[T=DT_STRING, client_terminated=true, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=-6901001318975381332, tensor_name="MergeSummary/MergeSummary:0", _device="/job:localhost/replica:0/task:0/cpu:0"](MergeSummary/MergeSummary)]]
Traceback (most recent call last):
File "dqn.py", line 213, in <module>
result = sess.run(opt_op)
File "/Users/home/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 385, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/Users/home/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 461, in _do_run
e.code)
tensorflow.python.framework.errors.FailedPreconditionError: Attempting to use uninitialized value train/output/bias/RMSProp
[[Node: RMSProp/update_train/output/bias/ApplyRMSProp = ApplyRMSProp[T=DT_FLOAT, use_locking=false, _device="/job:localhost/replica:0/task:0/cpu:0"](train/output/bias, train/output/bias/RMSProp, train/output/bias/RMSProp_1, RMSProp/learning_rate, RMSProp/decay, RMSProp/momentum, RMSProp/epsilon, clip_by_value_9)]]
Caused by op u'RMSProp/update_train/output/bias/ApplyRMSProp', defined at:
File "dqn.py", line 159, in qLearnMinibatch
opt_op = self.opt.apply_gradients(capped_grads_and_vars)
File "/Users/home/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 288, in apply_gradients
update_ops.append(self._apply_dense(grad, var))
File "/Users/home/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/rmsprop.py", line 103, in _apply_dense
grad, use_locking=self._use_locking).op
File "/Users/home/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/gen_training_ops.py", line 171, in apply_rms_prop
grad=grad, use_locking=use_locking, name=name)
File "/Users/home/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 659, in apply_op
op_def=op_def)
File "/Users/home/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1904, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/home/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1083, in __init__
self._traceback = _extract_stack()
Note that I don't get this error If am using the usual GradientDescentOptimizer. I am initializing my variables as you can see above but I don't know what 'train/output/bias/RMSProp' is because I don't create any such variable. I only have 'train/output/bias/' which does get initialized above.
Thanks!
So for people from the future running into similar trouble, I found this post helpful:
Tensorflow: Using Adam optimizer
Basically, I was running
sess.run(tf.initialize_all_variables())
before I had defined my loss minimization op
loss = tf.square(targets)
#create the gradient descent op
grads_and_vars = opt.compute_gradients(loss)
capped_grads_and_vars = [(tf.clip_by_value(g, -self.clip_delta, self.clip_delta), v) for g, v in grads_and_vars] #gradient capping
self.opt_op = self.opt.apply_gradients(capped_grads_and_vars)
This needs to be done before running the initialization op!

Tensorflow complaining about placeholder after model restore

I am having a problem with Tensorflow restoring models. I have a script that generates several models, based on a set of training files. These models are created with their own variable scopings, using
tf.variable_scope(myPrefix).
After training, I am able to restore the models using
tf.train.Saver(model_vars).restore(sess, model)
with model_vars computed as
all_vars = tf.all_variables()
model_vars=[k for k in all_vars if k.name.startswith(myPrefix)]
While the models do seem to load, running them produces a Placeholder-error (see below, 85314_tr_10 is my prefix).
I am pretty sure I do not skip any placeholders. The model just has two (x and y) and these are used by the eval call I make:
predictions = sess.run(pred, feed_dict={x: test_data, y:test_labels})
Here is the error trace:
W tensorflow/core/common_runtime/executor.cc:1076] 0x2ea0e60 Compute status: Invalid argument: You must feed a value for placeholder tensor '85314_tr_10/Placeholder' with dtype float
[[Node: 85314_tr_10/Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
File "../prediction/views.py", line 507, in <module>
predict("","85314","True","2015-11-12T09:08:00Z","2015-11-12T10:08:00Z")
File "../prediction/views.py", line 472, in predict
predictions= prediction.eval(feed_dict={x: test_data,y:test_labels}, session=sess)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 460, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2910, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 368, in run
results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 444, in _do_run
e.code)
Any help very much appreciated!