Could not read from TensorArray index 0. Possible you are working with resizeable TensorArray. stop_gradients isn't allowing gradients to be written - tensorflow

I am trying to reproduce the multi-gpu version of the code, with little change in the model architecture of ResNet (rest same) as given here https://github.com/FlyEgle/keras-yolo3. under train_height_point.py.
direct link : https://github.com/FlyEgle/keras-yolo3/blob/master/train_height_point.py
Error seems to be in the Yolo_loss function
I have tried modifying the while_loop and other tricks mentioned in other stackoverflow solutions
Gradients error using TensorArray Tensorflow
TensorArray TensorArray_1_0: Could not read from TensorArray index 0 because it has not yet been written to
https://github.com/tensorflow/tensorflow/issues/3663
When I run the code, I get the following error on the 1st epoch
Train on 62880 samples, val on 6976 samples, with batch size 1.
Epoch 1/400
2019-06-28 18:39:30.247036: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-28 18:39:30.251868: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_1_4: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-28 18:39:30.251942: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_2_5: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-28 18:39:31.368047: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Traceback (most recent call last):
File "train.py", line 517, in <module>
_main()
File "train.py", line 177, in _main
callbacks=[logging, lr_schedule, checkpoint]
File "/opt/conda/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
[[{{node replica_0/model_3/yolo_loss/TensorArrayStack/TensorArrayGatherV3}}]]
[[{{node loss/add_20}}]]

According to the stacktrace above, you need to pass a parameter named element_shape with fully defined like element_shape(10, 10, 10) instead of None or element_shape=(None, 10, 10). It seems that there could not be a unknown dimension.
I also have this problem, and try to find a better way to solve it.

Related

Invalid Argument Error Tensorflow Object Detection Training

I am training tensor flow object detection following the tensor flow API. I have trained many models in the past using the exact same steps. This model however keeps giving me the error message below. The error message references
InvalidArgumentError: image_size must contain 3 elements[4]
I searched the error and found
InvalidArgumentError: image_size must contain 3 elements[4] #3349
which shows the error and gives the solution of checking to make sure that all images are RGB. I used the code provided in that thread to check all images. I found about 15 images that were not RGB. I removed the images and the corresponding xml files. I recompiled the csv files and the tfrecord files and restarted the training. I received the error message again. I then tried to start the training over without resuming from the last checkpoint and I still received the error. The error does not happen on a regular basis. Sometimes the model will go for several thousand steps before a failure. I have also tried removing the random crop parameter from the pipeline.config file which had no affect.
Any help is appreciated.
Error Message:
INFO:tensorflow:global_step/sec: 2.03361
INFO:tensorflow:global step 4039: loss = 6.2836 (0.512 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, image_size must contain 3 elements[4]
[[Node: cond_2/RandomCropImage/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2],max_attempts=100, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_2/RandomCropImage/Shape, cond_2/RandomCropImage/ExpandDims, cond_2/RandomCropImage/PruneNonOverlappingBoxes/Const)]]
INFO:tensorflow:Recording summary at step 4039.
INFO:tensorflow:global step 4040: loss = 4.6984 (0.880 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "/floyd/object_detection/legacy/train.py", line 184, in <module>
tf.app.run()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, inrun
_sys.exit(main(argv))
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
return func(*args, **kwargs)
File "/floyd/object_detection/legacy/train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/floyd/object_detection/legacy/trainer.py", line 415, in train
saver=saver)
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 785, in train
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 833, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1244,in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409,in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: image_size must contain 3 elements[4]
[[Node: cond_2/RandomCropImage/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2],max_attempts=100, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_2/RandomCropImage/Shape, cond_2/RandomCropImage/ExpandDims, cond_2/RandomCropImage/PruneNonOverlappingBoxes/Const)]]
Thanks in advance.
so it was the RGB image problem. I had checked the images and removed the non RGB images and recreated the records, but the model was still pointing to the old records because the paths were very similar, I did not notice.

TF object detection API - Compute evaluation measures failed

I successfully trained a model on my own dataset, exported the inference graph and did the inference on my test dataset.
I now have
the detections as tfrecord file, specified in input config
an eval_config file with the specified metrics set
When I try to compute the measures like in the new object detector inference and evaluation measure computation tutorial with
python object_detection/metrics/offline_eval_map_corloc.py --eval_dir=/media/sf_shared --eval_config_path=/media/sf_shared/eval_config.pbtxt --input_config_path=/media/sf_shared/input_config.pbtxt
It returns this AttributeError:
INFO:tensorflow:Processing file: /media/sf_shared/detections.record
INFO:tensorflow:Processed 0 images...
Traceback (most recent call last):
File "object_detection/metrics/offline_eval_map_corloc.py", line 173, in <module>
tf.app.run(main)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/metrics/offline_eval_map_corloc.py", line 166, in main
metrics = read_data_and_evaluate(input_config, eval_config)
File "object_detection/metrics/offline_eval_map_corloc.py", line 124, in read_data_and_evaluate
decoded_dict)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/models/research/object_detection/utils/object_detection_evaluation.py", line 174, in add_single_ground_truth_image_info
(groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
AttributeError: 'NoneType' object has no attribute 'size'
Any hints?
I fixed it (temporarily) as follows:
if (standard_fields.InputDataFields.groundtruth_difficult in groundtruth_dict.keys()) and groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]:
if groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult].size or not groundtruth_classes.size:
groundtruth_difficult = groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
In place of the existing lines (195-198) in
object_detection/metrutils/object_detection_evaluation.py
The error is caused due to the fact that, even in the case there is no difficulty flag passed, the size of the object is being checked for.
This is an error if you skipped that parameter in your tf records.
Perhaps this was the intent of the developers, but the clarity of documentation certainly leaves a lot to be desired for.

Tensorflow: working tf.while_loop does not work as part of Dataset API input pipeline

My problem is an image keypoint recognition task on images of snails. I have found that although there are many prewritten image augmentation functions for classification tasks (such as Keras' ImageDataGenerator), there are none that I can find suitable for this problem, which requires changes to the output keypoints to match the random transformations of the image. Hence I am writing my own to be mapped onto the dataset as it is read from TFRecord.
The logic I am using involves a while loop which continues to generate random transformations (rotation + shift + zoom etc.) and apply them the real keypoints until it finds a set of transformations where the keypoints fit into the image. This is to avoid transformations that leave part of the snail outside the image. It would then apply those same transformations to the image and return them.
My problem is that, while I have successfully got this augmentation function to work on a single test set of keypoints, when I use the same function as part of my input pipeline, it does not work, throwing the following error: 'Merge can not have more than one valid input' (full trace included at end). I have not been able to find an explanation anywhere.
# Defining cond argument to while loop.'ph' are placeholders to match numbers of arguments for tf.while_loop
def not_fit_in_image(landmarks, ph2, ph3, ph4, ph5, ph6):
# tf logical operators to find if landmarks fit in image
return landmarks_not_fit_in_image
def augmentation_function(image, original_landmarks):
def body(ph1, ph2, ph3, ph4, ph5, ph6):
shift = tf.random_uniform([1, 2], -shift_max, shift_max, tf.float32)
landmarks = original_landmarks + shift
# More random transformations generated and applied
return landmarks, rotation, shift, zoom, y_over_x_proportion_change, shear
# placeholders to match number of arguments
ph_a = tf.constant(0, dtype=tf.float32)
landmarks, rotation, shift, zoom, y_over_x_proportion_change, shear = tf.while_loop(not_fit_in_image, body, [original_landmarks, ph_a, ph_b, ph_a, ph_a, ph_a])
# In future, would now apply these same transformations to image.
return image, landmarks
# Setting up input data pipeline using Dataset API
train = tf.data.TFRecordDataset(train_data_tfrecords).map(parse_function)
train = train.map(augmentation_function) # Using the above augmentation function
train = train.repeat().shuffle(buffer_size).batch(batch_size)
# ... Set up handle, iterator, init ops ... all works ...
with tf.Session() as sess:
train_handle = sess.run(train_iterator.string_handle())
sess.run(train_init_op)
train_images, train_landmarks = sess.run(next_batch, feed_dict={handle: train_handle})
The following error occurs:
2017-11-10 13:08:14.449612: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Internal: Merge can not have more than one valid input.
[[Node: while/Merge_5 = Merge[N=2, T=DT_FLOAT](while/Enter_5, while/NextIteration_5)]]
Traceback (most recent call last):
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
return fn(*args)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn
status, run_metadata)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Merge can not have more than one valid input.
[[Node: while/Merge_5 = Merge[N=2, T=DT_FLOAT](while/Enter_5, while/NextIteration_5)]]
[[Node: IteratorGetNext = IteratorGetNext[output_shapes=[[?,384,384], [?,15,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorFromStringHandle)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/hanne/Documents/Tensorflow Projects/Snails/random_rotations_working_while_loop_experiments.py", line 143, in <module>
train_images, train_landmarks = sess.run(next_batch, feed_dict={handle: train_handle})
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
run_metadata_ptr)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
options, run_metadata)
File "C:\Users\hanne\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Merge can not have more than one valid input.
[[Node: while/Merge_5 = Merge[N=2, T=DT_FLOAT](while/Enter_5, while/NextIteration_5)]]
[[Node: IteratorGetNext = IteratorGetNext[output_shapes=[[?,384,384], [?,15,2]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorFromStringHandle)]]
This is my first time asking a question on stack overflow, so any comments about how to write better questions are also very welcome! I have tried to strip down the code above as much as I can for brevity and it is hence minimal but NOT complete or verifiable - let me know if I should include more code.
EDIT
I was able to figure out what was wrong! tf.while_loop acts like a python while loop, checking the condition before each run of 'body', which includes THE VERY FIRST RUN. The argument 'loop_vars' takes the variables for this first check. I had entered placeholder values of the wrong format to 'loop_vars', which caused the error above. A good way around this, which worked for me, is to enter the result of a first run of 'body' to the loop_vars variable, as this is assured of being of the right form.

tensorflow gradient: unsupported operand type

I got the following error:
anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
trainstep = tf.train.AdamOptimizer(0.0001).minimize(lossobj)
File "anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 196, in minimize
grad_loss=grad_loss)
File "anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 253, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gradients.py", line 469, in gradients
in_grads = _AsList(grad_fn(op, *out_grads))
File "anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/array_grad.py", line 504, in _ExtractImagePatchesGrad
rows_out = int(ceil(rows_in / stride_r))
TypeError: unsupported operand type(s) for /: 'NoneType' and 'long'
there is look like gather ops is wrong.
I see that this is an old issue, but I have found a quick work-around for some cases of this. Chances are, you are feeding your input using a placeholder and one of the dimensions of the placeholder shape is "None". If you set that dimension to your batch size, it will no longer be an unknown shape.

error while merging summaries for tensorboard

I am trying to generate the graph for MNIST beginner tutorial but is getting the following error. For some reason, merged_summary_op object is None.
Traceback (most recent call last):
File "mnist1.py", line 48, in <module>
summary_str = sess.run(merged_summary_op)
File "/home/vagrant/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 307, in run
% (subfetch, fetch, type(subfetch), e.message))
TypeError: Fetch argument None of None has invalid type <type 'NoneType'>, must be a string or Tensor. (Can not convert a NoneType into a Tensor or Operation.)
I think I am missing a step here. I launched the session first and then running the statement:
merged_summary_op = tf.merge_all_summaries()
I had the same error.
In my case, adding at least one tf.scalar_summary() before calling tf.merge_all_summaries() solved the problem.
For example,
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
tf.scalar_summary("cross_entropy", cross_entropy)
merged_summary_op = tf.merge_all_summaries()
I hope this snippet helps you.