tensorflow gradient: unsupported operand type - tensorflow

I got the following error:
anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
trainstep = tf.train.AdamOptimizer(0.0001).minimize(lossobj)
File "anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 196, in minimize
grad_loss=grad_loss)
File "anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 253, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gradients.py", line 469, in gradients
in_grads = _AsList(grad_fn(op, *out_grads))
File "anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/array_grad.py", line 504, in _ExtractImagePatchesGrad
rows_out = int(ceil(rows_in / stride_r))
TypeError: unsupported operand type(s) for /: 'NoneType' and 'long'
there is look like gather ops is wrong.

I see that this is an old issue, but I have found a quick work-around for some cases of this. Chances are, you are feeding your input using a placeholder and one of the dimensions of the placeholder shape is "None". If you set that dimension to your batch size, it will no longer be an unknown shape.

Related

Could not read from TensorArray index 0. Possible you are working with resizeable TensorArray. stop_gradients isn't allowing gradients to be written

I am trying to reproduce the multi-gpu version of the code, with little change in the model architecture of ResNet (rest same) as given here https://github.com/FlyEgle/keras-yolo3. under train_height_point.py.
direct link : https://github.com/FlyEgle/keras-yolo3/blob/master/train_height_point.py
Error seems to be in the Yolo_loss function
I have tried modifying the while_loop and other tricks mentioned in other stackoverflow solutions
Gradients error using TensorArray Tensorflow
TensorArray TensorArray_1_0: Could not read from TensorArray index 0 because it has not yet been written to
https://github.com/tensorflow/tensorflow/issues/3663
When I run the code, I get the following error on the 1st epoch
Train on 62880 samples, val on 6976 samples, with batch size 1.
Epoch 1/400
2019-06-28 18:39:30.247036: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-28 18:39:30.251868: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_1_4: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-28 18:39:30.251942: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_2_5: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-28 18:39:31.368047: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Traceback (most recent call last):
File "train.py", line 517, in <module>
_main()
File "train.py", line 177, in _main
callbacks=[logging, lr_schedule, checkpoint]
File "/opt/conda/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
[[{{node replica_0/model_3/yolo_loss/TensorArrayStack/TensorArrayGatherV3}}]]
[[{{node loss/add_20}}]]
According to the stacktrace above, you need to pass a parameter named element_shape with fully defined like element_shape(10, 10, 10) instead of None or element_shape=(None, 10, 10). It seems that there could not be a unknown dimension.
I also have this problem, and try to find a better way to solve it.

Invalid Argument Error Tensorflow Object Detection Training

I am training tensor flow object detection following the tensor flow API. I have trained many models in the past using the exact same steps. This model however keeps giving me the error message below. The error message references
InvalidArgumentError: image_size must contain 3 elements[4]
I searched the error and found
InvalidArgumentError: image_size must contain 3 elements[4] #3349
which shows the error and gives the solution of checking to make sure that all images are RGB. I used the code provided in that thread to check all images. I found about 15 images that were not RGB. I removed the images and the corresponding xml files. I recompiled the csv files and the tfrecord files and restarted the training. I received the error message again. I then tried to start the training over without resuming from the last checkpoint and I still received the error. The error does not happen on a regular basis. Sometimes the model will go for several thousand steps before a failure. I have also tried removing the random crop parameter from the pipeline.config file which had no affect.
Any help is appreciated.
Error Message:
INFO:tensorflow:global_step/sec: 2.03361
INFO:tensorflow:global step 4039: loss = 6.2836 (0.512 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, image_size must contain 3 elements[4]
[[Node: cond_2/RandomCropImage/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2],max_attempts=100, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_2/RandomCropImage/Shape, cond_2/RandomCropImage/ExpandDims, cond_2/RandomCropImage/PruneNonOverlappingBoxes/Const)]]
INFO:tensorflow:Recording summary at step 4039.
INFO:tensorflow:global step 4040: loss = 4.6984 (0.880 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "/floyd/object_detection/legacy/train.py", line 184, in <module>
tf.app.run()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, inrun
_sys.exit(main(argv))
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
return func(*args, **kwargs)
File "/floyd/object_detection/legacy/train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/floyd/object_detection/legacy/trainer.py", line 415, in train
saver=saver)
File "/usr/local/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 785, in train
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 833, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1244,in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409,in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: image_size must contain 3 elements[4]
[[Node: cond_2/RandomCropImage/sample_distorted_bounding_box/SampleDistortedBoundingBoxV2 = SampleDistortedBoundingBoxV2[T=DT_INT32, area_range=[0.1, 1], aspect_ratio_range=[0.5, 2],max_attempts=100, seed=0, seed2=0, use_image_if_no_bounding_boxes=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_2/RandomCropImage/Shape, cond_2/RandomCropImage/ExpandDims, cond_2/RandomCropImage/PruneNonOverlappingBoxes/Const)]]
Thanks in advance.
so it was the RGB image problem. I had checked the images and removed the non RGB images and recreated the records, but the model was still pointing to the old records because the paths were very similar, I did not notice.

'MemoryError' when padding sequences using tensorflow

I am trying to training my model on an AWS instance 'g2.2xlarge' but getting a 'MemoryError' when trying to add paddings to my sequences.
content_array = keras.preprocessing.sequence.pad_sequences(content_array, maxlen=max_sequence_length,
padding='post')
Getting this error:
Traceback (most recent call last):
File "trainer.py", line 185, in <module>
train()
File "trainer.py", line 52, in train
padding='post')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/preprocessing/sequence.py", line 94, in pad_sequences
x = (np.ones((num_samples, maxlen) + sample_shape) * value).astype(dtype)
MemoryError
Any idea why ? I haven't started training the model even.
I was calculating the maximum sequence length incorrectly which led to a huge number. After correcting it I am not having any issues.

TF object detection API - Compute evaluation measures failed

I successfully trained a model on my own dataset, exported the inference graph and did the inference on my test dataset.
I now have
the detections as tfrecord file, specified in input config
an eval_config file with the specified metrics set
When I try to compute the measures like in the new object detector inference and evaluation measure computation tutorial with
python object_detection/metrics/offline_eval_map_corloc.py --eval_dir=/media/sf_shared --eval_config_path=/media/sf_shared/eval_config.pbtxt --input_config_path=/media/sf_shared/input_config.pbtxt
It returns this AttributeError:
INFO:tensorflow:Processing file: /media/sf_shared/detections.record
INFO:tensorflow:Processed 0 images...
Traceback (most recent call last):
File "object_detection/metrics/offline_eval_map_corloc.py", line 173, in <module>
tf.app.run(main)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/metrics/offline_eval_map_corloc.py", line 166, in main
metrics = read_data_and_evaluate(input_config, eval_config)
File "object_detection/metrics/offline_eval_map_corloc.py", line 124, in read_data_and_evaluate
decoded_dict)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/models/research/object_detection/utils/object_detection_evaluation.py", line 174, in add_single_ground_truth_image_info
(groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
AttributeError: 'NoneType' object has no attribute 'size'
Any hints?
I fixed it (temporarily) as follows:
if (standard_fields.InputDataFields.groundtruth_difficult in groundtruth_dict.keys()) and groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]:
if groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult].size or not groundtruth_classes.size:
groundtruth_difficult = groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
In place of the existing lines (195-198) in
object_detection/metrutils/object_detection_evaluation.py
The error is caused due to the fact that, even in the case there is no difficulty flag passed, the size of the object is being checked for.
This is an error if you skipped that parameter in your tf records.
Perhaps this was the intent of the developers, but the clarity of documentation certainly leaves a lot to be desired for.

error while merging summaries for tensorboard

I am trying to generate the graph for MNIST beginner tutorial but is getting the following error. For some reason, merged_summary_op object is None.
Traceback (most recent call last):
File "mnist1.py", line 48, in <module>
summary_str = sess.run(merged_summary_op)
File "/home/vagrant/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 307, in run
% (subfetch, fetch, type(subfetch), e.message))
TypeError: Fetch argument None of None has invalid type <type 'NoneType'>, must be a string or Tensor. (Can not convert a NoneType into a Tensor or Operation.)
I think I am missing a step here. I launched the session first and then running the statement:
merged_summary_op = tf.merge_all_summaries()
I had the same error.
In my case, adding at least one tf.scalar_summary() before calling tf.merge_all_summaries() solved the problem.
For example,
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
tf.scalar_summary("cross_entropy", cross_entropy)
merged_summary_op = tf.merge_all_summaries()
I hope this snippet helps you.