I am using the Tensorflow seq2seq tutorial to play with machine translation. Say I have trained the model for some time and determine that I want to supplement the original vocab with new words to enhance the quality of the model. Is there a way to pause training, add words to the vocabulary, and then resume training from the most recent checkpoint? I attempted to do so but when I began training again I got this error:
Traceback (most recent call last):
File "execute.py", line 405, in <module>
train()
File "execute.py", line 127, in train
model = create_model(sess, False)
File "execute.py", line 108, in create_model
model.saver.restore(session, ckpt.model_checkpoint_path)
File "/home/jrthom18/.local/lib/python2.7/site- packages/tensorflow/python/training/saver.py", line 1388, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [384633] rhs shape= [384617]
[[Node: save/Assign_82 = Assign[T=DT_FLOAT, _class=["loc:#proj_b"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](proj_b, save/RestoreV2_82)]]
Caused by op u'save/Assign_82', defined at:
File "execute.py", line 405, in <module>
train()
File "execute.py", line 127, in train
model = create_model(sess, False)
File "execute.py", line 99, in create_model
model = seq2seq_model.Seq2SeqModel( gConfig['enc_vocab_size'], gConfig['dec_vocab_size'], _buckets, gConfig['layer_size'], gConfig['num_layers'], gConfig['max_gradient_norm'], gConfig['batch_size'], gConfig['learning_rate'], gConfig['learning_rate_decay_factor'], forward_only=forward_only)
File "/home/jrthom18/data/3x256_bs32/easy_seq2seq/seq2seq_model.py", line 166, in __init__
self.saver = tf.train.Saver(tf.global_variables(), keep_checkpoint_every_n_hours=2.0)
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1000, in __init__
self.build()
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1030, in build
restore_sequentially=self._restore_sequentially)
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 624, in build
restore_sequentially, reshape)
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 373, in _AddRestoreOps
assign_ops.append(saveable.restore(tensors, shapes))
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 130, in restore
self.op.get_shape().is_fully_defined())
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
use_locking=use_locking, name=name)
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/jrthom18/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [384633] rhs shape= [384617]
[[Node: save/Assign_82 = Assign[T=DT_FLOAT, _class=["loc:#proj_b"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](proj_b, save/RestoreV2_82)]]
Obviously the new vocab is larger and so the tensor sizes do not match. Is there some way around this?
You can not update your vocab once its set but you can always use shared wordpiece model. It will help you to directly copy out of vocab words form source to target output.
Related
I am trying to run the tensorflow object detection api. I have made a dataset with 1 class using labelImg and then converting the xmls to tfrecord files. Some information:
os: Ubuntu 16.04
gpu: nvidia geforce 1080Ti & 1060
tensorflow version: 1.3.0
training model: faster_rcnn_resnet101_coco (although I have tried others)
Classes: 1
I then run train.py and training begins. When I get ~step 355 I get the error:
INFO:tensorflow:global step 363: loss = 1.4006 (0.294 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "object_detection/train.py", line 163, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/ucfadng/tensorflow/models/research/object_detection/trainer.py", line 332, in train
saver=saver)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 767, in train
sv.stop(threads, close_summary_writer=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 296, in stop_on_exception
yield
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 494, in run
self.run_loop()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 994, in run_loop
self._sv.global_step])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1
[[Node: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1/tag, SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma/read)]]
[[Node: Loss/RPNLoss/map/TensorArray_2/_1353 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_5239_Loss/RPNLoss/map/TensorArray_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1', defined at:
File "object_detection/train.py", line 163, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/ucfadng/tensorflow/models/research/object_detection/trainer.py", line 295, in train
global_summaries.add(tf.summary.histogram(model_var.op.name, model_var))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 192, in histogram
tag=tag, values=values, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 129, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Nan in summary histogram for: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1
[[Node: SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma_1/tag, SecondStageFeatureExtractor/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma/read)]]
[[Node: Loss/RPNLoss/map/TensorArray_2/_1353 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_5239_Loss/RPNLoss/map/TensorArray_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
My label map is:
item {
id: 1
name: 'rail'
}
The pipeline used was downloaded from: https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/faster_rcnn_resnet101_coco.config
I have had a look at the TFRecord files and look as expected which leads be to think this is something to do with the initial dataset. But what specifically would cause this error?
My dataset consists of 250 clear images containing railway tracks. I have labelled sections of the tracks so each image has ~20/30 labelled objects.
So far the main thing I have tried is lowering learning rates and changing batch size but these did not solve the problem.
Any help in solving this would be very appreciated.
Cheers
I have now updated to version: 1.4.0 and this is no longer a problem. I did not change anything else. Perhaps this was a bug with version: 1.3.0. I will leave this up just in case anyone has the same issue.
This doesn't seem to be an error with tensorflow itself, i believe it has something to do with your system running out of memory.
Setting train to false in batch_norm (within your pipeline.config file) appears to overcome this problem.
It should look something like this:
batch_norm {
decay: 0.999
center: true
scale: true
epsilon: 0.001
train: false
}
Hope this helped.
Traceback (most recent call last):
File "C:/Users/ljz/PycharmProjects/TFConvLSTM/LSTMnetwork.py", line 136, in <module>
sess.run(train_step, feed_dict={x: X_train[start:end], y_: y_train[start:end]})
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 767, in run
run_metadata_ptr)
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1500,18] vs. [22500,18]
[[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_Placeholder_1_0, Log)]]
Caused by op 'mul', defined at:
File "C:/Users/ljz/PycharmProjects/TFConvLSTM/LSTMnetwork.py", line 113, in <module>
cross_entropy=tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y_conv),reduction_indices=[1]))
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\math_ops.py", line 884, in binary_op_wrapper
return func(x, y, name=name)
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1105, in _mul_dispatch
return gen_math_ops._mul(x, y, name=name)
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 1625, in _mul
result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 763, in apply_op
op_def=op_def)
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Users\ljz\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1264, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Incompatible shapes: [1500,18] vs. [22500,18]
[[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_Placeholder_1_0, Log)]]
Caused by op 'mul', defined at: File "C:/Users/ljz/PycharmProjects/TFConvLSTM/LSTMnetwork.py", line 113, in cross_entropy=tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y_conv),reduction_indices=[1]))
and
InvalidArgumentError (see above for traceback): Incompatible shapes: [1500,18] vs. [22500,18] [[Node: mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_Placeholder_1_0, Log)]]
=> Make sure y and y_conv have identical or broadcastable shape.
Problematic code:
cross_entropy=tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y_conv),reduction_indices=[1]))
As you are trying to multiply two tensors y_ and y_conv, they must have the same shape, i.e. either both [1500,18] or both [22500,18]
In your case y_ should be input labels and y_conv should be calculated logits, and 18 should be the number of output neurons. Therefore I guess your batch size of input data (x) and label(y_) are not the same. Check it and make them the same.
I got an error when I changed new images to train the im2txt model. Don't know why.
Build the model.
bazel build -c opt im2txt/...
bazel-bin/im2txt/train
--input_file_pattern="${MY_DATA_DIR}/train-?????-of-00256"
--inception_checkpoint_file="${INCEPTION_CHECKPOINT}"
--train_dir="${MODEL_DIR}/train"
--train_inception=false
--number_of_steps=10000
It went to error when running below sentence
sequence_length = tf.reduce_sum(self.input_mask, 1)
lstm_outputs, _ = tf.nn.dynamic_rnn(cell=lstm_cell,
inputs=self.seq_embeddings,
sequence_length=sequence_length,
initial_state=initial_state,
dtype=tf.float32,
scope=lstm_scope)
The detail info is below
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:global step 1: loss = 9.5415 (37.21 sec/step)
INFO:tensorflow:global step 2: loss = 6.6332 (12.90 sec/step)
INFO:tensorflow:global step 3: loss = 3.1327 (13.01 sec/step)
INFO:tensorflow:global step 4: loss = 6.2893 (12.04 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.UnimplementedError'>, TensorArray has size zero, but element shape is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
[[Node: OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:#lstm/lstm/TensorArray_1"], dtype=DT_FLOAT, element_shape=, _device="/job:localhost/replica:0/task:0/cpu:0"](OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/TensorArrayGradV3, lstm/lstm/TensorArrayUnstack/range, OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/gradient_flow)]]
Caused by op u'OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3', defined at:
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 155, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 135, in main
learning_rate_decay_fn=learning_rate_decay_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/optimizers.py", line 226, in optimize_loss
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 345, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 482, in gradients
in_grads = grad_fn(op, *out_grads)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_grad.py", line 186, in _TensorArrayScatterGrad
grad = g.gather(indices)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_ops.py", line 328, in gather
element_shape=element_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2226, in _tensor_array_gather_v3
element_shape=element_shape, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()
...which was originally created as op u'lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3', defined at:
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 155, in
tf.app.run()
[elided 0 identical lines from previous traceback]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 89, in main
model.build()
File "/data/projects/content_creator/image2text/im2txt/im2txt/show_and_tell_model.py", line 437, in build
self.build_model()
File "/data/projects/content_creator/image2text/im2txt/im2txt/show_and_tell_model.py", line 356, in build_model
scope=lstm_scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 546, in dynamic_rnn
dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 664, in dynamic_rnn_loop
for ta, input in zip(input_ta, flat_input))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 664, in
for ta, input in zip(input_ta, flat_input))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_ops.py", line 380, in unstack
indices=math_ops.range(0, num_elements), value=value, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_ops.py", line 408, in scatter
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2492, in _tensor_array_scatter_v3
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
UnimplementedError (see above for traceback): TensorArray has size zero, but element shape is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
[[Node: OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:#lstm/lstm/TensorArray_1"], dtype=DT_FLOAT, element_shape=, _device="/job:localhost/replica:0/task:0/cpu:0"](OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/TensorArrayGradV3, lstm/lstm/TensorArrayUnstack/range, OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/gradient_flow)]]
Traceback (most recent call last):
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 155, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 152, in main
saver=saver)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 793, in train
train_step_kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 530, in train_step
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnimplementedError: TensorArray has size zero, but element shape is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
[[Node: OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:#lstm/lstm/TensorArray_1"], dtype=DT_FLOAT, element_shape=, _device="/job:localhost/replica:0/task:0/cpu:0"](OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/TensorArrayGradV3, lstm/lstm/TensorArrayUnstack/range, OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/gradient_flow)]]
Caused by op u'OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3', defined at:
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 155, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 135, in main
learning_rate_decay_fn=learning_rate_decay_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/optimizers.py", line 226, in optimize_loss
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 345, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 482, in gradients
in_grads = grad_fn(op, *out_grads)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_grad.py", line 186, in _TensorArrayScatterGrad
grad = g.gather(indices)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_ops.py", line 328, in gather
element_shape=element_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2226, in _tensor_array_gather_v3
element_shape=element_shape, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()
...which was originally created as op u'lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3', defined at:
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 155, in
tf.app.run()
[elided 0 identical lines from previous traceback]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/data/projects/content_creator/image2text/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 89, in main
model.build()
File "/data/projects/content_creator/image2text/im2txt/im2txt/show_and_tell_model.py", line 437, in build
self.build_model()
File "/data/projects/content_creator/image2text/im2txt/im2txt/show_and_tell_model.py", line 356, in build_model
scope=lstm_scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 546, in dynamic_rnn
dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 664, in dynamic_rnn_loop
for ta, input in zip(input_ta, flat_input))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 664, in
for ta, input in zip(input_ta, flat_input))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_ops.py", line 380, in unstack
indices=math_ops.range(0, num_elements), value=value, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_ops.py", line 408, in scatter
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2492, in _tensor_array_scatter_v3
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
UnimplementedError (see above for traceback): TensorArray has size zero, but element shape is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
[[Node: OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:#lstm/lstm/TensorArray_1"], dtype=DT_FLOAT, element_shape=, _device="/job:localhost/replica:0/task:0/cpu:0"](OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/TensorArrayGradV3, lstm/lstm/TensorArrayUnstack/range, OptimizeLoss/gradients/lstm/lstm/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/gradient_flow)]]
I met a runtime error during training when I tried to applied tensorflow built-in CTC loss function (https://www.tensorflow.org/versions/r0.10/api_docs/python/nn/conectionist_temporal_classification__ctc_) to SynthText dataset. http://www.robots.ox.ac.uk/~vgg/data/scenetext/
It said " Not enough time for target transition sequence (required: 4, available: 0)".
Here is some info for the environment: tensorflow version '0.12.0-rc0'.
I am able to apply tensorflow built-in CTC to Synth90K Dataset with great performance.
It seems like the SynthText dataset is not compilable with tensorflow built-in CTC but Synth90K Dataset could.
Please find the error message as reference
step 980, loss = 50.17 (92.1 examples/sec; 0.695 sec/batch)
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Not enough time for target transition sequence (required: 4, available: 0), skipping data instance in batch: 28
Traceback (most recent call last):
File "multi-gpu-train.py", line 305, in
tf.app.run()
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "multi-gpu-train.py", line 301, in main
train()
File "multi-gpu-train.py", line 270, in train
_, loss_value = sess.run([train_op, loss])
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Not enough time for target transition sequence (required: 4, available: 0), skipping data instance in batch: 28
[[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/transpose_2/_555, tower_0/Where, tower_0/sub_2/_557, tower_0/Sum_1/_559)]]
Caused by op u'tower_0/CTCLoss', defined at:
File "multi-gpu-train.py", line 305, in
tf.app.run()
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "multi-gpu-train.py", line 301, in main
train()
File "multi-gpu-train.py", line 179, in train
loss,logits_op,images,labels = tower_loss(scope)
File "multi-gpu-train.py", line 79, in tower_loss
_ = network2.loss(logits,images, labels)
File "/home/ubuntu/experiments/network2_dev/network2.py", line 61, in loss
out = tf.nn.ctc_loss(logit, to_sparse(y), seq_len, time_major=False)
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/ctc_ops.py", line 145, in ctc_loss
ctc_merge_repeated=ctc_merge_repeated)
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 164, in _ctc_loss
name=name)
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/ubuntu/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in init
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Not enough time for target transition sequence (required: 4, available: 0), skipping data instance in batch: 28
[[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/transpose_2/_555, tower_0/Where, tower_0/sub_2/_557, tower_0/Sum_1/_559)]]
I am trying to reverse my inputs with array_ops.reverse_sequence() before sending it to dynamic_rnn(), the inference graph can be build with no problem, but when building the training graph, I got the following error:
Traceback (most recent call last):
File "bin/trainer.py", line 158, in <module>
kmer_len=args.kmer_len)
File "/home/ubuntu/GIT/IvyMike/ivymike/base_model.py", line 193, in run_training
train_op = model.training(loss, learning_rate)
File "/home/ubuntu/GIT/IvyMike/ivymike/base_model.py", line 100, in training
train_op = optimizer.minimize(loss, global_step=global_step)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 190, in minimize
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 241, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients.py", line 481, in gradients
in_grads = _AsList(grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_grad.py", line 307, in _ReverseSequenceGrad
seq_lengths=seq_lengths),
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1143, in reverse_sequence
batch_dim=batch_dim, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2119, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1586, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1257, in _ReverseSequenceShape
(batch_dim, input_shape.ndims))
TypeError: %d format: a number is required, not NoneType
Any idea what went wrong?
This has been fixed at the mater in TensorFlow