Error in 'ValidationMonitor' when passing 'metrics' parameter - tensorflow

I'm using the following code to log accuracy as the validation measure (TensorFlow 0.10):
validation_metrics = {"accuracy": tf.contrib.metrics.streaming_accuracy}
validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
input_fn=input_fn_eval,
every_n_steps=FLAGS.eval_every,
# metrics=validation_metrics,
early_stopping_rounds=500,
early_stopping_metric="loss",
early_stopping_metric_minimize=True)
After running, in 'every_n_steps', I see the following lines in the output:
INFO:tensorflow:Validation (step 1000): loss = 1.04875, global_step = 900
The problem is that when 'metrics=validation_metrics' parameter uncomment in the above code, I get the following error in the validation phase:
INFO:tensorflow:Error reported to Coordinator: <type 'exceptions.TypeError'>, Input 'y' of 'Equal' Op has type int64 that does not match type float32 of argument 'x'.
E tensorflow/core/client/tensor_c_api.cc:485] Enqueue operation was cancelled
[[Node: read_batch_features_train/file_name_queue/file_name_queue_EnqueueMany = QueueEnqueueMany[Tcomponents=[DT_STRING], _class=["loc:#read_batch_features_train/file_name_queue"], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](read_batch_features_train/file_name_queue, read_batch_features_train/file_name_queue/RandomShuffle)]]
E tensorflow/core/client/tensor_c_api.cc:485] Enqueue operation was cancelled
[[Node: read_batch_features_train/random_shuffle_queue_EnqueueMany = QueueEnqueueMany[Tcomponents=[DT_STRING, DT_STRING], _class=["loc:#read_batch_features_train/random_shuffle_queue"], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](read_batch_features_train/random_shuffle_queue, read_batch_features_train/read/ReaderReadUpTo, read_batch_features_train/read/ReaderReadUpTo:1)]]
Traceback (most recent call last):
File "udc_train.py", line 74, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "udc_train.py", line 70, in main
estimator.fit(input_fn=input_fn_train, steps=None, monitors=[validation_monitor])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 240, in fit
max_steps=max_steps)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 578, in _train_model
max_steps=max_steps)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/graph_actions.py", line 280, in _supervised_train
None)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/supervised_session.py", line 270, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/recoverable_session.py", line 54, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/coordinated_session.py", line 70, in run
self._coord.join(self._coordinated_threads_to_join)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 357, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/coordinated_session.py", line 66, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/monitored_session.py", line 107, in run
induce_stop = monitor.step_end(monitors_step, monitor_outputs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 396, in step_end
return self.every_n_step_end(step, output)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 687, in every_n_step_end
steps=self.eval_steps, metrics=self.metrics, name=self.name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 356, in evaluate
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 630, in _evaluate_model
eval_dict = self._get_eval_ops(features, targets, metrics)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 877, in _get_eval_ops
result[name] = metric(predictions, targets)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/metrics/python/ops/metric_ops.py", line 432, in streaming_accuracy
is_correct = math_ops.to_float(math_ops.equal(predictions, labels))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 708, in equal
result = _op_def_lib.apply_op("Equal", x=x, y=y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 468, in apply_op
inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Equal' Op has type int64 that does not match type float32 of argument 'x'.

This looks like a problem with your input_fn and your estimator, which are returning different types for the label.

Related

Error when training model Faster-RCNN: NotImplementedError: Cannot convert a symbolic Tensor (cond_2/strided_slice:0) to a numpy array

I am trying out a Faster-RCNN tutorial on colab:
https://colab.research.google.com/drive/1U3fkRu6-hwjk7wWIpg-iylL2u5T9t7rr#scrollTo=uQCnYPVDrsgx
But in the training part, I have received this error
The code is:
!python3 /content/models/research/object_detection/model_main.py
--pipeline_config_path={pipeline_fname}
--model_dir={model_dir}
--alsologtostderr
--num_train_steps={num_steps}
--num_eval_steps={num_eval_steps}
The output:
Using TensorFlow backend.
Cause: module 'gast' has no attribute 'Index'
Traceback (most recent call last):
File "/content/models/research/object_detection/model_main.py", line 109, in
tf.app.run()
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/content/models/research/object_detection/model_main.py", line 105, in main
tf_estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
return executor.run()
File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 613, in run
return self.run_local()
File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
saving_listeners=saving_listeners)
File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
input_fn, ModeKeys.TRAIN))
File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1025, in _get_features_and_labels_from_input_fn
self._call_input_fn(input_fn, mode))
File "/tensorflow-1.15.2/python3.7/tensorflow_estimator/python/estimator/estimator.py", line 1116, in _call_input_fn
return input_fn(**kwargs)
File "/content/models/research/object_detection/inputs.py", line 765, in _train_input_fn
params=params)
File "/content/models/research/object_detection/inputs.py", line 908, in train_input
reduce_to_frame_fn=reduce_to_frame_fn)
File "/content/models/research/object_detection/builders/dataset_builder.py", line 252, in build
input_reader_config)
File "/content/models/research/object_detection/builders/dataset_builder.py", line 237, in dataset_map_fn
fn_to_map, num_parallel_calls=num_parallel_calls)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 1950, in map_with_legacy_function
use_legacy_function=True))
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 3472, in init
use_legacy_function=use_legacy_function)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2689, in init
self._function.add_to_graph(ops.get_default_graph())
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 545, in add_to_graph
self._create_definition_if_needed()
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 377, in _create_definition_if_needed
self._create_definition_if_needed_impl()
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 408, in _create_definition_if_needed_impl
capture_resource_var_by_value=self._capture_resource_var_by_value)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/function.py", line 944, in func_graph_from_py_func
outputs = func(*func_graph.inputs)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2681, in wrapper_fn
ret = _wrapper_helper(*args)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/data/ops/dataset_ops.py", line 2652, in _wrapper_helper
ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
raise e.ag_error_metadata.to_exception(e)
NotImplementedError: in converted code:
/content/models/research/object_detection/data_decoders/tf_example_decoder.py:580 decode
default_groundtruth_weights)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py:507 new_func
return func(*args, **kwargs)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/control_flow_ops.py:1235 cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/control_flow_ops.py:1061 BuildCondBranch
original_result = fn()
/content/models/research/object_detection/data_decoders/tf_example_decoder.py:573 default_groundtruth_weights
dtype=tf.float32)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py:2560 ones
output = _constant_if_small(one, shape, dtype, name)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/array_ops.py:2295 _constant_if_small
if np.prod(shape) < 1000:
<array_function internals>:6 prod
/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:3052 prod
keepdims=keepdims, initial=initial, where=where)
/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:86 _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/ops.py:736 array
" array.".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (cond_2/strided_slice:0) to a numpy array.
I have tried
pip install numpy==1.19.5
But it does not work and received another error
Traceback (most recent call last):
File "/content/models/research/object_detection/model_main.py", line 26, in
from object_detection import model_lib
File "/content/models/research/object_detection/model_lib.py", line 30, in
from object_detection import eval_util
File "/content/models/research/object_detection/eval_util.py", line 35, in
from object_detection.metrics import coco_evaluation
File "/content/models/research/object_detection/metrics/coco_evaluation.py", line 25, in
from object_detection.metrics import coco_tools
File "/content/models/research/object_detection/metrics/coco_tools.py", line 51, in
from pycocotools import coco
File "/usr/local/lib/python3.7/dist-packages/pycocotools/coco.py", line 52, in
from . import mask as maskUtils
File "/usr/local/lib/python3.7/dist-packages/pycocotools/mask.py", line 3, in
import pycocotools._mask as _mask
File "pycocotools/_mask.pyx", line 1, in init pycocotools._mask
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

tensorflow v1 GradientTape: AttributeError: 'NoneType' object has no attribute 'eval'

I want to compute the gradient of the distance between the NSynth WaveNet encoding of two sine waves.
This is tensorflow v1.
I am working with code based upon https://github.com/magenta/magenta/blob/master/magenta/models/nsynth/wavenet/fastgen.py
A minimal example of my bug is in this colab notebook: https://colab.research.google.com/drive/1oTEU8QAaOs0K1A0KHrAdt7kA7MkadNDr?usp=sharing
Here is the code:
# Commented out IPython magic to ensure Python compatibility.
# %tensorflow_version 1.x
!pip3 install -q magenta
!wget -c http://download.magenta.tensorflow.org/models/nsynth/wavenet-ckpt.tar && tar xvf wavenet-ckpt.tar
checkpoint_path = './wavenet-ckpt/model.ckpt-200000'
import math
from magenta.models.nsynth.wavenet import fastgen
import tensorflow as tf
session_config = tf.ConfigProto(allow_soft_placement=True)
session_config.gpu_options.allow_growth = True
sess = tf.Session(config=session_config)
pi = 3.1415926535897
SR = 16000
sample_length = 64000
DURATION_SECONDS = sample_length / SR
def sine(hz):
time = tf.linspace(0.0, DURATION_SECONDS, sample_length)
return tf.constant(0.5) * tf.cos(2.0 * pi * time * hz)
net = fastgen.load_nsynth(batch_size=2, sample_length=sample_length)
saver = tf.train.Saver()
saver.restore(sess, checkpoint_path)
"""We have two sine waves at 440 and 660 Hz. We use the encoder to generate two (125, 16) encodings:"""
twosines = tf.stack([sine(440), sine(660)]).eval(session=sess)
print(sess.run(net["encoding"], feed_dict={net["X"]: twosines}).shape)
"""Compute the distance between the two sine waves"""
distencode = tf.reduce_mean(tf.abs(net["encoding"][0] - net["encoding"][1]))
print(sess.run(distencode, feed_dict={net["X"]: twosines}))
"""I don't know why the following code doesn't work, but if I did I could solve the real task....
"""
net["X"] = twosines
distencode.eval(session=sess)
"""Here is the code that I need to work. I want to compute the gradient of the distance between the NSynth encoding of two sine waves:"""
fp = tf.constant(660.0)
newsines = tf.stack([sine(440), sine(fp)])
with tf.GradientTape() as g:
g.watch(fp)
dd_dfp = g.gradient(distencode, fp)
print(dd_dfp.eval(session=sess))
The last block, which I want to evaluate, gets the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-b5b8cdd00b24> in <module>()
4 g.watch(fp)
5 dd_dfp = g.gradient(distencode, fp)
----> 6 print(dd_dfp.eval(session=sess))
AttributeError: 'NoneType' object has no attribute 'eval'
I believe I need to define the operations to be executed within this block. However, I am using a pretrained model that I am just computing the distance over, so I am not sure how to define execution in that block.
The second-to-last block, which would help me fix the last block, gives the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-c3411dcbfa2c> in <module>()
3 with tf.GradientTape() as g:
4 g.watch(fp)
----> 5 dd_dfp = g.gradient(distencode, g)
6 print(dd_dfp.eval(session=sess))
/tensorflow-1.15.2/python3.6/tensorflow_core/python/eager/backprop.py in gradient(self, target, sources, output_gradients, unconnected_gradients)
997 flat_sources = [_handle_or_self(x) for x in flat_sources]
998 for t in flat_sources_raw:
--> 999 if not t.dtype.is_floating:
1000 logging.vlog(
1001 logging.WARN, "The dtype of the source tensor must be "
AttributeError: 'GradientTape' object has no attribute 'dtype'
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
1364 try:
-> 1365 return fn(*args)
1366 except errors.OpError as e:
8 frames
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: You must feed a value for placeholder tensor 'Placeholder' with dtype float and shape [2,64000]
[[{{node Placeholder}}]]
[[Mean/_759]]
(1) Invalid argument: You must feed a value for placeholder tensor 'Placeholder' with dtype float and shape [2,64000]
[[{{node Placeholder}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py in _do_call(self, fn, *args)
1382 '\nsession_config.graph_options.rewrite_options.'
1383 'disable_meta_optimizer = True')
-> 1384 raise type(e)(node_def, op, message)
1385
1386 def _extend_graph(self):
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: You must feed a value for placeholder tensor 'Placeholder' with dtype float and shape [2,64000]
[[node Placeholder (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
[[Mean/_759]]
(1) Invalid argument: You must feed a value for placeholder tensor 'Placeholder' with dtype float and shape [2,64000]
[[node Placeholder (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'Placeholder':
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 664, in launch_instance
app.start()
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py", line 499, in start
self.io_loop.start()
File "/usr/local/lib/python3.6/dist-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/usr/lib/python3.6/asyncio/base_events.py", line 438, in run_forever
self._run_once()
File "/usr/lib/python3.6/asyncio/base_events.py", line 1451, in _run_once
handle._run()
File "/usr/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 548, in <lambda>
self.io_loop.add_callback(lambda : self._handle_events(self.socket, 0))
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 462, in _handle_events
self._handle_recv()
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 492, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 444, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-5120c8282e75>", line 1, in <module>
net = fastgen.load_nsynth(batch_size=2, sample_length=sample_length)
File "/tensorflow-1.15.2/python3.6/magenta/models/nsynth/wavenet/fastgen.py", line 64, in load_nsynth
x = tf.placeholder(tf.float32, shape=[batch_size, sample_length])
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/array_ops.py", line 2619, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/gen_array_ops.py", line 6669, in placeholder
"Placeholder", dtype=dtype, shape=shape, name=name)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
Thank you.

no kernel image is available for execution on the device

I training maskrcnn ,use tf-1.2 can train, but I use tf-1.5 it not training
The error is as follows:
Caused by op u'pyramid_1/AssignGTBoxes/Where_6', defined at:
File "/home/zhouzd2/letrain/applications/letrain.py", line 349, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "/home/zhouzd2/letrain/applications/letrain.py", line 346, in main
LeTrain().model_train(user_mode)
File "/home/zhouzd2/letrain/platform/base_train.py", line 1228, in model_train
cluster=self.cluster_spec)
File "/home/zhouzd2/letrain/platform/deployment/model_deploy.py", line 226, in create_clones
outputs, feed_ops,verify_model_loss = model_fn(*args, **kwargs)
File "/home/zhouzd2/letrain/platform/base_train.py", line 1195, in clone_fn
model_loss, end_points, feed_ops = network_fn(data_direct, data_batch, int_network_fn)
File "/home/zhouzd2/letrain/applications/letrain.py", line 214, in get_loss
FLAGS.batch_size)
File "/home/zhouzd2/letrain/applications/fmrcnn/get_fmrcnn_loss.py", line 23, in model_fn
loss_weights=[0.2, 0.2, 1.0, 0.2, 1.0])
File "/home/zhouzd2/letrain/applications/fmrcnn/libs/nets/pyramid_network.py", line 580, in build
is_training=is_training, gt_boxes=gt_boxes)
File "/home/zhouzd2/letrain/applications/fmrcnn/libs/nets/pyramid_network.py", line 263, in build_heads
assign_boxes(rois, [rois, batch_inds], [2, 3, 4, 5])
File "/home/zhouzd2/letrain/applications/fmrcnn/libs/layers/wrapper.py", line 173, in assign_boxes
inds = tf.where(tf.equal(assigned_layers, l))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 2538, in where
return gen_array_ops.where(condition=condition, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 6087, in where
"Where", input=condition, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices. temp_storage_bytes: 1, status: no kernel image is available for execution on the device
[[Node: pyramid_1/AssignGTBoxes/Where_6 = Where[T=DT_BOOL, _device="/job:worker/replica:0/task:0/device:GPU:0"](pyramid_1/AssignGTBoxes/Equal_6_S9493)]]
[[Node: pyramid_1/AssignGTBoxes/Reshape_8_G1028 = _Recv[client_terminated=false, recv_device="/job:worker/replica:0/task:0/device:CPU:0", send_device="/job:worker/replica:0/task:0/device:GPU:0", send_device_incarnation=5407481677180697062, tensor_name="edge_1349_pyramid_1/AssignGTBoxes/Reshape_8", tensor_type=DT_INT64, _device="/job:worker/replica:0/task:0/device:CPU:0"]()]]
No problem when loading calculation graphs, error is reported in sess.run()。
Does anyone know how to solve this problem? Or does anyone know what function can replace tf.where?
Thank you!
If you are using Visual Studio:
Right click on the project > Properies > Cuda C/C++ > Device
and add the following to Code Generation field
compute_30,sm_30;compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;

Do the Tensorflow Silm assume TF version 1.4 for NASNet?

I try to train NASNet-A_Mobile_224 for two class classification by using train_image_classifier.py from slim with nasnet_mobile, However I get error
TypeError: separable_convolution2d() got an unexpected keyword argument 'data_format'
I suspect that the new NASNet requires TF version 1.4. Can somebody confirm this? I'm using Tensorflow 1.3.
More extensive error is given below:
Traceback (most recent call last):
File "train_image_classifier.py", line 574, in <module>
tf.app.run()
File "/home/sami/virenv/tensorflow_vanilla/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train_image_classifier.py", line 474, in main
clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
File "/home/sami/projects/Tools/models/research/slim/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "train_image_classifier.py", line 457, in clone_fn
logits, end_points = network_fn(images)
File "/home/sami/projects/Tools/models/research/slim/nets/nets_factory.py", line 135, in network_fn
return func(images, num_classes, is_training=is_training, **kwargs)
File "/home/sami/projects/Tools/models/research/slim/nets/nasnet/nasnet.py", line 371, in build_nasnet_mobile
final_endpoint=final_endpoint)
File "/home/sami/projects/Tools/models/research/slim/nets/nasnet/nasnet.py", line 450, in _build_nasnet_base
net, cell_outputs = stem()
File "/home/sami/projects/Tools/models/research/slim/nets/nasnet/nasnet.py", line 445, in <lambda>
stem = lambda: _imagenet_stem(images, hparams, stem_cell)
File "/home/sami/projects/Tools/models/research/slim/nets/nasnet/nasnet.py", line 264, in _imagenet_stem
cell_num=cell_num)
File "/home/sami/projects/Tools/models/research/slim/nets/nasnet/nasnet_utils.py", line 326, in __call__
stride, original_input_left)
File "/home/sami/projects/Tools/models/research/slim/nets/nasnet/nasnet_utils.py", line 352, in _apply_conv_operation
net = _stacked_separable_conv(net, stride, operation, filter_size)
File "/home/sami/projects/Tools/models/research/slim/nets/nasnet/nasnet_utils.py", line 183, in _stacked_separable_conv
stride=stride)
File "/home/sami/virenv/tensorflow_vanilla/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
TypeError: separable_convolution2d() got an unexpected keyword argument 'data_format'
YES,it must be tensorflow 1.4.0

Unimplemented Error: TensorArray has size zero

I am getting this weird error when trying to train a sequence to sequence model in tensorflow. The sequence to sequence model is a video captioning system. I have encoded the frames of the videos in sequence features of the SequenceExampleProto. After I prefetch the features containing the list of jpeg encoded strings, I decode them using the following function:
video = tf.map_fn(lambda x: tf.image.decode_jpeg(x, channels=3), encoded_video, dtype=tf.uint8)
The model compiles but during training time, I'm getting the following error which is caused by this code. The error says that the TensorArray is zero, whereas here the TensorArray should not be zero. Any help is appreciated:
tensorflow.python.framework.errors_impl.UnimplementedError: TensorArray has size zero, but element shape [?,?,3] is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
[[Node: input_fn/decode/map/TensorArrayStack/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:#input_fn/decode/map/TensorArray_1"], dtype=DT_UINT8, element_shape=[?,?,3], _device="/job:localhost/replica:0/task:0/cpu:0"](input_fn/decode/map/TensorArray_1, input_fn/decode/map/TensorArrayStack/range, input_fn/decode/map/while/Exit_1/_479)]]
Caused by op u'input_fn/decode/map/TensorArrayStack/TensorArrayGatherV3', defined at:
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/ubuntu/ASLNet/seq2seq/bin/train.py", line 277, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/ubuntu/ASLNet/seq2seq/bin/train.py", line 272, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 111, in run
return _execute_schedule(experiment, schedule)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 46, in _execute_schedule
return task()
File "seq2seq/contrib/experiment.py", line 104, in continuous_train_and_eval
monitors=self._train_monitors)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 281, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 430, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 925, in _train_model
features, labels = input_fn()
File "seq2seq/training/utils.py", line 274, in input_fn
frame_format="jpeg")
File "seq2seq/training/utils.py", line 365, in process_video
video = tf.map_fn(lambda x: tf.image.decode_jpeg(x, channels=3), encoded_video, dtype=tf.uint8)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/functional_ops.py", line 390, in map_fn
results_flat = [r.stack() for r in r_a]
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_ops.py", line 301, in stack
return self.gather(math_ops.range(0, self.size()), name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/tensor_array_ops.py", line 328, in gather
element_shape=element_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2244, in _tensor_array_gather_v3
element_shape=element_shape, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
UnimplementedError (see above for traceback): TensorArray has size zero, but element shape [?,?,3] is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
[[Node: input_fn/decode/map/TensorArrayStack/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:#input_fn/decode/map/TensorArray_1"], dtype=DT_UINT8, element_shape=[?,?,3], _device="/job:localhost/replica:0/task:0/cpu:0"](input_fn/decode/map/TensorArray_1, input_fn/decode/map/TensorArrayStack/range, input_fn/decode/map/while/Exit_1/_479)]]
Fixed. I followed the suggestion from tensorflow map_fn TensorArray has inconsistent shapes and implemented the following:
with tf.name_scope("decode", values=[encoded_video]):
input_jpeg_strings = tf.TensorArray(tf.string, video_length)
input_jpeg_strings = input_jpeg_strings.unstack(encoded_video)
init_array = tf.TensorArray(tf.float32, size=video_length)
def cond(i, ta):
return tf.less(i, video_length)
def body(i, ta):
image = input_jpeg_strings.read(i)
image = tf.image.decode_jpeg(image, 3, name='decode_image')
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
assert (resize_height > 0) == (resize_width > 0)
image = tf.image.resize_images(image, size=[resize_height, resize_width], method=tf.image.ResizeMethod.BILINEAR)
return i + 1, ta.write(i, image)
_, input_image = tf.while_loop(cond, body, [0, init_array])