Related
I'm training a multilabel classifier using tf.keras and horovod that has 14 classes. AucRoc is used as the metric to evaluate the performance of the classifier. I want to be able to use scikit learn's AucRoc calculator as mentioned here: How to compute Receiving Operating Characteristic (ROC) and AUC in keras?. If I feed the tensors as is for the following function:
def sci_auc_roc(y_true, y_pred):
return tf.py_func(roc_auc_score(y_true, y_pred), tf.double)
I get an error that looks like this:
/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/keras_applications/resnet50.py:265: UserWarning: The output shape of `ResNet50(include_top=False)` has been changed since Keras 2.2.0.
warnings.warn('The output shape of `ResNet50(include_top=False)` '
Traceback (most recent call last):
File "official_resnet_tf_1.12.0_auc.py", line 531, in <module>
main()
File "official_resnet_tf_1.12.0_auc.py", line 420, in main
model = chexnet_model(FLAGS)
File "official_resnet_tf_1.12.0_auc.py", line 375, in chexnet_model
metrics=[tf_auc_roc,sci_auc_roc])
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/base.py", line 474, in _method_wrapper
method(self, *args, **kwargs)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 648, in compile
sample_weights=self.sample_weights)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 313, in _handle_metrics
output, output_mask))
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 270, in _handle_per_output_metrics
y_true, y_pred, weights=weights, mask=mask)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 598, in weighted
score_array = fn(y_true, y_pred)
File "official_resnet_tf_1.12.0_auc.py", line 327, in sci_auc_roc
return tf.py_func(roc_auc_score(y_true, y_pred), tf.double)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/sklearn/metrics/ranking.py", line 349, in roc_auc_score
y_type = type_of_target(y_true)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/sklearn/utils/multiclass.py", line 243, in type_of_target
'got %r' % y)
ValueError: Expected array-like (array or non-string sequence), got <tf.Tensor 'dense_target:0' shape=(?, ?) dtype=float32>
I'm trying to convert tf tensors into a numpy array and then feed them to the roc_auc_score method like so:
def sci_auc_roc(y_true, y_pred):
with tf.Session() as sess:
y_true, y_pred = sess.run([y_true, y_pred])
return tf.py_func(roc_auc_score(y_true, y_pred), tf.double)
I get the following error:
warnings.warn('The output shape of `ResNet50(include_top=False)` '
Traceback (most recent call last):
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,256,256,3]
[[{{node input_1}} = Placeholder[dtype=DT_FLOAT, shape=[?,256,256,3], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
[[{{node dense_target/_5}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2237_dense_target", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "official_resnet_tf_1.12.0_auc.py", line 531, in <module>
main()
File "official_resnet_tf_1.12.0_auc.py", line 420, in main
model = chexnet_model(FLAGS)
File "official_resnet_tf_1.12.0_auc.py", line 375, in chexnet_model
metrics=[tf_auc_roc,sci_auc_roc])
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/training/checkpointable/base.py", line 474, in _method_wrapper
method(self, *args, **kwargs)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 648, in compile
sample_weights=self.sample_weights)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 313, in _handle_metrics
output, output_mask))
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 270, in _handle_per_output_metrics
y_true, y_pred, weights=weights, mask=mask)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 598, in weighted
score_array = fn(y_true, y_pred)
File "official_resnet_tf_1.12.0_auc.py", line 324, in sci_auc_roc
y_true, y_pred = sess.run([y_true, y_pred])
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,256,256,3]
[[node input_1 (defined at /mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/keras_applications/resnet50.py:214) = Placeholder[dtype=DT_FLOAT, shape=[?,256,256,3], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
[[{{node dense_target/_5}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2237_dense_target", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'input_1', defined at:
File "official_resnet_tf_1.12.0_auc.py", line 531, in <module>
main()
File "official_resnet_tf_1.12.0_auc.py", line 420, in main
model = chexnet_model(FLAGS)
File "official_resnet_tf_1.12.0_auc.py", line 339, in chexnet_model
input_shape=(FLAGS.image_size, FLAGS.image_size, 3))
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/applications/__init__.py", line 70, in wrapper
return base_fun(*args, **kwargs)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/applications/resnet50.py", line 32, in ResNet50
return resnet50.ResNet50(*args, **kwargs)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/keras_applications/resnet50.py", line 214, in ResNet50
img_input = layers.Input(shape=input_shape)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/input_layer.py", line 229, in Input
input_tensor=tensor)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/keras/engine/input_layer.py", line 112, in __init__
name=self.name)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1747, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5206, in placeholder
"Placeholder", dtype=dtype, shape=shape, name=name)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input_1' with dtype float and shape [?,256,256,3]
[[node input_1 (defined at /mnt/lustrefs/rakvee/miniconda3/envs/docker_pip2/lib/python3.6/site-packages/keras_applications/resnet50.py:214) = Placeholder[dtype=DT_FLOAT, shape=[?,256,256,3], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
[[{{node dense_target/_5}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2237_dense_target", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[52342,1],0]
Exit code: 1
--------------------------------------------------------------------------
I've also tried tensorflow's https://www.tensorflow.org/api_docs/python/tf/metrics/auc like so:
def tf_auc_roc(y_true, y_pred):
auc = tf.metrics.auc(y_true, y_pred)[1]
K.get_session().run(tf.local_variables_initializer())
return auc
It works just fine. However, it gives me a single number for aucroc. I wonder what that number represents, is it an average aucroc value for all the 14 classes? or max aucscores of all the classes? or how does it get to a single number?
1216/1216 [==============================] - 413s 340ms/step - loss: 0.1513 - tf_auc_roc: 0.7944 - val_loss: 0.2212 - val_tf_auc_roc: 0.8074
Epoch 2/15
582/1216 [=============>................] - ETA: 3:16 - loss: 0.1459 - tf_auc_roc: 0.8053
1) How do I fix the error with roc_auc_score?
2) What does that single number represent?
I think that the result of a metric should be a single tensor value that represents the average of the results as described here in the Keras documentation (which I find is the better documentation than that from TensorFlow).
You could instead use a custom callback to achieve your desired result, most probably you would want to write to disc the result on_epoch_end
I have a Seq2Seq model. I am interested to print out the matrix value of the output of the encoder per iteration.
So for example as the dimension of the matrix in the encoder is (?,20) and the epoch =5 and in each epoch, there are 10 iteration,
I would like to see 10 matrix of the dimension (?,20) per epoch.
I have gone to several links as here but it still does not print out the value matrix.
With this code as mentioned in the aboved link:
import keras.backend as K
k_value = K.print_tensor(encoded)
print(k_value)
I got:
Tensor("Print:0", shape=(?, 20), dtype=float32)
Is there any straightforward way of showing the tensor value of each layer in Keras?
Update 1
by trying this code: K_value = K.eval(encoded) it raises this error:
Traceback (most recent call last):
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'input' with dtype float and shape [?,45,50]
[[Node: input = Placeholder[dtype=DT_FLOAT, shape=[?,45,50], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
[[Node: encoder_lstm/add_16/_25 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_460_encoder_lstm/add_16", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sgnbx/Downloads/projects/LSTM_autoencoder/justfun.py", line 121, in <module>
k_value = K.eval(encoded)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 671, in eval
return to_dense(x).eval(session=get_session())
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 680, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 4951, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'input' with dtype float and shape [?,45,50]
[[Node: input = Placeholder[dtype=DT_FLOAT, shape=[?,45,50], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
[[Node: encoder_lstm/add_16/_25 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_460_encoder_lstm/add_16", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'input', defined at:
File "/home/sgnbx/Downloads/projects/LSTM_autoencoder/justfun.py", line 113, in <module>
inputs = Input(shape=(SEQUENCE_LEN, EMBED_SIZE), name="input")
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/keras/engine/input_layer.py", line 177, in Input
input_tensor=tensor)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/keras/engine/input_layer.py", line 86, in __init__
name=self.name)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 515, in placeholder
x = tf.placeholder(dtype, shape=shape, name=name)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1735, in placeholder
return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 4925, in placeholder
"Placeholder", dtype=dtype, shape=shape, name=name)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input' with dtype float and shape [?,45,50]
[[Node: input = Placeholder[dtype=DT_FLOAT, shape=[?,45,50], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
[[Node: encoder_lstm/add_16/_25 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_460_encoder_lstm/add_16", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x7fd900525c50>>
Traceback (most recent call last):
File "/home/sgnbx/anaconda3/envs/py3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 686, in __del__
TypeError: 'NoneType' object is not callable
Process finished with exit code 1
Very simple way to print a tensor :
from keras import backend as K
k_value = K.eval(tensor)
print(k_value)
UPDATE 1
Create a callback to print at the end of each epoch :
class callback(Callback):
def __init__(self, model, X_train):
self.model = model
self.x = X_train
def on_train_begin(self, logs={}):
return
def on_train_end(self, logs={}):
return
def on_epoch_begin(self, epoch, logs={}):
return
def on_epoch_end(self, epoch, logs={}):
inp = model.input # input placeholder
outputs = model.layers[N].output # get output of N's layer
functors = K.function([inp, K.learning_phase()], [outputs])
layer_outs = functors([self.x, 1.])
print('\r OUTPUT TENSOR : %s' % layer_outs)
return
def on_batch_begin(self, batch, logs={}):
return
def on_batch_end(self, batch, logs={}):
return
Call this function in your fit() method like that :
callbacks=[callback(model = model, X_train = X_train)])
Inspired from Keras, How to get the output of each layer?
Hope this will finally help you !
I am trying to train a SSD Lite + MobileNetv2 using the model_main.py in the tensorflow/models/research/objectdetection but I'm getting the following error
Assign requires shapes of both tensors to match. lhs shape= [1,1,256,256] rhs shape= [1,1,1280,256] [[Node: save/Assign_348 = Assign[T=DT_FLOAT, _class=["loc:#FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_2_1x1_256/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_2_1x1_256/weights, save/RestoreV2:348)]]
The full log is given here,
python E:\Documents\Projects\tensorflow\models\research\object_detection\model_main.py --alsologtostderr --pipeline_config_path=experiments/training_/ssdlite_mobilenet_v2_coco.config --model_dir=experiments/training_/ --num_train_steps=50000 --NUM_EVAL_STEPS=2000
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W1123 09:11:18.686478 7432 tf_logging.py:125] Forced number of epochs for all eval validations to be 1.
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1.
W1123 09:11:18.687448 7432 tf_logging.py:125] Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1.
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x000001E641D31268>) includes params argument, but params are not passed to Estimator.
W1123 09:11:18.688472 7432 tf_logging.py:125] Estimator's model_fn (<function create_model_fn..model_fn at 0x000001E641D31268>) includes params argument, but params are not passed to Estimator.
2018-11-23 09:11:23.084879: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-11-23 09:11:23.365711: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1392] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:01:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2018-11-23 09:11:23.372771: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1471] Adding visible gpu devices: 0
2018-11-23 09:11:24.001841: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-23 09:11:24.004967: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:958] 0
2018-11-23 09:11:24.007058: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0: N
2018-11-23 09:11:24.009175: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4741 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\client\session.py", line 1322, in _do_call
return fn(*args)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\client\session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\client\session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,256,256] rhs shape= [1,1,1280,256]
[[Node: save/Assign_348 = Assign[T=DT_FLOAT, _class=["loc:#FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_2_1x1_256/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_2_1x1_256/weights, save/RestoreV2:348)]]
[[Node: save/RestoreV2/_599 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_728_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\Documents\Projects\tensorflow\models\research\object_detection\model_main.py", line 109, in
tf.app.run()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "E:\Documents\Projects\tensorflow\models\research\object_detection\model_main.py", line 105, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\training.py", line 447, in train_and_evaluate
return executor.run()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\training.py", line 531, in run
return self.run_local()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\training.py", line 681, in run_local
eval_result, export_results = evaluator.evaluate_and_export()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\training.py", line 886, in evaluate_and_export
hooks=self._eval_spec.hooks)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\estimator.py", line 460, in evaluate
output_dir=self.eval_dir(name))
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1386, in _evaluate_run
config=self._session_config)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\evaluation.py", line 209, in _evaluate_once
session_creator=session_creator, hooks=hooks) as session:
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 826, in init
stop_grace_period_secs=stop_grace_period_secs)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 549, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1012, in init
_WrappedSession.init(self, self._create_session())
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1017, in _create_session
return self._sess_creator.create_session()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 706, in create_session
self.tf_sess = self._session_creator.create_session()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 477, in create_session
init_fn=self._scaffold.init_fn)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\session_manager.py", line 281, in prepare_session
config=config)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\session_manager.py", line 195, in _restore_checkpoint
saver.restore(sess, checkpoint_filename_with_path)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\saver.py", line 1752, in restore
{self.saver_def.filename_tensor_name: save_path})
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\client\session.py", line 900, in run
run_metadata_ptr)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\client\session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\client\session.py", line 1316, in _do_run
run_metadata)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\client\session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,256,256] rhs shape= [1,1,1280,256]
[[Node: save/Assign_348 = Assign[T=DT_FLOAT, _class=["loc:#FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_2_1x1_256/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_2_1x1_256/weights, save/RestoreV2:348)]]
[[Node: save/RestoreV2/_599 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_728_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Caused by op 'save/Assign_348', defined at:
File "E:\Documents\Projects\tensorflow\models\research\object_detection\model_main.py", line 109, in
tf.app.run()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "E:\Documents\Projects\tensorflow\models\research\object_detection\model_main.py", line 105, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\training.py", line 447, in train_and_evaluate
return executor.run()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\training.py", line 531, in run
return self.run_local()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\training.py", line 681, in run_local
eval_result, export_results = evaluator.evaluate_and_export()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\training.py", line 886, in evaluate_and_export
hooks=self._eval_spec.hooks)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\estimator.py", line 460, in evaluate
output_dir=self.eval_dir(name))
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1386, in _evaluate_run
config=self._session_config)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\evaluation.py", line 209, in _evaluate_once
session_creator=session_creator, hooks=hooks) as session:
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 826, in init
stop_grace_period_secs=stop_grace_period_secs)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 549, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1012, in init
_WrappedSession.init(self, self._create_session())
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1017, in _create_session
return self._sess_creator.create_session()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 706, in create_session
self.tf_sess = self._session_creator.create_session()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 468, in create_session
self._scaffold.finalize()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\monitored_session.py", line 212, in finalize
self._saver = training_saver._get_saver_or_default() # pylint: disable=protected-access
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\saver.py", line 856, in _get_saver_or_default
saver = Saver(sharded=True, allow_empty=True)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\saver.py", line 1284, in init
self.build()
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\saver.py", line 1296, in build
self._build(self._filename, build_save=True, build_restore=True)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\saver.py", line 1333, in _build
build_save=build_save, build_restore=build_restore)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\saver.py", line 775, in _build_internal
restore_sequentially, reshape)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\saver.py", line 453, in _AddShardedRestoreOps
name="restore_shard"))
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\saver.py", line 422, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\training\saver.py", line 113, in restore
self.op.get_shape().is_fully_defined())
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\ops\state_ops.py", line 219, in assign
validate_shape=validate_shape)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 63, in assign
use_locking=use_locking, name=name)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\framework\ops.py", line 3414, in create_op
op_def=op_def)
File "D:\ProgramData\Anaconda3\envs\tfod\lib\site-packages\tensorflow\python\framework\ops.py", line 1740, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,256,256] rhs shape= [1,1,1280,256]
[[Node: save/Assign_348 = Assign[T=DT_FLOAT, _class=["loc:#FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_2_1x1_256/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](FeatureExtractor/MobilenetV2/layer_19_1_Conv2d_2_1x1_256/weights, save/RestoreV2:348)]]
[[Node: save/RestoreV2/_599 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_728_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
My config can be found here
Also note that I originally posted it on Tensorflow/models/issues but the team suggested that I post here on SO
The error disappeared after I upgraded to TF 1.12 and cuDNN 7.3 for my CUDA v9
This is the command, customise it according to the parameters used in training.
python deeplab/export_model.py --checkpoint_path=/code/models/research/deeplab/weights_input_level_17/model.ckpt-22000 --export_path=/code/models/research/deeplab/frozen_weights_level_17/frozen_inference_graph.pb --model_variant="xception_65" --atrous_rates=6 --atrous_rates=12 --atrous_rates=18 --output_stride=16 --crop_size=2048 --crop_size=2048 --num_classes=3
I trained my model to segment bridges and followed the pascal dataset format. So ideally I had only one class but since we have two default classes 1. Background 2.Ignore class and 3.Bridge so my total classes become 3.
Hey, Guys please use this config to export your deeplabv3plus mode. It worked for me.
Your error is because of the dimensions in your image, you should make them equal as the ones you used for training
So I try to use multiple GPUs with Keras. When I run training_utils.py with the example program (given as comments inside the training_utils.py code), I end up with ResourceExhaustedError. nvidia-smi tells me that barely one of the four GPUs are working. Using one GPU works fine for other programs.
TensorFlow 1.3.0
Keras 2.0.8
Ubuntu 16.04
CUDA/cuDNN 8.0/6.0
Question: Anyone have any idea whats going on here?
Console output:
(...)
2017-10-26 14:39:02.086838: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ***************************************************************************************************x
2017-10-26 14:39:02.086857: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[128,55,55,256]
Traceback (most recent call last):
File "test.py", line 27, in
parallel_model.fit(x, y, epochs=20, batch_size=256)
File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/keras/engine/training.py", line 1631, in fit
validation_steps=validation_steps)
File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/keras/engine/training.py", line 1213, in _fit_loop
outs = f(ins_batch)
File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2331, in call
**self.session_kwargs)
File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[128,55,55,256]
[[Node: replica_1/xception/block3_sepconv2/separable_conv2d = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:1"](replica_1/xception/block3_sepconv2/separable_conv2d/depthwise, block3_sepconv2/pointwise_kernel/read/_2103)]]
[[Node: training/RMSprop/gradients/replica_0/xception/block10_sepconv2/separable_conv2d_grad/Conv2DBackpropFilter/_4511 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_25380_training/RMSprop/gradients/replica_0/xception/block10_sepconv2/separable_conv2d_grad/Conv2DBackpropFilter", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]
Caused by op u'replica_1/xception/block3_sepconv2/separable_conv2d',
defined at: File "test.py", line 19, in
parallel_model = multi_gpu_model(model, gpus=2) File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/keras/utils/training_utils.py",
line 143, in multi_gpu_model
outputs = model(inputs) File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/keras/engine/topology.py",
line 603, in call
output = self.call(inputs, **kwargs) File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/keras/engine/topology.py",
line 2061, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks) File
"/home/kyb/tensorflow/local/lib/python2.7/site-packages/keras/engine/topology.py",
line 2212, in run_internal_graph
output_tensors = _to_list(layer.call(computed_tensor, **kwargs)) File
"/home/kyb/tensorflow/local/lib/python2.7/site-packages/keras/layers/convolutional.py",
line 1221, in call
dilation_rate=self.dilation_rate) File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py",
line 3279, in separable_conv2d
data_format=tf_data_format) File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_impl.py",
line 497, in separable_conv2d
name=name) File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py",
line 397, in conv2d
data_format=data_format, name=name) File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py",
line 767, in apply_op
op_def=op_def) File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py",
line 2630, in create_op
original_op=self._default_original_op, op_def=op_def) File "/home/kyb/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py",
line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating
tensor with shape[128,55,55,256] [[Node:
replica_1/xception/block3_sepconv2/separable_conv2d =
Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1,
1, 1], use_cudnn_on_gpu=true,
_device="/job:localhost/replica:0/task:0/gpu:1"](replica_1/xception/block3_sepconv2/separable_conv2d/depthwise,
block3_sepconv2/pointwise_kernel/read/_2103)]] [[Node:
training/RMSprop/gradients/replica_0/xception/block10_sepconv2/separable_conv2d_grad/Conv2DBackpropFilter/_4511
= _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0",
send_device="/job:localhost/replica:0/task:0/gpu:0",
send_device_incarnation=1,
tensor_name="edge_25380_training/RMSprop/gradients/replica_0/xception/block10_sepconv2/separable_conv2d_grad/Conv2DBackpropFilter",
tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/cpu:0"]]
EDIT (Added example code):
import tensorflow as tf
from keras.applications import Xception
from keras.utils import multi_gpu_model
import numpy as np
num_samples = 1000
height = 224
width = 224
num_classes = 100
with tf.device('/cpu:0'):
model = Xception(weights=None,
input_shape=(height, width, 3),
classes=num_classes)
parallel_model = multi_gpu_model(model, gpus=4)
parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
x = np.random.random((num_samples, height, width, 3))
y = np.random.random((num_samples, num_classes))
parallel_model.fit(x, y, epochs=20, batch_size=128)
When encountering OOM/ResourceExhaustedError on GPU I believe changing (Reducing) batch size is the right option to try at first.
For different GPU you may need different batch size based on the GPU
memory you have.
Recently I faced the similar type of problem, tweaked a lot to do the different type of experiment.
Here is the link to the question (also some tricks are included).
However, while reducing the size of the batch you may find that your training gets slower.
I get the unexpected error "You must feed a value for placeholder tensor 'input_1' with dtype float" when training the discriminator of a GAN
here the error:
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: You must feed a value for placeholder tensor 'input_1' with dtype float
[[Node: input_1 = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: You must feed a value for placeholder tensor 'input_1' with dtype float
[[Node: input_1 = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "new_model.py", line 204, in <module>
main()
File "new_model.py", line 201, in main
train(nb_epoch=10, BATCH_SIZE=5)
File "new_model.py", line 176, in train
d_loss = discriminator.train_on_batch(image_to_dis, label_to_dis)
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 766, in train_on_batch
class_weight=class_weight)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1320, in train_on_batch
outputs = self.train_function(ins)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1943, in __call__
feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'input_1' with dtype float
[[Node: input_1 = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
[[Node: moments_4/sufficient_statistics/Shape/_217 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1267_moments_4/sufficient_statistics/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'input_1', defined at:
File "new_model.py", line 204, in <module>
main()
File "new_model.py", line 201, in main
train(nb_epoch=10, BATCH_SIZE=5)
File "new_model.py", line 134, in train
transformer0 = transform_model()
File "new_model.py", line 22, in transform_model
inputs = Input(shape=( 128, 128, 3))
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 1198, in Input
input_tensor=tensor)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 1116, in __init__
name=self.name)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 321, in placeholder
x = tf.placeholder(dtype, shape=shape, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1587, in placeholder
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 2043, in _placeholder
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'input_1' with dtype float
[[Node: input_1 = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
[[Node: moments_4/sufficient_statistics/Shape/_217 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_1267_moments_4/sufficient_statistics/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
it seems the error happens at
d_loss = discriminator.train_on_batch(image_to_dis, label_to_dis)
I'm sure image_to_dis and label_to_dis fit the input of dicriminator
however, the error message here
Caused by op u'input_1', defined at:
File "new_model.py", line 204, in <module>
main()
File "new_model.py", line 201, in main
train(nb_epoch=10, BATCH_SIZE=5)
File "new_model.py", line 134, in train
transformer0 = transform_model()
File "new_model.py", line 22, in transform_model
inputs = Input(shape=( 128, 128, 3))
it says the error is caused by the input tensor of 'transformer'(it is the generator in this GAN).
my code contains something like 'transformer_with_discriminator = discriminator(transformer)', but the discriminator is compiled without the transformer. I think training the discriminator has nothing to do with the input of 'transformer0'
the whole script is a little long, may I put the link of my model here?
https://github.com/wkcw/keras-face-attribute/blob/master/model%26train.py
image_to_dis.dtype and label_to_dis.dtype are both float32, and I've tried to convert label_to_dis.dtype to int
I really have no idea about this......
It comes from the batchnormalization. You can see here : https://stackoverflow.com/a/42470757/7137636 how to fix this issue.
If you need more info, ask in comments :)