ValueError: An operation has `None` for gradient. while implementing custom loss function in Keras - tensorflow

I'm trying to implement the following custom loss function from this SO post; however, I've had to make some minor changes to suit my model. For some context, I'm using multi labels with 5 classes (below is an example of how they're encoded).
0 => [1, 0, 0, 0, 0]
1 => [1, 1, 0, 0, 0]
2 => [1, 1, 1, 0, 0]
3 => [1, 1, 1, 1, 0]
4 => [1, 1, 1, 1, 1]
My custom loss function
def _cohen_kappa(y_true, y_pred, num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
kappa, update_op = tf.contrib.metrics.cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
kappa = K.cast(kappa, 'float32')
with tf.control_dependencies([update_op]):
kappa = tf.identity(kappa)
return kappa
def cohen_kappa_loss(num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
def cohen_kappa(y_true, y_pred):
y_true = K.cast(y_true, 'int32')
y_pred = K.cast(y_pred + 0.5, 'int32')
y_true = tf.subtract(K.sum(y_true, axis=1), tf.constant(1))
y_pred = tf.subtract(K.sum(y_pred, axis=1), tf.constant(1))
return -_cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
return cohen_kappa
This is how I'm attempting to use my loss function:
model_cohen_kappa = cohen_kappa_loss(num_classes=5)
optimizer=optimizers.SGD(lr=0.0001, momentum=0.9),
Unfortunately, I get the following error, which is confusing since my loss function doesn't contain K.argmax, K.round, K.eval., which are mentioned in the error message as operations that are non-differentiable. Is there another non-differentiable operation in my custom loss function that I'm not noticing that is giving me this error?
Traceback (most recent call last):
File "", line 106, in <module>
File "", line 101, in main
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\legacy\", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\", line 1418, in fit_generator
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\", line 40, in fit_generator
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\", line 509, in _make_train_function
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\legacy\", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\", line 184, in get_updates
grads = self.get_gradients(loss, params)
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\", line 91, in get_gradients
raise ValueError('An operation has `None` for gradient. '
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
While I suspect K.cast is non-differentiable, removing the below snippet from my loss function results in the following error:
kappa = K.cast(kappa, 'float32')
Traceback (most recent call last):
File "", line 106, in <module>
File "", line 91, in main
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\", line 342, in compile
sample_weight, mask)
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\", line 421, in weighted
score_array *= weights
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\", line 884, in binary_op_wrapper
return func(x, y, name=name)
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\", line 1180, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\", line 6879, in mul
"Mul", x=x, y=y, name=name)
File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\framework\", line 563, in _apply_op_helper
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float64 of argument 'x'.


keras tape.gradient error: Input to reshape is a tensor with 1012 values, but the requested shape has 20240 [Op:Reshape]

i use the tape.gradient(g_loss, aa_mutator.trainable_variables) to calculate the gradient of a model called aa_mutator and got the error
File "/home/tialan/Data/gan/code/", line 297, in <module>
grads_g = tape.gradient(g_loss, aa_mutator.trainable_variables)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/eager/", line 1086, in gradient
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/eager/", line 77, in imperative_grad
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/eager/", line 162, in _gradient_function
return grad_fn(mock_op, *out_grads)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/ops/", line 782, in _ReshapeGrad
_IndexedSlicesToTensorNoWarning(grad), array_ops.shape(op.inputs[0])),
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/util/", line 201, in wrapper
return target(*args, **kwargs)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/ops/", line 195, in reshape
result = gen_array_ops.reshape(tensor, shape, name)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/ops/", line 8368, in reshape
_ops.raise_from_not_ok_status(e, name)
File "/home/tialan/tf/lib/python3.7/site-packages/tensorflow/python/framework/", line 6862, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 1012 values, but the requested shape has 20240 [Op:Reshape]
within the aa_mutator model i build a customized keras layer
class mutate_func_layer(layers.Layer):
def __init__(self):
super(mutate_func_layer, self).__init__()
def call(self, inputs):
Mut_pos_layer_out, input_pre, Mutation_3 = inputs
where = where_func(Mut_pos_layer_out)
return mutate_func(Mutation_3, where, input_pre)
with mutate_func defined as
def mutate_func(x, where, input_pre):#(Mutation_3, where, input_pre): ## x = mutation 3
aa_aft = gather_nd_func(x, where)
aa_aft = K.argmax(aa_aft, axis=-1)
aa_aft = tf.reshape(aa_aft, [-1])
aa_aft = tf.cast(aa_aft, dtype=tf.float32)
aa_seq_out = tf.tensor_scatter_nd_update(input_pre, [where], [aa_aft])
def grad(upstream):
return upstream*1, upstream*1, upstream*1
the shape for layers in the mutate_func are printed as
(1, 1012, 20)
(675, 20)
(1, 1012)
the model is able to predict given the input. just for fitting the error shows at the stage of tape.gradient. is the error raised due to the customized layer? Thanks for any help or suggestion

How can I integrate Optuna with Deepspeech training?

I'm trying to integrate Optuna with DeepSpeech in order to optimise some of its hyperparameters. I'm sticking to learning rate for now, just to get a feel for how Optuna works, but I've hit a roadblock and need some help.
I have a function hps_train which is what does the training step. It takes the Optuna trial object as the argument and returns the dev loss, which is what I want to use Optuna to minimise. This is the exact same function as train() in training/deepspeech_training/, but with a few modifications:
def hps_train(trial):
#.Same as train() in
if FLAGS.horovod:
# Effective batch size in synchronous distributed training is scaled by the number of workers. An increase in learning rate compensates for the increased batch size.
optimizer = hps_create_optimizer(learning_rate_var * hvd.size())
optimizer = hvd.DistributedOptimizer(optimizer)
optimizer, learning_rate_var = hps_create_optimizer(trial)
reduce_learning_rate_op = learning_rate_var.assign(
tf.multiply(learning_rate_var, FLAGS.plateau_reduction)
#.Same as train()
with tfv1.Session(config=Config.session_config) as session:
#.Same as train()
final_dev_loss = dev_losses[-1]
log_debug("Session closed.")
return final_dev_loss
I also have some helper functions:
def hps_create_optimizer(trial):
learning_rate = trial.suggest_float("adam_lr", 1e-5, 1e-1, log=True)
with tf.variable_scope("learning_rate", reuse=tf.AUTO_REUSE):
learning_rate_var = tfv1.get_variable(
"learning_rate", initializer=learning_rate, trainable=False
optimizer = tfv1.train.AdamOptimizer(
learning_rate=learning_rate_var, beta1=0.9, beta2=0.999, epsilon=1e-08
return optimizer, learning_rate_var
def new_trial_callback(study, trial):
chkpt_path = setup_dirs(study.study_name, trial.number + 1)
FLAGS.checkpoint_dir = chkpt_path
FLAGS.save_checkpoint_dir = chkpt_path
FLAGS.load_checkpoint_dir = chkpt_path
def objective(trial, session):
if FLAGS.train_files:
val_loss = hps_train(trial, session)
return float(val_loss)
def objective_tf(trial):
with tfv1.Graph().as_default():
return objective(trial, session)
Putting it all together:
def main(_):
lr_study = optuna.create_study(study_name="lr_study", direction='minimize')
chkpt_dir = setup_dirs(lr_study.study_name, 0)
FLAGS.checkpoint_dir = chkpt_dir
FLAGS.save_checkpoint_dir = chkpt_dir
FLAGS.load_checkpoint_dir = chkpt_dir
lr_study.optimize(objective_tf, n_trials=25, callbacks=[new_trial_callback])
When I run this code, the first run completes normally. However, when it tries to start the second one, I get an error:
$ python training/ --train_files ~/datasets/cv-corpus-1/en/clips/train.csv --dev_files ~/datasets/cv-corpus-1/en/clips/dev.csv --test_files ~/datasets/cv-corpus-1/en/clips/test.csv --train_batch_size 64 --test_batch_size 64 --dev_batch_size 64 --n_hidden 512 --epochs 1 --train_cudnn --use_allow_growth --checkpoint_dir checkpoints
[I 2021-08-30 15:06:16,637] A new study created in memory with name: lr_study
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:17 | Steps: 187 | Loss: 252.374135
Epoch 0 | Validation | Elapsed Time: 0:00:12 | Steps: 109 | Loss: 255.176724 | Dataset: /home/user/datasets/cv-corpus-1/en/clips/dev.csv
I Saved new best validating model with loss 255.176724 to: checkpoints/optuna_trials/lr_study/0/best_dev-187
I FINISHED optimization in 0:00:30.553797
[I 2021-08-30 15:06:50,101] Trial 0 finished with value: 255.1767243551552 and parameters: {'adam_lr': 0.006636434104761772}. Best is trial 0 with value: 255.1767243551552.
[W 2021-08-30 15:06:50,229] Trial 1 failed because of the following error: ValueError('in converted code:\n relative to /usr/local/lib/python3.6/dist-packages/tensorflow_core:\n\n contrib/cudnn_rnn/python/layers/ call\n training)\n contrib/cudnn_rnn/python/layers/ _forward\n seed=self._seed)\n contrib/cudnn_rnn/python/ops/ _cudnn_rnn\n outputs, output_h, output_c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv3(**args)\n python/ops/ cudnn_rnnv3\n time_major=time_major, name=name)\n python/framework/ _apply_op_helper\n g = ops._get_graph_from_inputs(_Flatten(keywords.values()))\n python/framework/ _get_graph_from_inputs\n _assert_same_graph(original_graph_element, graph_element)\n python/framework/ _assert_same_graph\n (item, original_item))\n\n ValueError: Tensor("cudnn_lstm/opaque_kernel:0", dtype=float32_ref, device=/device:GPU:0) must be from the same graph as Tensor("tower_0/Reshape_2:0", shape=(?, ?, 512), dtype=float32, device=/device:GPU:0).\n',)
Traceback (most recent call last):
File "/home/user/.local/lib/python3.6/site-packages/optuna/study/", line 213, in _run_trial
value_or_values = func(trial)
File "training/", line 671, in objective_tf
return objective(trial)
File "training/", line 660, in objective
val_loss = hps_train(trial)
File "training/", line 332, in hps_train
iterator, optimizer, dropout_rates
File "/home/user/DeepSpeech/training/deepspeech_training/", line 317, in get_tower_results
avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
File "/home/user/DeepSpeech/training/deepspeech_training/", line 244, in calculate_mean_edit_distance_and_loss
logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
File "/home/user/DeepSpeech/training/deepspeech_training/", line 195, in create_model
output, output_state = rnn_impl(layer_3, seq_length, previous_state, reuse)
File "/home/user/DeepSpeech/training/deepspeech_training/", line 133, in rnn_impl_cudnn_rnn
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/layers/", line 548, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/", line 854, in __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/", line 237, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in converted code:
relative to /usr/local/lib/python3.6/dist-packages/tensorflow_core:
contrib/cudnn_rnn/python/layers/ call
contrib/cudnn_rnn/python/layers/ _forward
contrib/cudnn_rnn/python/ops/ _cudnn_rnn
outputs, output_h, output_c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv3(**args)
python/ops/ cudnn_rnnv3
time_major=time_major, name=name)
python/framework/ _apply_op_helper
g = ops._get_graph_from_inputs(_Flatten(keywords.values()))
python/framework/ _get_graph_from_inputs
_assert_same_graph(original_graph_element, graph_element)
python/framework/ _assert_same_graph
(item, original_item))
ValueError: Tensor("cudnn_lstm/opaque_kernel:0", dtype=float32_ref, device=/device:GPU:0) must be from the same graph as Tensor("tower_0/Reshape_2:0", shape=(?, ?, 512), dtype=float32, device=/device:GPU:0).
Traceback (most recent call last):
File "training/", line 691, in <module>
File "/usr/local/lib/python3.6/dist-packages/absl/", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/", line 251, in _run_main
File "training/", line 684, in main
lr_study.optimize(objective_tf, n_trials=25, callbacks=[new_trial_callback])
File "/home/user/.local/lib/python3.6/site-packages/optuna/study/", line 409, in optimize
File "/home/user/.local/lib/python3.6/site-packages/optuna/study/", line 76, in _optimize
File "/home/user/.local/lib/python3.6/site-packages/optuna/study/", line 163, in _optimize_sequential
trial = _run_trial(study, func, catch)
File "/home/user/.local/lib/python3.6/site-packages/optuna/study/", line 264, in _run_trial
raise func_err
File "/home/user/.local/lib/python3.6/site-packages/optuna/study/", line 213, in _run_trial
value_or_values = func(trial)
File "training/", line 671, in objective_tf
return objective(trial)
File "training/", line 660, in objective
val_loss = hps_train(trial)
File "training/", line 332, in hps_train
iterator, optimizer, dropout_rates
File "/home/user/DeepSpeech/training/deepspeech_training/", line 317, in get_tower_results
avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
File "/home/user/DeepSpeech/training/deepspeech_training/", line 244, in calculate_mean_edit_distance_and_loss
logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
File "/home/user/DeepSpeech/training/deepspeech_training/", line 195, in create_model
output, output_state = rnn_impl(layer_3, seq_length, previous_state, reuse)
File "/home/user/DeepSpeech/training/deepspeech_training/", line 133, in rnn_impl_cudnn_rnn
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/layers/", line 548, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/", line 854, in __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/", line 237, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in converted code:
relative to /usr/local/lib/python3.6/dist-packages/tensorflow_core:
contrib/cudnn_rnn/python/layers/ call
contrib/cudnn_rnn/python/layers/ _forward
contrib/cudnn_rnn/python/ops/ _cudnn_rnn
outputs, output_h, output_c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv3(**args)
python/ops/ cudnn_rnnv3
time_major=time_major, name=name)
python/framework/ _apply_op_helper
g = ops._get_graph_from_inputs(_Flatten(keywords.values()))
python/framework/ _get_graph_from_inputs
_assert_same_graph(original_graph_element, graph_element)
python/framework/ _assert_same_graph
(item, original_item))
ValueError: Tensor("cudnn_lstm/opaque_kernel:0", dtype=float32_ref, device=/device:GPU:0) must be from the same graph as Tensor("tower_0/Reshape_2:0", shape=(?, ?, 512), dtype=float32, device=/device:GPU:0).
It looks like the ValueError is complaining that some tensor is not from the same graph as another. But I don't understand how this can be, since I start each run within a new Graph context, so every tensor should be associated with this new graph.
Optuna version is 2.9.1 and Tensorflow version is 1.15.4
I'd be grateful for any insights into where I'm going wrong here, or even if this is the recommended way to use Optuna. Thanks very much!

Why TensorFlow throws this exception when loading a model that was normalized like this?

All latest versions from the very moment of this post.
tensorflow-gpu: 2.6.0
Python: 3.9.7
CUDA: 11.4.2
cuDNN: 8.2.4
As in the code below, when loading a model that was normalized by not passing arguments to Normalization() it throws an exception when that model is loaded by load_model(), however before loading the model I can use it without any apparent issues which makes you think it's all good since Normalization() did NOT complain and took care of the input shape. When loading a model that was normalized by Normalization(input_dim=5) it does NOT thrown any exception since a known shape is specified. That is weird I mean it should warn you that when normalizing it without passing arguments to Normalization() you should expect an exception when loading it.
I'm not sure if it's a bug so I'm posting it here before reporting a bug in the github section, maybe I'm missing to setup something.
Here's my code:
import numpy as np
import tensorflow as tf
def main():
train_data = np.array([[1, 2, 3, 4, 5]])
train_label = np.array([123])
# Uncomment this to load the model and comment the next model and normalizer related lines.
#model = tf.keras.models.load_model('AI/test.h5')
normalizer = tf.keras.layers.experimental.preprocessing.Normalization()
model = tf.keras.Sequential([normalizer, tf.keras.layers.Dense(units=1)])
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.1), loss='mean_absolute_error'), train_label, epochs=3000)'AI/test.h5')
unseen_data = np.array([[1, 2, 3, 4, 6]])
prediction = model.predict(unseen_data)
if __name__ == "__main__":
It throws the following exception:
Traceback (most recent call last):
File "E:\Backup\Desktop\", line 30, in <module>
File "E:\Backup\Desktop\", line 11, in main
model = tf.keras.models.load_model('AI/test.h5')
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\saving\", line 200, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\saving\", line 180, in load_model_from_hdf5
model = model_config_lib.model_from_config(model_config,
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\saving\", line 52, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\layers\", line 208, in deserialize
return generic_utils.deserialize_keras_object(
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\utils\", line 674, in deserialize_keras_object
deserialized_obj = cls.from_config(
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\", line 434, in from_config
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\training\tracking\", line 530, in _method_wrapper
result = method(self, *args, **kwargs)
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\", line 217, in add
output_tensor = layer(self.outputs[0])
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\", line 976, in __call__
return self._functional_construction_call(inputs, args, kwargs,
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\", line 1114, in _functional_construction_call
outputs = self._keras_tensor_symbolic_call(
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\", line 848, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\", line 886, in _infer_output_signature
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\engine\", line 2659, in _maybe_build # pylint:disable=not-callable
File "C:\Users\censored\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\layers\preprocessing\", line 145, in build
raise ValueError(
ValueError: All `axis` values to be kept must have known shape. Got axis: (-1,), input shape: [None, None], with unknown axis at index: 1
Process finished with exit code 1
It looks like a bug.
Follow this link
if 'input_dim' in kwargs and 'input_shape' not in kwargs:
# Backwards compatibility: alias 'input_dim' to 'input_shape'.
kwargs['input_shape'] = (kwargs['input_dim'],)
if 'input_shape' in kwargs or 'batch_input_shape' in kwargs:
# In this case we will later create an input layer
# to insert before the current layer
if 'batch_input_shape' in kwargs:
batch_input_shape = tuple(kwargs['batch_input_shape'])
elif 'input_shape' in kwargs:
if 'batch_size' in kwargs:
batch_size = kwargs['batch_size']
batch_size = None
batch_input_shape = (batch_size,) + tuple(kwargs['input_shape'])
self._batch_input_shape = batch_input_shape
The error occurs because the normalization could not get any shape information which would lead to self._input_batch_shape =(None, None).
But when loading model(deserialization), It would call build function which should have known shape in all axes.
# Sorted to avoid transposing axes.
self._keep_axis = sorted([d if d >= 0 else d + ndim for d in self.axis])
# All axes to be kept should have known shape.
for d in self._keep_axis:
if input_shape[d] is None:
raise ValueError(
'All `axis` values to be kept must have known shape. Got axis: {}, '
'input shape: {}, with unknown axis at index: {}'.format(
self.axis, input_shape, d))

InvalidArgumentError : ConcatOp : Dimensions of inputs should match

Tensorflow 1.7 when using dynamic_rnn.It runs fine at first , but at the 32th(it changes when i run the code) step , the error appears. When i used smaller batch , it seems the code can run longer , however the error still poped up .Just cannt figure out what's wrong.
from mapping import *
def my_input_fn(features, targets, batch_size=20, shuffle=True, num_epochs=None, sequece_lenth=None):
ds =
(features, targets, sequece_lenth)) # warning: 2GB limit
ds = ds.batch(batch_size).repeat(num_epochs)
if shuffle:
ds = ds.shuffle(10000)
features, labels, sequence = ds.make_one_shot_iterator().get_next()
return features, labels, sequence
def lstm_cell(lstm_size=50):
return tf.contrib.rnn.BasicLSTMCell(lstm_size)
class RnnModel:
def __init__(self,
self.batch_size = batch_size
self.hidden_units = hidden_units
stacked_lstm = tf.contrib.rnn.MultiRNNCell(
[lstm_cell(i) for i in self.hidden_units])
self.initial_state = stacked_lstm.zero_state(batch_size, tf.float32)
self.model = stacked_lstm
self.state = self.initial_state
self.time_steps = time_steps
self.num_features = num_features
def loss_mean_squre(self, outputs, targets):
pos = tf.add(outputs, tf.ones(self.batch_size))
eve = tf.div(pos, 2)
error = tf.subtract(eve,
return tf.reduce_mean(tf.square(error))
def train(self,
periods = 10
step_per_periods = int(num_steps / periods)
input, target, sequence = input_fn(inputs, targets, self.batch_size, shuffle=True, sequece_lenth=sequenceLenth)
initial_state = self.model.zero_state(self.batch_size, tf.float32)
outputs, state = tf.nn.dynamic_rnn(self.model, input, initial_state=initial_state)
loss = self.loss_mean_squre(tf.reshape(outputs, [self.time_steps, self.batch_size])[-1], target)
optimizer = tf.train.AdamOptimizer(learning_rate=learningRate)
grads_and_vars = optimizer.compute_gradients(loss, self.model.variables)
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
for i in range(num_steps):
state2, current_loss=[state, loss])
if i % step_per_periods == 0:
print("period " + str(int(i / step_per_periods)) + ":" + str(current_loss))
return self.model, self.state
def processFeature(df):
df = df.drop('class', 1)
features = []
for i in range(len(df["vecs"])):
aa = pd.Series(features).tolist() # tramsform into list
featuresList = []
for i in features:
p1 = []
for k in i:
return featuresList
def processTargets(df):
selected_features = df[
processed_features = selected_features.copy()
return tf.convert_to_tensor(processed_features.astype(float).tolist())
if __name__ == '__main__':
dividNumber = 30
some code here to modify my data to input
it looks like this:
inputs before use input function : [fullLenth, charactorLenth, embeddinglenth]
model = RnnModel(15, [100, 80, 80, 1], time_steps=dividNumber, num_features=25)
model.train(5000, 0.0001, my_input_fn, training_examples, training_targets, sequenceLenth=trainSequenceL)
And error is under here
Traceback (most recent call last):
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\", line 1330, in _do_call
return fn(*args)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\", line 1315, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\", line 1423, in _call_tf_sessionrun
status, run_metadata)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\", line 516, in __exit__
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [20,25] vs. shape[1] = [30,100]
[[Node: rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/Const)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/programming/mlwords/", line 198, in <module>
model.train(5000, 0.0001, my_input_fn, training_examples, training_targets, sequenceLenth=trainSequenceL)
File "D:/programming/mlwords/", line 124, in train
state2, current_loss, nowAccuracy =[state, loss, accuracy])
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\", line 908, in run
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\", line 1143, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\", line 1324, in _do_run
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\client\", line 1343, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [20,25] vs. shape[1] = [30,100]
[[Node: rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/Const)]]
Caused by op 'rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat', defined at:
File "D:/programming/mlwords/", line 198, in <module>
model.train(5000, 0.0001, my_input_fn, training_examples, training_targets, sequenceLenth=trainSequenceL)
File "D:/programming/mlwords/", line 95, in train
outputs, state = tf.nn.dynamic_rnn(self.model, input, initial_state=initial_state)#,sequence_length=sequence
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 627, in dynamic_rnn
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 824, in _dynamic_rnn_loop
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 3205, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 2943, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 2880, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 3181, in <lambda>
body = lambda i, lv: (i + 1, orig_body(*lv))
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 795, in _time_step
(output, new_state) = call_cell()
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 781, in <lambda>
call_cell = lambda: cell(input_t, state)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 232, in __call__
return super(RNNCell, self).__call__(inputs, state)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\layers\", line 714, in __call__
outputs =, *args, **kwargs)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 1283, in call
cur_inp, new_state = cell(cur_inp, cur_state)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 339, in __call__
*args, **kwargs)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\layers\", line 714, in __call__
outputs =, *args, **kwargs)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 620, in call
array_ops.concat([inputs, h], 1), self._kernel)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 1181, in concat
return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\ops\", line 1101, in concat_v2
"ConcatV2", values=values, axis=axis, name=name)
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\", line 787, in _apply_op_helper
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\", line 3309, in create_op
File "D:\Anaconda3\envs\tensorflow-cpu\lib\site-packages\tensorflow\python\framework\", line 1669, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [20,25] vs. shape[1] = [30,100]
[[Node: rnn/while/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](rnn/while/TensorArrayReadV3, rnn/while/Switch_4:1, rnn/while/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/Const)]]
this is my code used to check my input
def checkData(inputs, targets, sequencelence):
batch_size = 20
features, target, sequece = my_input_fn(inputs, targets, batch_size=batch_size, shuffle=True, num_epochs=None,
with tf.Session() as sess:
for i in range(1000):
features1, target1, sequece1 =[features, target, sequece])
assert len(features1) == batch_size
for sentence in features1 :
assert len(sentence) == 30
for word in sentence:
assert len(word) == 25
assert len(target1) == batch_size
assert len(sequece1) == batch_size
The error is coming from call method. There we are trying to tf.concat([inputs, h], 1) meaning that we want to concatenate the next input with the current hidden state before matmul'ing with the kernel variables matrix. The error is saying that you can't do it because the batch (0th) dimensions don't match up - your input is shaped [20,25] and your hidden state is shaped [30,100].
For some reason on your 32nd iteration, or whenever you see the error, the input is not batched to 30, but only to 20. This usually happens at the end of your training data when the total number of training examples does not evenly divide your batch size. This hypothesis is also consistent with "When i used smaller batch , it seems the code can run longer" statement.
I had the same issue. When I corrected the image input size to match the input shape, it ran without errors.

"output_shape has incorrect number of elements"

I'm trying to construct a simple one-hot converter. It takes a batch of data vectors as input, and for each data vector, converts it to a one-hot vector. The one-hots have 1s at the original data vectors' argmaxes. (e.g. [[2.3, -4.1, 0.4], [-0.1, -3.1, 2.1]] -> [[1.0, 0.0, 0.0], [0.0, 0.0, 1.0]])
I'm doing this with tf.sparse_to_dense().
import random
import tensorflow as tf
batch_size = 10
data_size = 3
data = []
for i in range(batch_size):
for j in range(data_size):
with tf.Graph().as_default(), tf.Session() as sess:
indices = tf.reshape(tf.range(0, limit=batch_size, delta=1), [1, -1])
hot_ids = tf.reshape(tf.cast(tf.argmax(data, 1), tf.int32), [1, -1])
sparse_indices = tf.concat(0, [indices, hot_ids])
output_shape = tf.pack([batch_size, data_size])
result = tf.sparse_to_dense(sparse_indices, output_shape, 1.0, 0.0)
The first three printouts happen correctly. The last printout triggers this error:
W tensorflow/core/common_runtime/] 0x7fb0e5903560 Compute status: Invalid argument: output_shape has incorrect number of elements: 2 should be: 10
[[Node: SparseToDense = SparseToDense[T=DT_FLOAT, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](concat, pack, SparseToDense/sparse_values, SparseToDense/default_value)]]
Traceback (most recent call last):
File "one-hot_simple", line 21, in <module>
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/", line 465, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/", line 3097, in _eval_using_default_session
return, feed_dict)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/", line 315, in run
return self._run(None, fetches, feed_dict)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/", line 511, in _run
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/", line 564, in _do_run
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/", line 586, in _do_call
tensorflow.python.framework.errors.InvalidArgumentError: output_shape has incorrect number of elements: 2 should be: 10
[[Node: SparseToDense = SparseToDense[T=DT_FLOAT, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](concat, pack, SparseToDense/sparse_values, SparseToDense/default_value)]]
Caused by op u'SparseToDense', defined at:
File "one-hot_simple", line 16, in <module>
result = tf.sparse_to_dense(sparse_indices, output_shape, 1.0, 0.0)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/", line 358, in sparse_to_dense
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/", line 322, in _sparse_to_dense
validate_indices=validate_indices, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/", line 655, in apply_op
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/", line 2040, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/", line 1087, in __init__
self._traceback = _extract_stack()
I don't understand why output_shape should have 10 elements or why this error is happening... Please help!
The issue seems to arise from the fact that your sparse_indices matrix is a 2 x 10 matrix, whereas it expects a num_elems x num_dims (i.e. 10 x 2) matrix. You should change the code that computes this matrix as follows:
indices = tf.reshape(tf.range(0, limit=batch_size, delta=1), [-1, 1])
hot_ids = tf.reshape(tf.cast(tf.argmax(data, 1), tf.int32), [-1, 1])
sparse_indices = tf.concat(1, [indices, hot_ids])
You might also find the recently added tf.one_hot() op useful.