NotFoundError: [_Derived_]No gradient defined for op: StatefulPartitionedCall on Tensorflow 1.15.0 - tensorflow

I am running a tensorflow model using BERT in the embedding layer. I found this similar question with no answer. Honestly I do not understand why the error is occurring, because the model runs fine for another dataset.
When i call
train_history =
train_input, train_labels,
I get this error:
NotFoundError: [_Derived_]No gradient defined for op: StatefulPartitionedCall
[[{{node Func/_4}}]]
[[PartitionedCall/gradients/StatefulPartitionedCall_grad/PartitionedCall/gradients/StatefulPartitionedCall_grad/SymbolicGradient]] [Op:__inference_distributed_function_28080]
Function call stack:
def build_model(bert_layer, max_len=512):
input_word_ids = Input(shape=(max_len,), dtype=tf.int32, name="input_word_ids")
input_mask = Input(shape=(max_len,), dtype=tf.int32, name="input_mask")
segment_ids = Input(shape=(max_len,), dtype=tf.int32, name="segment_ids")
_, sequence_output = bert_layer([input_word_ids, input_mask, segment_ids])
clf_output = sequence_output[:, 0, :]
out = Dense(1, activation='sigmoid')(clf_output)
model = Model(inputs=[input_word_ids, input_mask, segment_ids], outputs=out)
model.compile(Adam(lr=2e-6), loss='categorical_crossentropy', metrics=['accuracy'])
return model
Loading BERT from Tensorflow hub
module_url = ""
bert_layer = hub.KerasLayer(module_url, trainable=True)
Loading tokenizer and encoding text
vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
tokenizer = tokenization.FullTokenizer(vocab_file, do_lower_case)
train_input = bert_encode(train.text.values, tokenizer, max_len=160)
test_input = bert_encode(test.text.values, tokenizer, max_len=160)
train_labels = train['label']
train_labels = to_categorical(np.asarray(train_labels.factorize()[0]))
>> tuple
>> numpy.ndarray
Run model
model = build_model(bert_layer, max_len=160)
Model: "model"
Layer (type) Output Shape Param # Connected to
input_word_ids (InputLayer) [(None, 160)] 0
input_mask (InputLayer) [(None, 160)] 0
segment_ids (InputLayer) [(None, 160)] 0
keras_layer (KerasLayer) [(None, 768), (None, 177853441 input_word_ids[0][0]
tf_op_layer_strided_slice (Tens [(None, 768)] 0 keras_layer[0][1]
dense (Dense) (None, 8) 6152 tf_op_layer_strided_slice[0][0]
Total params: 177,859,593
Trainable params: 177,859,592
Non-trainable params: 1

It was because of using an older version of tf with bert. I missed this issue which answers the question.


with `with strategy.scope():` BERT output loses it's shape from tf-hub and `encoder_output` is missing

!pip install tensorflow-text==2.7.0
import tensorflow_text as text
import tensorflow_hub as hub
# ... other tf imports....
strategy = tf.distribute.MirroredStrategy()
print('Number of GPU: ' + str(strategy.num_replicas_in_sync)) # 1 or 2, shouldn't matter
with strategy.scope():
bert_preprocess = hub.KerasLayer("")
bert_encoder = hub.KerasLayer("")
def get_model():
text_input = Input(shape=(), dtype=tf.string, name='text')
preprocessed_text = bert_preprocess(text_input)
outputs = bert_encoder(preprocessed_text)
output_sequence = outputs['sequence_output']
x = Dense(NUM_CLASS, activation='sigmoid')(output_sequence)
model = Model(inputs=[text_input], outputs = [x])
return model
optimizer = Adam()
model = get_model()
model.compile(loss=CategoricalCrossentropy(from_logits=True),optimizer=optimizer,metrics=[Accuracy(), ],)
model.summary() # <- look at the output 1
tf.keras.utils.plot_model(model, show_shapes=True, to_file='model.png') # <- look at the figure 1
with strategy.scope():
optimizer = Adam()
model = get_model()
model.compile(loss=CategoricalCrossentropy(from_logits=True),optimizer=optimizer,metrics=[Accuracy(), ],)
model.summary() # <- compare with output 1, it has already lost it's shape
tf.keras.utils.plot_model(model, show_shapes=True, to_file='model_scoped.png') # <- compare this figure too, for ease
With scope, BERT loses seq_length, and it becomes None.
Model summary withOUT scope: (See there is 128 at the very last layer, which is seq_length)
Model: "model_6"
Layer (type) Output Shape Param # Connected to
text (InputLayer) [(None,)] 0 []
keras_layer_2 (KerasLayer) {'input_mask': (Non 0 ['text[0][0]']
e, 128),
(None, 128),
(None, 128)}
keras_layer_3 (KerasLayer) multiple 109482241 ['keras_layer_2[6][0]',
dense_6 (Dense) (None, 128, 2) 1538 ['keras_layer_3[6][14]']
Total params: 109,483,779
Trainable params: 1,538
Non-trainable params: 109,482,241
Model with scope:
Model: "model_7"
Layer (type) Output Shape Param # Connected to
text (InputLayer) [(None,)] 0 []
keras_layer_2 (KerasLayer) {'input_mask': (Non 0 ['text[0][0]']
e, 128),
(None, 128),
(None, 128)}
keras_layer_3 (KerasLayer) multiple 109482241 ['keras_layer_2[7][0]',
dense_7 (Dense) (None, None, 2) 1538 ['keras_layer_3[7][14]']
Total params: 109,483,779
Trainable params: 1,538
Non-trainable params: 109,482,241
If these image helps:
Another notable thing encoder_outputs is also missing if you take a look at the 2nd keras layer or 3rd layer of both model.

ValueError: Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 50), found shape=(None, 1, 512)

Learning to use bert-base-cased and a classification model... the code for the model is the following:
def mao_func(input_ids, masks, labels):
return {'input_ids':input_ids, 'attention_mask':masks}, labels
dataset =
dataset = dataset.shuffle(100000).batch(BATCH_SIZE)
split = .8
ds_len = len(list(dataset))
train = dataset.take(round(ds_len * split))
val = dataset.skip(round(ds_len * split))
from transformers import TFAutoModel
bert = TFAutoModel.from_pretrained('bert-base-cased')
Model: "tf_bert_model"
Layer (type) Output Shape Param #
bert (TFBertMainLayer) multiple 108310272
Total params: 108,310,272
Trainable params: 108,310,272
Non-trainable params: 0
then the NN builduing:
input_ids = tf.keras.layers.Input(shape=(50,), name='input_ids', dtype='int32')
mask = tf.keras.layers.Input(shape=(50,), name='attention_mask', dtype='int32')
embeddings = bert(input_ids, attention_mask=mask)[0]
X = tf.keras.layers.GlobalMaxPool1D()(embeddings)
X = tf.keras.layers.BatchNormalization()(X)
X = tf.keras.layers.Dense(128, activation='relu')(X)
X = tf.keras.layers.Dropout(0.1)(X)
X = tf.keras.layers.Dense(32, activation='relu')(X)
y = tf.keras.layers.Dense(3, activation='softmax',name='outputs')(X)
model = tf.keras.Model(inputs=[input_ids, mask], outputs=y)
model.layers[2].trainable = False
the model.summary is:
Layer (type) Output Shape Param # Connected to
input_ids (InputLayer) [(None, 50)] 0 []
attention_mask (InputLayer) [(None, 50)] 0 []
tf_bert_model (TFBertModel) TFBaseModelOutputWi 108310272 ['input_ids[0][0]',
thPoolingAndCrossAt 'attention_mask[0][0]']
n_state=(None, 50,
e, 768),
ne, hidden_states=N
one, attentions=Non
e, cross_attentions
global_max_pooling1d (GlobalMa (None, 768) 0 ['tf_bert_model[0][0]']
batch_normalization (BatchNorm (None, 768) 3072 ['global_max_pooling1d[0][0]']
dense (Dense) (None, 128) 98432 ['batch_normalization[0][0]']
dropout_37 (Dropout) (None, 128) 0 ['dense[0][0]']
dense_1 (Dense) (None, 32) 4128 ['dropout_37[0][0]']
outputs (Dense) (None, 3) 99 ['dense_1[0][0]']
Total params: 108,416,003
Trainable params: 104,195
Non-trainable params: 108,311,808
finally the model fitting is
optimizer = tf.keras.optimizers.Adam(0.01)
loss = tf.keras.losses.CategoricalCrossentropy()
acc = tf.keras.metrics.CategoricalAccuracy('accuracy')
model.compile(optimizer,loss=loss, metrics=[acc])
history =
validation_data = val,
with execution error in line 7 -> the
ValueError: Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 50), found shape=(None, 1, 512)
Can any one be so kind of helping me on what I did wrong and why... thanks:)
update: here is the git with the codes
It seems, that your shape of the train data doen't match the expected input shape of your input layer.
You can check your shape of the train data with train.shape()
You input layer Input_ids = tf.keras.layers.Input(shape=(50,), name='input_ids', dtype='int32') expects train data with 50 columns, but you most likely have 512 if we look at your error.
So to fix this, you could simply change your input shape.
Input_ids = tf.keras.layers.Input(shape=(512,), name='input_ids', dtype='int32')
If you split your x and y in your dataset you can make it more flexible with:
Input_ids = tf.keras.layers.Input(shape=(train_x.shape[0],), name='input_ids', dtype='int32')
Also don't forget, that you have to do this change to all of your input layers!

Sequential VGG16 model, graph disconnected error

I have a sequential model with a VGG16 at the top.:
def rescale(x):
return x/65535.
base_model = tf.keras.applications.VGG16(
include_top=True, weights=None, input_tensor=None, input_shape=(224,224,1),
pooling=None, classes=102, classifier_activation='softmax')
model = tf.keras.Sequential([
tf.keras.Input(shape=(None, None, 1)),
tf.keras.layers.experimental.preprocessing.Resizing(224, 224),
tf.keras.layers.experimental.preprocessing.RandomFlip(mode='horizontal_and_vertical', seed=42),
Output model.summary():
Model: "sequential"
Layer (type) Output Shape Param #
lambda (Lambda) (None, None, None, 1) 0
resizing (Resizing) (None, 224, 224, 1) 0
random_flip (RandomFlip) (None, 224, 224, 1) 0
vgg16 (Functional) (None, 102) 134677286
Total params: 134,677,286
Trainable params: 134,677,286
Non-trainable params: 0
Now I want to create a new model with two outputs:
vgg_model = model.layers[3]
last_conv_layer = vgg_model.get_layer('block5_conv3')
new_model = tf.keras.models.Model(inputs=[model.inputs], outputs=[last_conv_layer.output, model.output])
But I get this error:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_1_6:0", shape=(None, 224, 224, 1), dtype=float32) at layer "block1_conv1". The following previous layers were accessed without issue: []
What am I missing here?
Given a fitted model in this form:
def rescale(x):
return x/65535.
base_model = tf.keras.applications.VGG16(
include_top=True, weights=None, input_tensor=None, input_shape=(224,224,1),
pooling=None, classes=102, classifier_activation='softmax')
model = tf.keras.Sequential([
tf.keras.Input(shape=(None, None, 1)),
tf.keras.layers.experimental.preprocessing.Resizing(224, 224),
tf.keras.layers.experimental.preprocessing.RandomFlip(mode='horizontal_and_vertical', seed=42),
You can wrap your vgg in a Model that returns all the outputs you need
new_model = Model(inputs=model.layers[3].input,
inp = tf.keras.Input(shape=(None, None, 1))
x = tf.keras.layers.Lambda(rescale)(inp)
x = tf.keras.layers.experimental.preprocessing.Resizing(224, 224)(x)
outputs = new_model(x)
new_model = Model(inp, outputs)
The summary of new_model:
Layer (type) Output Shape Param #
input_49 (InputLayer) [(None, None, None, 1)] 0
lambda_25 (Lambda) (None, None, None, 1) 0
resizing_25 (Resizing) (None, 224, 224, 1) 0
functional_47 (Functional) [(None, 102), (None, 14, 134677286
Total params: 134,677,286
Trainable params: 134,677,286
Non-trainable params: 0

keras define a trainable variable for add or matmul

I have some problems in use tf.keras to build model. Now I want to define a trainbale weight tensor with shape(64, 128), which similar to tf.get_variable. However I can't achieve it.
In the past, I have try many methods.But I want to look for easily method.
inputs = tf.keras.Input((128,))
weights = tf.Variable(tf.random.normal((64, 128)))
output = tf.keras.layers.Lambda(lambda x: tf.matmul(x, tf.transpose(weights)))(inputs)
model = tf.keras.Model(inputs, output)
Layer (type) Output Shape Param #
input_10 (InputLayer) (None, 128) 0
lambda_2 (Lambda) (None, 64) 0
Total params: 0
Trainable params: 0
Non-trainable params: 0
The defined weights is not trainable.
In addition, I know Dense can get trained matrix weights and bias. But if I want add a bias, I can't use Dense.
However, I have to use add_weights in custome layer, for example:
class Bias(keras.layers.Layer):
def build(self, input_shape):
self.bias = self.add_weight(shape=(64, 128), initializer='zeros', dtype=tf.float32, name='x')
self.built = True
def call(self, inputs):
return inputs + self.bias
inputs = Input(shape=(64, 128))
outputs = Bias()(inputs)
model = Model(inputs=inputs, outputs=outputs)
Layer (type) Output Shape Param #
input_11 (InputLayer) (None, 64, 128) 0
bias_5 (Bias) (None, 64, 128) 8192
Total params: 8,192
Trainable params: 8,192
Non-trainable params: 0
Is there any more easily method to define a trainable variable ?

Seralizing a keras model with an embedding layer

I've trained a model with pre-trained word embeddings like this:
embedding_matrix = np.zeros((vocab_size, 100))
for word, i in text_tokenizer.word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector
embedding_layer = Embedding(vocab_size,
With the architecture looking like this:
sequence_input = Input(shape=(50,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
text_cnn = Conv1D(filters=5, kernel_size=5, padding='same', activation='relu')(embedded_sequences)
text_lstm = LSTM(500, return_sequences=True)(embedded_sequences)
char_in = Input(shape=(50, 18, ))
char_cnn = Conv1D(filters=5, kernel_size=5, padding='same', activation='relu')(char_in)
char_cnn = GaussianNoise(0.40)(char_cnn)
char_lstm = LSTM(500, return_sequences=True)(char_in)
merged = concatenate([char_lstm, text_lstm])
merged_d1 = Dense(800, activation='relu')(merged)
merged_d1 = Dropout(0.5)(merged_d1)
text_class = Dense(len(y_unique), activation='softmax')(merged_d1)
model = Model([sequence_input,char_in], text_class)
When I go to convert the model to json, I get this error:
ValueError: can only convert an array of size 1 to a Python scalar
Similarly, if I use the function, it seems to save correctly, but when I go to load it, I get Type Error: Expected Float32.
My question is: is there something I am missing when trying to serialize this model? Do I need some sort of Lambda layer or something of the sorts?
Any help would be greatly appreciated!
You can use the weights argument in Embedding layer to provide initial weights.
embedding_layer = Embedding(vocab_size,
The weights should remain non-trainable after model saving/loading:'1.h5')
m = load_model('1.h5')
Layer (type) Output Shape Param # Connected to
input_3 (InputLayer) (None, 50) 0
input_4 (InputLayer) (None, 50, 18) 0
embedding_1 (Embedding) (None, 50, 100) 1000000 input_3[0][0]
lstm_4 (LSTM) (None, 50, 500) 1038000 input_4[0][0]
lstm_3 (LSTM) (None, 50, 500) 1202000 embedding_1[0][0]
concatenate_2 (Concatenate) (None, 50, 1000) 0 lstm_4[0][0]
dense_2 (Dense) (None, 50, 800) 800800 concatenate_2[0][0]
dropout_2 (Dropout) (None, 50, 800) 0 dense_2[0][0]
dense_3 (Dense) (None, 50, 15) 12015 dropout_2[0][0]
Total params: 4,052,815
Trainable params: 3,052,815
Non-trainable params: 1,000,000
I hope you are saving the model after compiling. Like:
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
To save model, you can do:
from keras.models import load_model'model.h5')
model = load_model('model_detect1.h5')
model_json = model.to_json()
with open("model.json", "w") as json_file:
To load model,
from keras.models import model_from_json
json_file = open('model.json', 'r')
model_json =
model = model_from_json(model_json)
I tried multiple methods . The problem is when we work in the embedding layer, then pickle doesnt work, and is not able to save the data.
SO what you can do , when you have some layers like these:-
## Creating model
then, u can use
h5 extension to d=save file, and then convert that to json, model converetd to model2 here
from tensorflow.keras.models import load_model'model.h5')
model = load_model('model.h5')
model_json = model.to_json()
with open("model.json", "w") as json_file:
and this to load data:-
from tensorflow.keras.models import model_from_json
json_file = open('model.json', 'r')
model_json =
model2 = model_from_json(model_json)