Problem connecting transformer output to CNN input in Keras - tensorflow

I need to build a transformer-based architecture in Tensorflow following the encoder-decoder approach where the encoder is a preexisting Huggingface Distilbert model and the decoder is a CNN.
Inputs: a text containing texts with several phrases in a row. Outputs: codes according to taxonomic criteria. My data file has 7387 pairs text-label in TSV format:
text \t code
This is example text number one. It might contain some other phrases. \t C21
This is example text number two. It might contain some other phrases. \t J45.1
This is example text number three. It might contain some other phrases. \t A27
The remainder of the code is this:
text_file = "data/datafile.tsv"
with open(text_file) as f:
lines = f.read().split("\n")[:-1]
text_and_code_pairs = []
for line in lines:
text, code = line.split("\t")
text_and_code_pairs.append((text, code))
random.shuffle(text_and_code_pairs)
num_val_samples = int(0.10 * len(text_and_code_pairs))
num_train_samples = len(text_and_code_pairs) - 3 * num_val_samples
train_pairs = text_and_code_pairs[:num_train_samples]
val_pairs = text_and_code_pairs[num_train_samples : num_train_samples + num_val_samples]
test_pairs = text_and_code_pairs[num_train_samples + num_val_samples :]
train_texts = [fst for (fst,snd) in train_pairs]
train_labels = [snd for (fst,snd) in train_pairs]
val_texts = [fst for (fst,snd) in val_pairs]
val_labels = [snd for (fst,snd) in val_pairs]
test_texts = [fst for (fst,snd) in test_pairs]
test_labels = [snd for (fst,snd) in test_pairs]
distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")
tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-multilingual-cased")
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
dict(test_encodings),
test_labels
))
model = build_model(distilbert_encoder)
model.fit(train_dataset.batch(64), validation_data=val_dataset, epochs=3, batch_size=64)
model.predict(test_dataset, verbose=1)
Lastly, the build_model function:
def build_model(transformer, max_len=512):
model = tf.keras.models.Sequential()
# Encoder
inputs = layers.Input(shape=(max_len,), dtype=tf.int32)
distilbert = transformer(inputs)
# LAYER - something missing here?
# Decoder
conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
flat = tf.keras.layers.Flatten()(pooling)
fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
print(model.summary())
return model
I managed to narrow down the possible locations of my problem. After changing from sequential to functional Keras API, I get the following error:
Traceback (most recent call last):
File "keras_transformer.py", line 99, in <module>
main()
File "keras_transformer.py", line 94, in main
model = build_model(distilbert_encoder)
File "keras_transformer.py", line 23, in build_model
conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 897, in __call__
self._maybe_build(inputs)
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2416, in _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 152, in build
input_shape = tensor_shape.TensorShape(input_shape)
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in __init__
self._dims = [as_dimension(d) for d in dims_iter]
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in <listcomp>
self._dims = [as_dimension(d) for d in dims_iter]
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 716, in as_dimension
return Dimension(value)
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 200, in __init__
None)
File "<string>", line 3, in raise_from
TypeError: Dimension value must be integer or None or have an __index__ method, got 'last_hidden_state'
It seems that the error lies in the connection between the output of the transformer and the input of the convolutional layer. Am I supposed to include another layer between them so as to adapt the output of the transformer? If so, what would be the best option?I'm using tensorflow==2.2.0, transformers==4.5.1 and Python 3.6.9

I think the problem is to call the right tensor for the tensorflow layer after the dilbert instance. Because distilbert = transformer(inputs) returns an instance rather than a tensor like in tensorflow, e.g., pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D). pooling is the output tensor of the MaxPooling1D layer.
I fix your problem by calling the last_hidden_state variable of the distilbert instance (i.e. output of the dilbert model), and this will be your input to the next Conv1D layer.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages
from transformers import TFDistilBertModel, DistilBertModel
import tensorflow as tf
distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")
def build_model(transformer, max_len=512):
# model = tf.keras.models.Sequential()
# Encoder
inputs = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32)
distilbert = transformer(inputs)
# Decoder
###### !!!!!! #########
conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert.last_hidden_state)
###### !!!!!! #########
pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
flat = tf.keras.layers.Flatten()(pooling)
fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
print(model.summary())
return model
model = build_model(distilbert_encoder)
This returns,
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 512)] 0
_________________________________________________________________
tf_distil_bert_model (TFDist TFBaseModelOutput(last_hi 134734080
_________________________________________________________________
conv1d (Conv1D) (None, 503, 5) 38405
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 251, 5) 0
_________________________________________________________________
flatten (Flatten) (None, 1255) 0
_________________________________________________________________
dense (Dense) (None, 1255) 1576280
_________________________________________________________________
dense_1 (Dense) (None, 1255) 1576280
=================================================================
Total params: 137,925,045
Trainable params: 137,925,045
Non-trainable params: 0
Note: I assume you mean tf.keras.layers.Input by layers.Input in your build_model function.

I think you are right. The problem seems to be the input to the Conv1D layer.
According to the documentation the outputs.last_hidden_state has a shape of (batch_size, sequence_length, hidden_size).
Conv1D is expecting an input of shape (batch_size, sequence_length).
Maybe you could resolve the problem by either changing your Conv1D to Conv2D or adding a Conv2D layer in between.

Related

ML error says: "ValueError: Input 0 of layer "sequential_6" is incompatible with the layer: expected shape=(None, 42), found shape=(None, 41)"

I needed to increase the accuracy of a model. So I tried using TabNet.
I'm attaching the train & test data in a google drive link
Link: https://drive.google.com/drive/folders/1ronho26m9uX9_ooBTh0M81ox1a43ika8?usp=sharing
Here is my code.
import tensorflow as tf
import pandas as pd
# Load the train and test data into pandas dataframes
train_df = pd.read_csv("train.csv")
#train_df1 = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")
# Split the target variable and the features
train_labels = train_df[[f'F_{i}' for i in range(40)]]
#train_labels=trai
test_labels = train_df.target
# Convert the dataframes to tensors
train_dataset = tf.data.Dataset.from_tensor_slices((train_df.values, train_labels.values))
test_dataset = tf.data.Dataset.from_tensor_slices((test_df.values, test_labels.values))
# Define the model using the TabNet architecture
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(train_df.shape[1],)),
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(1)
])
# Compile the model with a mean squared error loss function and the Adam optimizer
model.compile(loss="mean_squared_error", optimizer="adam")
# Train the model on the training data
model.fit(train_dataset.batch(32), epochs=5)
# Make predictions on the test data
predictions = model.predict(test_dataset.batch(32))
#predictions = model.predict(test_dataset)
# Evaluate the model on the test data
mse = tf.keras.losses.mean_squared_error(test_labels, predictions)
print("Mean Squared Error:", mse.numpy().mean())
I don't know what's wrong with it as I'm just a beginner.
Here is the error code:
ValueError Traceback (most recent call last)
<ipython-input-40-87712e1604a9> in <module>
24
25 # Make predictions on the test data
---> 26 predictions = model.predict(test_dataset.batch(32))
27 #predictions = model.predict(test_dataset)
28
1 frames
/usr/local/lib/python3.8/dist-packages/keras/engine/training.py in tf__predict_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: in user code:
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1845, in predict_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1834, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1823, in run_step **
outputs = model.predict_step(data)
File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1791, in predict_step
return self(x, training=False)
File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.8/dist-packages/keras/engine/input_spec.py", line 264, in assert_input_compatibility
raise ValueError(f'Input {input_index} of layer "{layer_name}" is '
ValueError: Input 0 of layer "sequential_6" is incompatible with the layer: expected shape=(None, 42), found shape=(None, 41)
I didn't really know what to do. So I'm expecting help from you guys. Would be grateful to whatever tips you guys give.

The last dimension of the inputs to a Dense layer should be defined. Found None. Full input shape received: (None, None)

I am trying to build a binary Prediction model through Keras TensorFlow.
And I am having trouble when I add data augmentation inside.
This is my code.
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_dir = 'C:/train'
test_dir = 'C:/test'
train_data = train_datagen.flow_from_directory(train_dir,target_size=(224,224),class_mode='binary',seed=42)
test_data = test_datagen.flow_from_directory(test_dir,target_size=(224,224),class_mode='binary',seed=42)
tf.random.set_seed(42)
from tensorflow.keras.layers.experimental import preprocessing
data_augmentation = keras.Sequential([
preprocessing.RandomFlip("horizontal"),
preprocessing.RandomZoom(0.2),
preprocessing.RandomRotation(0.2),
preprocessing.RandomHeight(0.2),
preprocessing.RandomWidth(0.2),
], name='data_augmentation')
model_1 = tf.keras.Sequential([
tf.keras.layers.Input(shape=(224,224,3),name='input_layer'),
data_augmentation,
tf.keras.layers.Conv2D(20,3,activation='relu'),
tf.keras.layers.MaxPool2D(pool_size=2),
tf.keras.layers.Conv2D(20,3,activation='relu'),
tf.keras.layers.MaxPool2D(pool_size=2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1,activation='sigmoid')
])
model_1.compile(loss=tf.keras.losses.binary_crossentropy,
optimizer='Adam',
metrics=['accuracy'])
model_1.fit(train_data,epochs=10,validation_data=test_data)
I ready tried this way but error again
inputs = tf.keras.layers.Input(shape=(224,224,3),name='input_layer')
x = data_augmentation(inputs)
x = tf.keras.layers.Conv2D(20,3,activation='relu')(x)
x = tf.keras.layers.Conv2D(20,3,activation='relu')(x)
x = tf.keras.layers.MaxPool2D(pool_size=2)(x)
x = tf.keras.layers.Flatten()(x)
outputs = tf.keras.layers.Dense(1,activation='sigmoid')(x)
model_1 = tf.keras.Model(inputs,outputs)
and this is error message:
Traceback (most recent call last):
File "C:/Users/pondy/PycharmProjects/pythonProject2/main.py", line 60, in <module>
model_1 = tf.keras.Sequential([
File "C:\Users\pondy\PycharmProjects\pythonProject2\venv\lib\site-packages\tensorflow\python\training\tracking\base.py", line 530, in _method_wrapper
result = method(self, *args, **kwargs)
File "C:\Users\pondy\PycharmProjects\pythonProject2\venv\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\pondy\PycharmProjects\pythonProject2\venv\lib\site-packages\keras\layers\core\dense.py", line 139, in build
raise ValueError('The last dimension of the inputs to a Dense layer '
ValueError: The last dimension of the inputs to a Dense layer should be defined. Found None. Full input shape received: (None, None)
Process finished with exit code 1
if didn't add data_augmentation inside it's not error
thank you for help<3
your code will work if you remove
preprocessing.RandomHeight(0.2),
preprocessing.RandomWidth(0.2),

Input 0 of layer "global_average_pooling1d" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 16)

I just started to learn Tensorflow and got an error. I watched a Neural Network Tutorial of Tech with Tim. I finished this episode and got a problem to the end. I couldn't predict the value. when i try it this error shows up:
Traceback (most recent call last):
File "C:/Users/havil/Documents/GitHub/Python/Machine-Learning-Tutorial/Neural Network Tutorial/Tutorial 2.py", line 56, in <module>
predict = model.predict([test_review])
File "C:\Users\havil\anaconda3\envs\tf_3.7\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\havil\anaconda3\envs\tf_3.7\lib\site-packages\tensorflow\python\framework\func_graph.py", line 1147, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
File "C:\Users\havil\anaconda3\envs\tf_3.7\lib\site-packages\keras\engine\training.py", line 1801, in predict_function *
return step_function(self, iterator)
File "C:\Users\havil\anaconda3\envs\tf_3.7\lib\site-packages\keras\engine\training.py", line 1790, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\havil\anaconda3\envs\tf_3.7\lib\site-packages\keras\engine\training.py", line 1783, in run_step **
outputs = model.predict_step(data)
File "C:\Users\havil\anaconda3\envs\tf_3.7\lib\site-packages\keras\engine\training.py", line 1751, in predict_step
return self(x, training=False)
File "C:\Users\havil\anaconda3\envs\tf_3.7\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\havil\anaconda3\envs\tf_3.7\lib\site-packages\keras\engine\input_spec.py", line 214, in assert_input_compatibility
raise ValueError(f'Input {input_index} of layer "{layer_name}" '
ValueError: Exception encountered when calling layer "sequential" (type Sequential).
Input 0 of layer "global_average_pooling1d" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 16)
Call arguments received:
• inputs=('tf.Tensor(shape=(None,), dtype=int32)',)
• training=False
• mask=None
My whole code looks like this:
import tensorflow as tf
from tensorflow import keras
import numpy as np
data = keras.datasets.imdb
(train_data, train_labels), (test_data, test_labels) = data.load_data(num_words=10000)
print(train_data[0])
# decode Data
word_index = data.get_word_index()
word_index = {key: (value+3) for key, value in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2
word_index["<UNUSED>"] = 3
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
train_data = keras.preprocessing.sequence.pad_sequences(train_data, value=word_index["<PAD>"], padding="post", maxlen=250)
test_data = keras.preprocessing.sequence.pad_sequences(test_data, value=word_index["<PAD>"], padding="post", maxlen=250)
def decode_review(text):
return " ".join([reverse_word_index.get(i, "?") for i in text])
print(decode_review(test_data[0]))
model = keras.Sequential()
model.add(keras.layers.Embedding(10000, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation="relu"))
model.add(keras.layers.Dense(1, activation="sigmoid"))
model.summary()
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
x_val = train_data[:10000]
x_train = train_data[10000:]
y_val = train_labels[:10000]
y_train = train_labels[10000:]
fitModel = model.fit(x_train, y_train, epochs=40, batch_size=512, validation_data=(x_val, y_val), verbose=1)
results = model.evaluate(test_data, test_labels)
print(results)
test_review = test_data[0]
predict = model.predict([test_review])
print("Review: ")
print(decode_review(test_review))
print("Prediction: " + str(predict[0]))
print("Actual: " + str(test_labels[0]))
print(results)
Does anyone have an idea how to fix this error?
I could solve this problem. model.predict doesn´t accept normal Lists. I just replaced it with an Numpy array.
test_review = test_data[0]
prediction = model.predict(np.array([test_review]))
print("Review: " + decode_review(test_review))
print("Prediction: " + str(prediction[0]))
print("Actual: " + str(test_labels[0]))
I have same issue, but i noticed that prediction does not work just for a single input, if you do the same with with all of the test data it works fine
prediction = model.predict(test_data)
for i in range(10):
print("Review", test_data[i])
print("Decoded: ", decode_review(test_data[i]))
print("Actual: ", str(test_labels[i]))
print("Predicted: ", str(prediction[i]))
It probably won't help you but i hope it does because i'm struggling with this all day long and really want to know how to fix it
I had the same issue but I solved it by appending the test sample to an empty list then predicting it from there and it worked

tf.data multi output model has labels with incompatible shapes

I am trying to convert an workbook I did some time ago on Colab (using ImageDataGenerator) to one that uses tf.data.dataset as I now have a multi-gpu set up and am trying to learn how to do faster training. The model trains on the age/ gender/ race dataset from Kaggle but in this instance we're interested in just the sex and age prediction. Sex will either be 0 or 1 and the loss function is binarycrossentropy while age is an integer between 0 and 120 and the loss function is mse at it is regression.
import tensorflow as tf
import os
AUTOTUNE = tf.data.AUTOTUNE
batch_size = 64
#Load datasets from directories
train_gen = tf.data.Dataset.list_files(os.listdir(training_dir), shuffle = False)
valid_gen = tf.data.Dataset.list_files(os.listdir(validation_dir), shuffle = False)
def decode_img(img):
#Convert compressed string into a 3D tensor
img = tf.io.decode_jpeg(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
#Resize the image to the desired size
return tf.image.resize(img, [128,128])
def get_label(file):
gender = get_sex(file) #returns either 0 or 1
age = get_age(file) #returns interger between 0 and about 120
return gender, age
def process_path(file):
file = file.numpy()
file_path = str(bytes.decode(file))
file = file_path.split(' ')[-1].split("\\")[-1]
labels = get_label(file)
# Load data from file as a String
img = tf.io.read_file(file_path)
img = decode_img(img)
img = img / 255.0
return img, labels
def _set_shapes(t1, t2):
t1.set_shape((128,128,3))
t2.set_shape((2,))
return (t1,t2)
train_gen = train_gen.map(lambda x: tf.py_function(process_path, [x], [tf.float32, tf.int32]), num_parallel_calls=AUTOTUNE)
valid_gen = valid_gen.map(lambda x: tf.py_function(process_path, [x], [tf.float32, tf.int32]), num_parallel_calls=AUTOTUNE)
train_gen = train_gen.map(_set_shapes,num_parallel_calls=AUTOTUNE)
valid_gen = valid_gen.map(_set_shapes, num_parallel_calls=AUTOTUNE)
train_gen = train_gen.batch(batch_size)
valid_gen = valid_gen.batch(batch_size)
train_gen
Output: <BatchDataset shapes: ((None, 128, 128, 3), (None, 2)), types: (tf.float32, tf.int32)>
#configure for performance
def config_for_performance(ds):
ds = ds.cache()
ds = ds.prefetch(buffer_size=AUTOTUNE)
return ds
train_gen = config_for_performance(train_gen)
valid_gen = config_for_performance(valid_gen)
The model itself:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Dropout, Input, Activation, Flatten, BatchNormalization, PReLU
from tensorflow.keras.regularizers import l2
from tensorflow.keras.losses import BinaryCrossentropy
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16')
gpus = tf.config.list_logical_devices('GPU')
#print(gpus)
strategy = tf.distribute.MirroredStrategy(gpus,cross_device_ops=tf.distribute.ReductionToOneDevice())
with strategy.scope():
#Define the convolution layers
inp = Input(shape=(128,128,3))
cl1 = Conv2D(32,(3,3), padding='same', kernel_regularizer=l2(0.001), kernel_initializer='he_uniform')(inp)
bn1 = BatchNormalization()(cl1)
pr1 = PReLU(alpha_initializer='he_uniform')(bn1)
cl2 = Conv2D(32,(3,3), padding='same',kernel_regularizer=l2(0.001), kernel_initializer='he_uniform')(pr1)
bn2 = BatchNormalization()(cl2)
pr2 = PReLU(alpha_initializer='he_uniform')(bn2)
mp1 = MaxPool2D((2,2))(pr2)
cl3 = Conv2D(64,(3,3), padding='same',kernel_regularizer=l2(0.001), kernel_initializer='he_uniform')(mp1)
bn3 = BatchNormalization()(cl3)
pr3 = PReLU(alpha_initializer='he_uniform')(bn3)
cl4 = Conv2D(64,(3,3), padding='same',kernel_regularizer=l2(0.001), kernel_initializer='he_uniform')(pr3)
bn4 = BatchNormalization()(cl4)
pr4 = PReLU(alpha_initializer='he_uniform')(bn4)
mp2 = MaxPool2D((2,2))(pr4)
cl5 = Conv2D(128,(3,3), padding='same',kernel_regularizer=l2(0.001), kernel_initializer='he_uniform')(mp2)
bn5 = BatchNormalization()(cl5)
pr5 = PReLU(alpha_initializer='he_uniform')(bn5)
mp3 = MaxPool2D((2,2))(pr5)
cl6 = Conv2D(256,(3,3), padding='same',kernel_regularizer=l2(0.001), kernel_initializer='he_uniform')(mp3)
bn6 = BatchNormalization()(cl6)
pr6 = PReLU(alpha_initializer='he_uniform')(bn6)
mp4 = MaxPool2D((2,2))(pr6)
cl7 = Conv2D(512,(3,3), padding='same',kernel_regularizer=l2(0.001), kernel_initializer='he_uniform')(mp4)
bn7 = BatchNormalization()(cl7)
pr7 = PReLU(alpha_initializer='he_uniform')(bn7)
mp5 = MaxPool2D((2,2))(pr7)
flt = Flatten()(mp5)
#This layer predicts age
agelayer = Dense(128, activation='relu',kernel_regularizer=l2(0.001), kernel_initializer='he_uniform')(flt)
agelayer = BatchNormalization()(agelayer)
agelayer = Dropout(0.6)(agelayer)
agelayer = Dense(1, activation='relu', name='age_output', kernel_initializer='he_uniform', dtype='float32')(agelayer)
#This layer predicts gender
glayer = Dense(128, activation='relu',kernel_regularizer=l2(0.001), kernel_initializer='he_uniform')(flt)
glayer = BatchNormalization()(glayer)
glayer = Dropout(0.5)(glayer)
glayer = Dense(1, activation='sigmoid', name='gender_output', kernel_initializer='he_uniform', dtype='float32')(glayer)
modelA = Model(inputs=inp, outputs=[glayer,agelayer])
model_folder = 'C:/Users/mm/OneDrive/Documents/Age estimation & gender classification/models'
if not os.path.exists(model_folder):
os.mkdir(model_folder)
#Callback to control learning rate during training. Reduces learning rate by 5% after 3 epochs of no improvement on validation loss
lr_callback = ReduceLROnPlateau(monitor='val_loss', factor=0.95, patience=3,min_lr=0.000005)
#Callback to stop training if after 100 epochs of no improvement it stops and restores the best weights
es_callback = EarlyStopping(monitor='val_loss', patience=100, restore_best_weights=True, min_delta=0.001)
#Compile Model A
modelA.compile(optimizer='Adam', loss={'gender_output': BinaryCrossentropy(), 'age_output': 'mse'}, metrics={'gender_output': 'accuracy', 'age_output':'mae'})
#Training Model A
history = modelA.fit(train_gen, epochs=100, validation_data=valid_gen, callbacks=[es_callback,lr_callback])
The error message:
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
Epoch 1/100
INFO:tensorflow:Error reported to Coordinator: logits and labels must have the same shape ((None, 1) vs (None, 2))
Traceback (most recent call last):
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\ops\nn_impl.py", line 130, in sigmoid_cross_entropy_with_logits
labels.get_shape().assert_is_compatible_with(logits.get_shape())
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\framework\tensor_shape.py", line 1161, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (None, 2) and (None, 1) are incompatible
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\training\coordinator.py", line 297, in stop_on_exception
yield
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\distribute\mirrored_run.py", line 346, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\autograph\impl\api.py", line 692, in wrapper
return converted_call(f, args, kwargs, options=options)
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\autograph\impl\api.py", line 382, in converted_call
return _call_unconverted(f, args, kwargs, options)
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\autograph\impl\api.py", line 463, in _call_unconverted
return f(*args, **kwargs)
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\keras\engine\training.py", line 835, in run_step
outputs = model.train_step(data)
show more (open the raw output data in a text editor) ...
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
File "C:\Users\mm\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\ops\nn_impl.py", line 132, in sigmoid_cross_entropy_with_logits
raise ValueError("logits and labels must have the same shape (%s vs %s)" %
ValueError: logits and labels must have the same shape ((None, 1) vs (None, 2))
Managed to resove this with a bit of research and trial and error. Main issues are:
The labels are being fed to the model as a tuple instead of being separated. When it is multiple output heads this is necessary:
def process_path(file):
file = file.numpy()
file_path = str(bytes.decode(file))
file = file_path.split("\\")[-1]
gender, age = get_label(file)
# Load data from file as a String
img = tf.io.read_file(file_path)
img = decode_img(img)
img = img / 255.0
return img, gender, age
NB: I made a modification to extracting the labels from the filename as it wasn't getting it right all the time:
file = file_path.split("\\")[-1]
Due to the change on 1, the map functions need an additional dtype for the other label, so it becomes:
train_gen = train_gen.map(lambda x: tf.py_function(process_path, [x], [tf.float32, tf.int32, tf.int32]), num_parallel_calls=AUTOTUNE)
valid_gen = valid_gen.map(lambda x: tf.py_function(process_path, [x], [tf.float32, tf.int32, tf.int32]), num_parallel_calls=AUTOTUNE)
Each label needs to be reshaped:
def _set_shapes(t1, t2, t3):
t1.set_shape((128,128,3))
t2.set_shape((1,))
t3.set_shape((1,))
t2 = tf.reshape(t2, [-1,1])
t3 = tf.reshape(t3, [-1,1])
return (t1,t2,t3)

BERT Sequence Classification output layer using TensorFlow

I'm very new to the Transformers library from huggingface. What I'm trying to do is make a regression of a value using BERT transformers. I have the input tokens ids and the attention masks correctly generated. Now, my problem is with the prediction output. I'm getting the logits for each token present in my input and not only one number representing my output. My configuration in the TFAutoModelForSequenceClassification object is for 1 label. What I'm doing wrong?
def bert_model():
config = BertConfig()
config.num_labels = 1
encoder = TFAutoModelForSequenceClassification.from_pretrained(BASE_MODEL, config=config)
input_ids = tf.keras.Input(shape=(300,), dtype=tf.int32, name='input_ids')
input_attention_mask = tf.keras.Input(shape=(300,), dtype=tf.int32, name='attention_mask')
output = encoder({'input_ids': input_ids, 'attention_mask': input_attention_mask})
# output = tf.keras.layers.Dense(1)
model = tf.keras.Model(inputs=[input_ids, input_attention_mask], outputs=output)
optimizer = tf.keras.optimizers.Adam(lr=1e-5)
model.compile(optimizer=optimizer,
loss=tf.losses.MeanSquaredError(),
metrics=[tf.metrics.RootMeanSquaredError()])
return model
The Summary of my model:
Model: "model_10"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
attention_mask (InputLayer) [(None, 300)] 0
__________________________________________________________________________________________________
input_ids (InputLayer) [(None, 300)] 0
__________________________________________________________________________________________________
tf_bert_for_sequence_classifica TFSequenceClassifier 109483009 attention_mask[0][0]
input_ids[0][0]
==================================================================================================
Total params: 109,483,009
Trainable params: 109,483,009
Non-trainable params: 0
__________________________________________________________________________________________________
The output from my prediction:
data_test = generate_data(df_test.iloc[[1]], labels=False)
test_pred = model.predict(data_test)
len(test_pred['logits'])
237
EDIT: I implement the Dense Layer after the network, but got the same vector of values. How to know the value of the vector in comparison with my final real labeled data.
output = encoder({'input_ids': input_ids, 'attention_mask': input_attention_mask})[0]
X = tf.keras.layers.BatchNormalization()(output)
X = tf.keras.layers.Dense(1, activation='linear')(X)
model = tf.keras.Model(inputs=[input_ids, input_attention_mask], outputs=X)
Some Hints for what I'm doing wrong?
I found the problem inside my generate_data function. The model is correct and it's working correctly. The problem was with the dataset.batch(BATCH_SIZE) function that I do not run again for the test data only for the training data.