fine-tuning huggingface DistilBERT for multi-class classification on custom dataset yields weird output shape on prediction - tensorflow2.0

I'm trying to fine-tune huggingface's implementation of distilbert for multi-class classification (100 classes) on a custom dataset following the tutorial at https://huggingface.co/transformers/custom_datasets.html.
I'm doing so using Tensorflow, and fine-tuning in native tensorflow, that is, I use the following part of the tutorial for dataset creation:
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
dict(test_encodings),
test_labels
))
And this one for fine-tuning:
from transformers import TFDistilBertForSequenceClassification
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
Everything seems to go fine with fine-tuning, but when I try to predict on the test dataset using model.predict(test_dataset) as argument (with 2000 examples), the model seems to yield one prediction per token rather than one prediction per sequence...
That is, instead of getting an output of shape (1, 2000, 100), I get an output of shape (1, 1024000, 100), where 1024000 is the number of test examples (2000) * the sequence length (512).
Any hint on what's going on here?
(Sorry if this is naive, I'm very new to tensorflow).

I had exactly the same problem. I do not know why it's happening, as it should by the right code by looking at the tutorial.
But for me it worked to create numpy arrays out of the train_encodings and pass them directly to the fit method instead of creating the Dataset.
x1 = np.array(list(dict(train_encodings).values()))[0]
x2 = np.array(list(dict(train_encodings).values()))[1]
model.fit([x1,x2], train_labels, epochs=20)

Related

Training with Dataset API and numpy array yields completely different results

I have a CNN regression model and feature comes in (2000, 3000, 1) shape, where 2000 is total number of samples with each being a (3000, 1) 1D array. Batch size is 8, 20% of the full dataset is used for validation.
However, zip feature and label into tf.data.Dataset gives completely different scores from feeding numpy arrays directly in.
The tf.data.Dataset code looks like:
# Load features and labels
features = np.array(features) # shape is (2000, 3000, 1)
labels = np.array(labels) # shape is (2000,)
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shuffle(buffer_size=2000)
dataset = dataset.batch(8)
train_dataset = dataset.take(200)
val_dataset = dataset.skip(200)
# Training model
model.fit(train_dataset, validation_data=val_dataset,
batch_size=8, epochs=1000)
The numpy code looks like:
# Load features and labels
features = np.array(features) # exactly the same as previous
labels = np.array(labels) # exactly the same as previous
# Training model
model.fit(x=features, y=labels, shuffle=True, validation_split=0.2,
batch_size=8, epochs=1000)
Except for this, other code is exactly the same, for example
# Set global random seed
tf.random.set_seed(0)
np.random.seed(0)
# No preprocessing of feature at all
# Load model (exactly the same)
model = load_model()
# Compile model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.mean_absolute_error, ],
)
The former method via tf.data.Dataset API yields mean absolute error (MAE) around 10-3 on both training and validation set, which looks quite suspicious as the model doesn't have any drop-out or regularization to prevent overfitting. On the other hand, feeding numpy arrays right in gives training MAE around 0.1 and validation MAE around 1.
The low MAE of tf.data.Dataset method looks super suspicious however I just couldn't figure out anything wrong with the code. Also I could confirm the number of training batches is 200 and validation batches is 50, meaning I didn't use the training set for validation.
I tried to vary the global random seed or use some different shuffle seeds, which didn't change the results much. Training was done on NVIDIA V100 GPUs, and I tried tensorflow version 2.9, 2.10, 2.11 which didn't make much difference.
The problem lies in the default behaviour of "shuffle" method of tf.data.Dataset, more specificially the reshuffle_each_iteration argument which is by default True. Meaning if I implement the following code:
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
dataset = dataset.shuffle(buffer_size=2000)
dataset = dataset.batch(8)
train_dataset = dataset.take(200)
val_dataset = dataset.skip(200)
model.fit(train_dataset, validation_data=val_dataset, batch_size=8, epochs=1000)
The dataset would actually be shuffle after each epoch though it might not look so apparently so. As a result, the validation data would leak into training set (in fact there would be no distinguish between these two sets as the order is shuffled every epoch).
So make sure to set reshuffle_each_iteration to False if you would like to shuffle the dataset and then do train-val split.
UPDATE: TensorFlow confirms this issue and warning would be added in future docs.
PS: It's a hard lesson for me, as I have been using the model for analysing the results for several months (as a graduating MPhil student).

Training seq2seq model on Google Colab TPU with big dataset - Keras

I'm trying to train a sequence to sequence model for machine translation using Keras on Google Colab TPU.
I have a dataset which I can load in memory but I have to preprocess to it to feed it to the model. In particular I need to convert the target words to one hot vectors and with many examples I can't load the entire conversion in memory, so I need to make batches of data.
I'm using this function as a batch generator:
def generate_batch_bert(X_ids, X_masks, y, batch_size = 1024):
''' Generate a batch of data '''
while True:
for j in range(0, len(X_ids), batch_size):
# batch of encoder and decoder data
encoder_input_data_ids = X_ids[j:j+batch_size]
encoder_input_data_masks = X_masks[j:j+batch_size]
y_decoder = y[j:j+batch_size]
# decoder target and input for teacher forcing
decoder_input_data = y_decoder[:,:-1]
decoder_target_seq = y_decoder[:,1:]
# batch of decoder target data
decoder_target_data = to_categorical(decoder_target_seq, vocab_size_fr)
# keep only with the right amount of instances for training on TPU
if encoder_input_data_ids.shape[0] == batch_size:
yield([encoder_input_data_ids, encoder_input_data_masks, decoder_input_data], decoder_target_data)
The problem is that whenever I try to run the fit function as follows:
model.fit(x=generate_batch_bert(X_train_ids, X_train_masks, y_train, batch_size = batch_size),
steps_per_epoch = train_samples//batch_size,
epochs=epochs,
callbacks = callbacks,
validation_data = generate_batch_bert(X_val_ids, X_val_masks, y_val, batch_size = batch_size),
validation_steps = val_samples//batch_size)
I get the following error:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:445 make_tensor_proto
raise ValueError("None values not supported.")
ValueError: None values not supported.
Not sure what's wrong and how I can solve this problem.
EDIT
I tried loading less amount of data in memory so that the conversion to one hot encoding of the target words doesn't crash the kernel and it actually works. So there is obviously something wrong on how I generate batches.
It's hard to tell what's wrong since you don't provide your model
definition nor any sample data. However, I'm fairly certain that you're
running into the same
TensorFlow bug
that I recently got bitten by.
The workaround is to use the tensorflow.data API which works much
better with TPUs. Like this:
from tensorflow.data import Dataset
import tensorflow as tf
def map_fn(X_id, X_mask, y):
decoder_target_data = tf.one_hot(y[1:], vocab_size_fr)
return (X_id, X_mask, y[:-1]), decoder_target_data
...
X_ids = Dataset.from_tensor_slices(X_ids)
X_masks = Dataset.from_tensor_slices(X_masks)
y = Dataset.from_tensor_slices(y)
ds = Dataset.zip((X_ids, X_masks, y)).map(map_fn).batch(1024)
model.fit(x = ds, ...)

Problem with shapes of experimental Tensorflow dataset

I am trying to store numpy arrays in a Tensorflow dataset. The model fits correctly when using the numpy arrays as train and test data but not when I store the numpy arrays in a single Tensorflow dataset. The problem is with the dimensions of the dataset. Something is wrong even though shapes seem OK at first sight.
After trying multiple things to reshape my Tensorflow dataset, I am still unable to get it working. My code is the following:
train_x.shape
Out[54]: (7200, 40)
train_y.shape
Out[55]: (7200,)
dataset = tf.data.Dataset.from_tensor_slices((x,y))
print(dataset)
Out[56]: <TensorSliceDataset shapes: ((40,), ()), types: (tf.int32, tf.int32)>
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
history = model.fit(dataset, epochs=EPOCHS, batch_size=256)
sparse_softmax_cross_entropy_with_logits
logits.get_shape()))
ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (40, 1351)).
I have seen this answer but I am sure it doesn't apply here. I must use sparse_categorical_crossentropy. I am inspiring myself from this example where I want to store the train and test data in a Tensorflow dataset. I also want to store the arrays in a dataset as I will have to use it later.
You can't use batch_size with model.fit() when using a tf.data.Dataset. Instead use tf.data.Dataset.batch(). You'll have to change your code as follows for it to work.
import numpy as np
import tensorflow as tf
# Some toy data
train_x = np.random.normal(size=(7200, 40))
train_y = np.random.choice([0,1,2], size=(7200))
dataset = tf.data.Dataset.from_tensor_slices((train_x,train_y))
dataset = dataset.batch(256)
#### - Define your model here - ####
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
history = model.fit(dataset, epochs=EPOCHS)

Shouldn't same neural network weights produce same results?

So I am working with different deep learning frameworks as part of my research and have observed something weird (at least I cannot explain the cause of it).
I trained a fairly simple MLP model (on mnist dataset) in Tensorflow, extracted trained weights, created the same model architecture in PyTorch and applied the trained weights to PyTorch model. Now my expectation is to get same test accuracy from both Tensorflow and PyTorch models but this isn't the case. I get different results.
So my question is: If a model is trained to some optimal value, shouldn't the trained weights produce same results every time testing is done on the same dataset (regardless of the framework used)?
PyTorch Model:
class Net(nn.Module):
def __init__(self) -> None:
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 24)
self.fc2 = nn.Linear(24, 10)
def forward(self, x: Tensor) -> Tensor:
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
Tensorflow Model:
def build_model() -> tf.keras.Model:
# Build model layers
model = models.Sequential()
# Flatten Layer
model.add(layers.Flatten(input_shape=(28,28)))
# Fully connected layer
model.add(layers.Dense(24, activation='relu'))
model.add(layers.Dense(10))
# compile the model
model.compile(
optimizer='sgd',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy']
)
# return newly built model
return model
To extract weights from Tensorflow model and apply them to Pytorch model I use following functions:
Extract Weights:
def get_weights(model):
# fetch latest weights
weights = model.get_weights()
# transpose weights
t_weights = []
for w in weights:
t_weights.append(np.transpose(w))
# return
return t_weights
Apply Weights:
def set_weights(model, weights):
"""Set model weights from a list of NumPy ndarrays."""
state_dict = OrderedDict(
{k: torch.Tensor(v) for k, v in zip(model.state_dict().keys(), weights)}
)
self.load_state_dict(state_dict, strict=True)
Providing solution in answer section for the benefit of community. From comments
If you are using the same weights in the same manner then results
should be the same, though float rounding error should also be
accounted. Also it doesn't matter if model is trained at all. You can
think of your model architecture as a chain of matrix multiplications
with element-wise nonlinearities in between. How big is the
difference? Are you comparing model outputs, our metrics computed over
dataset? As a suggestion, intialize model with some random values in
Keras, do a forward pass for a single batch (paraphrased from jdehesa and Taras Sereda)

Making predictions with a TensorFlow model

I followed the given mnist tutorials and was able to train a model and evaluate its accuracy. However, the tutorials don't show how to make predictions given a model. I'm not interested in accuracy, I just want to use the model to predict a new example and in the output see all the results (labels), each with its assigned score (sorted or not).
In the "Deep MNIST for Experts" example, see this line:
We can now implement our regression model. It only takes one line! We
multiply the vectorized input images x by the weight matrix W, add the
bias b, and compute the softmax probabilities that are assigned to
each class.
y = tf.nn.softmax(tf.matmul(x,W) + b)
Just pull on node y and you'll have what you want.
feed_dict = {x: [your_image]}
classification = tf.run(y, feed_dict)
print classification
This applies to just about any model you create - you'll have computed the prediction probabilities as one of the last steps before computing the loss.
As #dga suggested, you need to run your new instance of the data though your already predicted model.
Here is an example:
Assume you went though the first tutorial and calculated the accuracy of your model (the model is this: y = tf.nn.softmax(tf.matmul(x, W) + b)). Now you grab your model and apply the new data point to it. In the following code I calculate the vector, getting the position of the maximum value. Show the image and print that maximum position.
from matplotlib import pyplot as plt
from random import randint
num = randint(0, mnist.test.images.shape[0])
img = mnist.test.images[num]
classification = sess.run(tf.argmax(y, 1), feed_dict={x: [img]})
plt.imshow(img.reshape(28, 28), cmap=plt.cm.binary)
plt.show()
print 'NN predicted', classification[0]
2.0 Compatible Answer: Suppose you have built a Keras Model as shown below:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Then Train and Evaluate the Model using the below code:
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
After that, if you want to predict the class of a particular image, you can do it using the below code:
predictions_single = model.predict(img)
If you want to predict the classes of a set of Images, you can use the below code:
predictions = model.predict(new_images)
where new_images is an Array of Images.
For more information, refer this Tensorflow Tutorial.
The question is specifically about the Google MNIST tutorial, which defines a predictor but doesn't apply it. Using guidance from Jonathan Hui's TensorFlow Estimator blog post, here is code which exactly fits the Google tutorial and does predictions:
from matplotlib import pyplot as plt
images = mnist.test.images[0:10]
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x":images},
num_epochs=1,
shuffle=False)
mnist_classifier.predict(input_fn=predict_input_fn)
for image,p in zip(images,mnist_classifier.predict(input_fn=predict_input_fn)):
print(np.argmax(p['probabilities']))
plt.imshow(image.reshape(28, 28), cmap=plt.cm.binary)
plt.show()