how to log validation loss and accuracy using tfslim - tensorflow

Is there any way that I can log the validaton loss and accuracy to tensorboard when using tf-slim? When I was using keras, the following code can do this for me:
model.fit_generator(generator=train_gen(), validation_data=valid_gen(),...)
Then the model will evaluate the validation loss and accuracy after each epoch, which is very convenient. But how to achieve this using tf-slim? The following steps are using primitive tensorflow, which is not what I want:
with tf.Session() as sess:
for step in range(100000):
sess.run(train_op, feed_dict={X: X_train, y: y_train})
if n % batch_size * batches_per_epoch == 0:
print(sess.run(train_op, feed_dict={X: X_train, y: y_train}))
Right now, the steps to train a model using tf-slim is:
tf.contrib.slim.learning.train(
train_op=train_op,
logdir="logs",
number_of_steps=10000,
log_every_n_steps = 10,
save_summaries_secs=1
)
So how to evaluate validation loss and accuracy after each epoch with the above slim training procedure?
Thanks in advance!

The matter is still being discussed on TF Slim repo (issue #5987).
The framework allows you to easily create an evaluation script to run after / in parallel of your training (solution 1 below), but some people are pushing to be able to implement the "classic cycle of batch training + validation" (solution 2).
1. Use slim.evaluation in another script
TF Slim has evaluation methods e.g. slim.evaluation.evaluation_loop() you can use in another script (which can be run in parallel of your training) to periodically load the latest checkpoint of your model and perform evaluation. TF Slim page contains a good example how such a script may look: example.
2. Provide a custom train_step_fn to slim.learning.train()
A patchy solution the initiator of the discussion came up with makes use of a custom training step function you can provide to slim.learning.train():
"""
Snippet from code by Kevin Malakoff #kmalakoff
https://github.com/tensorflow/tensorflow/issues/5987#issue-192626454
"""
# ...
accuracy_validation = slim.metrics.accuracy(
tf.argmax(predictions_validation, 1),
tf.argmax(labels_validation, 1)) # ... or whatever metrics needed
def train_step_fn(session, *args, **kwargs):
total_loss, should_stop = train_step(session, *args, **kwargs)
if train_step_fn.step % FLAGS.validation_check == 0:
accuracy = session.run(train_step_fn.accuracy_validation)
print('Step %s - Loss: %.2f Accuracy: %.2f%%' % (str(train_step_fn.step).rjust(6, '0'), total_loss, accuracy * 100))
# ...
train_step_fn.step += 1
return [total_loss, should_stop]
train_step_fn.step = 0
train_step_fn.accuracy_validation = accuracy_validation
slim.learning.train(
train_op,
FLAGS.logs_dir,
train_step_fn=train_step_fn,
graph=graph,
number_of_steps=FLAGS.max_steps
)

Related

Why is tf.GradientTape.jacobian giving None?

I'm using the IRIS dataset, and am following this official tutorial: Custom training: walkthrough
In the Training loop, I am trying to gather the model outputs and weights in each epoch%50==0 in the lists m_outputs_mod50, gather_weights respectively:
# Keep results for plotting
train_loss_results = []
train_accuracy_results = []
m_outputs_mod50 = []
gather_weights = []
num_epochs = 201
for epoch in range(num_epochs):
epoch_loss_avg = tf.keras.metrics.Mean()
epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
# gather_kernel(model)
# Training loop - using batches of 32
for x, y in train_dataset:
# Optimize the model
loss_value, grads = grad(model, x, y)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
# Track progress
epoch_loss_avg.update_state(loss_value) # Add current batch loss
# Compare predicted label to actual label
# training=True is needed only if there are layers with different
# behavior during training versus inference (e.g. Dropout).
epoch_accuracy.update_state(y, model(x, training=True))
# End epoch
train_loss_results.append(epoch_loss_avg.result())
train_accuracy_results.append(epoch_accuracy.result())
# pred_hist.append(model.predict(x))
if epoch % 50 == 0:
m_outputs_mod50.append(model(x))
gather_weights.append(model.weights)
print("Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(epoch,
epoch_loss_avg.result(),
epoch_accuracy.result()))
Running the above and trying to even get the jacobian at epoch 0 (using m_outputs_mod50[0] and gather_weights[0]) using
with tf.GradientTape() as tape:
print(tape.jacobian(target = m_outputs_mod50[0], sources = gather_weights[0]))`
I get a list of None as the output.
Why?
You need to understand how the GradientTape operates. For that, you can follow the guide: Introduction to gradients and automatic differentiation. Here is an excerpt:
TensorFlow provides the tf.GradientTape API for automatic
differentiation; that is, computing the gradient of a computation with
respect to some inputs, usually tf.Variables. TensorFlow "records"
relevant operations executed inside the context of a tf.GradientTape
onto a "tape". TensorFlow then uses that tape to compute the gradients
of a "recorded" computation using reverse mode differentiation.
To compute a gradient (or a jacobian), the tape needs to record the operations that are executed in its context. Then, outside its context, once the forward pass has been executed, its possible to use the tape to compute the gradient/jacobian.
You could use something like that:
if epoch % 50 == 0:
with tf.GradientTape() as tape:
out = model(x)
jacobian = tape.jacobian(out, model.weights)

Codes worked fine one week ago, but keep getting error since yesterday: Fine-tuning Bert model training via PyTorch on Colab

I am new to Bert. Two weeks ago I successfully ran a fine-tuning Bert model on a nlp classification task though the outcome was not brilliant. Yesterday, however, when I tried to run the same code and data, an AttributeError was always there, which says: 'str' object has no attribute 'dim'. Please know everything is on Colab and via PyTorch Transformers.
What should I do to fix it?
Here is one thing I tried when I installed transformers but turned out it did not work:
instead of
!pip install transformers ,
I tried to use previous transformers version:
!pip install --target lib --upgrade transformers==3.5.0
Any feedback will be greatly appreciated!
Please see the code and the error message as below:
Code:
train definition
# function to train the model
def train():
model.train()
total_loss, total_accuracy = 0, 0
# empty list to save model predictions
total_preds=[]
# iterate over batches
for step,batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 200 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# update learning rate schedule
# scheduler.step()
# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# append the model predictions
total_preds.append(preds)
# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
# predictions are in the form of (no. of batches, size of batch, no. of classes).
# reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)
#returns the loss and predictions
return avg_loss, total_preds
training process
# set initial loss to infinite
best_valid_loss = float('inf')
# empty lists to store training and validation loss of each epoch
train_losses=[]
valid_losses=[]
#for each epoch
for epoch in range(epochs):
print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))
#train model
train_loss, _ = train()
#evaluate model
valid_loss, _ = evaluate()
#save the best model
if valid_loss < best_valid_loss:
best_valid_loss = valid_loss
torch.save(model.state_dict(), 'saved_weights.pt')
# append training and validation loss
train_losses.append(train_loss)
valid_losses.append(valid_loss)
print(f'\nTraining Loss: {train_loss:.3f}')
print(f'Validation Loss: {valid_loss:.3f}')
Error message:
Epoch 1 / 10
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-41-c5138ddf6b25> in <module>()
12
13 #train model
---> 14 train_loss, _ = train()
15
16 #evaluate model
5 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
1686 if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
1687 return handle_torch_function(linear, tens_ops, input, weight, bias=bias)
-> 1688 if input.dim() == 2 and bias is not None:
1689 # fused op is marginally faster
1690 ret = torch.addmm(bias, input, weight.t())
AttributeError: 'str' object has no attribute 'dim'
As far as I remember - there was an old transformer version in colab. Something like 2.11.0. Try:
!pip install transformers~=2.11.0
Change the version number until it works.

Keras evaluate the validation data before the epoch ends

I'm want to train my model with Keras. I'm using a huge dataset Where one training epoch has more than 30000 steps. My problem is that I don't want to wait for an epoch before checking the model improvement on the validation dataset. Is there any way to make Keras evaluate the validation data every 1000 steps of the training data? I think one option will be to use a callback but is there any built-in solution with Keras?
if train:
log('Start training')
history = model.fit(train_dataset,
steps_per_epoch=train_steps,
epochs=50,
validation_data=val_dataset,
validation_steps=val_steps,
callbacks=[
keras.callbacks.EarlyStopping(
monitor='loss',
patience=10,
restore_best_weights=True,
),
keras.callbacks.ModelCheckpoint(
filepath=f'model.h5',
monitor='val_loss',
save_best_only=True,
save_weights_only=True,
),
keras.callbacks.ReduceLROnPlateau(
monitor = "val_loss",
factor = 0.5,
patience = 3,
min_lr=0.001,
),
],
)
With the in-built callbacks, you cannot do that. What you need is to implement a custom callback.
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_train_batch_begin(self, batch, logs=None):
print('Training: batch {} begins at {}'.format(batch, datetime.datetime.now().time()))
def on_train_batch_end(self, batch, logs=None):
print('Training: batch {} ends at {}'.format(batch, datetime.datetime.now().time()))
def on_test_batch_begin(self, batch, logs=None):
print('Evaluating: batch {} begins at {}'.format(batch, datetime.datetime.now().time()))
def on_test_batch_end(self, batch, logs=None):
print('Evaluating: batch {} ends at {}'.format(batch, datetime.datetime.now().time()))
This is taken from the TensorFlow documentation.
You can override the on_train_batch_end() function and, since the batch parameter is an integer, you can verify batch % 100 == 0, then self.model.predict(val_data) etc. to your needs.
Please check my answer here: How to get other metrics in Tensorflow 2.0 (not only accuracy)? to have a good overview on how to override a custom callback function. Please note that in your case it is the on_train_batch_end() not on_epoch_end() that is important.

can't reproduce model.fit with GradientTape

I've been trying to investigate into the reason (e.g. by checking weights, gradients and activations during training) why SGD with a 0.001 learning rate worked in training while Adam fails to do so. (Please see my previous post [here](Why is my loss (binary cross entropy) converging on ~0.6? (Task: Natural Language Inference)"Why is my loss (binary cross entropy) converging on ~0.6? (Task: Natural Language Inference)"))
Note: I'm using the same model from my previous post here as well.
using tf.keras, i trained the neural network using model.fit():
model.compile(optimizer=SGD(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x=ds,
epoch=80,
validation_data=ds_val)
This resulted in a epoch loss graphed below, within the 1st epoch, it's reached a train loss of 0.46 and then ultimately resulting in a train_loss of 0.1241 and val_loss of 0.2849.
I would've used tf.keras.callbacks.Tensorboard(histogram_freq=1) to train the network with both SGD(0.001) and Adam to investigate but it's throwing an InvalidArgumentError on Variable:0, something I can't decipher. So I tried to write a custom training loop using GradientTape and plotting the values.
using tf.GradientTape(), i tried to reproduce the results using the exact same model and dataset, however the epoch loss is training incredibly slowly, reaching train loss of 0.676 after 15 epochs (see graph below), is there something wrong with my implementation? (code below)
#tf.function
def compute_grads(train_batch: Dict[str,tf.Tensor], target_batch: tf.Tensor,
loss_fn: Loss, model: tf.keras.Model):
with tf.GradientTape(persistent=False) as tape:
# forward pass
outputs = model(train_batch)
# calculate loss
loss = loss_fn(y_true=target_batch, y_pred=outputs)
# calculate gradients for each param
grads = tape.gradient(loss, model.trainable_variables)
return grads, loss
BATCH_SIZE = 8
EPOCHS = 15
bce = BinaryCrossentropy()
optimizer = SGD(learning_rate=0.001)
for epoch in tqdm(range(EPOCHS), desc='epoch'):
# - accumulators
epoch_loss = 0.0
for (i, (train_batch, target_dict)) in tqdm(enumerate(ds_train.shuffle(1024).batch(BATCH_SIZE)), desc='step'):
(grads, loss) = compute_grads(train_batch, target_dict['target'], bce, model)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
epoch_loss += loss
avg_epoch_loss = epoch_loss/(i+1)
tensorboard_scalar(writer, name='epoch_loss', data=avg_epoch_loss, step=epoch) # custom helper function
print("Epoch {}: epoch_loss = {}".format(epoch, avg_epoch_loss))
Thanks in advance!
Check if you have shuffle your dataset then the problem may came from the shuffling using the tf.Dataset method. It only shuffled through the dataset one bucket at the time. Using the Keras.Model.fit yielded better results because it probably adds another shuffling.
By adding a shuffling with numpy.random.shuffle it may improve the training performance. From this reference.
The example of applying it into generation of the dataset is:
numpy_data = np.hstack([index_rows.reshape(-1, 1), index_cols.reshape(-1, 1), index_data.reshape(-1, 1)])
np.random.shuffle(numpy_data)
indexes = np.array(numpy_data[:, :2], dtype=np.uint32)
labels = np.array(numpy_data[:, 2].reshape(-1, 1), dtype=np.float32)
train_ds = data.Dataset.from_tensor_slices(
(indexes, labels)
).shuffle(100000).batch(batch_size, drop_remainder=True)
If this not work you may need to use Dataset .repeat(epochs_number) and .shuffle(..., reshuffle_each_iteration=True):
train_ds = data.Dataset.from_tensor_slices(
(np.hstack([index_rows.reshape(-1, 1), index_cols.reshape(-1, 1)]), index_data)
).shuffle(100000, reshuffle_each_iteration=True
).batch(batch_size, drop_remainder=True
).repeat(epochs_number)
for ix, (examples, labels) in train_ds.enumerate():
train_step(examples, labels)
current_epoch = ix // (len(index_data) // batch_size)
This workaround is not beautiful nor natural, for the moment you can use this to shuffle each epoch. It's a known issue and will be fixed, in the future you can use for epoch in range(epochs_number) instead of .repeat()
The solution provided here may also help a lot. You might want to check it out.
If this is not the case, you may want to speed up the TF2.0 GradientTape. This can be the solution:
TensorFlow 2.0 introduces the concept of functions, which translate eager code into graph code.
The usage is pretty straight-forward. The only change needed is that all relevant functions (like compute_loss and apply_gradients) have to be annotated with #tf.function.

A Tensorflow training agnostic to Eager and Graph modes

I spend some of my time coding novel (I wish) RNN cells in Tensorflow.
To prototype, I use eager mode (easier to debug).
In order to train, I migrate the code to a graph (runs faster).
I am looking for a wrapper code/example that can run forward pass and training in a way that will be agnostic to the mode I run it - eager or graph, as much as possible. I have in mind a set of functions/classes, to which the particular neural network/optimizer/data can be inserted, and that these set of functions/classes could run in both modes with minimal changes between the two. In addition, it is of course good that it would be compatible with many types of NN/optimizers/data instances.
I am quite sure that many had this idea.
I wonder if something like this is feasible given the current eager/graph integration in TF.
Yes. I have been wondering the same. In the Tensorflow documentation you can see:
The same code written for eager execution will also build a graph during graph execution. Do this by simply running the same code in a new Python session where eager execution is not enabled.
But this is hard to achieve, mostly because working with graphs means dealing with placeholders, which can not be used in Eager mode. I tried to get rid off placeholders using object-oriented layers and the Dataset API. This is the closest I could get to totally compatible code:
m = 128 # num_examples
n = 5 # num features
epochs = 2
batch_size = 32
steps_per_epoch = m // 32
dataset = tf.data.Dataset.from_tensor_slices(
(tf.random_uniform([m, n], dtype=tf.float32),
tf.random_uniform([m, 1], dtype=tf.float32)))
dataset = dataset.repeat(epochs)
dataset = dataset.batch(batch_size)
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_dim=n),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1)
])
def train_eagerly(model, dataset):
optimizer = tf.train.AdamOptimizer()
iterator = dataset.make_one_shot_iterator()
print('Training graph...')
for epoch in range(epochs):
print('Epoch', epoch)
progbar = tf.keras.utils.Progbar(target=steps_per_epoch, stateful_metrics='loss')
for step in range(steps_per_epoch):
with tf.GradientTape() as tape:
features, labels = iterator.get_next()
predictions = model(features, training=True)
loss_value = tf.losses.mean_squared_error(labels, predictions)
grads = tape.gradient(loss_value, model.variables)
optimizer.apply_gradients(zip(grads, model.variables))
progbar.add(1, values=[('loss', loss_value.numpy())])
def train_graph(model, dataset):
optimizer = tf.train.AdamOptimizer()
iterator = dataset.make_initializable_iterator()
print('Training graph...')
with tf.Session() as sess:
sess.run(iterator.initializer)
sess.run(tf.global_variables_initializer())
for epoch in range(epochs):
print('Epoch', epoch)
progbar = tf.keras.utils.Progbar(target=steps_per_epoch, stateful_metrics='loss')
for step in range(steps_per_epoch):
with tf.GradientTape() as tape:
features, labels = sess.run(iterator.get_next())
predictions = model(features, training=True)
loss_value = tf.losses.mean_squared_error(labels, predictions)
grads = tape.gradient(loss_value, model.variables)
optimizer.apply_gradients(zip(grads, model.variables))
progbar.add(1, values=[('loss', loss_value.eval())])
As you can see, the main difference is that I use a one_shot_iterator during Eager training (of course, during graph training, I have to run operations within a session).
I tried to do the same using optimizer.minimize instead of applying the gradients myself, but I could not come up with a code that worked both for eager and graph modes.
Also, I'm sure this becomes much harder to do with not so simple models, like the one you are working with.