I have tried to adapt the instructions in this documentation to use minibatches for a training a GPR model, but nothing I have tried works. I cannot supply the batch iterator to the training_loss_closure method or use a batch iterator for model's data attribute. Is there a way to use minibatches with a non-variational model, like a GPR or SGPR, in gpflow?
You can construct the data tuple to be two tf.Variable objects (if you want to be able to have different-length minibatches, you can pass a shape=None or shape=(None, dim) argument). Something like
X = tf.Variable(np.zeros(0, input_dim), shape=tf.TensorShape(None, input_dim), dtype=gpflow.default_float())
Y = tf.Variable(np.zeros(0, output_dim), shape=tf.TensorShape(None, output_dim), dtype=gpflow.default_float())
model = gpflow.models.GPR((X, Y), kernel)
Then you can write a loss function that takes in the current batch, assigns it to the variables, and then returns the model loss, along the lines of
#tf.function
def loss(data_batch):
model.data[0].assign(data_batch[0])
model.data[1].assign(data_batch[1])
return model.training_loss()
Note: While this is numerically doable, for the non-SVGP models this might not give you the correct answer (the gradient you compute from a batch might not be an unbiased estimate of the full-batch gradient).
Related
I'm porting a bunch of code from tf.estimator.Estimator API to tf.keras using tf.data.Datasets and I'm hoping to stay as close to the provided compile/fit as possible. I'm being frustrated by compile's loss and metrics args.
Essentially, I'd like to use a loss function which uses multiple outputs and labels in a non-additive way, i.e. I want to provide
def custom_loss(all_labels, model_outputs):
"""
Args:
all_labels: all labels in the dataset, as a single tensor, tuple or dict
model_outputs: all outputs of model as a single tensor, tuple or dict
Returns:
single loss tensor to be averaged.
""""
...
I can't provide this to compile because as far as I'm aware it only supports weighted sums of per-output/label losses, and makes assumptions about the shape of each label based on the the corresponding model output. I can't create it separately and use model.add_loss because I never have explicit access to a labels tensor if I want to let model.fit handle dataset iteration. I've considered flattening/concatenating all outputs and labels together, but then I can't monitor multiple metrics.
I can write my own training loop using model.train_on_batch, but that forces me to replicate behaviour already implemented in fit such as dataset iteration, callbacks, validation, distribution strategies etc.
As an example, I'd like to replicate the following estimator.
def model_fn(features, labels, mode):
outputs = get_outputs(features) # dict
loss = custom_loss(labels, outputs)
train_op = tf.train.AdamOptimizer(1e-3).minimize(loss)
eval_metrics_op = {
'a_mean': tf.metrics.mean(outputs['a'])
}
return tf.estimator.EstimatorSpec(
loss=loss, train_op=train_op, mode=mode, eval_metric_ops=eval_metric_ops)
estimator = tf.estimator.Estimator(model_fn=model_fn)
estimator.train(dataset_fn)
Is it possible to know the weight matrix of a fully trained Neural Network with multiple hidden layers. More specifically, Can we check and store these values for every training iteration.
The tf.train.Saver class provides methods to save and restore models. The tf.saved_model.simple_save function is an easy way to build a saved model suitable for serving.
See Official Documentation Here.
On each iteration you are passing a train_op to sess.run asking it to compute that right? Something like this:
sess.run([train_op], feed_dict={...})
You could also ask it to return other values, such as the cost and accuracy tensors using something like this:
_, result_cost, result_accuracy = sess.run([train_op, cost, accuracy], feed_dict={...})
If that all makes sense, then accessing the weight matrix is no more complicated. You just need a reference to the weight matrix tensor (keep it around when you create it or look up the tensor by name):
weight_matrix, _ = sess.run([weight_tensor, train_op], feed_dict={...})
Notice that you can request the value of any tensor (variable, or operation) along with your training. You can also just call sess.run and ask for that particular value:
weight_matrix = sess.run([weight_tensor])
I'm trying to add class weights as a hyperparameter for my model, but to calculate weight I need to read input data, this happens inside input_fn which then passed to estimator.fit(). An output of input_fn are only features, labels which should have same shape num_examples * num_features. My questions - is there any way to propagate data from input_fn to model_fn's hyperparameter map? Or as alternative - maybe there is a wrapper for input_fn dataset which allows to oversample minority/undersample majority along with batching - in this case I would not need any parameter to propagate.
Both features and labels can be dictionary of tensors (not just one tensor). The tensors can be any shape you want though it's common to be num_examples * ...
If you don't use any of the predefined estimators, the easiest way would be to add another feature with what you need to compute the weights, compute the weights in the model then use them (multiply the loss or pass it as a parameter).
You also have access to hyper parameters inside the input_fn so you can compute the weight there and add it as a separate column.
If you use a canned estimator check the documentation. I see most of them support a weight_column_name. In this case just give it the name you used in the features dictionary for the weight values.
Alternatively, if all else fails you can sample the data the way you want before you feed it to tensorflow.
Let's say that I have some code such as:
out = tf.nn.softmax(x) # shape (batch,time,n)
labels = .... # reference labels of type (batch,time)->int
And then I define my loss as the Cross Entropy:
loss = -tf.log(tf.gather_nd(out, labels))
Will TensorFlow automatically replace the loss in the computation graph by this?
loss = sparse_softmax_cross_entropy_with_logits(x, labels)
What type of optimizations can I expect that TensorFlow will apply?
Follow-up question: If TensorFlow doesn't do this optimization, how can I do it manually? Consider that I have a modular framework where I get some out tensor which could possibly be the output of a softmax operation, and I want to calculate Cross Entropy, and I want to use sparse_softmax_cross_entropy_with_logits if possible. How could I accomplish this? Can I do something like the following?
if out.op == "softmax": # how to check this?
x = out.op.sources[0] # how to get this?
loss = sparse_softmax_cross_entropy_with_logits(x, labels)
else:
loss = -tf.log(tf.gather_nd(out, labels))
TensorFlow generally doesn't merge nodes together in the way you're hoping. This is because other code (e.g. fetching outputs when running) may depend on intermediate nodes like the softmax, so removing them behind the user's back would be confusing.
If you do want to do this optimization yourself as part of a higher-level framework, you can analyze the current graphdef, but there's no annotation in TF to tell you what the outputs are, since that can vary at runtime depending on how session.run is called.
Assuming that I want to update a pre-trained word-embedding matrix during training, is there a way to update only a subset of the word embedding matrix?
I have looked into the Tensorflow API page and found this:
# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)
# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars is a list of tuples (gradient, variable). Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1])) for gv in grads_and_vars]
# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)
However how do I apply that to the word-embedding matrix. Suppose I do:
word_emb = tf.Variable(0.2 * tf.random_uniform([syn0.shape[0],s['es']], minval=-1.0, maxval=1.0, dtype=tf.float32),name='word_emb',trainable=False)
gather_emb = tf.gather(word_emb,indices) #assuming that I pass some indices as placeholder through feed_dict
opt = tf.train.AdamOptimizer(1e-4)
grad = opt.compute_gradients(loss,gather_emb)
How do I then use opt.apply_gradients and tf.scatter_update to update the original embeddign matrix? (Also, tensorflow throws an error if the second argument of compute_gradient is not a tf.Variable)
TL;DR: The default implementation of opt.minimize(loss), TensorFlow will generate a sparse update for word_emb that modifies only the rows of word_emb that participated in the forward pass.
The gradient of the tf.gather(word_emb, indices) op with respect to word_emb is a tf.IndexedSlices object (see the implementation for more details). This object represents a sparse tensor that is zero everywhere, except for the rows selected by indices. A call to opt.minimize(loss) calls AdamOptimizer._apply_sparse(word_emb_grad, word_emb), which makes a call to tf.scatter_sub(word_emb, ...)* that updates only the rows of word_emb that were selected by indices.
If on the other hand you want to modify the tf.IndexedSlices that is returned by opt.compute_gradients(loss, word_emb), you can perform arbitrary TensorFlow operations on its indices and values properties, and create a new tf.IndexedSlices that can be passed to opt.apply_gradients([(word_emb, ...)]). For example, you could cap the gradients using MyCapper() (as in the example) using the following calls:
grad, = opt.compute_gradients(loss, word_emb)
train_op = opt.apply_gradients(
[tf.IndexedSlices(MyCapper(grad.values), grad.indices)])
Similarly, you could change the set of indices that will be modified by creating a new tf.IndexedSlices with a different indices.
* In general, if you want to update only part of a variable in TensorFlow, you can use the tf.scatter_update(), tf.scatter_add(), or tf.scatter_sub() operators, which respectively set, add to (+=) or subtract from (-=) the value previously stored in a variable.
Since you just want to select the elements to be updated (and not to change the gradients), you can do as follows.
Let indices_to_update be a boolean tensor that indicates the indices you wish to update, and entry_stop_gradients is defined in the link, Then:
gather_emb = entry_stop_gradients(gather_emb, indices_to_update)
(Source)
Actually, I was also struggling with such a problem. In my case, I needed to train a model with w2v embeddings, but not all of the tokens existed in embedding matrix. Thus for those tokens which were not in matrix, I made random initialization. Of course tokens for which embeddings were already trained, shouldn't be updated, thus I've came up with such a solution:
class PartialEmbeddingsUpdate(tf.keras.layers.Layer):
def __init__(self, len_vocab,
weights,
indices_to_update):
super(PartialEmbeddingsUpdate, self).__init__()
self.embeddings = tf.Variable(weights, name='embedding', dtype=tf.float32)
self.bool_mask = tf.equal(tf.expand_dims(tf.range(0,len_vocab),1), tf.expand_dims(indices_to_update,0))
self.bool_mask = tf.reduce_any(self.bool_mask,1)
self.bool_mask_not = tf.logical_not(self.bool_mask)
self.bool_mask_not = tf.expand_dims(tf.cast(self.bool_mask_not, dtype=self.embeddings.dtype),1)
self.bool_mask = tf.expand_dims(tf.cast(self.bool_mask, dtype=self.embeddings.dtype),1)
def call(self, input):
input = tf.cast(input, dtype=tf.int32)
embeddings = tf.stop_gradient(self.bool_mask_not * self.embeddings) + self.bool_mask * self.embeddings
return tf.gather(embeddings,input)
Where len_vocab - is your vocabulary length, weights - matrix of weights (some of which shouldn't be updated) and indices_to_update - indices of those tokens which should be updated. After that I applied this layer instead of tf.keras.layers.Embeddings. Hope it helps everyone, who encountered the same problem.