"Quantize" Tensorflow Graph to float16 - tensorflow

How do you convert a Tensorflow graph from using float32 to float16? Currently there are graph optimizations for quantization and conversion to eight bit ints.
Trying to load float32 weights into a float16 graph fails with:
DataLossError (see above for traceback): Invalid size in bundle entry: key model/conv5_1/biases; stored size 1536; expected size 768
[[Node: save/RestoreV2_16 = RestoreV2[dtypes=[DT_HALF], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_16/tensor_names, save/RestoreV2_16/shape_and_slices)]]
[[Node: save/RestoreV2_3/_39 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_107_save/RestoreV2_3", tensor_type=DT_HALF, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

I think my solution is definitely not the best and not the one which is the most straight forward, but as nobody else posted anything:
What I did was training the network with full precision and saved them in a checkpoint. Then I built a copy of the network setting all variables desired to a dtype of tf.float16 and removing all the training nodes. Finally, I loaded and casted the variables the following way:
previous_variables = [
var_name for var_name, _
in tf.contrib.framework.list_variables('path-to-checkpoint-file')]
#print(previous_variables)
sess.run(tf.global_variables_initializer())
restore_map = {}
for variable in tf.global_variables():
if variable.op.name in previous_variables:
var = tf.contrib.framework.load_variable(
'path-to-checkpoint-file', variable.op.name)
if(var.dtype == np.float32):
tf.add_to_collection('assignOps', variable.assign(
tf.cast(var, tf.float16)))
else:
tf.add_to_collection('assignOps', variable.assign(var))
sess.run(tf.get_collection('assignOps'))
This obviously has issues if there are tensors of float32 that you don't want to convert, which I luckily don't have as I want to convert all my nodes to float16 precision. In case you have those you could further filter with other if statements. I hope this answers your question.

I had this issue but I was loading a sub-graph which contained some variables that needed to be loaded or converted and some that not.
Based on #Jendrik, here is a function that returns the assign operation, given a dictionary that maps the saved variables to the new graph:
def assign_and_convert_halfPrecision(restore_dictinary, CHECKPOINT_PATH):
# Iterate over the dictionary containing the variables to load
for variable_name_old, varible_new in restore_dictinary.items():
# Load the variable from the checkpoint
var = tf.contrib.framework.load_variable(CHECKPOINT_PATH, variable_name_old)
# Assign to new graph
if(var.dtype == np.float32) and (varible_new.dtype == np.float16):
# If the variable is float16 in the new graph, we cast it
tf.add_to_collection('assignOps', varible_new.assign(tf.cast(var, tf.float16)))
else:
# If the variable in the old graph is float16 or the new variable is float32,
# we load it directly
tf.add_to_collection('assignOps', varible_new.assign(var))
# Return the operation
return tf.get_collection('assignOps')
To use it, just do:
# Create a trivial dictionary (all custom loading can be added here, like change of scope names)
restore_dictionary = dict()
for a in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=''):
restore_dictionary[a.name[:-2]] = a
# Create the assignment and conversion op
assign_operation = assign_and_convert_halfPrecision(restore_dictionary, CHECKPOINT_PATH)
# Load
sess.run(assign_operation)
The loading can be controlled by modifying the dictionary, avoiding variables that should not be loaded or changing the scope of the variables to load.

Related

How to restore a smaller variable to larger variable in TensorFlow?

I made a previously trained model that classifies sentences.
I want to make some variables bigger, but want to restore the old variables which is smaller and the rest part of the variables to be initialized new.
Here is an image of what I want:
When I tried this, this error occurred.
InvalidArgumentError (see above for traceback): Assign requires shapes
of both tensors to match. lhs shape= [13173,32] rhs shape= [13113,32]
[[Node: save_1/Assign = Assign[T=DT_FLOAT,
_class=["loc:#embedding/embedding_W"], use_locking=true, validate_shape=true,
_device="/job:localhost/replica:0/task:0/gpu:0"](embedding/embedding_W,
save_1/RestoreV2/_5)]]
You cannot restore smaller variables to larger variables directly for several reasons, for example TensorFlow does not know what part of the larger variable the smaller variable should be embedded in.
The way to do this properly is instead to load the old variable, create the new variable and then assign the old variable to a subset of the new variables.
In code, that would be:
# the old variable, as it was before
old_variable = tf.Variable(..., name='old_name')
# variable with new shape, the one you want to use now
new_variable = tf.Variable(...)
# Initialize the variables and restore the checkpoint
sess.run(tf.global_variables_initializer())
saver.restore(sess, 'old_checkpoint')
# assign op for the places you want to fill in the old variable
idx = tuple(slice(0, osi) for osi in old_variable.shape)
asssign_op = new_variable[idx].assign(old_variable)
sess.run(asssign_op)

How to get the output of a maxpool layer in a pre-trained model in TensorFlow?

I have a model that I trained. I wish to extract from the model the output of an intermediate maxpool layer.
I tried the following
saver = tf.train.import_meta_graph(BASE_DIR + LOG_DIR + '/model.ckpt.meta')
saver.restore(sess,tf.train.latest_checkpoint(BASE_DIR + LOG_DIR))
sess.run("maxpool/maxpool",feed_dict=feed_dict)
here, feed_dict contains the placeholders and their contents for this run in a dictionary.
I keep getting the following error
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder_1_1' with dtype float and shape...
what can be the cause of this? I generated all of the placeholders and input them in the feed dictionary.
I ran in to a similar issue and it was frustrating. What got me around it was filling out the name field for every variable and operation that I wanted to call later. You also may need to add your maxpool/maxpool op to a collection with tf.add_to_collection('name_for_maxpool_op', maxpool_op_handle). You can then restore the ops and named tensors with:
# Restore from metagraph.
saver = tf.train.import_meta_graph(...)
sess = tf.Session()
saver = restore(sess, ...)
graph = sess.graph
# Restore your ops and tensors.
maxpool_op = tf.get_collection('name_for_maxpool_op')[0] # returns a list, you want the first element
a_tensor = graph.get_tensor_by_name('tensor_name:0') # need the :0 added to your name
Then you would build your feed_dict using your restored tensors. More information can be found here. Also, as you mentioned in your comment, you need to pass the op itself to sess.run, not it's name:
sess.run(maxpool_op, feed_dict=feed_dict)
You can access your tensors and ops from a restored metagraph even if you did not name them (to avoid retraining the model with new fancy tensor names, for instance), but it can be a bit of a pain. The names given to the tensors automatically are not always the most transparent. You can list the names of all variables in your graph with:
print([v.name for v in tf.all_variables()])
You can hopefully find the name that you are looking for there and then restore that tensor using graph.get_tensor_by_name as described above.

TF LSTM: Save State from training session for prediction session later

I am trying to save the latest LSTM State from training to be reused during the prediction stage later. The problem I am encountering is that in the TF LSTM model the State is passed around from one training iteration to next via a combination of a placeholder and a numpy array -- neither of which seems to be included in the Graph by default when the session is saved.
To work around this, I am creating a dedicated TF variable to hold the latest version of the state so as to add it to the Session graph, like so:
# latest State from last training iteration:
_, y, ostate, smm = sess.run([train_step, Y, H, summaries], feed_dict=feed_dict)
# now add to TF variable:
savedState = tf.Variable(ostate, dtype=tf.float32, name='savedState')
tf.variables_initializer([savedState]).run()
save_path = saver.save(sess, pathModel + '/my_model.ckpt')
This seems to add the savedState variable to the saved session graph well, and is easily recoverable later with the rest of the Session.
The problem though, is that the only way I have managed to actually use that variable later in the restored Session, is that if I initialize all variables in the session AFTER I recover it (which seems to reset all trained variables, including the weights/biases/etc.!). If I initialize variables first and THEN recover the session (which works fine in terms of preserving the trained varialbes), then I am getting an error that I'm trying to access an uninitialized variable.
I know there is a way to initialize a specific individual varialbe (which i am using while saving it originally) but the problem is that when we recover them, we refer to them by name as strings, we don't just pass the variable itself?!
# This produces an error 'trying to use an uninitialized varialbe
gInit = tf.global_variables_initializer().run()
new_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta')
new_saver.restore(sess, pathModel + 'my_model.ckpt')
fullState = sess.run('savedState:0')
What is the right way to get this done? As a workaround, I am currently saving the State to CSV just as a numpy array and then recover it the same way. It works OK, but clearly not the cleanest solution given that every other aspect of saving/restoring the TF session works perfectly.
Any suggestions appreciated!
**EDIT:
Here's the code that works well, as described in the accepted answer below:
# make sure to define the State variable before the Saver variable:
savedState = tf.get_variable('savedState', shape=[BATCHSIZE, CELL_SIZE * LAYERS])
saver = tf.train.Saver(max_to_keep=1)
# last training iteration:
_, y, ostate, smm = sess.run([train_step, Y, H, summaries], feed_dict=feed_dict)
# now save the State and the whole model:
assignOp = tf.assign(savedState, ostate)
sess.run(assignOp)
save_path = saver.save(sess, pathModel + '/my_model.ckpt')
# later on, in some other program, recover the model and the State:
# make sure to initialize all variables BEFORE recovering the model!
gInit = tf.global_variables_initializer().run()
local_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta')
local_saver.restore(sess, pathModel + 'my_model.ckpt')
# recover the state from training and get its last dimension
fullState = sess.run('savedState:0')
h = fullState[-1]
h = np.reshape(h, [1, -1])
I haven't tested yet whether this approach unintentionally initializes any other variables in the saved Session, but don't see why it should, since we only run the specific one.
The issue is that creating a new tf.Variable after the Saver was constructed means that the Saver has no knowledge of the new variable. It still gets saved in the metagraph, but not saved in the checkpoint:
import tensorflow as tf
with tf.Graph().as_default():
var_a = tf.get_variable("a", shape=[])
saver = tf.train.Saver()
var_b = tf.get_variable("b", shape=[])
print(saver._var_list) # [<tf.Variable 'a:0' shape=() dtype=float32_ref>]
initializer = tf.global_variables_initializer()
with tf.Session() as session:
session.run([initializer])
saver.save(session, "/tmp/model", global_step=0)
with tf.Graph().as_default():
new_saver = tf.train.import_meta_graph("/tmp/model-0.meta")
print(saver._var_list) # [<tf.Variable 'a:0' shape=() dtype=float32_ref>]
with tf.Session() as session:
new_saver.restore(session, "/tmp/model-0") # Only var_a gets restored!
I've annotated the quick reproduction of your issue above with the variables that the Saver knows about.
Now, the solution is relatively easy. I would suggest creating the Variable before the Saver, then using tf.assign to update its value (make sure you run the op returned by tf.assign). The assigned value will be saved in checkpoints and restored just like other variables.
This could be handled better by the Saver as a special case when None is passed to its var_list constructor argument (i.e. it could pick up new variables automatically). Feel free to open a feature request on Github for this.

What's the difference between tf.placeholder and tf.Variable?

I'm a newbie to TensorFlow. I'm confused about the difference between tf.placeholder and tf.Variable. In my view, tf.placeholder is used for input data, and tf.Variable is used to store the state of data. This is all what I know.
Could someone explain to me more in detail about their differences? In particular, when to use tf.Variable and when to use tf.placeholder?
In short, you use tf.Variable for trainable variables such as weights (W) and biases (B) for your model.
weights = tf.Variable(
tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))), name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]), name='biases')
tf.placeholder is used to feed actual training examples.
images_placeholder = tf.placeholder(tf.float32, shape=(batch_size, IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
This is how you feed the training examples during the training:
for step in xrange(FLAGS.max_steps):
feed_dict = {
images_placeholder: images_feed,
labels_placeholder: labels_feed,
}
_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
Your tf.variables will be trained (modified) as the result of this training.
See more at https://www.tensorflow.org/versions/r0.7/tutorials/mnist/tf/index.html. (Examples are taken from the web page.)
The difference is that with tf.Variable you have to provide an initial value when you declare it. With tf.placeholder you don't have to provide an initial value and you can specify it at run time with the feed_dict argument inside Session.run
Since Tensor computations compose of graphs then it's better to interpret the two in terms of graphs.
Take for example the simple linear regression
WX+B=Y
where W and B stand for the weights and bias and X for the observations' inputs and Y for the observations' outputs.
Obviously X and Y are of the same nature (manifest variables) which differ from that of W and B (latent variables). X and Y are values of the samples (observations) and hence need a place to be filled, while W and B are the weights and bias, Variables (the previous values affect the latter) in the graph which should be trained using different X and Y pairs. We place different samples to the Placeholders to train the Variables.
We only need to save or restore the Variables (at checkpoints) to save or rebuild the graph with the code.
Placeholders are mostly holders for the different datasets (for example training data or test data). However, Variables are trained in the training process for the specific tasks, i.e., to predict the outcome of the input or map the inputs to the desired labels. They remain the same until you retrain or fine-tune the model using different or the same samples to fill into the Placeholders often through the dict. For instance:
session.run(a_graph, dict = {a_placeholder_name : sample_values})
Placeholders are also passed as parameters to set models.
If you change placeholders (add, delete, change the shape etc) of a model in the middle of training, you can still reload the checkpoint without any other modifications. But if the variables of a saved model are changed, you should adjust the checkpoint accordingly to reload it and continue the training (all variables defined in the graph should be available in the checkpoint).
To sum up, if the values are from the samples (observations you already have) you safely make a placeholder to hold them, while if you need a parameter to be trained harness a Variable (simply put, set the Variables for the values you want to get using TF automatically).
In some interesting models, like a style transfer model, the input pixes are going to be optimized and the normally-called model variables are fixed, then we should make the input (usually initialized randomly) as a variable as implemented in that link.
For more information please infer to this simple and illustrating doc.
TL;DR
Variables
For parameters to learn
Values can be derived from training
Initial values are required (often random)
Placeholders
Allocated storage for data (such as for image pixel data during a feed)
Initial values are not required (but can be set, see tf.placeholder_with_default)
The most obvious difference between the tf.Variable and the tf.placeholder is that
you use variables to hold and update parameters. Variables are
in-memory buffers containing tensors. They must be explicitly
initialized and can be saved to disk during and after training. You
can later restore saved values to exercise or analyze the model.
Initialization of the variables is done with sess.run(tf.global_variables_initializer()). Also while creating a variable, you need to pass a Tensor as its initial value to the Variable() constructor and when you create a variable you always know its shape.
On the other hand, you can't update the placeholder. They also should not be initialized, but because they are a promise to have a tensor, you need to feed the value into them sess.run(<op>, {a: <some_val>}). And at last, in comparison to a variable, placeholder might not know the shape. You can either provide parts of the dimensions or provide nothing at all.
There other differences:
the values inside the variable can be updated during optimizations
variables can be shared, and can be non-trainable
the values inside the variable can be stored after training
when the variable is created, 3 ops are added to a graph (variable op, initializer op, ops for the initial value)
placeholder is a function, Variable is a class (hence an uppercase)
when you use TF in a distributed environment, variables are stored in a special place (parameter server) and are shared between the workers.
Interesting part is that not only placeholders can be fed. You can feed the value to a Variable and even to a constant.
Adding to other's answers, they also explain it very well in this MNIST tutorial on Tensoflow website:
We describe these interacting operations by manipulating symbolic
variables. Let's create one:
x = tf.placeholder(tf.float32, [None, 784]),
x isn't a specific value. It's a placeholder, a value that we'll input when we ask TensorFlow to
run a computation. We want to be able to input any number of MNIST
images, each flattened into a 784-dimensional vector. We represent
this as a 2-D tensor of floating-point numbers, with a shape [None,
784]. (Here None means that a dimension can be of any length.)
We also need the weights and biases for our model. We could imagine
treating these like additional inputs, but TensorFlow has an even
better way to handle it: Variable. A Variable is a modifiable tensor
that lives in TensorFlow's graph of interacting operations. It can be
used and even modified by the computation. For machine learning
applications, one generally has the model parameters be Variables.
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
We create these Variables by giving tf.Variable the initial value of
the Variable: in this case, we initialize both W and b as tensors full
of zeros. Since we are going to learn W and b, it doesn't matter very
much what they initially are.
Tensorflow uses three types of containers to store/execute the process
Constants :Constants holds the typical data.
variables: Data values will be changed, with respective the functions such as cost_function..
placeholders: Training/Testing data will be passed in to the graph.
Example snippet:
import numpy as np
import tensorflow as tf
### Model parameters ###
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
### Model input and output ###
x = tf.placeholder(tf.float32)
linear_model = W * x + b
y = tf.placeholder(tf.float32)
### loss ###
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
### optimizer ###
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
### training data ###
x_train = [1,2,3,4]
y_train = [0,-1,-2,-3]
### training loop ###
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
sess.run(train, {x:x_train, y:y_train})
As the name say placeholder is a promise to provide a value later i.e.
Variable are simply the training parameters (W(matrix), b(bias) same as the normal variables you use in your day to day programming, which the trainer updates/modify on each run/step.
While placeholder doesn't require any initial value, that when you created x and y TF doesn't allocated any memory, instead later when you feed the placeholders in the sess.run() using feed_dict, TensorFlow will allocate the appropriately sized memory for them (x and y) - this unconstrained-ness allows us to feed any size and shape of data.
In nutshell:
Variable - is a parameter you want trainer (i.e. GradientDescentOptimizer) to update after each step.
Placeholder demo -
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
adder_node = a + b # + provides a shortcut for tf.add(a, b)
Execution:
print(sess.run(adder_node, {a: 3, b:4.5}))
print(sess.run(adder_node, {a: [1,3], b: [2, 4]}))
resulting in the output
7.5
[ 3. 7.]
In the first case 3 and 4.5 will be passed to a and b respectively, and then to adder_node ouputting 7. In second case there's a feed list, first step 1 and 2 will be added, next 3 and 4 (a and b).
Relevant reads:
tf.placeholder doc.
tf.Variable doc.
Variable VS placeholder.
Variables
A TensorFlow variable is the best way to represent shared, persistent state manipulated by your program. Variables are manipulated via the tf.Variable class. Internally, a tf.Variable stores a persistent tensor. Specific operations allow you to read and modify the values of this tensor. These modifications are visible across multiple tf.Sessions, so multiple workers can see the same values for a tf.Variable. Variables must be initialized before using.
Example:
x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2
This creates a computation graph. The variables (x and y) can be initialized and the function (f) evaluated in a tensorflow session as follows:
with tf.Session() as sess:
x.initializer.run()
y.initializer.run()
result = f.eval()
print(result)
42
Placeholders
A placeholder is a node (same as a variable) whose value can be initialized in the future. These nodes basically output the value assigned to them during runtime. A placeholder node can be assigned using the tf.placeholder() class to which you can provide arguments such as type of the variable and/or its shape. Placeholders are extensively used for representing the training dataset in a machine learning model as the training dataset keeps changing.
Example:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
Note: 'None' for a dimension means 'any size'.
with tf.Session as sess:
B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})
print(B_val_1)
[[6. 7. 8.]]
print(B_val_2)
[[9. 10. 11.]
[12. 13. 14.]]
References:
https://www.tensorflow.org/guide/variables
https://www.tensorflow.org/api_docs/python/tf/placeholder
O'Reilly: Hands-On Machine Learning with Scikit-Learn & Tensorflow
Think of Variable in tensorflow as a normal variables which we use in programming languages. We initialize variables, we can modify it later as well. Whereas placeholder doesn’t require initial value. Placeholder simply allocates block of memory for future use. Later, we can use feed_dict to feed the data into placeholder. By default, placeholder has an unconstrained shape, which allows you to feed tensors of different shapes in a session. You can make constrained shape by passing optional argument -shape, as I have done below.
x = tf.placeholder(tf.float32,(3,4))
y = x + 2
sess = tf.Session()
print(sess.run(y)) # will cause an error
s = np.random.rand(3,4)
print(sess.run(y, feed_dict={x:s}))
While doing Machine Learning task, most of the time we are unaware of number of rows but (let’s assume) we do know the number of features or columns. In that case, we can use None.
x = tf.placeholder(tf.float32, shape=(None,4))
Now, at run time we can feed any matrix with 4 columns and any number of rows.
Also, Placeholders are used for input data ( they are kind of variables which we use to feed our model), where as Variables are parameters such as weights that we train over time.
Placeholder :
A placeholder is simply a variable that we will assign data to at a later date. It allows us to create our operations and build our computation graph, without needing the data. In TensorFlow terminology, we then feed data into the graph through these placeholders.
Initial values are not required but can have default values with tf.placeholder_with_default)
We have to provide value at runtime like :
a = tf.placeholder(tf.int16) // initialize placeholder value
b = tf.placeholder(tf.int16) // initialize placeholder value
use it using session like :
sess.run(add, feed_dict={a: 2, b: 3}) // this value we have to assign at runtime
Variable :
A TensorFlow variable is the best way to represent shared,
persistent state manipulated by your program.
Variables are manipulated via the tf.Variable class. A tf.Variable
represents a tensor whose value can be changed by running ops on it.
Example : tf.Variable("Welcome to tensorflow!!!")
Tensorflow 2.0 Compatible Answer: The concept of Placeholders, tf.placeholder will not be available in Tensorflow 2.x (>= 2.0) by default, as the Default Execution Mode is Eager Execution.
However, we can use them if used in Graph Mode (Disable Eager Execution).
Equivalent command for TF Placeholder in version 2.x is tf.compat.v1.placeholder.
Equivalent Command for TF Variable in version 2.x is tf.Variable and if you want to migrate the code from 1.x to 2.x, the equivalent command is
tf.compat.v2.Variable.
Please refer this Tensorflow Page for more information about Tensorflow Version 2.0.
Please refer the Migration Guide for more information about migration from versions 1.x to 2.x.
Think of a computation graph. In such graph, we need an input node to pass our data to the graph, those nodes should be defined as Placeholder in tensorflow.
Do not think as a general program in Python. You can write a Python program and do all those stuff that guys explained in other answers just by Variables, but for computation graphs in tensorflow, to feed your data to the graph, you need to define those nods as Placeholders.
For TF V1:
Constant is with initial value and it won't change in the computation;
Variable is with initial value and it can change in the computation; (so good for parameters)
Placeholder is without initial value and it won't change in the computation. (so good for inputs like prediction instances)
For TF V2, same but they try to hide Placeholder (graph mode is not preferred).
In TensorFlow, a variable is just another tensor (like tf.constant or tf.placeholder). It just so happens that variables can be modified by the computation. tf.placeholder is used for inputs that will be provided externally to the computation at run-time (e.g. training data). tf.Variable is used for inputs that are part of the computation and are going to be modified by the computation (e.g. weights of a neural network).

Update only part of the word embedding matrix in Tensorflow

Assuming that I want to update a pre-trained word-embedding matrix during training, is there a way to update only a subset of the word embedding matrix?
I have looked into the Tensorflow API page and found this:
# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)
# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars is a list of tuples (gradient, variable). Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1])) for gv in grads_and_vars]
# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)
However how do I apply that to the word-embedding matrix. Suppose I do:
word_emb = tf.Variable(0.2 * tf.random_uniform([syn0.shape[0],s['es']], minval=-1.0, maxval=1.0, dtype=tf.float32),name='word_emb',trainable=False)
gather_emb = tf.gather(word_emb,indices) #assuming that I pass some indices as placeholder through feed_dict
opt = tf.train.AdamOptimizer(1e-4)
grad = opt.compute_gradients(loss,gather_emb)
How do I then use opt.apply_gradients and tf.scatter_update to update the original embeddign matrix? (Also, tensorflow throws an error if the second argument of compute_gradient is not a tf.Variable)
TL;DR: The default implementation of opt.minimize(loss), TensorFlow will generate a sparse update for word_emb that modifies only the rows of word_emb that participated in the forward pass.
The gradient of the tf.gather(word_emb, indices) op with respect to word_emb is a tf.IndexedSlices object (see the implementation for more details). This object represents a sparse tensor that is zero everywhere, except for the rows selected by indices. A call to opt.minimize(loss) calls AdamOptimizer._apply_sparse(word_emb_grad, word_emb), which makes a call to tf.scatter_sub(word_emb, ...)* that updates only the rows of word_emb that were selected by indices.
If on the other hand you want to modify the tf.IndexedSlices that is returned by opt.compute_gradients(loss, word_emb), you can perform arbitrary TensorFlow operations on its indices and values properties, and create a new tf.IndexedSlices that can be passed to opt.apply_gradients([(word_emb, ...)]). For example, you could cap the gradients using MyCapper() (as in the example) using the following calls:
grad, = opt.compute_gradients(loss, word_emb)
train_op = opt.apply_gradients(
[tf.IndexedSlices(MyCapper(grad.values), grad.indices)])
Similarly, you could change the set of indices that will be modified by creating a new tf.IndexedSlices with a different indices.
* In general, if you want to update only part of a variable in TensorFlow, you can use the tf.scatter_update(), tf.scatter_add(), or tf.scatter_sub() operators, which respectively set, add to (+=) or subtract from (-=) the value previously stored in a variable.
Since you just want to select the elements to be updated (and not to change the gradients), you can do as follows.
Let indices_to_update be a boolean tensor that indicates the indices you wish to update, and entry_stop_gradients is defined in the link, Then:
gather_emb = entry_stop_gradients(gather_emb, indices_to_update)
(Source)
Actually, I was also struggling with such a problem. In my case, I needed to train a model with w2v embeddings, but not all of the tokens existed in embedding matrix. Thus for those tokens which were not in matrix, I made random initialization. Of course tokens for which embeddings were already trained, shouldn't be updated, thus I've came up with such a solution:
class PartialEmbeddingsUpdate(tf.keras.layers.Layer):
def __init__(self, len_vocab,
weights,
indices_to_update):
super(PartialEmbeddingsUpdate, self).__init__()
self.embeddings = tf.Variable(weights, name='embedding', dtype=tf.float32)
self.bool_mask = tf.equal(tf.expand_dims(tf.range(0,len_vocab),1), tf.expand_dims(indices_to_update,0))
self.bool_mask = tf.reduce_any(self.bool_mask,1)
self.bool_mask_not = tf.logical_not(self.bool_mask)
self.bool_mask_not = tf.expand_dims(tf.cast(self.bool_mask_not, dtype=self.embeddings.dtype),1)
self.bool_mask = tf.expand_dims(tf.cast(self.bool_mask, dtype=self.embeddings.dtype),1)
def call(self, input):
input = tf.cast(input, dtype=tf.int32)
embeddings = tf.stop_gradient(self.bool_mask_not * self.embeddings) + self.bool_mask * self.embeddings
return tf.gather(embeddings,input)
Where len_vocab - is your vocabulary length, weights - matrix of weights (some of which shouldn't be updated) and indices_to_update - indices of those tokens which should be updated. After that I applied this layer instead of tf.keras.layers.Embeddings. Hope it helps everyone, who encountered the same problem.