How to dynamically initialize Variables in Tensorflow? - tensorflow

I want to run some optimization procedure in Tensorflow for a batch of examples, and I already have some raw estimation of these variables to optimize. So I want to initialize the variables with these estimated values, instead of some random numbers or zero.
So I wonder how can I make it? Please note here the initialization value is sample-dependent. My plan is to feed the initialization to some placeholder, then initialize the variable from this placeholder, but that doesn't work.

Define the operation update_estimates = tf.assign(variable,estimated_value), where estimated_value is a tf.placeholder that will contain your guess in the form of numpy arrays.
You then do a simple
sess.run(update_estimates, feed_dict={estimated_value:numpy_array}).
tf.get_variable() can be very useful, but for beginners I would advise against it.

I belive that this could be a good start for your problem:
import numpy as np
import tensorflow as tf
#This should be your raw estimation for the variables.
#Here I am using random numers as an example.
estimated_raw = np.random.uniform(-1,1,[2,3])
#This trainable variable will be initialized with estimated_raw
var = tf.get_variable('var', initializer=estimated_raw)
# Testing if everything is ok
with tf.Session() as sess:
var.initializer.run()
print(var.eval())
In this way you have initialized a variable with your estimation. The optimizer will take it further.

Related

Defining assignment function as variable in tensroflow?

I am training a neural network by SGD (batch size = 1). The inputs are randomly generated, and the labels are calculated based on the input. AKA the data does not have to be realistic, but the relationships between inputs and labels are specific. I will train my NN only 1 epoch, but with many batches.
I have the following code:
training_input = tf.Variable(tf.zeros(...))
assign_training_input_with_random_values = training_input.assign(tf.random_normal(...))
//Create a session, initialize a bunch of variables, construct a neural network...
for batch in range(batch_number):
sess.run(assign_training_input_with_random_values)
//Train my neural network...
However I noticed that if I write the above code differently the speed goes down by a lot:
//Run the assignment operation directly without defining it as a variable
for batch in range(batch_size)
sess.run(training_input.assign(tf.random_normal(...)))
//Train my neural network...
The first snippet being significantly faster makes me worry that tensorflow is only randomizing when I define the assign_training_input_with_random_values variable, and the same training examples are fed to the NN over every batch afterwards. In this case, the NN will probably not generalize well. Meanwhile, the second snippet is slow because it is randomizing every batch. Is this actually the case or is there another reason for this?
First the explanation to your observations
Computational difference between 1st and 2nd solutions
It makes sense that your first solution is faster than the second. You define the assign operation once and then execute that for 100 epochs. However in the 2nd solution you create an op every epoch, growing the computational graph over time which causes your program to slow down.
Observation about the 1st solution
(After #Y.Z.'s finding) Apparently the first solution does evaluate to different random number arrays every time you run it. Therefore, the first solution is also valid.
Another way to implement this
The correct way to implement your solution would be to use a tf.placeholder to feed values in every epoch the following way.
import tensorflow as tf
import numpy as np
training_input = tf.Variable(tf.zeros(shape=[3, 2]))
tf_random = tf.placeholder(shape=[3, 2], dtype=tf.float32)
assign_training_input_with_random_values = training_input.assign(tf_random)
#Create a session, initialize a bunch of variables, construct a neural network...
epoch=0
with tf.Session() as sess:
while epoch < 10:
epoch+= 1
sess.run(assign_training_input_with_random_values, feed_dict={tf_random:np.random.normal(size=(3,2))})
Comparing Solution 1 vs My solution
So turns out, both your first solution and my solution will not grow the graph. If you run the line
print([n.name for n in tf.get_default_graph().as_graph_def().node])
for your first solution and my solution (Be careful to run tf.reset_default_graph() at the beginning) you'll see that the number of tensors remain constant regardless of the number of iterations. Appears that TensorFlow is smart enough to prune those old tf.random tensors no longer used.

TensorFlow: Initial value without shape

I tried to implement the following code.
import tensorflow as tf
a = tf.placeholder(tf.int32)
b = tf.placeholder(tf.int32)
def initw(a,b):
tf.Variable(tf.sign(tf.random_uniform(shape=[a,b],minval=-1.0,maxval=1.0)))
bla = initw(a,b)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run([bla], feed_dict={a:2, b:2}))
But I keep getting an error which states:
ValueError: initial_value must have a shape specified: Tensor("Sign:0",shape=(?, ?), dtype=float32)
Can someone tell me what I am doing wrong here? I really don't see what causes the error.
EDIT:
I want to use initw(a,b) to initialize the weights of a network. I want to be able to do something like:
weights = {
"h1": tf.get_variable("h1", initializer=initw(a,b).initialized_value())
}
Where a and b are the height and width of a matrix.
In my eyes the error message is actually quite precise. But I understand your confusion. You probably do not really understand how Tensorflow works under the hood. You might want to start reading here.
The shape of the computational graph must be known before runtime. There can only be one axis in every variable or placeholder which is unspecified at compile time, it is than later at runtime considered to be the batch dimension.
In your case you are trying to use placeholders to specify the dimensions of a variable, which is impossible because the graph can not be compiled this way.
I don't know what you are trying to do with this but I would guess there is a way to achieve what you need. You can actually use the length of the batch dimension dynamically to draw a uniform vector of that size.
Edit: After you updated the question I feel like I was right about my suspicion. There is no need for a and b to be placeholders, just make them Python variables, like this:
import tensorflow as tf
# Matrix shape must be known in advance, but can of course still be specified
# in some settings file or at the beginning of the python skript
A = 2
B = 2
W = tf.Variable(tf.sign(tf.random_uniform(shape=(A, B), minval=-1.0,
maxval=1.0)))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(W))

Does tensorflow create a new numpy array each time it calls compute_gradients()?

A typical training loop in tensorflow maybe as follows:
cg = opt.compute_gradients(loss)
grads = [None] * len(cg)
for gv in cg:
grads[i] = gv[0]
# ... do some process to grads ...
apply_gradients = opt.apply_gradients(cg)
while (...):
gradients = sess.run(grads)
feed = dict()
for i, grad_var in enumerate(cg)
feed[grad_var[0]] = gradients[i]
sess.run(apply_gradients, feed_dict=feed)
Each time it calls sess.run(grads), a new numpy array gradients (with new-allocated inner memory) is generated. I want to use a fixed numpy array for all the training iterations, how could I do that?
The tf.Optimizer.compute_gradients() method should not create any new NumPy arrays: instead it builds a graph of TensorFlow operations for computing the gradients of the loss with respect to some or all of the variables in your model. The return value is not a NumPy array; it is a list of pairs of gradient tf.Tensor objects and the corresponding tf.Variable to which that gradient should be applied.
Nevertheless, it is usually wasteful of memory to call opt.compute_gradients() inside a loop. It's hard to say whether this will work exactly without seeing more of your code, but you should be able to move the call to opt.compute_gradients() before the loop, since it does not seem to depend on anything computed inside the loop. This will avoid building a new segment of TensorFlow graph in each loop iteration, and should reduce the memory cost.

Tensorflow RNN weight matrices initialization

I'm using bidirectional_rnn with GRUCell but this is a general question regarding the RNN in Tensorflow.
I couldn't find how to initialize the weight matrices (input to hidden, hidden to hidden). Are they initialized randomly? to zeros? are they initialized differently for each LSTM I create?
EDIT: Another motivation for this question is in pre-training some LSTMs and using their weights in a subsequent model. I don't currently know how to do that currently without saving all the states and restoring the entire model.
Thanks.
How to initialize weight matrices for RNN?
I believe people are using random normal initialization for weight matrices for RNN. Check out the example in TensorFlow GitHub Repo. As the notebook is a bit long, they have a simple LSTM model where they use tf.truncated_normal to initialize weights and tf.zeros to initialize biases (although I have tried using tf.ones to initialize biases before, seem to also work). I believe that the standard deviation is a hyperparameter you could tune yourself. Sometimes weights initialization is important to the gradient flow. Although as far as I know, LSTM itself is designed to handle gradient vanishing problem (and gradient clipping is for helping gradient exploding problem), so perhaps you don't need to be super careful with the setup of std_dev in LSTM? I've read papers recommending Xavier initialization (TF API doc for Xavier initializer) in Convolution Neural Network context. I don't know if people use that in RNN, but I imagine you can even try those in RNN if you want to see if it helps.
Now to follow up with #Allen's answer and your follow up question left in the comments.
How to control initialization with variable scope?
Using the simple LSTM model in the TensorFlow GitHub python notebook that I linked to as an example.
Specifically, if I want to re-factorize the LSTM part of the code in above picture using variable scope control, I may code something as following...
import tensorflow as tf
def initialize_LSTMcell(vocabulary_size, num_nodes, initializer):
'''initialize LSTMcell weights and biases, set variables to reuse mode'''
gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
with tf.variable_scope('LSTMcell') as scope:
for gate in gates:
with tf.variable_scope(gate) as gate_scope:
wx = tf.get_variable("wx", [vocabulary_size, num_nodes], initializer)
wt = tf.get_variable("wt", [num_nodes, num_nodes], initializer)
bi = tf.get_variable("bi", [1, num_nodes, tf.constant_initializer(0.0)])
gate_scope.reuse_variables() #this line can probably be omitted, b.z. by setting 'LSTMcell' scope variables to 'reuse' as the next line, it'll turn on the reuse mode for all its child scope variables
scope.reuse_variables()
def get_scope_variables(scope_name, variable_names):
'''a helper function to fetch variable based on scope_name and variable_name'''
vars = {}
with tf.variable_scope(scope_name, reuse=True):
for var_name in variable_names
var = tf.get_variable(var_name)
vars[var_name] = var
return vars
def LSTMcell(i, o, state):
'''a function for performing LSTMcell computation'''
gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
var_names = ['wx', 'wt', 'bi']
gate_comp = {}
with tf.variable_scope('LSTMcell', reuse=True):
for gate in gates:
vars = get_scope_variables(gate, var_names)
gate_comp[gate] = tf.matmul(i, vars['wx']) + tf.matmul(o, vars['wt']) + vars['bi']
state = tf.sigmoid(gate_comp['forget_gate']) * state + tf.sigmoid(gate_comp['input_gate']) * tf.tanh(gate_comp['memory_cell'])
output = tf.sigmoid(gate_comp['output_gate']) * tf.tanh(state)
return output, state
The usage of the re-factorized code would be something like following...
initialize_LSTMcell(volcabulary_size, num_nodes, tf.truncated_normal_initializer(mean=-0.1, stddev=.01))
#...Doing some computation...
LSTMcell(input_tensor, output_tensor, state)
Even though the refactorized code may look less straightforward, but using scope variable control ensures scope encapsulation and allows flexible variable controls (in my opinion at least).
In pre-training some LSTMs and using their weights in a subsequent model. How to do that without saving all the states and restoring the entire model.
Assuming you have a pre-trained model froze and loaded in, if you wanna use their frozen 'wx', 'wt' and 'bi', you can simply find their parent scope names and variable names, then fetch the variables using similar structure in get_scope_variables func.
with tf.variable_scope(scope_name, reuse=True):
var = tf.get_variable(var_name)
Here is a link to understanding variable scope and sharing variables. I hope this is helpful.
The RNN models will create their variables with get_variable, and you can control the initialization by wrapping the code which creates those variables with a variable_scope and passing a default initializer to it. Unless the RNN specifies one explicitly (looking at the code, it doesn't), uniform_unit_scaling_initializer is used.
You should also be able to share model weights by declaring the second model and passing reuse=True to its variable_scope. As long as the namespaces match up, the new model will get the same variables as the first model.
A simple way to initialize all kernel weights with certain initializer is to leave the initializer in tf.variable_scope(). For example:
with tf.variable_scope('rnn', initializer=tf.variance_scaling_initializer()):
basic_cell= tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
outputs, state= tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)

How can I feed a numpy array to a prefetch and buffer pipeline of TensorFlow

I tried to follow the Cifar10 example. However, I want to replace the file reading with the Numpy array. There are a few benefits for doing that:
Simpler code (I want to remove the binary file parsing)
Simpler graph and visualization --> easier to explain to other audience
Small perf improvement (due to I/O and parsing)?
What would be a simple way to do it?
You need to get the tensor reshape_image by either:
giving it a name
finding its default name, with Tensorboard for instance
reshaped_image = tf.cast(read_input.uint8image, tf.float32, name="float_image")
Then you can feed your numpy array using a feed_dict like:
reshaped_image = tf.get_default_graph().get_tensor_by_name("float_image")
sess.run(loss, feed_dict={reshaped_image: your_numpy})
The same goes for labels.