I have a similar problem to TensorFlow: varscope.reuse_variables().
I am doing cross-validation on a dataset.
Each time I call a function e.g. myFunctionInFile1()) with new data (currently due to limited space, I am omitting the data assigning details). This function is not in same python file. Because of Which I import this function from that file in my main python file (file2). This function creates a complete CNN and train and test a model on the given training and testing data with newly initialized and trained parameters.
From the main file (file2), at first validation, myFunctionInFile1 is called and the CNN model train and test it and returns results to the main file (file2). However, in the second iteration with new data, following code:
def myFunctionInFile1():
# Nodes for the input variables
x = tf.placeholder("float", shape=[None, D], name='Input_data')
y_ = tf.placeholder(tf.int64, shape=[None], name='Ground_truth')
keep_prob = tf.placeholder("float")
bn_train = tf.placeholder(tf.bool) # Boolean value to guide batchnorm
def bias_variable(shape, name):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial, name=name)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
with tf.name_scope("Reshaping_data") as scope:
x_image = tf.reshape(x, [-1, D, 1, 1])
initializer = tf.contrib.layers.xavier_initializer()
"""Build the graph"""
# ewma is the decay for which we update the moving average of the
# mean and variance in the batch-norm layers
with tf.name_scope("Conv1") as scope:
# reuse = tf.AUTO_REUSE
W_conv1 = tf.get_variable("Conv_Layer_1", shape=[5, 1, 1, num_filt_1], initializer=initializer)
b_conv1 = bias_variable([num_filt_1], 'bias_for_Conv_Layer_1')
a_conv1 = conv2d(x_image, W_conv1) + b_conv1
with tf.name_scope('Batch_norm_conv1') as scope:
a_conv1 = tf.contrib.layers.batch_norm(a_conv1,is_training=bn_train,updates_collections=None)
give me following error:
ValueError: Variable BatchNorm_2/beta does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?
What is the problem, as generally in programming in C/C++/Java if you exit a function, the local variables in that called function are deleted automatically at return. And at each time new call it should create a new set of parameters. Than why it gives this error. How can I fix this?
TensorFlow layers like batch_norm are implemented using tf.get_variable. tf.get_variable has a reuse argument (which it can also get from a variable_scope), defaulting to False, and when called with reuse=False it always creates variables. You can call it with reuse=True which means it will reuse existing variables or fail if the variables do not exist.
In your case you're calling batch norm with reuse=True for the first time, so it's having a hard time creating the variables. Try setting reuse=False in your variable scope or using, as the error message suggests, tf.AUTO_REUSE.
Related
I am trying to implement a noisy linear layer in tensorflow, inheriting from tf.keras.layers.Layer . Everything works fine except for reusing variables. This seems to stem from some issue with the scoping: Whenever i use the add_weight function from the superclass and a weight with the same name already exists, it seems to ignore the given reuse-flag in the scope and creates a new variable instead. Interestingly, it does not add a 1 to the variable name in the end as usual in similar cases, but rather adds the 1 to the scope name.
import tensorflow as tf
class NoisyDense(tf.keras.layers.Layer):
def __init__(self,output_dim):
self.output_dim=output_dim
super(NoisyDense, self).__init__()
def build(self, input_shape):
self.input_dim = input_shape.as_list()[1]
self.noisy_kernel = self.add_weight(name='noisy_kernel',shape= (self.input_dim,self.output_dim))
def noisydense(inputs, units):
layer = NoisyDense(units)
return layer.apply(inputs)
inputs = tf.placeholder(tf.float32, shape=(1, 10),name="inputs")
scope="scope"
with tf.variable_scope(scope):
inputs3 = noisydense(inputs,
1)
my_variable = tf.get_variable("my_variable", [1, 2, 3],trainable=True)
with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
inputs2 = noisydense(inputs,
1)
my_variable = tf.get_variable("my_variable", [1, 2, 3],trainable=True)
tvars = tf.trainable_variables()
init=tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
tvars_vals = sess.run(tvars)
for var, val in zip(tvars, tvars_vals):
print(var.name, val)
This results in the variables
scope/noisy_dense/noisy_kernel:0
scope_1/noisy_dense/noisy_kernel:0
scope/my_variable:0
being printed. I would like it to reuse the noisy kernel instead of creating a second one, as it is done for my_variable.
I am using TensorFlow as a part of a larger system where I want to apply the gradient updates in batches. Ideally I'd like to do something along the lines of (in pseudo-code):
grads_and_vars = tf.gradients(loss, [vars])
list_of_losses = [2, 1, 3, ...]
for loss_vals in list_of_losses:
tf.apply_gradients(grads_and_vars, feed_dict = {loss : loss_vals}
My loss function depends on earlier predictions from my neural network and it takes a long time to compute thus my need for this.
When you call tf.gradients, the argument grad_ys let you specify custom values from upstream backprop graph. If you don't specify them, you end up with node that assumes that upstream backprop is tensor of 1's (Fill node). So you could either call tf.gradients with a placeholder that lets you specify custom upstream values, or just feed the Fill node.
IE
tf.reset_default_graph()
a = tf.constant(2.)
b = tf.square(a)
grads = tf.gradients(b, [a])
sess.run(grads, feed_dict={"gradients/Fill:0": 0})
(Posted on behalf of the OP.)
Thanks for your suggestions Yaroslav! Below is the code I put together based on your suggestions. I think this solves my problem:
tf.reset_default_graph()
with tf.Session() as sess:
X = tf.placeholder("float", name="X")
W = tf.Variable(1.0, name="weight")
b = tf.Variable(0.5, name="bias")
pred = tf.sigmoid(tf.add(tf.multiply(X, W), b))
opt = tf.train.AdagradOptimizer(1.0)
gvs = tf.gradients(pred, [W, b], grad_ys=0.5)
train_step = opt.apply_gradients(zip(gvs, [W, b]))
tf.global_variables_initializer().run()
for i in range(50):
val, _ = sess.run([pred, train_step], feed_dict= {X : 2})
print(val)
Currently I have the following code:
init_state = tf.Variable(tf.zeros([batch_partition_length, state_size])) # -> [16, 1024].
final_state = tf.Variable(tf.zeros([batch_partition_length, state_size]))
And inside my inference method that is responsible producing the output, I have the following:
def inference(frames):
# Note that I write the final_state as a global valriable to avoid the shadowing issue, since it is referenced at the dynamic_rnn line.
global final_state
# .... Here we have some conv layers and so on...
# Now the RNN cell
with tf.variable_scope('local1') as scope:
# Move everything into depth so we can perform a single matrix multiply.
shape_d = pool3.get_shape()
shape = shape_d[1] * shape_d[2] * shape_d[3]
# tf_shape = tf.stack(shape)
tf_shape = 1024
print("shape:", shape, shape_d[1], shape_d[2], shape_d[3])
# So note that tf_shape = 1024, this means that we have 1024 features are fed into the network. And
# the batch size = 1024. Therefore, the aim is to divide the batch_size into num_steps so that
reshape = tf.reshape(pool3, [-1, tf_shape])
# Now we need to reshape/divide the batch_size into num_steps so that we would be feeding a sequence
rnn_inputs = tf.reshape(reshape, [batch_partition_length, step_size, tf_shape])
print('RNN inputs shape: ', rnn_inputs.get_shape()) # -> (16, 64, 1024).
cell = tf.contrib.rnn.BasicRNNCell(state_size)
# note that rnn_outputs are the outputs but not multiplied by W.
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, rnn_inputs, initial_state=init_state)
# linear Wx + b
with tf.variable_scope('softmax_linear') as scope:
weight_softmax = \
tf.Variable(
tf.truncated_normal([state_size, n_classes], stddev=1 / state_size, dtype=tf.float32, name='weight_softmax'))
bias_softmax = tf.constant(0.0, tf.float32, [n_classes], name='bias_softmax')
softmax_linear = tf.reshape(
tf.matmul(tf.reshape(rnn_outputs, [-1, state_size]), weight_softmax) + bias_softmax,
[batch_size, n_classes])
print('Output shape:', softmax_linear.get_shape())
return softmax_linear
# Here we define the loss, accuracy and the optimzer.
# now run the graph:
with tf.Session() as sess:
_, accuracy_train, loss_train, summary = \
sess.run([optimizer, accuracy, cost_scalar, merged], feed_dict={x: image_batch,
y_valence: valences,
confidence_holder: confidences})
....
Problem: How I would be able to assign initial_state the value stored in final_state? That is, how to more update a Variable value given the other?
I have used the following:
tf.assign(init_state, final_state.eval())
under session after running the sess.run command. But, this is throwing an error:
You must feed a value for placeholder tensor 'inputs' with dtype float
Where tf.Variable: "input" is declared as follows:
x = tf.placeholder(tf.float32, [None, 112, 112, 3], name='inputs')
And the feeding is done after reading the images from the tfRecords through the following command:
example = tf.train.Example()
example.ParseFromString(string_record)
height = int(example.features.feature['height']
.int64_list
.value[0])
width = int(example.features.feature['width']
.int64_list
.value[0])
img_string = (example.features.feature['image_raw']
.bytes_list
.value[0])
img_1d = np.fromstring(img_string, dtype=np.uint8)
reconstructed_img = img_1d.reshape((height, width, -1)) # Where this is added to the image_batch list, which is fed into the placeholder.
And if tried the following:
img_1d = np.fromstring(img_string, dtype=np.float32)
This will produce the following error:
ValueError: cannot reshape array of size 9408 into shape (112,112,newaxis)
Any help is much appreciated!!
So here are the mistakes that I have done so far. After doing some revision I figured out the following:
I shouldn't create the final_state as a tf.Variable. Since tf.nn.dynamic_rnn return tensors as ndarray, then, I should not instantiate the final_state int the beginning. And I should not use the global final_state under the function definition.
In order to assign the initial state the final_state, I used:
tf.assign(intial_state, final_state)
And things work out.
Note: in tensorflow, an operation returns the data as numpy array in python and as tensorflow::Tensor in C and C++.
Have a look at https://www.tensorflow.org/versions/r0.10/get_started/basic_usage for more informaiton.
I want to initialize the Weights variable by including the BatchSize dimension, which will be different between the Training and Prediction stages. Tried using the placeholder for that, but doesn't seem to work:
batchsize = tf.placeholder(tf.int32, name='batchsize', shape=[])
...
output, state = tf.nn.dynamic_rnn(multicell, X, dtype=tf.float32, initial_state=inState)
weights = tf.Variable(tf.truncated_normal([batchsize, CELL_SIZE, 1], 0.0, 1.0), name='weights')
bias = tf.Variable(tf.zeros(1), name='bias')
preds = tf.add(tf.matmul(output, weights), bias, name='preds')
loss = tf.reduce_mean(tf.squared_difference(preds, Y_))
train_step = tf.train.AdamOptimizer(LR).minimize(loss)
I can get it to work by specifying batchsize as a constant for the weights variable dimension, as opposed to a placeholder, but this way I get an error when I try to recover the session for the Prediction stage, because there the batchsize is 1. If I specify the placeholder, I get the error:
ValueError: initial_value must have a shape specified: Tensor("truncated_normal:0", shape=(?, 32, 1), dtype=float32)
Even though I do pass the value for the batchsize placeholder into the feed_dict when running this part of the graph.
If I specify the option validate_shape=False while creating the weights variable, that stage of the graph works, but later I get this error in AdamOptimizer:
ValueError: as_list() is not defined on an unknown TensorShape.
How can I get this to work? My ultimate goal is to reduce the Cell-Size dimension of the dynamic_rnn output down to 1 to predict the output at each time-step of the RNN.
Make the whole size of variable
get the specific shape of variable corresponding to the batch size (using tf.gather)
self.model_X = tf.placeholder(dtype=tf.float32, shape=[None, 100], name='X')
real_batch_size = tf.cast(tf.shape(self.model_X)[0],tf.int32)
self.y_dk = tf.get_variable(name="y_dk",initializer=tf.truncated_normal(shape=[self.num_doc, self.num_topic], mean=0, stddev=tf.truediv(1.0,self.lambda_y)), dtype=tf.float32)
batch_y_dk = tf.reshape(tf.gather(self.y_dk, self.model_batch_data_idx), [real_batch_size, self.num_topic])
I read from here that it is recommended to always use tf.get_variable(...) although this seems a bit troublesome when I'm trying to implement a network.
For example:
def create_weights(shape, name = 'weights',\
initializer = tf.random_normal_initializer(0, 0.1)):
weights = tf.get_variable(name, shape, initializer = initializer)
print("weights created named: {}".format(weights.name))
return(weights)
def LeNet(in_units, keep_prob):
# define the network
with tf.variable_scope("conv1"):
conv1 = conv(in_units, create_weights([5, 5, 3, 32]), create_bias([32]))
pool1 = maxpool(conv1)
with tf.variable_scope("conv2"):
conv2 = conv(pool1, create_weights([5, 5, 32, 64]), create_bias([64]))
pool2 = maxpool(conv2)
# reshape the network to feed it into the fully connected layers
with tf.variable_scope("flatten"):
flatten = tf.reshape(pool2, [-1, 1600])
flatten = dropout(flatten, keep_prob)
with tf.variable_scope("fc1"):
fc1 = fc(flatten, create_weights([1600, 120]), biases = create_bias([120]))
fc1 = dropout(fc1, keep_prob)
with tf.variable_scope("fc2"):
fc2 = fc(fc1, create_weights([120, 84]), biases = create_bias([84]))
with tf.variable_scope("logits"):
logits = fc(fc2, create_weights([84, 43]), biases = create_bias([43]))
return(logits)
I have to use with tf_variable_scope(...) every single time I call create_weights, and furthermore, say if I wanted to change the conv1 variable's weights to [7, 7, 3, 32] instead of [5, 5, 3, 32] I would have to restart the kernel as the variable already exists. On the other hand if I use tf.Variable(...) I wouldn't have any of these problems.
Am I using tf.variable_scope(...) incorrectly?
It seems that you cannot change what already exists in a variable scope, thus only when you restart the kernel, you can change a variable that you defined before.(In fact you create a new one because the previous one has been deleted)
...
that is only my guess...I will appreciate it if someone can give a detailed answer.