Understanding Variable scope example in Tensorflow - tensorflow

I was looking at the mechanics section for Tensorflow, specifically on shared variables. In the section "The problem", they are dealing with a convolutional neural net, and provide the following code (which runs an image through the model):
# First call creates one set of variables.
result1 = my_image_filter(image1)
# Another set is created in the second call.
result2 = my_image_filter(image2)
If the model was implemented in such a way, would it then be impossible to learn/update the parameters because there's a new set of parameters for each image in my training set?
Edit:
I've also tried "the problem" approach on a simple linear regression example, and there do not appear to be any issues with this method of implementation. Training seems to work as well as can be shown by the last line of the code. So I'm wondering if there is a subtle discrepancy in the tensorflow documentation and what I'm doing. :
import tensorflow as tf
import numpy as np
trX = np.linspace(-1, 1, 101)
trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 # create a y value which is approximately linear but with some random noise
X = tf.placeholder("float") # create symbolic variables
Y = tf.placeholder("float")
def model(X):
with tf.variable_scope("param"):
w = tf.Variable(0.0, name="weights") # create a shared variable (like theano.shared) for the weight matrix
return tf.mul(X, w) # lr is just X*w so this model line is pretty simple
y_model = model(X)
cost = (tf.pow(Y-y_model, 2)) # use sqr error for cost function
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) # construct an optimizer to minimize cost and fit line to my data
sess = tf.Session()
init = tf.initialize_all_variables() # you need to initialize variables (in this case just variable W)
sess.run(init)
with tf.variable_scope("train"):
for i in range(100):
for (x, y) in zip(trX, trY):
sess.run(train_op, feed_dict={X: x, Y: y})
print sess.run(y_model, feed_dict={X: np.array([1,2,3])})

One has to create the variable set only once per whole training (and testing) set. The goal of variable scopes is to allow for modularization of subsets of parameters, such as those belonging to layers (e.g. when architecture of a layer is repeated, the same names can be used within each layer scope).
In your example you create parameters only in the model function. You can print out your variable names to see that it is assigned to the specified scope:
from __future__ import print_function
X = tf.placeholder("float") # create symbolic variables
Y = tf.placeholder("float")
print("X:", X.name)
print("Y:", Y.name)
def model(X):
with tf.variable_scope("param"):
w = tf.Variable(0.0, name="weights") # create a shared variable (like theano.shared) for the weight matrix
print("w:", w.name)
return tf.mul(X, w)
The call to sess.run(train_op, feed_dict={X: x, Y: y}) only evaluates the value of train_op given the provided values of X and Y. No new variables (incl. parameters) are created there; therefore, it has no effect. You can make sure the variable names stay the same by again printing them out:
with tf.variable_scope("train"):
print("X:", X.name)
print("Y:", Y.name)
for i in range(100):
for (x, y) in zip(trX, trY):
sess.run(train_op, feed_dict={X: x, Y: y})
You will see that variable names stay the same, as they are already initialized.
If you'd like to retrieve a variable using its scope, you need to use get_variable within a tf.variable_scope enclosure:
with tf.variable_scope("param"):
w = tf.get_variable("weights", [1])
print("w:", w.name)

Related

Tensorflow: When should I use or not use `feed_dict`?

I am kind of confused why are we using feed_dict? According to my friend, you commonly use feed_dict when you use placeholder, and this is probably something bad for production.
I have seen code like this, in which feed_dict is not involved:
for j in range(n_batches):
X_batch, Y_batch = mnist.train.next_batch(batch_size)
_, loss_batch = sess.run([optimizer, loss], {X: X_batch, Y:Y_batch})
I have also seen code like this, in which feed_dict is involved:
for i in range(100):
for x, y in data:
# Session execute optimizer and fetch values of loss
_, l = sess.run([optimizer, loss], feed_dict={X: x, Y:y})
total_loss += l
I understand feed_dict is that you are feeding in data and try X as the key as if in the dictionary. But here I don't see any difference. So, what exactly is the difference and why do we need feed_dict?
In a tensorflow model you can define a placeholder such as x = tf.placeholder(tf.float32), then you will use x in your model.
For example, I define a simple set of operations as:
x = tf.placeholder(tf.float32)
y = x * 42
Now when I ask tensorflow to compute y, it's clear that y depends on x.
with tf.Session() as sess:
sess.run(y)
This will produce an error because I did not give it a value for x. In this case, because x is a placeholder, if it gets used in a computation you must pass it in via feed_dict. If you don't it's an error.
Let's fix that:
with tf.Session() as sess:
sess.run(y, feed_dict={x: 2})
The result this time will be 84. Great. Now let's look at a trivial case where feed_dict is not needed:
x = tf.constant(2)
y = x * 42
Now there are no placeholders (x is a constant) and so nothing needs to be fed to the model. This works now:
with tf.Session() as sess:
sess.run(y)

How to use conditions and temporary variables in tensorflow?

I want to create a temporary variable in TF and then substract it from my input variable if it is a traning phase. This is simplified code that I use. Please, could you give me a piece of advice how to make it work?
Please, keep in mind that I don't want to create a variable if it is not traning phase.
import tensorflow as tf
def some_transformation(x):
x0 = tf.get_variable('x0', initializer=tf.random_uniform([1],
maxval=0.3, dtype=tf.float32), dtype=tf.float32)
return tf.subtract(x, x0)
x = tf.placeholder("float", [])
is_traning = tf.placeholder(tf.int32, None)
x_transformed = tf.cond(is_traning > 0, lambda: some_transformation(x), lambda: x)
#x_transformed = some_transformation(x)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
out = sess.run(x_transformed, feed_dict={x: 10, is_traning: 1})
print(out)
Please do post your error messages along with your code, after running your code I see you're getting this error:
ValueError: Initializer for variable x0/ is from inside a control-flow construct, such as a loop or conditional. When creating a variable inside a loop or conditional, use a lambda as the initializer.
That appears to be because you are trying to call tf.get_variable from inside some_transformation and it's telling you to change the property initializer=tf.random_uniform(...) to initializer=lambda: tf.random_uniform(...).
You might also opt to define x0 outside of the transformation and pass it in as:
x_transformed = tf.cond(is_traning > 0, lambda: some_transformation(x, x0), lambda: x)
If that's valid in your use case.

tensorflow modify variables in py_func (and its grad func)

In tensorflow, we can define our own op and its gradient by:
https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342
However, can we modify any variable in the computational graph in these python functions. For example in the "_MySquareGrad" function?
I assume we can get the variable by:
var = tf.get_variable('var')
and then do something to change its value and then assign it back?
e.g.
tmp = var*10
var.assign(tmp)
Thanks!
Also when we do var*10, do we have to convert it to numpy?
Background: I'm familiar with automatic differentiation, but new to Tensorflow and Python. So please point out any syntactic problem and let me know if my intention is clear.
You can modify the variables in the computational graph in these python functions. Your example code with tmp = var*10 will work and does not convert anything to numpy.
In fact you should try to avoid converting to numpy as much as possible since it will slow down the computation.
edit:
You can include your code to the gradient computation graph of the _MySquareGrad function doing this:
def _MySquareGrad(op, grad):
#first get a Variable that was created using tf.get_variable()
with tf.variable_scope("", reuse=True):
var = tf.get_variable('var')
#now create the assign graph:
tmp = var*10.
assign_op = var.assign(tmp)
#now make the assign operation part of the grad calculation graph:
with tf.control_dependencies([assign_op]):
x = tf.identity(op.inputs[0])
return grad * 20 * x
Here is a working example:
import tensorflow as tf
from tensorflow.python.framework import ops
import numpy as np
# Define custom py_func which takes also a grad op as argument:
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
tf.RegisterGradient(rnd_name)(grad) # see _MySquareGrad for grad example
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
# Def custom square function using np.square instead of tf.square:
def mysquare(x, name=None):
with ops.name_scope(name, "Mysquare", [x]) as name:
sqr_x = py_func(np.square,
[x],
[tf.float32],
name=name,
grad=_MySquareGrad) # <-- here's the call to the gradient
return sqr_x[0]
### Actual gradient:
##def _MySquareGrad(op, grad):
##x = op.inputs[0]
##return grad * 20 * x # add a "small" error just to see the difference:
def _MySquareGrad(op, grad):
#first get a Variable that was created using tf.get_variable()
with tf.variable_scope("", reuse=True):
var = tf.get_variable('var')
#now create the assign graph:
tmp = var*10.
assign_op = var.assign(tmp)
#now make the assign operation part of the grad calculation graph:
with tf.control_dependencies([assign_op]):
x = tf.identity(op.inputs[0])
return grad * 20 * x
with tf.Session() as sess:
x = tf.constant([1., 2.])
var = tf.get_variable(name="var", shape=[], initializer=tf.constant_initializer(0.2))
y = mysquare(x)
tf.global_variables_initializer().run()
print(x.eval(), y.eval(), tf.gradients(y, x)[0].eval())
print("Now var is 10 times larger:", var.eval())

Simple custom gradient with gradient_override_map

I want to use a function that creates weights for a normal dense layer, it basically behaves like an initialization function, only that it "initializes" before every new forward pass.
The flow for my augmented linear layer looks like this:
input = (x, W)
W_new = g(x,W)
output = tf.matmul(x,W_new)
However, g(x,W) is not differentiable, as it involves some sampling. Luckily it also doesn't have any parameters I want to learn so I just try to do the forward and backward pass, as if I would have never replaced W.
Now I need to tell the automatic differentiation to not backpropagate through g(). I do this with:
W_new = tf.stop_gradient(g(x,W))
Unfortunately this does not work, as it complains about non-matching shapes.
What does work is the following:
input = (x, W)
W_new = W + tf.stop_gradient(g(x,W) - W)
output = tf.matmul(x,W_new)
as suggested here: https://stackoverflow.com/a/36480182
Now the forward pass seems to be OK, but I don't know how to override the gradient for the backward pass. I know, that I have to use: gradient_override_map for this, but could not transfer applications I have seen to my particular usecase (I am still quite new to TF).
However, I am not sure how to do this and if there isn't an easier way. I assume something similar has to be done in the first forward pass in a given model, where all weights are initialized while we don't have to backpropagate through the init functions as well.
Any help would be very much appreciated!
Hey #jhj I too faced the same problem fortunately I found this gist. Hope this helps :)
Sample working -
import tensorflow as tf
from tensorflow.python.framework import ops
import numpy as np
Define custom py_func which takes also a grad op as argument:
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
tf.RegisterGradient(rnd_name)(grad) # see _MySquareGrad for grad example
g = tf.get_default_graph()
with g.gradient_override_map({"PyFunc": rnd_name, "PyFuncStateless": rnd_name}):
return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
Def custom square function using np.square instead of tf.square:
def mysquare(x, name=None):
with ops.name_scope(name, "Mysquare", [x]) as name:
sqr_x = py_func(np.square,
[x],
[tf.float32],
name=name,
grad=_MySquareGrad) # <-- here's the call to the gradient
return sqr_x[0]
Actual gradient:
def _MySquareGrad(op, grad):
x = op.inputs[0]
return grad * 20 * x # add a "small" error just to see the difference:
with tf.Session() as sess:
x = tf.constant([1., 2.])
y = mysquare(x)
tf.global_variables_initializer().run()
print(x.eval(), y.eval(), tf.gradients(y, x)[0].eval())

Tensorflow while_loop for training

In my problem I need run GD with 1 example from data on each training step. It's known problem that session.run() has overhead and therefore it is too long to train model.
In attempt to avoid overhead I tried to use while_loop and train model on all data with one run() call. But it approach don't work and train_op don't execute even ones. Below simple example of what I'm doing:
data = [k*1. for k in range(10)]
tf.reset_default_graph()
i = tf.Variable(0, name='loop_i')
q_x = tf.FIFOQueue(100000, tf.float32)
q_y = tf.FIFOQueue(100000, tf.float32)
x = q_x.dequeue()
y = q_y.dequeue()
w = tf.Variable(0.)
b = tf.Variable(0.)
loss = (tf.add(tf.mul(x, w), b) - y)**2
gs = tf.Variable(0)
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(loss, global_step=gs)
s = tf.Session()
s.run(tf.initialize_all_variables())
def cond(i):
return i < 10
def body(i):
return tf.tuple([tf.add(i, 1)], control_inputs=[train_op])
loop = tf.while_loop(cond, body, [i])
for _ in range(1):
s.run(q_x.enqueue_many((data, )))
s.run(q_y.enqueue_many((data, )))
s.run(loop)
s.close()
What I'm doing wrong? Or there is another solution of this problem with too expensive overhead?
Thanks!
The reason the model does not appear to train is because the input reading, gradient calculation, and the minimize() call are all defined outside (and hence, in dataflow terms, before) the body of the tf.while_loop(). This means that all of these parts of the model run only once, before the loop executes, and the loop itself has no effect.
A slight refactoring—to move the dequeue() operations, gradient calculation, and minimize() call inside the loop—fixes the problem and allows your program to train:
optimizer = tf.train.GradientDescentOptimizer(0.05)
def cond(i):
return i < 10
def body(i):
# Dequeue a new example each iteration.
x = q_x.dequeue()
y = q_y.dequeue()
# Compute the loss and gradient update based on the current example.
loss = (tf.add(tf.mul(x, w), b) - y)**2
train_op = optimizer.minimize(loss, global_step=gs)
# Ensure that the update is applied before continuing.
return tf.tuple([tf.add(i, 1)], control_inputs=[train_op])
loop = tf.while_loop(cond, body, [i])
UPDATE: Here's a complete program the executes the while loop, based on the code in your question:
import tensorflow as tf
# Define a single queue with two components to store the input data.
q_data = tf.FIFOQueue(100000, [tf.float32, tf.float32])
# We will use these placeholders to enqueue input data.
placeholder_x = tf.placeholder(tf.float32, shape=[None])
placeholder_y = tf.placeholder(tf.float32, shape=[None])
enqueue_data_op = q_data.enqueue_many([placeholder_x, placeholder_y])
gs = tf.Variable(0)
w = tf.Variable(0.)
b = tf.Variable(0.)
optimizer = tf.train.GradientDescentOptimizer(0.05)
# Construct the while loop.
def cond(i):
return i < 10
def body(i):
# Dequeue a single new example each iteration.
x, y = q_data.dequeue()
# Compute the loss and gradient update based on the current example.
loss = (tf.add(tf.multiply(x, w), b) - y) ** 2
train_op = optimizer.minimize(loss, global_step=gs)
# Ensure that the update is applied before continuing.
with tf.control_dependencies([train_op]):
return i + 1
loop = tf.while_loop(cond, body, [tf.constant(0)])
data = [k * 1. for k in range(10)]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(1):
# NOTE: Constructing the enqueue op ahead of time avoids adding
# (potentially many) copies of `data` to the graph.
sess.run(enqueue_data_op,
feed_dict={placeholder_x: data, placeholder_y: data})
print (sess.run([gs, w, b])) # Prints before-loop values.
sess.run(loop)
print (sess.run([gs, w, b])) # Prints after-loop values.