Tensorflow: When should I use or not use `feed_dict`? - tensorflow

I am kind of confused why are we using feed_dict? According to my friend, you commonly use feed_dict when you use placeholder, and this is probably something bad for production.
I have seen code like this, in which feed_dict is not involved:
for j in range(n_batches):
X_batch, Y_batch = mnist.train.next_batch(batch_size)
_, loss_batch = sess.run([optimizer, loss], {X: X_batch, Y:Y_batch})
I have also seen code like this, in which feed_dict is involved:
for i in range(100):
for x, y in data:
# Session execute optimizer and fetch values of loss
_, l = sess.run([optimizer, loss], feed_dict={X: x, Y:y})
total_loss += l
I understand feed_dict is that you are feeding in data and try X as the key as if in the dictionary. But here I don't see any difference. So, what exactly is the difference and why do we need feed_dict?

In a tensorflow model you can define a placeholder such as x = tf.placeholder(tf.float32), then you will use x in your model.
For example, I define a simple set of operations as:
x = tf.placeholder(tf.float32)
y = x * 42
Now when I ask tensorflow to compute y, it's clear that y depends on x.
with tf.Session() as sess:
sess.run(y)
This will produce an error because I did not give it a value for x. In this case, because x is a placeholder, if it gets used in a computation you must pass it in via feed_dict. If you don't it's an error.
Let's fix that:
with tf.Session() as sess:
sess.run(y, feed_dict={x: 2})
The result this time will be 84. Great. Now let's look at a trivial case where feed_dict is not needed:
x = tf.constant(2)
y = x * 42
Now there are no placeholders (x is a constant) and so nothing needs to be fed to the model. This works now:
with tf.Session() as sess:
sess.run(y)

Related

How does Tensorflow's reduce_sum work in a loop?

I cannot understand the working of reduce_sum when the optimizer is run in a loop.
I have 30 samples in my train_x and train_y lists. I run my optimizer in a loop by feeding one sample from both at an iteration. My cost function computes the sum of the difference of predicted and actual values for all samples using the tensorflow's reduce_sum method. According to the graph the optimzer depends on the cost function and so the cost will be computed for every x and y. I need to know whether the reduce_sum will wait for all the 30 samples or take one sample (x, y) at a time. Here n_samples is 30. I also need to know whether the weights and bias will be updated for each epoch or for each x and y.
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
W = tf.Variable(np.random.randn(), name='weights')
B = tf.Variable(np.random.randn(), name='bias')
pred = X * W + B
cost = tf.reduce_sum((pred - Y) ** 2) / (2 * n_samples)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sesh:
sesh.run(init)
for epoch in range(epochs):
for x, y in zip(train_x, train_y):
sesh.run(optimizer, feed_dict={X: x, Y: y})
if not epoch % 20:
c = sesh.run(cost, feed_dict={X: train_x, Y: train_y})
w = sesh.run(W)
b = sesh.run(B)
print(f'epoch: {epoch:04d} c={c:.4f} w={w:.4f} b={b:.4f}')
I need to know whether the reduce_sum will wait for all the 30 samples or take one sample (x, y) at a time.
tf.reduce_sum is an operation and as such it does not have any implicit mutable state. The result of tf.reduce_sum is fully defined by the model parameters (W and B) and the placeholder values explicitly provided in the feed_dict argument to the sess.run(cost, feed_dict={...}) call.
If you would like to aggregate the value of a metric across all batches check out tf.metrics:
y_pred = tf.placeholder(tf.float32)
y_true = tf.placeholder(tf.float32)
mse, update_op = tf.metrics.mean_squared_error(y_true, y_pred)
init = tf.local_variables_initializer() # MSE state is local!
sess = tf.Session()
sess.run(init)
# Update the metric and compute the value after the update.
sess.run(update_op, feed_dict={y_pred: [0.0], y_true: [42.0]}) # => 1764.0
# Get current value.
sess.run(mse) # => 1764.0
I also need to know whether the weights and bias will be updated for each epoch or for each x and y.
Each sess.run(optimizer, ...) call will compute the gradients of the trainable variables and apply these gradients to the variable values. See GradientDescentOptimizer.minimize.

Set value of loss function when calculating/applying gradients

I am using TensorFlow as a part of a larger system where I want to apply the gradient updates in batches. Ideally I'd like to do something along the lines of (in pseudo-code):
grads_and_vars = tf.gradients(loss, [vars])
list_of_losses = [2, 1, 3, ...]
for loss_vals in list_of_losses:
tf.apply_gradients(grads_and_vars, feed_dict = {loss : loss_vals}
My loss function depends on earlier predictions from my neural network and it takes a long time to compute thus my need for this.
When you call tf.gradients, the argument grad_ys let you specify custom values from upstream backprop graph. If you don't specify them, you end up with node that assumes that upstream backprop is tensor of 1's (Fill node). So you could either call tf.gradients with a placeholder that lets you specify custom upstream values, or just feed the Fill node.
IE
tf.reset_default_graph()
a = tf.constant(2.)
b = tf.square(a)
grads = tf.gradients(b, [a])
sess.run(grads, feed_dict={"gradients/Fill:0": 0})
(Posted on behalf of the OP.)
Thanks for your suggestions Yaroslav! Below is the code I put together based on your suggestions. I think this solves my problem:
tf.reset_default_graph()
with tf.Session() as sess:
X = tf.placeholder("float", name="X")
W = tf.Variable(1.0, name="weight")
b = tf.Variable(0.5, name="bias")
pred = tf.sigmoid(tf.add(tf.multiply(X, W), b))
opt = tf.train.AdagradOptimizer(1.0)
gvs = tf.gradients(pred, [W, b], grad_ys=0.5)
train_step = opt.apply_gradients(zip(gvs, [W, b]))
tf.global_variables_initializer().run()
for i in range(50):
val, _ = sess.run([pred, train_step], feed_dict= {X : 2})
print(val)

restore a model trained with variable input length in tensorflow results in InvalidArgumentError

I am rather new to tensorflow and am currently experimenting with models of varying complexity. I have a problem with the save and restore functionality of the package. As far as I did understand the tutorials, I should be able to restore a trained graph and run it with some new input at some later point. However, I get the following error when I try to do just that.:
InvalidArgumentError (see above for traceback): Shape [-1,10] has negative dimensions
[[Node: Placeholder = Placeholderdtype=DT_FLOAT, shape=[?,10], _device="/job:localhost/replica:0/task:0/cpu:0"]]
My understanding of the message is that the restored graph does not like one dimension to be left arbitrary, which in turn is necessary for practical cases where I don't know beforehand how large my input will be. A code snippet as a minimal example, producing the error above, can be found below. I know how to restore each tensor individually but this gets impractical pretty quickly when the models grow in complexity. I am thankful for any help I get and apologize in case my question is stupid.
import numpy as np
import tensorflow as tf
def generate_random_input():
alist = []
for _ in range(10):
alist.append(np.random.uniform(-1, 1, 100))
return np.array(alist).T
def generate_random_target():
return np.random.uniform(-1, 1, 100)
x = tf.placeholder('float', [None, 10])
y = tf.placeholder('float')
# the model
w1 = tf.get_variable('w1', [10, 1], dtype=tf.float32, initializer=tf.contrib.layers.xavier_initializer(seed=1))
b1 = tf.get_variable('b1', [1], dtype=tf.float32, initializer=tf.contrib.layers.xavier_initializer(seed=1))
result = tf.add(tf.matmul(x, w1), b1, name='result')
loss = tf.reduce_mean(tf.losses.mean_squared_error(predictions=result, labels=y))
optimizer = tf.train.AdamOptimizer(0.03).minimize(loss)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run([optimizer, loss], feed_dict={x: generate_random_input(), y: generate_random_target()})
saver.save(sess, 'file_name')
# now load the model in another session:
sess2 = tf.Session()
saver = tf.train.import_meta_graph('file_name.meta')
saver.restore(sess2, tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
pred = graph.get_operation_by_name('result')
test_result = sess2.run(pred, feed_dict={x: generate_random_input()})
in the last line, you don't feed_dict the label_palceholder with the data. So in the placeholder, the [-1] dimension is still -1, other than the batch size. That's the cause.
I'm having the exact same problem as you. I'm importing and testing a bunch of different CNNs with different layer sizes and testing on various datasets. You can stick your model creation in a function like so and recreate it in your other code:
def create_model():
x = tf.placeholder('float', [None, 10])
y = tf.placeholder('float')
w1 = tf.get_variable('w1', [10, 1], dtype=tf.float32, initializer=tf.contrib.layers.xavier_initializer(seed=1))
b1 = tf.get_variable('b1', [1], dtype=tf.float32, initializer=tf.contrib.layers.xavier_initializer(seed=1))
result = tf.add(tf.matmul(x, w1), b1, name='result')
return x, y, result
x, y, result = create_model()
loss = tf.reduce_mean(tf.losses.mean_squared_error(predictions=result, labels=y))
optimizer = tf.train.AdamOptimizer(0.03).minimize(loss)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run([optimizer, loss], feed_dict={x: generate_random_input(), y: generate_random_target()})
saver.save(sess, 'file_name')
# now load the model in another session:
sess2 = tf.Session()
# This stuff is optional if everything is the same scope
x, y, result = create_model()
saver = tf.train.Saver()
# loss = ... if you want loss
# Now just restore the weights and run
saver.restore(sess, 'file_name')
test_result = sess2.run(pred, feed_dict={x: generate_random_input()})
This is a bit tedious if I want to import many complex architectures with different dimensions. For our situation, I don't know if there's any other way to restore an entire model than to recreate that architecture first in your second session.

Tensorflow model always produces mean

I am having trouble with fitting a very simple model in tensorflow. If I have a column of input data which is constant, my output always converges to produce the same value for all rows, which is the mean of my output data, y_, even when there is another column in x_ which has enough information to reproduce y_ exactly. Here is a small example.
import tensorflow as tf
def weight_variable(shape):
"""Initialize the weights with random weights"""
initial = tf.truncated_normal(shape, stddev=0.1, dtype=tf.float64)
return tf.Variable(initial)
#Initialize my data
x = tf.constant([[1.0,1.0],[1.0,2.0],[1.0,3.0]], dtype=tf.float64)
y_ = tf.constant([1.0,2.0,3.0], dtype=tf.float64)
w = weight_variable((2,1))
y = tf.matmul(x,w)
error = tf.reduce_mean(tf.square(y_ - y))
train_step = tf.train.AdamOptimizer(1e-5).minimize(error)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
#Train the model and output every 1000 iterations
for i in range(1000000):
sess.run(train_step)
err = sess.run(error)
if i % 1000 == 0:
print "\nerr:", err
print "x: ", sess.run(x)
print "w: ", sess.run(w)
print "y_: ", sess.run(y_)
print "y: ", sess.run(y)
This example always converges to w=[2,0], and y = [2,2,2]. This is a smooth function with a minimum at w=[0,1] and y = [1,2,3], where the error function is zero. Why does it not converge to this? I have also tried using gradient descent and I have tried varying the training rate.
Your target is y_ = tf.constant([1.0,2.0,3.0], dtype=tf.float64) has the shape (1, 3). The output of tf.matmul(x, w) has the shape (3, 1). Thus y_ - y has the shape (3, 3) according to numpy broadcasting rules. So you are really not optimizing the function that you thought you were optimizing. Change your y_ to the following and give it a shot :
y_ = tf.constant([[1.0],[2.0],[3.0]], dtype=tf.float64)
This should converge pretty quickly to your expected answer, even with a large learning rate.

Understanding Variable scope example in Tensorflow

I was looking at the mechanics section for Tensorflow, specifically on shared variables. In the section "The problem", they are dealing with a convolutional neural net, and provide the following code (which runs an image through the model):
# First call creates one set of variables.
result1 = my_image_filter(image1)
# Another set is created in the second call.
result2 = my_image_filter(image2)
If the model was implemented in such a way, would it then be impossible to learn/update the parameters because there's a new set of parameters for each image in my training set?
Edit:
I've also tried "the problem" approach on a simple linear regression example, and there do not appear to be any issues with this method of implementation. Training seems to work as well as can be shown by the last line of the code. So I'm wondering if there is a subtle discrepancy in the tensorflow documentation and what I'm doing. :
import tensorflow as tf
import numpy as np
trX = np.linspace(-1, 1, 101)
trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 # create a y value which is approximately linear but with some random noise
X = tf.placeholder("float") # create symbolic variables
Y = tf.placeholder("float")
def model(X):
with tf.variable_scope("param"):
w = tf.Variable(0.0, name="weights") # create a shared variable (like theano.shared) for the weight matrix
return tf.mul(X, w) # lr is just X*w so this model line is pretty simple
y_model = model(X)
cost = (tf.pow(Y-y_model, 2)) # use sqr error for cost function
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) # construct an optimizer to minimize cost and fit line to my data
sess = tf.Session()
init = tf.initialize_all_variables() # you need to initialize variables (in this case just variable W)
sess.run(init)
with tf.variable_scope("train"):
for i in range(100):
for (x, y) in zip(trX, trY):
sess.run(train_op, feed_dict={X: x, Y: y})
print sess.run(y_model, feed_dict={X: np.array([1,2,3])})
One has to create the variable set only once per whole training (and testing) set. The goal of variable scopes is to allow for modularization of subsets of parameters, such as those belonging to layers (e.g. when architecture of a layer is repeated, the same names can be used within each layer scope).
In your example you create parameters only in the model function. You can print out your variable names to see that it is assigned to the specified scope:
from __future__ import print_function
X = tf.placeholder("float") # create symbolic variables
Y = tf.placeholder("float")
print("X:", X.name)
print("Y:", Y.name)
def model(X):
with tf.variable_scope("param"):
w = tf.Variable(0.0, name="weights") # create a shared variable (like theano.shared) for the weight matrix
print("w:", w.name)
return tf.mul(X, w)
The call to sess.run(train_op, feed_dict={X: x, Y: y}) only evaluates the value of train_op given the provided values of X and Y. No new variables (incl. parameters) are created there; therefore, it has no effect. You can make sure the variable names stay the same by again printing them out:
with tf.variable_scope("train"):
print("X:", X.name)
print("Y:", Y.name)
for i in range(100):
for (x, y) in zip(trX, trY):
sess.run(train_op, feed_dict={X: x, Y: y})
You will see that variable names stay the same, as they are already initialized.
If you'd like to retrieve a variable using its scope, you need to use get_variable within a tf.variable_scope enclosure:
with tf.variable_scope("param"):
w = tf.get_variable("weights", [1])
print("w:", w.name)