Confused on how tensorflow feed_dict works - tensorflow

Recently started using tensorflow and I'm really confused on the functionality of feed_dict.
Looking at the mnist example from the tensorflow website, x is a symbolic placeholder that will be filled with a new batch of images every training iteration, so 'None' here could also be 'batch_size'
x = tf.placeholder(tf.float32, shape=[None, 784])
when looking at the convolutional part of this tutorial, there's a command to reshape x from it's flattened 1x784 shape back to a 2D image 28x28 shape
x_image = tf.reshape(x, [-1,28,28,1])
during the training loop, x is fed through the command
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
My question is when we feed in values to x, does tensorflow automatically vectorize every op involving x? So for example when we define the op
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
this will automatically work across the entire batch?
if x is ndarray with each row being a flattened image, because we specified shape 'None' in the x placeholder tensorflow automatically knows to use each row as an individual training sample, and vectorize all subsequent ops?

The shape argument is used for static shape inference (ie,tensor.get_shape) and is optional. TensorFlow doesn't vectorize anything automatically, but for binary cwise ops it uses broadcasting which looks a bit like that. In your example, tf.conv2d is an operation that treats each row as an example, so it works with batches, but not with individual examples. Also batch[0] is a batch of inputs, and batch[1] is a batch of labels.

Related

Tensorflow - training Adam

I try to build my first simple neural network with tensorflow, above you can see my code. My code can calculate the loss, but when i try to add the train_step i got the error message InvalidArgumentError (see above for traceback): Matrix size-incompatible: In[0]: [2,2], In[1]: [1024,1], which says that the dimensions of the matrxis aren't compatible, but i don't understand the dimensions. In my opinion they must be [1] and [1]...
input=[[1,2,3,4,5],[6,7,8,9,10]]
labels=[1,1]
x = tf.placeholder(tf.float32, [None, 5])
y = tf.placeholder(tf.float32)
hidden = tf.layers.dense(inputs=x, units=1024, activation=tf.nn.relu)
output = tf.layers.dense(inputs=hidden, units=1)
loss = tf.losses.softmax_cross_entropy(y, output)
train_step = tf.train.AdamOptimizer(1).minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(1):
result = sess.run(train_step, feed_dict={x: input,y: labels})
print(result)
The reason is due to your input and labels being inconsistent. For your inputs, you have 2 input vectors with dimensions (1, 5). In your output layer, you have one output. And in your labels, you have only one example of dimension (1,2).
Two fixes depending on what you wanted to do. If you meant to do two training examples (which is what it looks like you're doing):
input=[[1,2,3,4,5],[6,7,8,9,10]]
labels=[[1],[1]]
and keep the rest the same. This way, you have 2 input vectors, and 2 label examples.
Second possible interpretation, where you are feeding in 2 input vectors, both with the label of [1, 1]. Then keep everything the same, but change the output layer to:
output = tf.layers.dense(inputs=hidden, units=2)
I'm pretty sure the first fix is what you're looking for. Also your code will never update your neural network because you did not sess.run(train_step) anywhere. If you want it to actually train, you'll need that step as well.

Tensorflow: calculate gradient for tf.multiply

I'm building a neural network that has the following two layers
pseudo_inputs = tf.Variable(a_numpy_ndarray)
weights = tf.Variable(tf.truncated_normal(...))
I then want to multiply them using tf.multiply (which, unlike tf.matmul multiplies corresponding indices, i.e. c_ij = a_ij * b_ij)
input = tf.multiply(pseudo_inputs, weights)
My goal is to learn weights. So I run
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss, var_list=[weights])
But it doesn't work. The network doesn't change at all.
Looking at tensorboard, I could see that 'input' has no gradient, so I'm assuming that's the problem. Any ideas how to solve this?
From reading tensorflow docs it seems like I might have to write a gradient op for tf.multiply, but I find it hard to believe no one needed to do this before.
I thought the pseudo_inputs should be set as Placeholders in the first line.
And in this line:
train_step = tf.train.AdamOptimizer(learn_rate).minimize(loss, var_list=[weights])
Since weights are to be trained in the graph by minimizing loss then it should not passed as a parameter here.
train = tf.train.AdamOptimizer(learn_rate).minimize(loss)
Then you should first run the train using the samples(you don't have labels) you have.
for x_train, y_train in samples:
sess.run(train, {pseudo_inputs:x_train, y:y_train})
And after that you can get weights by:
W_c, loss_c = sess.run([W, loss], {pseudo_inputs=x_train, y:y_train})

What is the right way to Computing correct predictions in Tensorflow?

I'm feeding a simple ConvNet in Tensorflow using a tfrecords file containing grayscale images as inputs and integer class labels.
my loss is defined as loss = tf.nn.sparse_softmax_cross_entropy_with_logits(y_conv, label_batch)
where y_conv=tf.matmul(h_fc1_drop,W_fc2) + b_fc2
and label_batch is tensor of size [batch_size].
I'm trying to compute the accuracy by using
correct_prediction = tf.equal(tf.argmax(label_batch,1),tf.argmax(y_conv, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
This correct_prediction statement is giving an error:
InvalidArgumentError (see above for traceback): Minimum tensor rank: 2 but got: 1
I'm a bit confused as to how exactly one computes correct predictions in TF.
You probably want to use 0 for the dimension argument to tf.argmax since label_batch and y_conv are vectors. Using dimension=1 implies a tensor rank of at least 2. See the documentation for the dimension parameter of argmax here.
I hope that helps!
For your y_conv you do everything right -- it is a matrix of shape (batch_size, n_classes) where for each sample and for each class you have a probability that this is the class the image belongs to. So to get the actual predicted class you need to call argmax.
However your labels are integers and have a shape of just (batch_size,), because the class of an image is known and there's no reason to supply n_classes probabilities, a single integer can hold the actual class just as well. So you don't need to call argmax on it to convert probabilities to a class, it already has the class. To fix it, just do
correct_prediction = tf.equal(label_batch, tf.argmax(y_conv, 1))

Tensorflow: optimize over input with gradient descent

I have a TensorFlow model (a convolutional neural network) which I successfully trained using gradient descent (GD) on some input data.
Now, in a second step, I would like to provide an input image as initialization then and optimize over this input image with fixed network parameters using GD. The loss function will be a different one, but this a detail.
So, my main question is how to tell the gradient descent algorithm to
stop optimizing the network parameters
to optimize over the input image
The first can probably done with this
Holding variables constant during optimizer
Do you guys have ideas about the second point?
I guess I can recode the gradient descent algorithm myself using the TF gradient function, but my gut feeling tells me that there should be an easier way, which also allows me to benefit from more complex GD variants (Adam etc.).
No need for your SDG own implementation. TensorFlow provides all functions:
import tensorflow as tf
import numpy as np
# some input
data_pldhr = tf.placeholder(tf.float32)
img_op = tf.get_variable('input_image', [1, 4, 4, 1], dtype=tf.float32, trainable=True)
img_assign = img_op.assign(data_pldhr)
# your starting image
start_value = (np.ones((4, 4), dtype=np.float32) + np.eye(4))[None, :, :, None]
# override variable_getter
def nontrainable_getter(getter, *args, **kwargs):
kwargs['trainable'] = False
return getter(*args, **kwargs)
# all variables in this scope are not trainable
with tf.variable_scope('myscope', custom_getter=nontrainable_getter):
x = tf.layers.dense(img_op, 10)
y = tf.layers.dense(x, 10)
# the usual stuff
cost_op = tf.losses.mean_squared_error(x, y)
train_op = tf.train.AdamOptimizer(0.1).minimize(cost_op)
# fire up the training process
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(img_assign, {data_pldhr: start_value})
print(sess.run(img_op))
for i in range(10):
_, c = sess.run([train_op, cost_op])
print(c)
print(sess.run(img_op))
represent an image as tf.Variable with trainable=True
initialise this variable with the starting image (initial guess)
recreate the NN graph using TF variables with trainable=False and copy the weights from the trained NN graph using tf.assign
calculate the loss function
plug the loss into any TF optimiser algorithm you want
Another alternative is to use ScipyOptimizerInterface, which allows to use scipy's minimizer. This supports constrained minimization.
I'm looking for a solution to the same problem, but my model is not an easy one as I have an LSTM network with cells created with MultiRNNCell, I don't think it is possible to get the weight and clone the network. Is there any workaround so that I can compute the gradient wrt the input?

Why do we need to worry about the batch dimension when specifying a model in Tensorflow?

It seems a bit cumbersome to take into account the batch dimension for every layer in a neural network. Why don't we have some functionality in Tensorflow that can just set the batch size for an entire model?
In tensorflow you do not have to take into account the batch size.
In the MNIST Tutorial it's explained how tensorflow handles batches of every size.
Quoting the tutorial:
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
The input images x will consist of a 2d tensor of floating point numbers. Here we assign it a shape of [None, 784], where 784 is the dimensionality of a single flattened MNIST image, and None indicates that the first dimension, corresponding to the batch size, can be of any size.