Tensorflow: what exactly does tf.gradients() return - tensorflow

Quick question as I'm kind of confused here.
Let's say we have a simple graph:
a = tf.Variable(tf.truncated_normal(shape=[200, 1], mean=0., stddev=.5))
b = tf.Variable(tf.truncated_normal(shape=[200, 100], mean=0., stddev=.5))
add = a+b
add
<tf.Tensor 'add:0' shape=(200, 100) dtype=float32> #shape is because of broadcasting
So I've got a node that takes in 2 tensors, and produces 1 tensor as an output. Let's now run tf.gradients on it
tf.gradients(add, [a, b])
[<tf.Tensor 'gradients/add_grad/Reshape:0' shape=(200, 1) dtype=float32>,
<tf.Tensor 'gradients/add_grad/Reshape_1:0' shape=(200, 100) dtype=float32>]
So we get gradients exactly in the shape of the input tensors. But... why?
Not like there's a single metric with respect to which we can take the partial derivative. Shouldn't the gradients map from every single value of the input tensors to every single value of the output tensor, effectively giving a 200x1x200x100 gradients for input a?
This is just a simple example where every element of the output tensor depends only on one value from tensor b, and one row from tensor a. However if we did something more complicated, like running a gaussian blur on a tensor then gradients would surely have to be bigger than just the input tensor.
What am I getting here wrong?

By default tf.gradients takes the gradient of the scalar you get by summing all elements of all tensors passed to tf.gradients as outputs.

Related

How to use metric with three inputs(GAP metric) in Keras while training?

This is GAP metric code from kaggle
def GAP(pred, conf, true):
x = pd.DataFrame({'pred': pred, 'conf': conf, 'true': true})
x.sort_values('conf', ascending=False, inplace=True, na_position='last')
x['correct'] = (x.true == x.pred).astype(int)
x['prec_k'] = x.correct.cumsum() / (np.arange(len(x)) + 1)
x['term'] = x.prec_k * x.correct
gap = x.term.sum() / x.true.count()
return gap
I want to use it while training, but it get conf argument - vector of probability or confidence scores for prediction. But metrics must get only two arguments. Does any possibility to use it like this:
model.compile(loss='my_loss',metrics=[GAP])
Yes.. There is a way you can do this with a small tweak. Note that frameworks like Keras support loss functions and metrics of the form fun(true, pred). The function definition should be in this form only.
Also, the second limitation is, the shapes of both true and pred must be same.
Tweaking the first limitation: Concatenate the two output tensors into one. Suppose you have x number of output classes, then shape of conf and pred will be (None, x). You can concatenate these two tensors into one producing final_output with shape (None, 2, x).
Doing this is only the first step. It won't work unless we tweak the second limitation.
Now let us tweak the second limitation: This limitation can be shortened to: "The dimensions of both these tensors must be same." Note that I am trying to reduce the limitation from shape to dimensions. This can be done by having dynamic shapes, for ex: shape(true) = (None, 1, x) and shape(pred) = (None, None, x) will not throw errors as None can take any value at runtime. In short, add a layer at the end of the model to combine outputs and that layer should have dynamic output shape.
But in your case, true will also have shape (None, x). You can just expand dimensions of this tensor at axis=1 to get (None, 1, x) and then the newly generated true can be provided as input to the model.
Note that as you are combining two tensors, the final_output will always have shape (None, 2, x) which isn't equal to (None, 1, x). But as we have configured the last layer to return dynamic shape i.e. (None, None, x), this will not be a problem at compile time. And Keras never checks for shape mismatch at runtime except an operation on tensor causes that error.
Now, that you have final_output with same shape as true, you just need to slice the final_output to get back the original two tensors pred and conf in your custom loss function and metrics.
The above was purely logical.. To see an example implementation, check out layers and loss function here.

Difference between Tensorflow Operation and Tensor?

I am confused about the difference between Tensorflow Operation and Tensor objects. More specifically, what are the relationships between them and what are the design philosophies behind them.
x = tf.constant([[37.0, -23.0], [1.0, 4.0]])
w = tf.Variable(tf.random_uniform([2, 2]))
y = tf.matmul(x, w)
output = tf.nn.softmax(y, name="output")
output
<tf.Tensor 'output_7:0' shape=(2, 2) dtype=float32>
output2 = tf.get_default_graph().get_operation_by_name("output")
output2
<tf.Operation 'output' type=Softmax>
If I want to pass output2 to sess.run([output2]), I will get None. Is there a way to convert output2 to output?
I am a PyTorch user, what will be the analogy of Operation and Tensor in PyTorch?
I've not used PyTorch but you can assume it like it's a method and variable of a Layer class. So the operation is a method and the tensor is like the variable that can store the data. So when you run sess.run([output2]), you are trying to access the value of the method and not the variable.
To access the tensor from the name of the Layer, you can use the function as:
output2 = tf.get_default_graph().get_tensor_by_name("output:0")
The :0 is used as it is the first instance of the tensor. If you create more instances of the same Layer, it will be indexed as :1, :2 and so on.
Edit: Another thing to note is that in tensorflow sess.run([output]) extracts the value of output and doesn't feed it to the graph. Values are fed to the graph via using a feed_dict or a Feed Dictionary.

tf.unstack with dynamic shape

I'm trying to unstack a Tensor because I need a sequence as input for the RNN. I am using variable sequence lengths which prevents me from correctly using tf.unstack.
def MapToSequences(x):
# x.get_shape().as_list() = [64, 1, None, 512]
x = tf.squeeze(x)
# tf.shape(x) = [None, None, None], at runtime would be [64, seqlen, 512]
x = tf.transpose(x, perm=[1, 0, 2])
# [seqlen, 64, 512]
# Here I'd like to unstack with seqlen as num
x = tf.unstack(x) # Cannot infer num from shape (?, ?, ?)
return x
I tried using tf.shape(x) to infer the seqlen and use it as num, but I get Expected int for argument 'num' not <tf.Tensor 'strided_slice:0' shape=() dtype=int32>
I believe this may be answered elsewhere, but here's an answer here. You cannot use tf.unstack with non-inferrable dimensions.
This is because of how tensorflow is designed with computation graphs defining transformations of Tensors. Each operation adds a node, and each Tensor is an edge between Nodes. When you tf.unstack a Tensor you generate multiple new Tensors (edges). If the number of new tensors created from a tf.unstack operation is undefined then the computation graph has an undefined number of edges which must not be.
Operations that don't add multiple new edges to the graph are allowed to have input Tensors with inferred dimensions (most operations).
To get around this one has two choices useful for the case of batched operations, i.e. in the case when you are trying to tf.unstack a Tensor with dimensions (batch_size, ...) and batch_size is inferrable.
Choice 1
I would use the batch_shape argument to keras.topology.Input.
The weight Tensors produced will always be interchangable with another model generated with different batch_size.
Unless you need access to the computation graph with that non-inferrable dimension there is no reason why you should not that this route.
Choice 2
A second option, in the case when you know a maximal batch_size, is to use tf.dynamic_partition.
tensor = tf.placeholder(tf.float32,shape=(None,10))
partitions = tf.range(max_batch_size)
num_partitions = max_batch_size
partitioned = tf.dynamic_partition(tensor, partitions, num_partitions, name='dynamic_unstack')
When you actually give a batch_size it will produce unstacked Tesors for the first batch_size indices, and [] empty Tensors for the rest.

shape of a sparse tensor without invoking run()

sparse tensor.shape method returns a tensor object which seems to be of no use to extract the actual shape of the sparse tensor without resorting to run function.
To clarify what I mean, first consider a sparse tensor:
a = tf.SparseTensor(indices=[[0, 0, 0], [1, 2, 1]], values=[1.0+2j, 2.0], shape=[3, 4, 2])
a.shape returns:
tf.Tensor 'SparseTensor_1/shape:0' shape=(3,) dtype=int64
This is kind of no use.
Now, consider a dense tensor:
a = tf.constant(np.random.normal(0.0, 1.0, (4, 4)).astype(dtype=np.complex128))
a.get_shape() returns:
TensorShape([Dimension(4), Dimension(4)])
I can use this output and cast it into a list or tuple of integers without ever invoking run(). However, I cannot do the same for sparse tensor, unless I first convert sparse tensor to dense (which is not implemented for complex sparse tensor yet) and then call get_shape() method on it, but this is kind of redundant, defeats the purpose of using a sparse tensor in the first place and also leads to error down the road if the input sparse tensor is complex.
Is there a way to obtain the shape of a sparse tensor without invoking run() or converting it to a dense tensor first?
tf.SparseTensor is implemented as a triple of dense Tensors under the hood. The shape of a SparseTensor is just a Tensor; if you want to know its value, your best bet is to evaluate it using session.run:
print(sess.run(a.shape))
In general, Tensorflow does not promise to compute an exact shape even for dense tensors at graph construction time; shapes are best effort and may not even have a fixed value. So even for a dense Tensor you may have to evaluate the Tensor using run to get a precise shape.

Tensorflow reshape tensor gives None dimension

I have used the model described here on the 0.6.0 branch. The code can be found here. I have done some minor changes to the linked code.
In my code I create two models, one for training and one for validation, very similar as it is done in the Tensorflow Tutorial.
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel_User(is_training=True, config=config, name='Training model')
with tf.variable_scope("model", reuse=True, initializer=initializer):
mtest = PTBModel_User(is_training=False, config=config_valid, name='Validation model')
The first model, the one for training, seems to be created just fine, but the second, used for validation, does not. The output gets a None dimension! The row I'm refering to is on row 134 in the linked code:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
I've added these lines right after the reshape of the output:
output_shape = output.get_shape()
print("Model num_steps:", num_steps)
print("Model batch_size:", batch_size)
print("Output dims", output_shape[0], output_shape[1])
and that gives me this:
Model num_steps: 400
Model batch_size: 1
Output dims Dimension(None) Dimension(650)
This problem only happens with the 'validation model', not with the 'training model'. For the 'training model' I get expected output:
Model num_steps: 400
Model batch_size: 2
Output dims Dimension(800) Dimension(650)
(Note that with the 'validation model' I use a batch_size=1 instead of batch_size=2 that I use for the training model)
From what I understand, using -1 as input to the reshape function, will figure the output shape out automagically! But then why do I get None? Nothing in my config fed to the model has a None value.
Thank you for all the help and tips!
TL;DR: A dimension being None simply means that shape inference could not determine an exact shape for the output tensor, at graph-building time. When you run the graph, the tensor will have the appropriate run-time shape.
If you're not interested in how shape inference works, you can stop reading now.
Shape inference applies local rules, based on a "shape function" that takes the shapes of the inputs to an operation and computes (possibly incomplete) shapes for the outputs of an operation. To figure out why tf.reshape() gives an incomplete shape, we have to look at its inputs, and work backwards:
The shape argument to tf.reshape() includes a [-1], which means "figure the output shape automagically" based on the shape of the tensor input.
The tensor input is the output of tf.concat() on the same line.
The inputs to tf.concat() are computed by a tf.mul() in BasicLSTMCell.__call__(). The tf.mul() op multiplies the result of a tf.tanh() and a tf.sigmoid() op.
The tf.tanh() op produces an output of size [?, hidden_size], and the tf.sigmoid() op produces an output of size [batch_size, hidden_size].
The tf.mul() op performs NumPy-style broadcasting. A dimension will only be broadcast if it has size 1. Consider three cases where we compute tf.mul(x, y):
If x has shape [1, 10], and y has shape [5, 10], then broadcasting will happen, and the output shape will be [5, 10].
If x has shape [1, 10], and y has shape [1, 10], then there will be no broadcasting, and the output shape will be [1, 10].
However, if x has shape [1, 10], and y has shape [?, 10], there is insufficient static information to tell whether broadcasting will happen (even though we happen to know that case 2 applies at runtime).
Therefore, when batch_size is 1, the tf.mul() op produces an output with the shape [?, hidden_size]; but when batch_size is greater than 1, the output shape is [batch_size, hidden_size].
Where shape inference breaks down, it can be appropriate to use the Tensor.set_shape() method to add information. This would potentially be useful in the BasicLSTMCell implementation, where we know more than it is possible to infer about the shapes of the outputs.