Tensorflow: access shape of placeholder after NN layer in code - tensorflow

So, here is what I want to do:
Right now, I have padding = 'SAME' for all of my neural net layers. I would like to make my code more generic, so I can build my nets with arbitrary paddings, and I don't want to have to calculate how big the output tensors of the layers of my net are. I would like to just access the dimension at initialization/run time, the way the tf.nn functions apparently do internally, so I can initialize my weight and bias tensors in the correct dimension...
So,
How do I access the "shape" function/object of the output placeholder of a convolution?

There are two kinds of shapes -- tensor.get_shape() which gives static shape computed by Python wrappers during Graph construction (whenever possible), and tf.shape(tensor) which is an op that can be executed during runtime to get shape of the tensor (always possible). Both of these work for convolutions.
a = tf.Variable(tf.ones((1, 3, 3, 1)))
b = tf.Variable(tf.ones((3, 3, 1, 1)))
c = tf.nn_ops.conv2d(a, b, [1, 1, 1, 1], padding="VALID")
sess = create_session()
sess.run(tf.initialize_all_variables())
print c.get_shape()
print sess.run(tf.shape(c))
This gives
(1, 1, 1, 1)
[1 1 1 1]

Related

Tabular data: Implementing a custom tensor layer without resorting to iteration

I have an idea for a tensor operation that would not be difficult to implement via iteration, with batch size one. However I would like to parallelize it as much as possible.
I have two tensors with shape (n, 5) called X and Y. X is actually supposed to represent 5 one-dimensional tensors with shape (n, 1): (x_1, ..., x_n). Ditto for Y.
I would like to compute a tensor with shape (n, 25) where each column represents the output of the tensor operation f(x_i, y_j), where f is fixed for all 1 <= i, j <= 5. The operation f has output shape (n, 1), just like x_i and y_i.
I feel it is important to clarify that f is essentially a fully-connected layer from the concatenated [...x_i, ...y_i] tensor with shape (1, 10), to an output layer with shape (1,5).
Again, it is easy to see how to do this manually with iteration and slicing. However this is probably very slow. Performing this operation in batches, where the tensors X, Y now have shape (n, 5, batch_size) is also desirable, particularly for mini-batch gradient descent.
It is difficult to really articulate here why I desire to create this network; I feel it is suited for my domain of 'itemized tabular data' and cuts down significantly on the number of weights per operation, compared to a fully connected network.
Is this possible using tensorflow? Certainly not using just keras.
Below is an example in numpy per AloneTogether's request
import numpy as np
features = 16
batch_size = 256
X_batch = np.random.random((features, 5, batch_size))
Y_batch = np.random.random((features, 5, batch_size))
# one tensor operation to reduce weights in this custom 'layer'
f = np.random.random((features, 2 * features))
for b in range(batch_size):
X = X_batch[:, :, b]
Y = Y_batch[:, :, b]
for i in range(5):
x_i = X[:, i:i+1]
for j in range(5):
y_j = Y[:, j:j+1]
x_i_y_j = np.concatenate([x_i, y_j], axis=0)
# f(x_i, y_j)
# implemented by a fully-connected layer
f_i_j = np.matmul(f, x_i_y_j)
All operations you need (concatenation and matrix multiplication) can be batched.
Difficult part here is, that you want to concatenate features of all items in X with features of all items in Y (all combinations).
My recommended solution is to expand the dimensions of X to [batch, features, 5, 1], expand dimensions of Y to [batch, features, 1, 5]
Than tf.repeat() both tensors so their shapes become [batch, features, 5, 5].
Now you can concatenate X and Y. You will have a tensor of shape [batch, 2*features, 5, 5]. Observe that this way all combinations are built.
Next step is matrix multiplication. tf.matmul() can also do batch matrix multiplication, but I use here tf.einsum() because I want more control over which dimensions are considered as batch.
Full code:
import tensorflow as tf
import numpy as np
batch_size=3
features=6
items=5
x = np.random.uniform(size=[batch_size,features,items])
y = np.random.uniform(size=[batch_size,features,items])
f = np.random.uniform(size=[2*features,features])
x_reps= tf.repeat(x[:,:,:,tf.newaxis], items, axis=3)
y_reps= tf.repeat(y[:,:,tf.newaxis,:], items, axis=2)
xy_conc = tf.concat([x_reps,y_reps], axis=1)
f_i_j = tf.einsum("bfij, fg->bgij", xy_conc,f)
f_i_j = tf.reshape(f_i_j , [batch_size,features,items*items])

how tf.space_to_depth() works in tensorflow?

I am a pytorch user. I have got a pretrained model in tensorflow and I would like to transfer it into pytorch. In one part of model architecture, I mean in tensorflow-defined model, there is a function tf.space_to_depth which transfers an input size of (None, 38,38,64) to (None, 19,19, 256). (https://www.tensorflow.org/api_docs/python/tf/space_to_depth) is the doc of this function. But I could not understand what this function actually do. Could you please provide some numpy codes to illustrate it for me?
Actually I would like to make an exact similar layer in pytorch.
Some codes in tensorflow reveals another secret:
Here is some codes:
import numpy as np
import tensorflow as tf
norm = tf.random_normal([1, 2, 2, 1], mean=0, stddev=1)
trans = tf.space_to_depth(norm,2)
with tf.Session() as s:
norm = s.run(norm)
trans = s.run(trans)
print("Norm")
print(norm.shape)
for index,value in np.ndenumerate(norm):
print(value)
print("Trans")
print(trans.shape)
for index,value in np.ndenumerate(trans):
print(value)
And here is the output:
Norm
(1, 2, 2, 1)
0.695261
0.455764
1.04699
-0.237587
Trans
(1, 1, 1, 4)
1.01139
0.898777
0.210135
2.36742
As you can see above, In Addition to data reshaping, the tensor values has changed!
This tf.space_to_depth divides your input into blocs and concatenates them.
In your example the input is 38x38x64 (and I guess the block_size is 2). So the function divides your input into 4 (block_size x block_size) and concatenates them which gives your 19x19x256 output.
You just need to divide each of your channel (input) into block_size*block_size patches (each patch has a size of width/block_size x height/block_size) and concatenate all of these patches. Should be pretty straightforward with numpy.
Hope it helps.
Conclusion: tf.space_to_depth() only outputs a copy of the input tensor where values from the height and width dimensions are moved to the depth dimension.
If you modify your code a little bit, like this
norm = tf.random_normal([1, 2, 2, 1], mean=0, stddev=1)
with tf.Session() as s:
norm = s.run(norm)
trans = tf.space_to_depth(norm,2)
with tf.Session() as s:
trans = s.run(trans)
Then you will have the following results:
Norm
(1, 2, 2, 1)
-0.130227
2.04587
-0.077691
-0.112031
Trans
(1, 1, 1, 4)
-0.130227
2.04587
-0.077691
-0.112031
Hope this can help you.
A good reference for PyTorch is the implementation of the PixelShuffle module here. This shows the implementation of something equivalent to Tensorflow's depth_to_space. Based on that we can implement pixel_shuffle with a scaling factor less than 1 which would be like space_to_depth. E.g., downscale_factor=0.5 is like space_to_depth with block_size=2.
def pixel_shuffle_down(input, downscale_factor):
batch_size, channels, in_height, in_width = input.size()
out_channels = channels / (downscale_factor ** 2)
block_size = 1 / downscale_factor
out_height = in_height * downscale_factor
out_width = in_width * downscale_factor
input_view = input.contiguous().view(
batch_size, channels, out_height, block_size, out_width, block_size)
shuffle_out = input_view.permute(0, 1, 3, 5, 2, 4).contiguous()
return shuffle_out.view(batch_size, out_channels, out_height, out_width)
Note: I haven't verified this implementation yet and I'm not sure if it's exactly the inverse of pixel_shuffle but this is the basic idea. I've also opened an issue on the PyTorch Github about this here. In NumPy the equivalent code would use reshapeand transpose instead of view and permute respectively.
Using split and stack functions along with permute in Pytorch gives us the same result as space_to_depth in tensorflow does. Here is the code in Pytorch.
Assume that input is in BHWC format.
Based on block_size and input shape, we can caculate the output shape.
First, it splits the input on the "width" dimension or dimension #2 by block_size. The result of this operation is an array of length d_width. It's just like you cut a cake (by block_size) into d_width pieces.
Then for each piece, you reshape it so it has correct output height and output depth (channel). Finally, we stack those pieces together and perform a permutation.
Hope it helps.
def space_to_depth(input, block_size)
block_size_sq = block_size*block_size
(batch_size, s_height, s_width, s_depth) = input.size()
d_depth = s_depth * self.block_size_sq
d_width = int(s_width / self.block_size)
d_height = int(s_height / self.block_size)
t_1 = input.split(self.block_size, 2)
stack = [t_t.contiguous().view(batch_size, d_height, d_depth) for t_t in t_1]
output = torch.stack(stack, 1)
output = output.permute(0, 2, 1, 3)
return output
maybe this one works:
sudo apt install nvidia-cuda-toolkit
it worked for me.

Slicing a tensor by an index tensor in Tensorflow

I have two following tensors (note that they are both Tensorflow tensors which means they are still virtually symbolic at the time I construct the following slicing op before I launch a tf.Session()):
params: has shape (64,784, 256)
indices: has shape (64, 784)
and I want to construct an op that returns the following tensor:
output: has shape (64,784) where
output[i,j] = params_tensor[i,j, indices[i,j] ]
What is the most efficient way in Tensorflow to do so?
ps: I tried with tf.gather but couldn't make use of it to perform the operation I described above.
Many thanks.
-Bests
You can get exactly what you want using tf.gather_nd. The final expression is:
tf.gather_nd(params, tf.stack([tf.tile(tf.expand_dims(tf.range(tf.shape(indices)[0]), 1), [1, tf.shape(indices)[1]]), tf.transpose(tf.tile(tf.expand_dims(tf.range(tf.shape(indices)[1]), 1), [1, tf.shape(indices)[0]])), indices], 2))
This expression has the following explanation:
tf.gather_nd does what you expected and uses the indices to gather the output from the params
tf.stack combines three separate tensors, the last of which is the indices. The first two tensors specify the ordering of the first two dimensions (axis 0 and axis 1 of params/indices)
For the example provided, this ordering is simply 0, 1, 2, ..., 63 for axis 0, and 0, 1, 2, ... 783 for axis 1. These sequences are obtained with tf.range(tf.shape(indices)[0]) and tf.range(tf.shape(indices)[1]), respectively.
For the example provided, indices has shape (64, 784). The other two tensors from the last point above need to have this same shape in order to be combined with tf.stack
First, an additional dimension/axis is added to each of the two sequences using tf.expand_dims.
The use of tf.tile and tf.transpose can be shown by example: Assume the first two axes of params and index have shape (5,3). We want the first tensor to be:
[[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]]
We want the second tensor to be:
[[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]]
These two tensors almost function like specifying the coordinates in a grid for the associated indices.
The final part of tf.stack combines the three tensors on a new third axis, so that the result has the same 3 axes as params.
Keep in mind if you have more or less axes than in the question, you need to modify the number of coordinate-specifying tensors in tf.stack accordingly.
What you want is like a custom reduction function. If you are keeping something like index of maximum value at indices then I would suggest using tf.reduce_max:
max_params = tf.reduce_max(params_tensor, reduction_indices=[2])
Otherwise, here is one way to get what you want (Tensor objects are not assignable so we create a 2d list of tensors and pack it using tf.pack):
import tensorflow as tf
import numpy as np
with tf.Graph().as_default():
params_tensor = tf.pack(np.random.randint(1,256, [5,5,10]).astype(np.int32))
indices = tf.pack(np.random.randint(1,10,[5,5]).astype(np.int32))
output = [ [None for j in range(params_tensor.get_shape()[1])] for i in range(params_tensor.get_shape()[0])]
for i in range(params_tensor.get_shape()[0]):
for j in range(params_tensor.get_shape()[1]):
output[i][j] = params_tensor[i,j,indices[i,j]]
output = tf.pack(output)
with tf.Session() as sess:
params_tensor,indices,output = sess.run([params_tensor,indices,output])
print params_tensor
print indices
print output
I know I'm late, but I recently had to do something similar, and was able to to do it using Ragged Tensors:
output = tf.gather(params, tf.RaggedTensor.from_tensor(indices), batch_dims=-1, axis=-1)
Hope it helps

Why inconsistent shapes numpy vs cntk?

I am just starting to learn cntk. However, I have a basic question that is holding me back from progressing. I have the following test that passes:
import numpy as np
from cntk import input_variable, plus
def test_simple(self):
x_input = np.asarray([[1, 2, 2]], dtype=np.int64)
assert (1, 3) == x_input.shape
y_input = np.asarray([[5, 3, 3]], dtype=np.int64)
assert (1, 3) == y_input.shape
x = input_variable(x_input.shape[1])
assert (3, ) == x.shape
y = input_variable(y_input.shape[1])
assert (3, ) == y.shape
x_plus_y = plus(x, y)
assert (3, ) == x_plus_y.shape
res = x_plus_y.eval({x: x_input, y: y_input})
assert 6 == res[0, 0, 0]
assert 5 == res[0, 0, 1]
assert 5 == res[0, 0, 2]
I understand that the shape of the output is (1, 1, 3) as the first and second axis are the batch and default dynamic axis respectively.
However, why do I need to set the shape of the input variables as (3,) instead of (1, 3). Using (1, 3) fails.
Why is there an inconsistency between the shape of the input node in the graph and the numpy data used as input to that node?
Thank you,
Paddy
This is explained a little bit in the description of "arguments" for Function.forward. Another description is here. The reason for your confusion probably is that CNTK does some "helpful" conversions.
If you specify your input as (1,3) then you need to provide a list of (1,3) arrays in case of a minibatch without a sequence axis or a list of (x,1,3) arrays in case of a minibatch with a sequence axis (where x is potentially different for each sequence in the minibatch). Similarly, if you specify an input as (3,) then you need to either provide a list of (3,) vectors or a list of (x,3) vectors.
The confusion probably arises from the case when a list is not provided. In that case CNTK iterates over the leading axis of the provided tensor and creates a list out of those elements e.g. a (5,1,3) tensor becomes a batch of 5 elements each having a shape of (1,3).

No broadcasting for tf.matmul in TensorFlow

I have a problem with which I've been struggling. It is related to tf.matmul() and its absence of broadcasting.
I am aware of a similar issue on https://github.com/tensorflow/tensorflow/issues/216, but tf.batch_matmul() doesn't look like a solution for my case.
I need to encode my input data as a 4D tensor:
X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
The first dimension is the size of a batch, the second the number of entries in the batch.
You can imagine each entry as a composition of a number of objects (third dimension). Finally, each object is described by a vector of 100 float values.
Note that I used None for the second and third dimensions because the actual sizes may change in each batch. However, for simplicity, let's shape the tensor with actual numbers:
X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
These are the steps of my computation:
compute a function of each vector of 100 float values (e.g., linear function)
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.matmul(X, W)
problem: no broadcasting for tf.matmul() and no success using tf.batch_matmul()
expected shape of Y: (5, 10, 4, 50)
applying average pooling for each entry of the batch (over the objects of each entry):
Y_avg = tf.reduce_mean(Y, 2)
expected shape of Y_avg: (5, 10, 50)
I expected that tf.matmul() would have supported broadcasting. Then I found tf.batch_matmul(), but still it looks like doesn't apply to my case (e.g., W needs to have 3 dimensions at least, not clear why).
BTW, above I used a simple linear function (the weights of which are stored in W). But in my model I have a deep network instead. So, the more general problem I have is automatically computing a function for each slice of a tensor. This is why I expected that tf.matmul() would have had a broadcasting behavior (if so, maybe tf.batch_matmul() wouldn't even be necessary).
Look forward to learning from you!
Alessio
You could achieve that by reshaping X to shape [n, d], where d is the dimensionality of one single "instance" of computation (100 in your example) and n is the number of those instances in your multi-dimensional object (5*10*4=200 in your example). After reshaping, you can use tf.matmul and then reshape back to the desired shape. The fact that the first three dimensions can vary makes that little tricky, but you can use tf.shape to determine the actual shapes during run time. Finally, you can perform the second step of your computation, which should be a simple tf.reduce_mean over the respective dimension. All in all, it would look like this:
X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
X_ = tf.reshape(X, [-1, 100])
Y_ = tf.matmul(X_, W)
X_shape = tf.gather(tf.shape(X), [0,1,2]) # Extract the first three dimensions
target_shape = tf.concat(0, [X_shape, [50]])
Y = tf.reshape(Y_, target_shape)
Y_avg = tf.reduce_mean(Y, 2)
As the renamed title of the GitHub issue you linked suggests, you should use tf.tensordot(). It enables contraction of axes pairs between two tensors, in line with Numpy's tensordot(). For your case:
X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.tensordot(X, W, [[3], [0]]) # gives shape=[5, 10, 4, 50]