I am a pytorch user. I have got a pretrained model in tensorflow and I would like to transfer it into pytorch. In one part of model architecture, I mean in tensorflow-defined model, there is a function tf.space_to_depth which transfers an input size of (None, 38,38,64) to (None, 19,19, 256). (https://www.tensorflow.org/api_docs/python/tf/space_to_depth) is the doc of this function. But I could not understand what this function actually do. Could you please provide some numpy codes to illustrate it for me?
Actually I would like to make an exact similar layer in pytorch.
Some codes in tensorflow reveals another secret:
Here is some codes:
import numpy as np
import tensorflow as tf
norm = tf.random_normal([1, 2, 2, 1], mean=0, stddev=1)
trans = tf.space_to_depth(norm,2)
with tf.Session() as s:
norm = s.run(norm)
trans = s.run(trans)
print("Norm")
print(norm.shape)
for index,value in np.ndenumerate(norm):
print(value)
print("Trans")
print(trans.shape)
for index,value in np.ndenumerate(trans):
print(value)
And here is the output:
Norm
(1, 2, 2, 1)
0.695261
0.455764
1.04699
-0.237587
Trans
(1, 1, 1, 4)
1.01139
0.898777
0.210135
2.36742
As you can see above, In Addition to data reshaping, the tensor values has changed!
This tf.space_to_depth divides your input into blocs and concatenates them.
In your example the input is 38x38x64 (and I guess the block_size is 2). So the function divides your input into 4 (block_size x block_size) and concatenates them which gives your 19x19x256 output.
You just need to divide each of your channel (input) into block_size*block_size patches (each patch has a size of width/block_size x height/block_size) and concatenate all of these patches. Should be pretty straightforward with numpy.
Hope it helps.
Conclusion: tf.space_to_depth() only outputs a copy of the input tensor where values from the height and width dimensions are moved to the depth dimension.
If you modify your code a little bit, like this
norm = tf.random_normal([1, 2, 2, 1], mean=0, stddev=1)
with tf.Session() as s:
norm = s.run(norm)
trans = tf.space_to_depth(norm,2)
with tf.Session() as s:
trans = s.run(trans)
Then you will have the following results:
Norm
(1, 2, 2, 1)
-0.130227
2.04587
-0.077691
-0.112031
Trans
(1, 1, 1, 4)
-0.130227
2.04587
-0.077691
-0.112031
Hope this can help you.
A good reference for PyTorch is the implementation of the PixelShuffle module here. This shows the implementation of something equivalent to Tensorflow's depth_to_space. Based on that we can implement pixel_shuffle with a scaling factor less than 1 which would be like space_to_depth. E.g., downscale_factor=0.5 is like space_to_depth with block_size=2.
def pixel_shuffle_down(input, downscale_factor):
batch_size, channels, in_height, in_width = input.size()
out_channels = channels / (downscale_factor ** 2)
block_size = 1 / downscale_factor
out_height = in_height * downscale_factor
out_width = in_width * downscale_factor
input_view = input.contiguous().view(
batch_size, channels, out_height, block_size, out_width, block_size)
shuffle_out = input_view.permute(0, 1, 3, 5, 2, 4).contiguous()
return shuffle_out.view(batch_size, out_channels, out_height, out_width)
Note: I haven't verified this implementation yet and I'm not sure if it's exactly the inverse of pixel_shuffle but this is the basic idea. I've also opened an issue on the PyTorch Github about this here. In NumPy the equivalent code would use reshapeand transpose instead of view and permute respectively.
Using split and stack functions along with permute in Pytorch gives us the same result as space_to_depth in tensorflow does. Here is the code in Pytorch.
Assume that input is in BHWC format.
Based on block_size and input shape, we can caculate the output shape.
First, it splits the input on the "width" dimension or dimension #2 by block_size. The result of this operation is an array of length d_width. It's just like you cut a cake (by block_size) into d_width pieces.
Then for each piece, you reshape it so it has correct output height and output depth (channel). Finally, we stack those pieces together and perform a permutation.
Hope it helps.
def space_to_depth(input, block_size)
block_size_sq = block_size*block_size
(batch_size, s_height, s_width, s_depth) = input.size()
d_depth = s_depth * self.block_size_sq
d_width = int(s_width / self.block_size)
d_height = int(s_height / self.block_size)
t_1 = input.split(self.block_size, 2)
stack = [t_t.contiguous().view(batch_size, d_height, d_depth) for t_t in t_1]
output = torch.stack(stack, 1)
output = output.permute(0, 2, 1, 3)
return output
maybe this one works:
sudo apt install nvidia-cuda-toolkit
it worked for me.
Related
I would like to whiten each image in a batch. The code I have to do so is this:
def whiten(self, x):
shape = x.shape
x = K.batch_flatten(x)
mn = K.mean(x, 0)
std = K.std(x, 0) + K.epsilon()
r = (x - mn) / std
r = K.reshape(x, (-1,shape[1],shape[2],shape[3]))
return r
#
where x is (?, 320,320,1). I am not keen on the reshape function with a -1 arg. Is there a cleaner way to do this?
Let's see what the -1 does. From the Tensorflow documentation (Because the documentation from Keras is scarce compared to the one from Tensorflow):
If one component of shape is the special value -1, the size of that dimension is computed so that the total size remains constant.
So what this means:
from keras import backend as K
X = tf.constant([1,2,3,4,5])
K.reshape(X, [-1, 5])
# Add one more dimension, the number of columns should be 5, and keep the number of elements to be constant
# [[1 2 3 4 5]]
X = tf.constant([1,2,3,4,5,6])
K.reshape(X, [-1, 3])
# Add one more dimension, the number of columns should be 3
# For the number of elements to be constant the number of rows should be 2
# [[1 2 3]
# [4 5 6]]
I think it is simple enough. So what happens in your code:
# Let's assume we have 5 images, 320x320 with 3 channels
X = tf.ones((5, 320, 320, 3))
shape = X.shape
# Let's flat the tensor so we can perform the rest of the computation
flatten = K.batch_flatten(X)
# What this did is: Turn a nD tensor into a 2D tensor with same 0th dimension. (Taken from the documentation directly, let's see that below)
flatten.shape
# (5, 307200)
# So all the other elements were squeezed in 1 dimension while keeping the batch_size the same
# ...The rest of the stuff in your code is executed here...
# So we did all we wanted and now we want to revert the tensor in the shape it had previously
r = K.reshape(flatten, (-1, shape[1],shape[2],shape[3]))
r.shape
# (5, 320, 320, 3)
Besides, I can't think of a cleaner way to do what you want to do. If you ask me, your code is already clear enough.
I'm trying to writting a layer to merge 2 tensors with such a formula
The shapes of x[0] and x[1] are both (?, 1, 500).
M is a 500*500 Matrix.
I want the output to be (?, 500, 500) which is theoretically feasible in my opinion. The layer will output (1,500,500) for every pair of inputs, as (1, 1, 500) and (1, 1, 500). As the batch_size is variable, or dynamic, the output must be (?, 500, 500).
However, I know little about axes and I have tried all the combinations of axes but it doesn't make sense.
I try with numpy.tensordot and keras.backend.batch_dot(TensorFlow). If the batch_size is fixed, taking a =
(100,1,500) for example, batch_dot(a,M,(2,0)), the output can be (100,1,500).
Newbie for Keras, sorry for such a stupid question but I have spent 2 days to figure out and it drove me crazy :(
def call(self,x):
input1 = x[0]
input2 = x[1]
#self.M is defined in build function
output = K.batch_dot(...)
return output
Update:
Sorry for being late. I try Daniel's answer with TensorFlow as Keras's backend and it still raises a ValueError for unequal dimensions.
I try the same code with Theano as backend and now it works.
>>> import numpy as np
>>> import keras.backend as K
Using Theano backend.
>>> from keras.layers import Input
>>> x1 = Input(shape=[1,500,])
>>> M = K.variable(np.ones([1,500,500]))
>>> firstMul = K.batch_dot(x1, M, axes=[1,2])
I don't know how to print tensors' shape in theano. It's definitely harder than tensorflow for me... However it works.
For that I scan 2 versions of codes for Tensorflow and Theano. Following are differences.
In this case, x = (?, 1, 500), y = (1, 500, 500), axes = [1, 2]
In tensorflow_backend:
return tf.matmul(x, y, adjoint_a=True, adjoint_b=True)
In theano_backend:
return T.batched_tensordot(x, y, axes=axes)
(If following changes of out._keras_shape don't make influence on out's value.)
Your multiplications should select which axes it uses in the batch dot function.
Axis 0 - the batch dimension, it's your ?
Axis 1 - the dimension you say has length 1
Axis 2 - the last dimension, of size 500
You won't change the batch dimension, so you will use batch_dot always with axes=[1,2]
But for that to work, you must ajust M to be (?, 500, 500).
For that define M not as (500,500), but as (1,500,500) instead, and repeat it in the first axis for the batch size:
import keras.backend as K
#Being M with shape (1,500,500), we repeat it.
BatchM = K.repeat_elements(x=M,rep=batch_size,axis=0)
#Not sure if repeating is really necessary, leaving M as (1,500,500) gives the same output shape at the end, but I haven't checked actual numbers for correctness, I believe it's totally ok.
#Now we can use batch dot properly:
firstMul = K.batch_dot(x[0], BatchM, axes=[1,2]) #will result in (?,500,500)
#we also need to transpose x[1]:
x1T = K.permute_dimensions(x[1],(0,2,1))
#and the second multiplication:
result = K.batch_dot(firstMul, x1T, axes=[1,2])
I prefer using TensorFlow so I tried to figure it out with TensorFlow in past few days.
The first one is much similar to Daniel's solution.
x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(None,3,3))
tf.matmul(x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>
It needs to feed values to M with fit shapes.
sess = tf.Session()
sess.run(tf.matmul(x,M), feed_dict = {x: [[[1,2,3]]], M: [[[1,2,3],[0,1,0],[0,0,1]]]})
# return : array([[[ 1., 4., 6.]]], dtype=float32)
Another way is simple with tf.einsum.
x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(3,3))
tf.einsum('ijk,lm->ikl', x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>
Let's feed some values.
sess.run(tf.einsum('ijk,kl->ijl', x, M), feed_dict = {x: [[[1,2,3]]], M: [[1,2,3],[0,1,0],[0,0,1]]})
# return: array([[[ 1., 4., 6.]]], dtype=float32)
Now M is a 2D tensor and no need to feed batch_size to M.
What's more, now it seems such a question can be solved in TensorFlow with tf.einsum. Does it mean it's a duty for Keras to invoke tf.einsum in some situations? At least I find no where Keras calls tf.einsum. And in my opinion, when batch_dot 3D tensor and 2D tensor Keras behaves weirdly. In Daniel's answer, he pads M to (1,500,500) but in K.batch_dot() M will be adjusted to (500,500,1) automatically. I find tf will adjust it with Broadcasting rules and I'm not sure Keras does the same.
Here's my problem. I have a tensor X and I want to set all negative values to zero. In numpy, I would do the following np.maximum(0, X). Is there any way to achieve the same effect in tensorflow? I tried tf.maximum(tf.fill(X.get_shape(), 0.0), X), but this throws ValueError: Cannot convert a partially known TensorShape to a Tensor: (?,).
PS. X is a 1-D tensor of shape (?,).
As it happens, your problem is exactly the same as computing the rectifier activation function, and TensorFlow has a built-in operator, tf.nn.relu(), that does exactly what you need:
X_with_negatives_set_to_zero = tf.nn.relu(X)
You can use tf.clip_by_value function as follows:
t = tf.clip_by_value(t, min_val, max_val)
It will clip tensor t in the range [min_val, max_val]. Here you can set min_val to 0 to clip all negative values and set those to 0. More documentation about clip_by_value.
A simple solution is to use the cast function keras documentation (as suggested by #ldavid)
X = tf.cast(X > 0, X.dtype) * X
Moreover this can be adapted to any threshold level with :
X = tf.cast(X > threshold, X.dtype) * X
One possible solution could be this (although it's not the best):
class TensorClass(object):
def __init__(tensor_values):
self.test_tensor = tf.Variable(tensor_values, name="test_tensor")
test_session = tf.Session()
with test_session.as_default():
tc = TensorClass([1, -1, 2, -2, 3])
test_session.run(tf.initialize_all_variables())
test_tensor_value = test_session.run(tc.test_tensor)
print(test_tensor_value) # Will print [1, -1, 2, -2, 3]
new_test_tensor_value = [element * int(element > 0) for element in test_tensor_value]
test_tensor_value_assign_op = tf.assign(tc.test_tensor, new_test_tensor_value)
test_session.run(test_tensor_value_assign_op)
test_tensor_value = test_session.run(tc.test_tensor)
print(test_tensor_value) # Will print [1 0 2 0 3]
While this does what you need, it's not done in tensorflow. We are pulling out a tensorflow variable, changing it, and putting it back again.
For performance critical things, don't use this because it's not very efficient.
I have a problem with which I've been struggling. It is related to tf.matmul() and its absence of broadcasting.
I am aware of a similar issue on https://github.com/tensorflow/tensorflow/issues/216, but tf.batch_matmul() doesn't look like a solution for my case.
I need to encode my input data as a 4D tensor:
X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
The first dimension is the size of a batch, the second the number of entries in the batch.
You can imagine each entry as a composition of a number of objects (third dimension). Finally, each object is described by a vector of 100 float values.
Note that I used None for the second and third dimensions because the actual sizes may change in each batch. However, for simplicity, let's shape the tensor with actual numbers:
X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
These are the steps of my computation:
compute a function of each vector of 100 float values (e.g., linear function)
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.matmul(X, W)
problem: no broadcasting for tf.matmul() and no success using tf.batch_matmul()
expected shape of Y: (5, 10, 4, 50)
applying average pooling for each entry of the batch (over the objects of each entry):
Y_avg = tf.reduce_mean(Y, 2)
expected shape of Y_avg: (5, 10, 50)
I expected that tf.matmul() would have supported broadcasting. Then I found tf.batch_matmul(), but still it looks like doesn't apply to my case (e.g., W needs to have 3 dimensions at least, not clear why).
BTW, above I used a simple linear function (the weights of which are stored in W). But in my model I have a deep network instead. So, the more general problem I have is automatically computing a function for each slice of a tensor. This is why I expected that tf.matmul() would have had a broadcasting behavior (if so, maybe tf.batch_matmul() wouldn't even be necessary).
Look forward to learning from you!
Alessio
You could achieve that by reshaping X to shape [n, d], where d is the dimensionality of one single "instance" of computation (100 in your example) and n is the number of those instances in your multi-dimensional object (5*10*4=200 in your example). After reshaping, you can use tf.matmul and then reshape back to the desired shape. The fact that the first three dimensions can vary makes that little tricky, but you can use tf.shape to determine the actual shapes during run time. Finally, you can perform the second step of your computation, which should be a simple tf.reduce_mean over the respective dimension. All in all, it would look like this:
X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
X_ = tf.reshape(X, [-1, 100])
Y_ = tf.matmul(X_, W)
X_shape = tf.gather(tf.shape(X), [0,1,2]) # Extract the first three dimensions
target_shape = tf.concat(0, [X_shape, [50]])
Y = tf.reshape(Y_, target_shape)
Y_avg = tf.reduce_mean(Y, 2)
As the renamed title of the GitHub issue you linked suggests, you should use tf.tensordot(). It enables contraction of axes pairs between two tensors, in line with Numpy's tensordot(). For your case:
X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.tensordot(X, W, [[3], [0]]) # gives shape=[5, 10, 4, 50]
So, here is what I want to do:
Right now, I have padding = 'SAME' for all of my neural net layers. I would like to make my code more generic, so I can build my nets with arbitrary paddings, and I don't want to have to calculate how big the output tensors of the layers of my net are. I would like to just access the dimension at initialization/run time, the way the tf.nn functions apparently do internally, so I can initialize my weight and bias tensors in the correct dimension...
So,
How do I access the "shape" function/object of the output placeholder of a convolution?
There are two kinds of shapes -- tensor.get_shape() which gives static shape computed by Python wrappers during Graph construction (whenever possible), and tf.shape(tensor) which is an op that can be executed during runtime to get shape of the tensor (always possible). Both of these work for convolutions.
a = tf.Variable(tf.ones((1, 3, 3, 1)))
b = tf.Variable(tf.ones((3, 3, 1, 1)))
c = tf.nn_ops.conv2d(a, b, [1, 1, 1, 1], padding="VALID")
sess = create_session()
sess.run(tf.initialize_all_variables())
print c.get_shape()
print sess.run(tf.shape(c))
This gives
(1, 1, 1, 1)
[1 1 1 1]