what is the best way to multiply tensors in tensorflow - tensorflow

Suppose that I have tensors x[i,j,k] and y[p,q] in a graph. What is the correct way to specify the tensor z[i,j,k,p,q] = x[i,j,k]y[p,q]? This is the coordinate representation of the tensor product of x and y. I can get the job done using a combination of tf.expand_dims, tf.mult and tf.tile, but I feel like there should be a better way...

I think you can get away without the tile operation using broadcasting.
x_reshaped = tf.reshape(x, (i, j, k, 1, 1))
y_reshaped = tf.reshape(y, (1, 1, 1, p, q))
z = x_reshaped * y_reshaped
When a dimension has size 1 and does not match the size of the other tensor's dimensions it is being multiplied with, it is copied / broadcasted automatically along that dimension and the product is carried out. Tile is often unnecessary. I actually don't think I have ever even used tile in tensorflow. Here I also used reshape rather than expand_dims but the result is the same either way.

Related

How to do 2D Convolution only at a specific location?

This question has been asked multiple times but still I could not get what I was looking for. Imagine
data=np.random.rand(N,N) #shape N x N
kernel=np.random.rand(3,3) #shape M x M
I know convolution typically means placing the kernel all over the data. But in my case N and M are of the orders of 10000. So I wish to get the value of the convolution at a specific location in the data, say at (10,37) without doing unnecessary calculations at all locations. So the output will be just a number. The main goal is to reduce the computation and memory expenses. Is there any inbuilt function that does this with minimal adjustments?
Indeed, applying the convolution for a particular position coincides with the mere sum over the entries of a (pointwise) multiplication of the submatrix in data and the flipped kernel itself. Here, is a reproducible example.
Code
N = 1000
M = 3
np.random.seed(777)
data = np.random.rand(N,N) #shape N x N
kernel= np.random.rand(M,M) #shape M x M
# Pointwise convolution = pointwise product
data[10:10+M,37:37+M]*kernel[::-1, ::-1]
>array([[0.70980514, 0.37426475, 0.02392947],
[0.24387766, 0.1985901 , 0.01103323],
[0.06321042, 0.57352696, 0.25606805]])
with output
conv = np.sum(data[10:10+M,37:37+M]*kernel[::-1, ::-1])
conv
>2.45430578
The kernel is being flipped by definition of the convolution as explained in here and was kindly pointed Warren Weckesser. Thanks!
The key is to make sense of the index you provided. I assumed it refers to the upper left corner of the sub-matrix in data. However, it can refer to the midpoint as well when M is odd.
Concept
A different example with N=7 and M=3 exemplifies the idea
and is presented in here for the kernel
kernel = np.array([[3,0,-1], [2,0,1], [4,4,3]])
which, when flipped, yields
k[::-1,::-1]
> array([[ 3, 4, 4],
[ 1, 0, 2],
[-1, 0, 3]])
EDIT 1:
Please note that the lecturer in this video does not explicitly mention that flipping the kernel is required before the pointwise multiplication to adhere to the mathematically proper definition of convolution.
EDIT 2:
For large M and target index close to the boundary of data, a ValueError: operands could not be broadcast together with shapes ... might be thrown. To prevent this, padding the matrix data with zeros can prevent this (although it blows up the memory requirement). I.e.
data = np.pad(data, pad_width=M, mode='constant')

How to convert this numpy one-liner into Tensorflow backend code?

I have multiple depthmaps which show a car from different angles. I need to calculate how well they match together in my loss function, so I have to reproject them into a different view. The depthmaps live in a cube that is relative to the length of the vehicle. The images have the shape (256,256). I already wrote the code to convert them to a pointcloud with backend functions (256*256,3). I can reproject this pointcloud to the side view with numpy like this:
reProj = np.zeros((256, 256), np.float32)
reProj[pointCloud[:, 1], pointCloud[:, 2]] = pointCloud[:, 0]
How can I convert this into keras backend code? I suspect there should be a gather somewhere in there, but I just cannot get it working.
Example:
Source depth image:
Reprojected:
Thanks for your help!
Edit: Minimal working example with data: https://easyupload.io/rwutwa
You can do this by using tf.matmul() the first input will be your pointcloud, from the dimensions i am assuming you are storing for every pixel a 3d vector x,y,z. The second input will be the 3d rotation matrix coresponding to the projection you need, keep in mind this works for every angle you want to you just need to define the 3x3 matrix.
If i understand correctly your data you need to rotate over x 90 degrees so the matrix would be
1 0 0
0 0 -1
0 1 0
read more on rotation matrices here https://en.wikipedia.org/wiki/Rotation_matrix
just go to the tree dimension and see what you need
So i finally figured it out, I was just thinking about it wrong. It is not a gather operation, is it a scatter. This works perfect now!
indices = K.stack([p[:, 1], p[:, 2]], -1)
indices = K.reshape(indices, (256, 256, 2))
indices = K.clip(indices, 0, 256 - 1)
updates = K.reshape(p[:,0], (256,256))
reProj = tf.tensor_scatter_nd_max(tf.zeros((256, 256), tf.int32), indices, updates)

How does the gradient of the sum trick work to get maxpooling positions in keras?

The keras examples directory contains a lightweight version of a stacked what-where autoencoder (SWWAE) which they train on MNIST data. (https://github.com/fchollet/keras/blob/master/examples/mnist_swwae.py)
In the original SWWAE paper, the authors compute the what and where using soft functions. However, in the keras implementation, they use a trick to get these locations. I would like to understand this trick.
Here is the code of the trick.
def getwhere(x):
''' Calculate the 'where' mask that contains switches indicating which
index contained the max value when MaxPool2D was applied. Using the
gradient of the sum is a nice trick to keep everything high level.'''
y_prepool, y_postpool = x
return K.gradients(K.sum(y_postpool), y_prepool) # How exactly does this line work?
Where y_prepool is a MxN matrix and y_postpool is a M/2 x N/2 matrix (lets assume canonical pooling of a size 2 pixels).
I have verified that the output of getwhere() is a bed of nails matrix where the nails indicate the position of the max (the local argmax if you will).
Can someone construct a small example demonstrating how getwhere works using this "Trick?"
Lets focus on the simplest example, without really talking about convolutions, say we have a vector
x = [1 4 2]
which we max-pool over (with a single, big window), we get
mx = 4
mathematically speaking, it is:
mx = x[argmax(x)]
now, the "trick" to recover one hot mask used by pooling is to do
magic = d mx / dx
there is no gradient for argmax, however it "passes" the corresponding gradient to an element in a vector at the location of maximum element, so:
d mx / dx = [0/dx[1] dx[2]/dx[2] 0/dx[3]] = [0 1 0]
as you can see, all the gradient for non-maximum elements are zero (due to argmax), and "1" appears at the maximum value because dx/x = 1.
Now for "proper" maxpool you have many pooling regions, connected to many input locations, thus taking analogous gradient of sum of pooled values, will recover all the indices.
Note however, that this trick will not work if you have heavily overlapping kernels - you might end up with bigger values than "1". Basically if a pixel is max-pooled by K kernels, than it will have value K, not 1, for example:
[1 ,2, 3]
x = [13,3, 1]
[4, 2, 9]
if we max pool with 2x2 window we get
mx = [13,3]
[13,9]
and the gradient trick gives you
[0, 0, 1]
magic = [2, 0, 0]
[0, 0, 1]

Tensorflow: What exact formula is applied in `tf.nn.sparse_softmax_cross_entropy_with_logits`?

I tried to manually recompute the outputs of this function so I created a minimal example:
logits = tf.pack(np.array([[[[0,1,2]]]],dtype=np.float32)) # img of shape (1, 1, 1, 3)
labels = tf.pack(np.array([[[1]]],dtype=np.int32)) # gt of shape (1, 1, 1)
softmaxCrossEntropie = tf.nn.sparse_softmax_cross_entropy_with_logits(logits,labels)
softmaxCrossEntropie.eval() # --> output is [1.41]
Now according to my own calculation I only get [1.23]
When manually calculating, I'm simply applying softmax
and cross-entropy:
where q(x) = sigma(x_j) or (1-sigma(x_j)) depending whether j is the correct ground truth class or not and p(x) = labels which are then one-hot-encoded
I'm not sure where the difference might originate from. I cannot really imagine that some epsilon causes such a big difference. Does someone know where I can lookup, which exact formula is used by tensorflow?
Is the source code of that exact part available?
I could only find nn_ops.py, but it only uses another function called gen_nn_ops._sparse_softmax_cross_entropy_with_logits which I couldn't find on github...
Well, usually p(x) in cross-entropy equation is true distribution, while q(x) is the distribution obtained from softmax. So, if p(x) is one-hot (and this is so, otherwise sparse cross-entropy could not be applied), cross entropy is just negative log for probability of true category.
In your example, softmax(logits) is a vector with values [0.09003057, 0.24472847, 0.66524096], so the loss is -log(0.24472847) = 1.4076059 which is exactly what you got as output.

Elegant Way to Select one Element per Row in Tensorflow

Given...
a Matrix A of shape [m, n]
a tensor I of shape [m]
I want to get a list J of elements from A where
J[i] = A[i, I[i]].
That is, I holds the index of the element to select from each row in A.
Context: I already have the argmax(A, 1) and now I also want the max.
I know that I can just use reduce_max.
And after trying around for a bit I also came up with this:
J = tf.gather_nd(A,
tf.transpose(tf.pack([tf.to_int64(tf.range(A.get_shape()[0])), I])))
Where the to_int64 is needed because range only produces int32 and argmax only produces int64.
None of the two strike me as particularly elegant.
One has runtime overhead (probably about factor n) and the other has an unknown factor cognitive overhead. Am I missing something here?
The gather() function provides a way to do it:
r = tf.random.uniform([4,5],0, 9, dtype=tf.int32)
i = tf.random.uniform([4], 0, 4, dtype=tf.int32)
tf.gather(r, i, axis=1, batch_dims=1)
This is a rather late answer, but could doing
mask = tf.one_hot(I, depth=n, dtype=tf.bool, on_value=True, off_value=False)
elements = tf.boolean_mask(A, mask)
Accomplish what you're looking for?
edit: I should point out that this is NOT a good idea if A is already a very large tensor, as this ends up making a dense matrix.
Link provided by #yaroslav-bulatov mentiones this solution:
def get_elements(data, indices):
indeces = tf.range(0, tf.shape(indices)[0])*data.shape[1] + indices
return tf.gather(tf.reshape(data, [-1]), indeces)
Your solution is not currently differentiable (because gradients for tf.gather_nd are not currently supported).
Hopefully, data[:, indices] will be introduced soon.