How does tensorflow batch_matmul work? - numpy

Tensorflow has a function called batch_matmul which multiplies higher dimensional tensors. But I'm having a hard time understanding how it works, perhaps partially because I'm having a hard time visualizing it.
What I want to do is multiply a matrix by each slice of a 3D tensor, but I don't quite understand what the shape of tensor a is. Is z the innermost dimension? Which of the following is correct?
I would most prefer the first to be correct -- it's most intuitive to me and easy to see in the .eval() output. But I suspect the second is correct.
Tensorflow says that batch_matmul performs:
out[..., :, :] = matrix(x[..., :, :]) * matrix(y[..., :, :])
What does that mean? What does that mean in the context of my example? What is being multiplied with with what? And why aren't I getting a 3D tensor the way I expected?

You can imagine it as doing a matmul over each training example in the batch.
For example, if you have two tensors with the following dimensions:
a.shape = [100, 2, 5]
b.shape = [100, 5, 2]
and you do a batch tf.matmul(a, b), your output will have the shape [100, 2, 2].
100 is your batch size, the other two dimensions are the dimensions of your data.

First of all tf.batch_matmul() was removed and no longer available. Now you suppose to use tf.matmul():
The inputs must be matrices (or tensors of rank > 2, representing
batches of matrices), with matching inner dimensions, possibly after
transposition.
So let's assume you have the following code:
import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)
Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_size of matrices nxm and batch_size of matrices mxk. Now for each pair of them you calculate nxm X mxk which gives you an nxk matrix. You will have batch_size of them.
Notice that something like this is also valid:
A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)
and will give you a shape (a, b, n, k)

You can now do it using tf.einsum, starting from Tensorflow 0.11.0rc0.
For example,
M1 = tf.Variable(tf.random_normal([2,3,4]))
M2 = tf.Variable(tf.random_normal([5,4]))
N = tf.einsum('ijk,lk->ijl',M1,M2)
It multiplies the matrix M2 with every frame (3 frames) in every batch (2 batches) in M1.
The output is:
[array([[[ 0.80474716, -1.38590837, -0.3379252 , -1.24965811],
[ 2.57852983, 0.05492432, 0.23039417, -0.74263287],
[-2.42627382, 1.70774114, 1.19503212, 0.43006262]],
[[-1.04652011, -0.32753903, -1.26430523, 0.8810069 ],
[-0.48935518, 0.12831448, -1.30816901, -0.01271309],
[ 2.33260512, -1.22395933, -0.92082584, 0.48991606]]], dtype=float32),
array([[ 1.71076882, 0.79229093, -0.58058828, -0.23246667],
[ 0.20446332, 1.30742455, -0.07969904, 0.9247328 ],
[-0.32047141, 0.66072595, -1.12330854, 0.80426538],
[-0.02781649, -0.29672042, 2.17819595, -0.73862702],
[-0.99663496, 1.3840003 , -1.39621222, 0.77119476]], dtype=float32),
array([[[ 0.76539308, 2.77609682, -1.79906654, 0.57580602, -3.21205115],
[ 4.49365759, -0.10607499, -1.64613271, 0.96234947, -3.38823152],
[-3.59156275, 2.03910899, 0.90939498, 1.84612727, 3.44476724]],
[[-1.52062428, 0.27325237, 2.24773455, -3.27834225, 3.03435063],
[ 0.02695178, 0.16020992, 1.70085776, -2.8645196 , 2.48197317],
[ 3.44154787, -0.59687197, -0.12784094, -2.06931567, -2.35522676]]], dtype=float32)]
I have verified, the arithmetic is correct.

tf.tensordot should solve this problem. It supports batch operations, e.g., if you want to contract a 2D tensor with a 3D tensor, with the latter having a batch dimension.
If a is shape [n,m] b is shape [?,m,l], then
y = tf.tensordot(b, a, axes=[1, 1]) will produce a tensor of shape [?,n,l]
https://www.tensorflow.org/api_docs/python/tf/tensordot

It is simply like splitting on the first dimension respectively, multiply and concat them back. If you want to do 3D by 2D, you can reshape, multiply, and reshape it back. I.e. [100, 2, 5] -> [200, 5] -> [200, 2] -> [100, 2, 2]

The answer to this particular answer is using tf.scan function.
If a = [5,3,2] #dimension of 5 batch, with 3X2 mat in each batch
and b = [2,3] # a constant matrix to be multiplied with each sample
then let def fn(a,x):
return tf.matmul(x,b)
initializer = tf.Variable(tf.random_number(3,3))
h = tf.scan(fn,outputs,initializer)
this h will store all the outputs.

Related

Tabular data: Implementing a custom tensor layer without resorting to iteration

I have an idea for a tensor operation that would not be difficult to implement via iteration, with batch size one. However I would like to parallelize it as much as possible.
I have two tensors with shape (n, 5) called X and Y. X is actually supposed to represent 5 one-dimensional tensors with shape (n, 1): (x_1, ..., x_n). Ditto for Y.
I would like to compute a tensor with shape (n, 25) where each column represents the output of the tensor operation f(x_i, y_j), where f is fixed for all 1 <= i, j <= 5. The operation f has output shape (n, 1), just like x_i and y_i.
I feel it is important to clarify that f is essentially a fully-connected layer from the concatenated [...x_i, ...y_i] tensor with shape (1, 10), to an output layer with shape (1,5).
Again, it is easy to see how to do this manually with iteration and slicing. However this is probably very slow. Performing this operation in batches, where the tensors X, Y now have shape (n, 5, batch_size) is also desirable, particularly for mini-batch gradient descent.
It is difficult to really articulate here why I desire to create this network; I feel it is suited for my domain of 'itemized tabular data' and cuts down significantly on the number of weights per operation, compared to a fully connected network.
Is this possible using tensorflow? Certainly not using just keras.
Below is an example in numpy per AloneTogether's request
import numpy as np
features = 16
batch_size = 256
X_batch = np.random.random((features, 5, batch_size))
Y_batch = np.random.random((features, 5, batch_size))
# one tensor operation to reduce weights in this custom 'layer'
f = np.random.random((features, 2 * features))
for b in range(batch_size):
X = X_batch[:, :, b]
Y = Y_batch[:, :, b]
for i in range(5):
x_i = X[:, i:i+1]
for j in range(5):
y_j = Y[:, j:j+1]
x_i_y_j = np.concatenate([x_i, y_j], axis=0)
# f(x_i, y_j)
# implemented by a fully-connected layer
f_i_j = np.matmul(f, x_i_y_j)
All operations you need (concatenation and matrix multiplication) can be batched.
Difficult part here is, that you want to concatenate features of all items in X with features of all items in Y (all combinations).
My recommended solution is to expand the dimensions of X to [batch, features, 5, 1], expand dimensions of Y to [batch, features, 1, 5]
Than tf.repeat() both tensors so their shapes become [batch, features, 5, 5].
Now you can concatenate X and Y. You will have a tensor of shape [batch, 2*features, 5, 5]. Observe that this way all combinations are built.
Next step is matrix multiplication. tf.matmul() can also do batch matrix multiplication, but I use here tf.einsum() because I want more control over which dimensions are considered as batch.
Full code:
import tensorflow as tf
import numpy as np
batch_size=3
features=6
items=5
x = np.random.uniform(size=[batch_size,features,items])
y = np.random.uniform(size=[batch_size,features,items])
f = np.random.uniform(size=[2*features,features])
x_reps= tf.repeat(x[:,:,:,tf.newaxis], items, axis=3)
y_reps= tf.repeat(y[:,:,tf.newaxis,:], items, axis=2)
xy_conc = tf.concat([x_reps,y_reps], axis=1)
f_i_j = tf.einsum("bfij, fg->bgij", xy_conc,f)
f_i_j = tf.reshape(f_i_j , [batch_size,features,items*items])

How can one utilize the indices provided by torch.topk()?

Suppose I have a pytorch tensor x of shape [N, N_g, 2]. It can be viewed as N * N_g 2d vectors. Specifically, x[i, j, :] is the 2d vector of the jth group in the ith batch.
Now I am trying to get the coordinates of vectors of top 5 length in each group. So I tried the following:
(i) First I used x_len = (x**2).sum(dim=2).sqrt() to compute their lengths, resulting in x_len.shape==[N, N_g].
(ii) Then I used tk = x_len.topk(5) to get the top 5 lengths in each group.
(iii) The desired output would be a tensor x_top5 of shape [N, 5, 2]. Naturally I thought of using tk.indices to index x so as to obtain x_top5. But I failed as it seems such indexing is not supported.
How can I do this?
A minimal example:
x = torch.randn(10,10,2) # N=10 is the batchsize, N_g=10 is the group size
x_len = (x**2).sum(dim=2).sqrt()
tk = x_len.topk(5)
x_top5 = x[tk.indices]
print(x_top5.shape)
# torch.Size([10, 5, 10, 2])
However, this gives x_top5 as a tensor of shape [10, 5, 10, 2], instead of [10, 5, 2] as desired.

batch_dot with variable batch size in Keras

I'm trying to writting a layer to merge 2 tensors with such a formula
The shapes of x[0] and x[1] are both (?, 1, 500).
M is a 500*500 Matrix.
I want the output to be (?, 500, 500) which is theoretically feasible in my opinion. The layer will output (1,500,500) for every pair of inputs, as (1, 1, 500) and (1, 1, 500). As the batch_size is variable, or dynamic, the output must be (?, 500, 500).
However, I know little about axes and I have tried all the combinations of axes but it doesn't make sense.
I try with numpy.tensordot and keras.backend.batch_dot(TensorFlow). If the batch_size is fixed, taking a =
(100,1,500) for example, batch_dot(a,M,(2,0)), the output can be (100,1,500).
Newbie for Keras, sorry for such a stupid question but I have spent 2 days to figure out and it drove me crazy :(
def call(self,x):
input1 = x[0]
input2 = x[1]
#self.M is defined in build function
output = K.batch_dot(...)
return output
Update:
Sorry for being late. I try Daniel's answer with TensorFlow as Keras's backend and it still raises a ValueError for unequal dimensions.
I try the same code with Theano as backend and now it works.
>>> import numpy as np
>>> import keras.backend as K
Using Theano backend.
>>> from keras.layers import Input
>>> x1 = Input(shape=[1,500,])
>>> M = K.variable(np.ones([1,500,500]))
>>> firstMul = K.batch_dot(x1, M, axes=[1,2])
I don't know how to print tensors' shape in theano. It's definitely harder than tensorflow for me... However it works.
For that I scan 2 versions of codes for Tensorflow and Theano. Following are differences.
In this case, x = (?, 1, 500), y = (1, 500, 500), axes = [1, 2]
In tensorflow_backend:
return tf.matmul(x, y, adjoint_a=True, adjoint_b=True)
In theano_backend:
return T.batched_tensordot(x, y, axes=axes)
(If following changes of out._keras_shape don't make influence on out's value.)
Your multiplications should select which axes it uses in the batch dot function.
Axis 0 - the batch dimension, it's your ?
Axis 1 - the dimension you say has length 1
Axis 2 - the last dimension, of size 500
You won't change the batch dimension, so you will use batch_dot always with axes=[1,2]
But for that to work, you must ajust M to be (?, 500, 500).
For that define M not as (500,500), but as (1,500,500) instead, and repeat it in the first axis for the batch size:
import keras.backend as K
#Being M with shape (1,500,500), we repeat it.
BatchM = K.repeat_elements(x=M,rep=batch_size,axis=0)
#Not sure if repeating is really necessary, leaving M as (1,500,500) gives the same output shape at the end, but I haven't checked actual numbers for correctness, I believe it's totally ok.
#Now we can use batch dot properly:
firstMul = K.batch_dot(x[0], BatchM, axes=[1,2]) #will result in (?,500,500)
#we also need to transpose x[1]:
x1T = K.permute_dimensions(x[1],(0,2,1))
#and the second multiplication:
result = K.batch_dot(firstMul, x1T, axes=[1,2])
I prefer using TensorFlow so I tried to figure it out with TensorFlow in past few days.
The first one is much similar to Daniel's solution.
x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(None,3,3))
tf.matmul(x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>
It needs to feed values to M with fit shapes.
sess = tf.Session()
sess.run(tf.matmul(x,M), feed_dict = {x: [[[1,2,3]]], M: [[[1,2,3],[0,1,0],[0,0,1]]]})
# return : array([[[ 1., 4., 6.]]], dtype=float32)
Another way is simple with tf.einsum.
x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(3,3))
tf.einsum('ijk,lm->ikl', x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>
Let's feed some values.
sess.run(tf.einsum('ijk,kl->ijl', x, M), feed_dict = {x: [[[1,2,3]]], M: [[1,2,3],[0,1,0],[0,0,1]]})
# return: array([[[ 1., 4., 6.]]], dtype=float32)
Now M is a 2D tensor and no need to feed batch_size to M.
What's more, now it seems such a question can be solved in TensorFlow with tf.einsum. Does it mean it's a duty for Keras to invoke tf.einsum in some situations? At least I find no where Keras calls tf.einsum. And in my opinion, when batch_dot 3D tensor and 2D tensor Keras behaves weirdly. In Daniel's answer, he pads M to (1,500,500) but in K.batch_dot() M will be adjusted to (500,500,1) automatically. I find tf will adjust it with Broadcasting rules and I'm not sure Keras does the same.

Sample from a tensor in Tensorflow along an axis

I have a matrix L of shape (2,5,2). The values along the last axis form a probability distribution. I want to sample another matrix S of shape (2, 5) where each entry is one of the following integers: 0, 1.
For example,
L = [[[0.1, 0.9],[0.2, 0.8],[0.3, 0.7],[0.5, 0.5],[0.6, 0.4]],
[[0.5, 0.5],[0.9, 0.1],[0.7, 0.3],[0.9, 0.1],[0.1, 0.9]]]
One of the samples could be,
S = [[1, 1, 1, 0, 1],
[1, 1, 1, 0, 1]]
The distributions are binomial in the above example. However, in general, the last dimension of L can be any positive integer, so the distributions can be multinomial.
The samples need to be generated efficiently within Tensorflow computation graph. I know how to do this using numpy using the functions apply_along_axis and numpy.random.multinomial.
You can use tf.multinomial() here.
You will first need to reshape your input tensor to shape [-1, N] (where N is the last dimension of L):
# L has shape [2, 5, 2]
L = tf.constant([[[0.1, 0.9],[0.2, 0.8],[0.3, 0.7],[0.5, 0.5],[0.6, 0.4]],
[[0.5, 0.5],[0.9, 0.1],[0.7, 0.3],[0.9, 0.1],[0.1, 0.9]]])
dims = L.get_shape().as_list()
N = dims[-1] # here N = 2
logits = tf.reshape(L, [-1, N]) # shape [10, 2]
Now we can apply the function tf.multinomial() to logits:
samples = tf.multinomial(logits, 1)
# We reshape to match the initial shape minus the last dimension
res = tf.reshape(samples, dims[:-1])
Be cautious when using tf.multinomial(). The inputs to the function should be logits and not probability distributions.
However, in your example, the last axis is a probability distribution.

No broadcasting for tf.matmul in TensorFlow

I have a problem with which I've been struggling. It is related to tf.matmul() and its absence of broadcasting.
I am aware of a similar issue on https://github.com/tensorflow/tensorflow/issues/216, but tf.batch_matmul() doesn't look like a solution for my case.
I need to encode my input data as a 4D tensor:
X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
The first dimension is the size of a batch, the second the number of entries in the batch.
You can imagine each entry as a composition of a number of objects (third dimension). Finally, each object is described by a vector of 100 float values.
Note that I used None for the second and third dimensions because the actual sizes may change in each batch. However, for simplicity, let's shape the tensor with actual numbers:
X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
These are the steps of my computation:
compute a function of each vector of 100 float values (e.g., linear function)
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.matmul(X, W)
problem: no broadcasting for tf.matmul() and no success using tf.batch_matmul()
expected shape of Y: (5, 10, 4, 50)
applying average pooling for each entry of the batch (over the objects of each entry):
Y_avg = tf.reduce_mean(Y, 2)
expected shape of Y_avg: (5, 10, 50)
I expected that tf.matmul() would have supported broadcasting. Then I found tf.batch_matmul(), but still it looks like doesn't apply to my case (e.g., W needs to have 3 dimensions at least, not clear why).
BTW, above I used a simple linear function (the weights of which are stored in W). But in my model I have a deep network instead. So, the more general problem I have is automatically computing a function for each slice of a tensor. This is why I expected that tf.matmul() would have had a broadcasting behavior (if so, maybe tf.batch_matmul() wouldn't even be necessary).
Look forward to learning from you!
Alessio
You could achieve that by reshaping X to shape [n, d], where d is the dimensionality of one single "instance" of computation (100 in your example) and n is the number of those instances in your multi-dimensional object (5*10*4=200 in your example). After reshaping, you can use tf.matmul and then reshape back to the desired shape. The fact that the first three dimensions can vary makes that little tricky, but you can use tf.shape to determine the actual shapes during run time. Finally, you can perform the second step of your computation, which should be a simple tf.reduce_mean over the respective dimension. All in all, it would look like this:
X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
X_ = tf.reshape(X, [-1, 100])
Y_ = tf.matmul(X_, W)
X_shape = tf.gather(tf.shape(X), [0,1,2]) # Extract the first three dimensions
target_shape = tf.concat(0, [X_shape, [50]])
Y = tf.reshape(Y_, target_shape)
Y_avg = tf.reduce_mean(Y, 2)
As the renamed title of the GitHub issue you linked suggests, you should use tf.tensordot(). It enables contraction of axes pairs between two tensors, in line with Numpy's tensordot(). For your case:
X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.tensordot(X, W, [[3], [0]]) # gives shape=[5, 10, 4, 50]