TensorFlow: Contracting a dimension of two tensors via dot product - numpy

I have two tensors, a of rank 4 and b of rank 1. I'd like to produce aprime, of rank 3, by "contracting" the last axis of a away, by replacing it with its dot product against b. In numpy, this is as easy as np.tensordot(a, b, 1). However, I can't figure out a way to do this in Tensorflow.
How can I replace the last axis of a tensor with a value equal to that axis's dot product against another tensor (of course, of the same shape)?
UPDATE:
I see in Wikipedia that this is called the "Tensor Inner Product" https://en.wikipedia.org/wiki/Dot_product#Tensors aka tensor contraction. It seems like this is a common operation, I'm surprised that there's no explicit support for it in Tensorflow.
I believe that this may be possible via tf.einsum; however, I have not been able to find a generalized way to do this that works for tensors of any rank (this is probably because I do not understand einsum and have been reduced to trial and error)

Aren't you just using tensor in the sense of a multidimensional array? Or in some disciplines a tensor is 3d (vector 1d, matrix 2d, etc). I haven't used tensorflow but I don't think it has much to do with tensors in that linear algebra sensor. They talk about data flow graphs. I'm not sure where the tensor part of the name comes from.
I assume you are talking about an expression like:
In [293]: A=np.tensordot(np.ones((5,4,3,2)),np.arange(2),1)
resulting in a (5,4,3) shape array. The einsum equivalent is
In [294]: B=np.einsum('ijkl,l->ijk',np.ones((5,4,3,2)),np.arange(2))
np.einsum implements Einstine Notation, as discussed here: https://en.wikipedia.org/wiki/Einstein_notation. I got this link from https://en.wikipedia.org/wiki/Tensor_contraction
You seem to be talking about straight forward numpy operations, not something special in tensorflow.

I would first add 3 dimensions of size 1 to b so that it can be broadcast along the 4'th dimension of a.
b = tf.reshape(b, (1, 1, 1, -1))
Then you can multiply b and a and it will broadcast b along all of the other dimensions.
a_prime = a * b
Finally, reduce the sum along the 4'th dimension to get rid of that dimension and replace it with the dot product.
a_prime = tf.reduce_sum(a_prime, [3])

This seems like it would work (for the first tensor being of any rank):
tf.einsum('...i,i->...', x, y)

Related

TFP Linear Regression yhat=model(x_tst) - doesn't work for other data

I cannot see the difference between what I am doing and the working Google TFP example, whose structure I am following. What am I doing wrong/should I be doing differently?
[Setup: Win 10 Home 64-bit 20H2, Python 3.7, TF2.4.1, TFP 0.12.2, running in Jupyter Lab]
I have been building a model step by step following the example of TFP Probabilistic Layers Regression. The Case 1 code runs fine, but my parallel model doesn't and I cannot see the difference that might cause this
yhat = model(x_tst)
to fail with message Input 0 of layer sequential_14 is incompatible with the layer: : expected min_ndim=2, found ndim=1. Full shape received: (2019,) (which is the correct 1D size of x_tst)
For comparison: Google's load_dataset function for the TFP example returns y, x, x_tst, which are all np.ndarray of size 150, whereas I read data from a csv file with pandas.read_csv, split it into train_ and test_datasets and then take 1 col of data as independent variable 'g' and dependent variable 'redz' from the training dataset.
I know x, y, etc. need to be np.ndarray, but one does not create ndarray directly, so I have...
x = np.array(train_dataset['g'])
y = np.array(train_dataset['redz'])
x_tst = np.array(test_dataset['g'])
where x, y, x_tst are all 1-dimensional - just like the TFP example.
The model itself runs
model = tf.keras.Sequential([
tf.keras.layers.Dense(1),
tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)),
])
# Do inference.
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
model.fit(x, y, epochs=1, verbose=False);
(and when plotted gives the expected output for the google data - I don't get this far):
But, per the example when I try to "profit" by doing yhat = model(x_tst) I get the dimensions error given above.
What's wrong?
(If I try mode.predict I think I hit a known bug/gap in TFP; then it fails the assert)
Update - Explicit Reshape Resolves Issue
The hint from Frightera led to further investigation: x_tst had shape (2019,)
Reshaping by x_tst = x_tst.rehape(2019,1) resolved the issue. Is TF inconsistent in its requirements or is there some good reason that the explicit final dimension 1 was required? Who knows. At least predictions can be made now.
In this question Difference between numpy.array shape (R, 1) and (R,), the OP asked for the difference between (R,) and (R,1) but the answers given did not address this specific point.
Similarly in this question Difference between these array shapes in numpy
I believe the answer lies in the numpy glossary, where it says of (n,) that
A parenthesized number followed by a comma denotes a tuple with one
element. The trailing comma distinguishes a one-element tuple from a
parenthesized n.
Which, naturally, echoes the Python statements concerning tuples here
Thus an array of shape (R,) is a tuple describing an array as being 1D of a certain extent R, where the comma is appended to distinguish the tuple (R,) from the non-tuple (R).
However, for a 1D array, there is no sense of row or column ordering; (R,1) is R rows by 1 column, but (1, R) would be 1 row of R columns, and though it shouldn't matter to a 1D iterator either it does or the iterator doesn't correctly recognise ( ,) and thinks it is 2D. (i.e. I don't know the technical details of that part, but these seem to be the only options that account for the behaviour.)
This issue is unrelated to the indeterminacy of size that occurs in tensor definition in Tensorflow. In the context of Tensorflow, Tensors (arrays) may have indeterminate shapes, so that more data may be added along a certain axis as processing occurs, e.g. in batches, in which case the initial Tensor shape includes a leading None to indicate where array expansion is expected to occur. (See e.g. tensor's shape here)

Tensorflow: Add small number before division for numerical stability

In order to prevent divisions by zero in TensorFlow, I want to add a tiny number to my dividend. A quick search did not yield any results. In particular, I am interested in using the scientific notation, e.g.
a = b/(c+1e-05)
How can this be achieved?
Assuming a, b and c are tensors. The formula you have written will work as expected. 1e-5 will be broadcasted and added on the tensor c. Tensorflow automatically typecasts the 1e-5 to tf.constant(1e-5).
Tensorflow however has some limitations with non-scalar broadcasts. Take a look at my other answer.

dimension of a tensor created by tf.zeros(n)

I'm confused by the dimension of a tensor created with tf.zeros(n). For instance, if I write: tf.zeros(6).eval.shape, this will return me (6, ). What dimension is this? is this a matrix of 6 rows and arbitrary # of columns? Or is this a matrix of 6 columns with arbitrary # of rows?
weights = tf.random_uniform([3, 6], minval=-1, maxval=1, seed=1)- this is 3X6 matrix
b=tf.zeros(6).eval- I'm not sure what dimension this is.
Why I am able to add the two like weights+b? If I understand correctly, in order for the two to be added, b needs to be 3X1 dimension.
why i am able to add the two like weights+b?
Operator + is the same as using tf.add() (<obj>.__add__() calls the tf.add() or tf.math.add()) and if you read the documentation it says:
NOTE: math.add supports broadcasting. AddN does not. More about broadcasting here
Now I'm quoting from numpy broadcasting rules (which are the same for tensorflow):
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
So you're able to add two tensors with different shapes because they have the same trailing dimensions. If you change the dimension of your weights tensor to, let's say [3, 5], you will get InvalidArgumentError exception because trailing dimensions differ.
(6,) is python syntax for a tuple with 6 as a single element. Hence the shape here is a uni-dimensional vector of length 6.

Tensorflow/Keras find two most similar filters

I have a tensorflow/keras CNN. It has layers and some are Conv2D. In a given layer I want to efficiently find the two filters in the Conv2D that are most similar.
The layer.weights is a list of shape (height, width, depth) filter_count long.
I want to compare by the difference or maybe the sqrt(diff^2) between each element in (height,width,depth) then sum so the difference is a single float value.
If T1 is thelayer.weights[idx1] and T2 is thelayer.weights[idx2]
then the comparison is tf.sqrt(tf.reduce_sum(tf.squared_difference(T1, T2)))
I want to compare every filter to every other filter and take the 3 lowest differences. (The first one will always be zero where it T1 and T2 are the same tensor, self)
Obviously I can do nested loops but that is not functional and nifty.
Is there some built in tensorflow or keras function to do this fast and possibly in the GPU?
Its not quite clear from your description, but I assume the shape of weights is [filter_count, height,width,depth]. If filter_count is along a different axis the arguments to "reduce_sum" will have to be modified accordingly.
You can use broadcasting to parallelize this process.
differences = tf.sqrt(
tf.reduce_sum(
tf.squared_difference(
tf.expand_dims(thelayer.weights,0),
tf.expand_dims(thelayer.weights,1),
),
(-1,-2,-3)
)
)
This will result in a tensor of shape [filter_count, filter_count] where element differences[i, j] measure differences between filter weights i and j.
You can then filter to find the desired elements.

tf.nn.embedding_lookup - row or column?

This is a very simple question. I'm learning tensorflow and converting my numpy-written code using Tensorflow.
I have word embedding matrix defined U = [embedding_size, vocab_size] therefore each column is the embedding vector of each word.
I converted U into TF like below:
U = tf.Variable(tf.truncated_normal([embedding_size, vocab_size], -0.1, 0.1))
So far, so good.
Now I need to look up each word's embedding for training. I assume it would be
tf.nn.embedding_lookup(U, word_index)
My question is because my embedding is a column vector, I need to look up like this U[:,x[t]] in numpy.
How does TF figure out it needs to return the row OR column by word_index?
What's the default? Row or column?
If it's a row vector, then do I need to transpose my embedding matrix?
https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup
doesn't mention this. If anyone could point me to right resource, I'd appreciate it.
If params is a single tensor, the tf.nn.embedding_lookup(params, ids) operation treats ids as the indices of rows in params. If params is a list of tensors or a partitioned variable, then ids still correspond to rows in those tensors, but the partition_strategy (either "div" or "mod") determines how the ids map to a particular row.
As Aaron suggests, it will probably be easiest to define your embedding U as having shape [vocab_size, embedding_size], so that you can use tf.nn.embedding_lookup() and related functions.
Alternatively, you can use the axis argument to tf.gather() to select columns from U:
embedding = tf.gather(U, word_index, axis=1)
U should be vocab_size x embedding_size, the transpose of what you have now.