Tensorflow/Keras find two most similar filters - tensorflow

I have a tensorflow/keras CNN. It has layers and some are Conv2D. In a given layer I want to efficiently find the two filters in the Conv2D that are most similar.
The layer.weights is a list of shape (height, width, depth) filter_count long.
I want to compare by the difference or maybe the sqrt(diff^2) between each element in (height,width,depth) then sum so the difference is a single float value.
If T1 is thelayer.weights[idx1] and T2 is thelayer.weights[idx2]
then the comparison is tf.sqrt(tf.reduce_sum(tf.squared_difference(T1, T2)))
I want to compare every filter to every other filter and take the 3 lowest differences. (The first one will always be zero where it T1 and T2 are the same tensor, self)
Obviously I can do nested loops but that is not functional and nifty.
Is there some built in tensorflow or keras function to do this fast and possibly in the GPU?

Its not quite clear from your description, but I assume the shape of weights is [filter_count, height,width,depth]. If filter_count is along a different axis the arguments to "reduce_sum" will have to be modified accordingly.
You can use broadcasting to parallelize this process.
differences = tf.sqrt(
tf.reduce_sum(
tf.squared_difference(
tf.expand_dims(thelayer.weights,0),
tf.expand_dims(thelayer.weights,1),
),
(-1,-2,-3)
)
)
This will result in a tensor of shape [filter_count, filter_count] where element differences[i, j] measure differences between filter weights i and j.
You can then filter to find the desired elements.

Related

Splitting up tensor

Let T be a tensor of shape [n,f], which represents a batch. Now I want to slice T into m tensors along axis=0. The value of m depends on the current batch. I have another tensor I of shape [m,2] which stores pairs of indices which indicate where the slices should occur.
I am not really sure how to "iterate" over the indices to apply tf.slice. Any ideas?
Can this somehow be achieved using tf.scan?
I suppose you are looking for the split function.

In tensorflow, how to gather with indices matching params first dimensions?

If i have indices of shape (D_0,...,D_k) and params of shape (D_0,...,D_k,I,F) (with 0 ≤ indices[i_0,...,i_k] < I), what is the fastest/most elegant way to get the array output of shape (D_0,...,D_k,F) with
output[i_0,...,i_k,f]=params[i_0,...,i_k,indices[i_0,...,i_k],f]
If k=0, then we can use gather. So, in the past, I had a solution based on flattening. Is there a nicer solution now that tensorflow has matured?
Most of the times, when I want this type of gathering, indices is obtained by indices = tf.argmax(params[:,...,:,:,0]). For every (i_0,...,i_k), I have I vectors of size (F,) and I want to keep only those with the maximal value for one of the features. A solution which would only work for this special case (a kind of reduce_max only using one feature to decide how to reduce) would satisfy me.

tf.nn.embedding_lookup - row or column?

This is a very simple question. I'm learning tensorflow and converting my numpy-written code using Tensorflow.
I have word embedding matrix defined U = [embedding_size, vocab_size] therefore each column is the embedding vector of each word.
I converted U into TF like below:
U = tf.Variable(tf.truncated_normal([embedding_size, vocab_size], -0.1, 0.1))
So far, so good.
Now I need to look up each word's embedding for training. I assume it would be
tf.nn.embedding_lookup(U, word_index)
My question is because my embedding is a column vector, I need to look up like this U[:,x[t]] in numpy.
How does TF figure out it needs to return the row OR column by word_index?
What's the default? Row or column?
If it's a row vector, then do I need to transpose my embedding matrix?
https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup
doesn't mention this. If anyone could point me to right resource, I'd appreciate it.
If params is a single tensor, the tf.nn.embedding_lookup(params, ids) operation treats ids as the indices of rows in params. If params is a list of tensors or a partitioned variable, then ids still correspond to rows in those tensors, but the partition_strategy (either "div" or "mod") determines how the ids map to a particular row.
As Aaron suggests, it will probably be easiest to define your embedding U as having shape [vocab_size, embedding_size], so that you can use tf.nn.embedding_lookup() and related functions.
Alternatively, you can use the axis argument to tf.gather() to select columns from U:
embedding = tf.gather(U, word_index, axis=1)
U should be vocab_size x embedding_size, the transpose of what you have now.

Setting up the input on an RNN in Keras

So I had a specific question with setting up the input in Keras.
I understand that the sequence length refers to the window length of the longest sequence that you are looking to model with the rest being padded by 0's.
However, how do I set up something that is already in a time series array?
For example, right now I have an array that is 550k x 28. So there are 550k rows each with 28 columns (27 features and 1 target). Do I have to manually split the array into (550k- sequence length) different arrays and feed all of those to the network?
Assuming that I want to the first layer to be equivalent to the number of features per row, and looking at the past 50 rows, how do I size the input layer?
Is that simply input_size = (50,27), and again do I have to manually split the dataset up or would Keras automatically do that for me?
RNN inputs are like: (NumberOfSequences, TimeSteps, ElementsPerStep)
Each sequence is a row in your input array. This is also called "batch size", number of examples, samples, etc.
Time steps are the amount of steps for each sequence
Elements per step is how much info you have in each step of a sequence
I'm assuming the 27 features are inputs and relate to ElementsPerStep, while the 1 target is the expected output having 1 output per step.
So I'm also assuming that your output is a sequence with also 550k steps.
Shaping the array:
Since you have only one sequence in the array, and this sequence has 550k steps, then you must reshape your array like this:
(1, 550000, 28)
#1 sequence
#550000 steps per sequence
#28 data elements per step
#PS: this sequence is too long, if it creates memory problems to you, maybe it will be a good idea to use a `stateful=True` RNN, but I'm explaining the non stateful method first.
Now you must split this array for inputs and targets:
X_train = thisArray[:, :, :27] #inputs
Y_train = thisArray[:, :, 27] #targets
Shaping the keras layers:
Keras layers will ignore the batch size (number of sequences) when you define them, so you will use input_shape=(550000,27).
Since your desired result is a sequence with same length, we will use return_sequences=True. (Else, you'd get only one result).
LSTM(numberOfCells, input_shape=(550000,27), return_sequences=True)
This will output a shape of (BatchSize, 550000, numberOfCells)
You may use a single layer with 1 cell to achieve your output, or you could stack more layers, considering that the last one should have 1 cell to match the shape of your output. (If you're using only recurrent layers, of course)
stateful = True:
When you have sequences so long that your memory can't handle them well, you must define the layer with stateful=True.
In that case, you will have to divide X_train in smaller length sequences*. The system will understand that every new batch is a sequel of the previous batches.
Then you will need to define batch_input_shape=(BatchSize,ReducedTimeSteps,Elements). In this case, the batch size should not be ignored like in the other case.
* Unfortunately I have no experience with stateful=True. I'm not sure about whether you must manually divide your array (less likely, I guess), or if the system automatically divides it internally (more likely).
The sliding window case:
In this case, what I often see is people dividing the input data like this:
From the 550k steps, get smaller arrays with 50 steps:
X = []
for i in range(550000-49):
X.append(originalX[i:i+50]) #then take care of the 28th element
Y = #it seems you just exclude the first 49 ones from the original

TensorFlow: Contracting a dimension of two tensors via dot product

I have two tensors, a of rank 4 and b of rank 1. I'd like to produce aprime, of rank 3, by "contracting" the last axis of a away, by replacing it with its dot product against b. In numpy, this is as easy as np.tensordot(a, b, 1). However, I can't figure out a way to do this in Tensorflow.
How can I replace the last axis of a tensor with a value equal to that axis's dot product against another tensor (of course, of the same shape)?
UPDATE:
I see in Wikipedia that this is called the "Tensor Inner Product" https://en.wikipedia.org/wiki/Dot_product#Tensors aka tensor contraction. It seems like this is a common operation, I'm surprised that there's no explicit support for it in Tensorflow.
I believe that this may be possible via tf.einsum; however, I have not been able to find a generalized way to do this that works for tensors of any rank (this is probably because I do not understand einsum and have been reduced to trial and error)
Aren't you just using tensor in the sense of a multidimensional array? Or in some disciplines a tensor is 3d (vector 1d, matrix 2d, etc). I haven't used tensorflow but I don't think it has much to do with tensors in that linear algebra sensor. They talk about data flow graphs. I'm not sure where the tensor part of the name comes from.
I assume you are talking about an expression like:
In [293]: A=np.tensordot(np.ones((5,4,3,2)),np.arange(2),1)
resulting in a (5,4,3) shape array. The einsum equivalent is
In [294]: B=np.einsum('ijkl,l->ijk',np.ones((5,4,3,2)),np.arange(2))
np.einsum implements Einstine Notation, as discussed here: https://en.wikipedia.org/wiki/Einstein_notation. I got this link from https://en.wikipedia.org/wiki/Tensor_contraction
You seem to be talking about straight forward numpy operations, not something special in tensorflow.
I would first add 3 dimensions of size 1 to b so that it can be broadcast along the 4'th dimension of a.
b = tf.reshape(b, (1, 1, 1, -1))
Then you can multiply b and a and it will broadcast b along all of the other dimensions.
a_prime = a * b
Finally, reduce the sum along the 4'th dimension to get rid of that dimension and replace it with the dot product.
a_prime = tf.reduce_sum(a_prime, [3])
This seems like it would work (for the first tensor being of any rank):
tf.einsum('...i,i->...', x, y)