input dimension reshape in Tensorflow conolutional network - tensorflow

In the expert mnist tutorial in tensorflow website, it have something like this :
x_image = tf.reshape(x, [-1,28,28,1])
I know that the reshape is like
tf.reshape(input,[batch_size,width,height,channel])
Q1 : why is the batch_size equals -1? What does the -1 means?
And when I go down the code there's one more thing I can not understand
W_fc1 = weight_variable([7 * 7 * 64, 1024])
Q2:What does the image_size * 64 means?

Q1 : why is the batch_size equals -1? What does the -1 means?
-1 means "figure this part out for me". For example, if I run:
reshape([1, 2, 3, 4, 5, 6, 7, 8], [-1, 2])
It creates two columns, and whatever number of rows it needs to get everything to fit:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
Q2:What does the image_size * 64 means?
It is the number of filters in that particular filter activation. Shapes of filters in conv layers follow the format [height, width, # of input channels (number of filters in the previous layer), # of filters].

When you pass -1 as a dimension in tf.reshape, it preserves the existing dimension. From the docs:
If one component of shape is the special value -1, the size of that
dimension is computed so that the total size remains constant. In
particular, a shape of [-1] flattens into 1-D. At most one component
of shape can be -1.
The reference to 7 x 7 x 64 is because the convolutional layer being applied prior to this example has reduced the image to a shape of [7, 7, 64], and the input to the next fully connected layer needs to be a single dimension, so in the next line of the example, the tensor is reshaped from [7,7,64] to [7*7*64] so it can connect to the FC layer.
For more info on how convolutions and max pooling works, the wikipedia page has some helpful graphics:
e.g. network architecture:
and pooling:

Related

Mapping timeseries sequence input shape to desired output shape using EinsumDense

Can anyone help me understand how to handle compressing/expanding the dimension of a tensor using EinsumDense?
I have a timeseries (not NLP) input tensor of the shape (batch, horizon, features) wherein the intended output is (1, H, F); H is an arbitrary horizon and F is an arbitrary feature size. I'm actually using EinsumDense as my Feed Forward Network in a transformer encoder module and as a final dense layer in the transformer's output. The FFN should map (1, horizon, features) to (1, H, features) and the final dense layer should map (1, H, features) to (1, H, F).
My current equation is shf,h->shf for the FFN, and shf,hfyz->syz for the dense layer, however I'm getting a less than optimal result as compared to my original setup where there was no change in the horizon length and my equations were shf,h->shf and shf,hz->shz respectively.
My two cents,
First, an intuitive understanding of the transformer encoder: Given (batch, horizon, features), the attention mechanism tries to find a weighted linear combination of the projected features. The resulting weights are learned via attention scores obtained by operating between features, over each horizon. The FFN layer that comes next should be a linear combination of values within features.
Coming to EinsumDense by way of example we have two tensors:
a: Data (your input tensor to EinsumDense)
b: Weights (EinsumDense's internal weights tensor)
# create random data in a 3D tensor
a = tf.random.uniform(minval=1, maxval=3, shape=(1,2,3), dtype=tf.int32)
# [[[1, 2, 2],
# [2, 2, 1]]]
shf,h->shf:
This just scales the individual features.
b = tf.random.uniform(minval=2, maxval=4, shape=(2,), dtype=tf.int32)
# [3, 2]
tf.einsum('shf,h->shf', a, b)
# [[[3, 6, 6], #1st feature is scaled with 3
# [4, 4, 2]]]] #2nd feature is scaled with 2
shf,hz->shz: This does a linear combination within features
b = tf.random.uniform(minval=2, maxval=4, shape=(2,6), dtype=tf.int32)
# [[3, 3, 3, 3, 3, 3],
# [2, 2, 2, 3, 2, 3]]
tf.einsum('shf,hz->shz', a, b)
# [[[15, 15, 15, 15, 15, 15],
# [10, 10, 10, 15, 10, 15]]]
# every value is a linear combination of the first feature [1, 2, 2] with b. The first value is sum([1,2,2]*3)
The above two resembles the transformer encoder architecture, with a feature scaling layer. And the output structure is preserved (batch, H, F)
shf,hfyz->syz: This does both between features and within features combination.
b = tf.random.uniform(minval=2, maxval=4, shape=(2,3,4,5), dtype=tf.int32)
tf.einsum('shf,hfyz->syz', a,b)
# each element output `(i,j)` is a dot product of a and b[:,:,i,j]
# first element is tf.reduce_sum(a*b[:,:,0,0])
Here the output (s,y,z), y doesnt correspond to horizon and z doesn't correspond to features, but a combination of values between them.

Average pooling tensorflow layer with differently shaped input tensors

I have extracted the embeddings for a particular entity X from every sentence in my dataset. Where X is mentioned more than once within the same sentence, this yields an embedding for each mention: I'd like to put these through an average pooling layer to arrive at a single embedding for X in each sentence.
Simplified working example:
import tensorflow as tf
embeddings = tf.constant([[1, 1, 1],
[2, 2, 2],
[4, 4, 4],
[5, 5, 5]])
# Let's imagine rows [1, 1, 1] & [4, 4, 4]
# correspond to embeddings for X from the same sentence
# We can indicate sentence belonging through an sent_idxs variable:
sent_idxs = tf.constant([0, 1, 0, 2])
With the help of related stackoverflow questions (Torch - How to calculate average of tensors with the same indexes, Summing over specific indices PyTorch (similar to scatter_add)), I could average embeddings corresponding to the same sentence like this:
unique_idxs, _, counts = tf.unique_with_counts(sent_idxs) # counts = ([2, 1, 1])
result_holder = tf.zeros([unique_idxs.shape[0], embeddings.shape[1]], dtype= embeddings.dtype)
embeddings = tf.tensor_scatter_nd_add(result_holder, tf.expand_dims(sent_idxs, axis=1), embeddings)
embeddings /= counts[:, None]
However, I would prefer to re-shape my original embeddings to instead perform the averaging with AveragePooling2D or AveragePooling1D, and I'm really struggling with imagining the appropriate shape to enable this.

Separating custom keras metric inputs into two seperate metrics and finding median error

I have a ResNet network that I am using for a camera pose network. I have replaced the final classifier layer with a 1024 dense layer and then a 7 dense layer (first 3 for xyz, final 4 for quaternion).
My problem is that I want to record the xyz error and the quaternion error as two separate errors or metrics (Instead of just mean absolute error of all 7). The inputs of the custom metric template of customer_error(y_true,y_pred) are tensors. I don't know how to separate the inputs into two different xyz and q arrays. The function runs at compile time, when the tensors are empty and don't have any numpy components.
Ultimately I want to get the median xyz and q error using
median = tensorflow_probability.stats.percentile(input,q=50, interpolation='linear').
Any help would be really appreciated.
You could use the tf.slice() to extract just the first three elements of your model output.
import tensorflow as tf
# enabling eager mode to demo the slice fn
tf.compat.v1.enable_eager_execution()
import numpy as np
# just creating a random array dimesions size (2, 7)
# where 2 is just an arbitrary value chosen for the batch dimension
out = np.arange(0,14).reshape(2,7)
print(out)
# array([[ 0, 1, 2, 3, 4, 5, 6],
# [ 7, 8, 9, 10, 11, 12, 13]])
# put it in a tf variable
out_tf = tf.Variable(out)
# now using the slice operator
xyz = tf.slice(out_tf, begin=[0, 0], size=[-1,3])
# lets see what it looked like
print(xyz)
# <tf.Tensor: id=11, shape=(2, 3), dtype=int64, numpy=
# array([[0, 1, 2],
# [7, 8, 9]])>
Could wrap something like this into your custom metric to get what you need.
def xyz_median(y_true, y_pred):
"""get the median of just the X,Y,Z coords
UNTESTED though :)
"""
# slice to get just the xyz
xyz = tf.slice(out_tf, begin=[0, 0], size=[-1,3])
median = tfp.stats.percentile(xyz, q=50, interpolation='linear')
return median

Converting a 3D tensor of image blocks into 2D image on TensorFlow

I have an image of size 256 x 256 divided into non-overlapping blocks of size 32 x 32 and arranged as a 3D tensor of size [64, 32, 32]. Here, 64 is the number of 32 x 32 blocks in the image. The 64 blocks are arranged in such a way that the first 8 blocks form the first row, the next 8 the second row and so on.
I want to know if there is a way to construct the full image given the image blocks on TensorFlow without using loops. There is a related function tf.batch_to_space, however it does not exactly do what is required. Please help.
def reconstruct(x):
x = tf.reshape(x, [8, 8, 32, 32])
x = tf.transpose(x, [0, 2, 1, 3]) # x.shape is [8, 32, 8, 32]
x = tf.reshape(x, [256, 256]) # Because tf tensor is row-major.
return x

tensorflow transform a (structured) dense matrix to sparse, when number of rows unknow

My task is to transform a special formed dense matrix tensor into a sparse one. e.g. input matrix M as followed (dense positive integer sequence followed by 0 as padding in each row)
[[3 5 7 0]
[2 2 0 0]
[1 3 9 0]]
Additionally, given the non-padding length for each row, e.g. given by tensor L =
[3, 2, 3].
The desired output would be sparse tensor S.
SparseTensorValue(indices=array([[0, 0],[0, 1],[0, 2],[1, 0],[1, 1],[2, 0],[2, 1], [2, 2]]), values=array([3, 5, 7, 2, 2, 1, 3, 9], dtype=int32), shape=array([3, 4]))
This is useful in models where objects are described by variable-sized descriptors (S are then used in embedding_lookup_sparse to connect embeddings of descriptors.)
I am able to do it when number of M's row is known (by python loop and ops like slice and concat). However, M's row number here is determined by mini-batch size and could change (say in testing phase). Is there a good way to implement that? I am trying some control_flow_ops but haven't succeeded.
Thanks!!