tensorflow dataset shuffle then batch or batch then shuffle - tensorflow

I recently began learning tensorflow.
I am unsure about whether there is a difference
x = np.array([[1],[2],[3],[4],[5]])
dataset = tf.data.Dataset.from_tensor_slices(x)
ds.shuffle(buffer_size=4)
ds.batch(4)
and
x = np.array([[1],[2],[3],[4],[5]])
dataset = tf.data.Dataset.from_tensor_slices(x)
ds.batch(4)
ds.shuffle(buffer_size=4)
Also, I am not sure why I cannot use
dataset = dataset.shuffle_batch(buffer_size=2,batch_size=BATCH_SIZE)
as it gives the error
dataset = dataset.shuffle_batch(buffer_size=2,batch_size=BATCH_SIZE)
AttributeError: 'TensorSliceDataset' object has no attribute 'shuffle_batch'
Thank you!

TL;DR: Yes, there is a difference. Almost always, you will want to call Dataset.shuffle() before Dataset.batch(). There is no shuffle_batch() method on the tf.data.Dataset class, and you must call the two methods separately to shuffle and batch a dataset.
The transformations of a tf.data.Dataset are applied in the same sequence that they are called. Dataset.batch() combines consecutive elements of its input into a single, batched element in the output.
We can see the effect of the order of operations by considering the following two datasets:
tf.enable_eager_execution() # To simplify the example code.
# Batch before shuffle.
dataset = tf.data.Dataset.from_tensor_slices([0, 0, 0, 1, 1, 1, 2, 2, 2])
dataset = dataset.batch(3)
dataset = dataset.shuffle(9)
for elem in dataset:
print(elem)
# Prints:
# tf.Tensor([1 1 1], shape=(3,), dtype=int32)
# tf.Tensor([2 2 2], shape=(3,), dtype=int32)
# tf.Tensor([0 0 0], shape=(3,), dtype=int32)
# Shuffle before batch.
dataset = tf.data.Dataset.from_tensor_slices([0, 0, 0, 1, 1, 1, 2, 2, 2])
dataset = dataset.shuffle(9)
dataset = dataset.batch(3)
for elem in dataset:
print(elem)
# Prints:
# tf.Tensor([2 0 2], shape=(3,), dtype=int32)
# tf.Tensor([2 1 0], shape=(3,), dtype=int32)
# tf.Tensor([0 1 1], shape=(3,), dtype=int32)
In the first version (batch before shuffle), the elements of each batch are 3 consecutive elements from the input; whereas in the second version (shuffle before batch), they are randomly sampled from the input. Typically, when training by (some variant of) mini-batch stochastic gradient descent, the elements of each batch should be sampled as uniformly as possible from the total input. Otherwise, it is possible that the network will overfit to whatever structure was in the input data, and the resulting network will not achieve as high an accuracy.

Fully agree to #mrry, but there exists one case where you might want to do batching before shuffling. Suppose you're processing some text data which will be feed into an RNN. Here each sentence is treated as one sequence, and one batch will contain multiple sequences. Since the length of sentences is variable, we need to pad the sentences in a batch to a uniform length. An efficient way to do this is to group sentences of similar length together through batching, and then do shuffling. Otherwise, we may end up batches which are full with the <pad> token.

Related

Average pooling tensorflow layer with differently shaped input tensors

I have extracted the embeddings for a particular entity X from every sentence in my dataset. Where X is mentioned more than once within the same sentence, this yields an embedding for each mention: I'd like to put these through an average pooling layer to arrive at a single embedding for X in each sentence.
Simplified working example:
import tensorflow as tf
embeddings = tf.constant([[1, 1, 1],
[2, 2, 2],
[4, 4, 4],
[5, 5, 5]])
# Let's imagine rows [1, 1, 1] & [4, 4, 4]
# correspond to embeddings for X from the same sentence
# We can indicate sentence belonging through an sent_idxs variable:
sent_idxs = tf.constant([0, 1, 0, 2])
With the help of related stackoverflow questions (Torch - How to calculate average of tensors with the same indexes, Summing over specific indices PyTorch (similar to scatter_add)), I could average embeddings corresponding to the same sentence like this:
unique_idxs, _, counts = tf.unique_with_counts(sent_idxs) # counts = ([2, 1, 1])
result_holder = tf.zeros([unique_idxs.shape[0], embeddings.shape[1]], dtype= embeddings.dtype)
embeddings = tf.tensor_scatter_nd_add(result_holder, tf.expand_dims(sent_idxs, axis=1), embeddings)
embeddings /= counts[:, None]
However, I would prefer to re-shape my original embeddings to instead perform the averaging with AveragePooling2D or AveragePooling1D, and I'm really struggling with imagining the appropriate shape to enable this.

Separating custom keras metric inputs into two seperate metrics and finding median error

I have a ResNet network that I am using for a camera pose network. I have replaced the final classifier layer with a 1024 dense layer and then a 7 dense layer (first 3 for xyz, final 4 for quaternion).
My problem is that I want to record the xyz error and the quaternion error as two separate errors or metrics (Instead of just mean absolute error of all 7). The inputs of the custom metric template of customer_error(y_true,y_pred) are tensors. I don't know how to separate the inputs into two different xyz and q arrays. The function runs at compile time, when the tensors are empty and don't have any numpy components.
Ultimately I want to get the median xyz and q error using
median = tensorflow_probability.stats.percentile(input,q=50, interpolation='linear').
Any help would be really appreciated.
You could use the tf.slice() to extract just the first three elements of your model output.
import tensorflow as tf
# enabling eager mode to demo the slice fn
tf.compat.v1.enable_eager_execution()
import numpy as np
# just creating a random array dimesions size (2, 7)
# where 2 is just an arbitrary value chosen for the batch dimension
out = np.arange(0,14).reshape(2,7)
print(out)
# array([[ 0, 1, 2, 3, 4, 5, 6],
# [ 7, 8, 9, 10, 11, 12, 13]])
# put it in a tf variable
out_tf = tf.Variable(out)
# now using the slice operator
xyz = tf.slice(out_tf, begin=[0, 0], size=[-1,3])
# lets see what it looked like
print(xyz)
# <tf.Tensor: id=11, shape=(2, 3), dtype=int64, numpy=
# array([[0, 1, 2],
# [7, 8, 9]])>
Could wrap something like this into your custom metric to get what you need.
def xyz_median(y_true, y_pred):
"""get the median of just the X,Y,Z coords
UNTESTED though :)
"""
# slice to get just the xyz
xyz = tf.slice(out_tf, begin=[0, 0], size=[-1,3])
median = tfp.stats.percentile(xyz, q=50, interpolation='linear')
return median

what is the use of reduce command in tensorflow?

tensorflow.reduce_sum(..) computes the sum of elements across dimensions of a tensor. it is Ok.
But one thing is not clear to me , what is the purpose of saying reduce in the function name ?
Is it related to map_reduce of parallel computation?
Let's say, it distributes the required computation to
different cores , and collect the result from the cores , finally delivers the sum of the collected results ?
Because you can compute the sum along a given dimension (and therefore reduce it). And no it has nothing to do with map-reduce.
Quoting the documentation string of the method:
Reduces input_tensor along the dimensions given in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keepdims is true, the reduced dimensions are retained with length 1.
Example from the API:
x = tf.constant([[1, 1, 1], [1, 1, 1]])
tf.reduce_sum(x) # 6
tf.reduce_sum(x, 0) # [2, 2, 2]
tf.reduce_sum(x, 1) # [3, 3]
tf.reduce_sum(x, 1, keepdims=True) # [[3], [3]]
tf.reduce_sum(x, [0, 1]) # 6

Slicing a tensor by an index tensor in Tensorflow

I have two following tensors (note that they are both Tensorflow tensors which means they are still virtually symbolic at the time I construct the following slicing op before I launch a tf.Session()):
params: has shape (64,784, 256)
indices: has shape (64, 784)
and I want to construct an op that returns the following tensor:
output: has shape (64,784) where
output[i,j] = params_tensor[i,j, indices[i,j] ]
What is the most efficient way in Tensorflow to do so?
ps: I tried with tf.gather but couldn't make use of it to perform the operation I described above.
Many thanks.
-Bests
You can get exactly what you want using tf.gather_nd. The final expression is:
tf.gather_nd(params, tf.stack([tf.tile(tf.expand_dims(tf.range(tf.shape(indices)[0]), 1), [1, tf.shape(indices)[1]]), tf.transpose(tf.tile(tf.expand_dims(tf.range(tf.shape(indices)[1]), 1), [1, tf.shape(indices)[0]])), indices], 2))
This expression has the following explanation:
tf.gather_nd does what you expected and uses the indices to gather the output from the params
tf.stack combines three separate tensors, the last of which is the indices. The first two tensors specify the ordering of the first two dimensions (axis 0 and axis 1 of params/indices)
For the example provided, this ordering is simply 0, 1, 2, ..., 63 for axis 0, and 0, 1, 2, ... 783 for axis 1. These sequences are obtained with tf.range(tf.shape(indices)[0]) and tf.range(tf.shape(indices)[1]), respectively.
For the example provided, indices has shape (64, 784). The other two tensors from the last point above need to have this same shape in order to be combined with tf.stack
First, an additional dimension/axis is added to each of the two sequences using tf.expand_dims.
The use of tf.tile and tf.transpose can be shown by example: Assume the first two axes of params and index have shape (5,3). We want the first tensor to be:
[[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]]
We want the second tensor to be:
[[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]]
These two tensors almost function like specifying the coordinates in a grid for the associated indices.
The final part of tf.stack combines the three tensors on a new third axis, so that the result has the same 3 axes as params.
Keep in mind if you have more or less axes than in the question, you need to modify the number of coordinate-specifying tensors in tf.stack accordingly.
What you want is like a custom reduction function. If you are keeping something like index of maximum value at indices then I would suggest using tf.reduce_max:
max_params = tf.reduce_max(params_tensor, reduction_indices=[2])
Otherwise, here is one way to get what you want (Tensor objects are not assignable so we create a 2d list of tensors and pack it using tf.pack):
import tensorflow as tf
import numpy as np
with tf.Graph().as_default():
params_tensor = tf.pack(np.random.randint(1,256, [5,5,10]).astype(np.int32))
indices = tf.pack(np.random.randint(1,10,[5,5]).astype(np.int32))
output = [ [None for j in range(params_tensor.get_shape()[1])] for i in range(params_tensor.get_shape()[0])]
for i in range(params_tensor.get_shape()[0]):
for j in range(params_tensor.get_shape()[1]):
output[i][j] = params_tensor[i,j,indices[i,j]]
output = tf.pack(output)
with tf.Session() as sess:
params_tensor,indices,output = sess.run([params_tensor,indices,output])
print params_tensor
print indices
print output
I know I'm late, but I recently had to do something similar, and was able to to do it using Ragged Tensors:
output = tf.gather(params, tf.RaggedTensor.from_tensor(indices), batch_dims=-1, axis=-1)
Hope it helps

input dimension reshape in Tensorflow conolutional network

In the expert mnist tutorial in tensorflow website, it have something like this :
x_image = tf.reshape(x, [-1,28,28,1])
I know that the reshape is like
tf.reshape(input,[batch_size,width,height,channel])
Q1 : why is the batch_size equals -1? What does the -1 means?
And when I go down the code there's one more thing I can not understand
W_fc1 = weight_variable([7 * 7 * 64, 1024])
Q2:What does the image_size * 64 means?
Q1 : why is the batch_size equals -1? What does the -1 means?
-1 means "figure this part out for me". For example, if I run:
reshape([1, 2, 3, 4, 5, 6, 7, 8], [-1, 2])
It creates two columns, and whatever number of rows it needs to get everything to fit:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
Q2:What does the image_size * 64 means?
It is the number of filters in that particular filter activation. Shapes of filters in conv layers follow the format [height, width, # of input channels (number of filters in the previous layer), # of filters].
When you pass -1 as a dimension in tf.reshape, it preserves the existing dimension. From the docs:
If one component of shape is the special value -1, the size of that
dimension is computed so that the total size remains constant. In
particular, a shape of [-1] flattens into 1-D. At most one component
of shape can be -1.
The reference to 7 x 7 x 64 is because the convolutional layer being applied prior to this example has reduced the image to a shape of [7, 7, 64], and the input to the next fully connected layer needs to be a single dimension, so in the next line of the example, the tensor is reshaped from [7,7,64] to [7*7*64] so it can connect to the FC layer.
For more info on how convolutions and max pooling works, the wikipedia page has some helpful graphics:
e.g. network architecture:
and pooling: