I'm now trying to use tf.losses.sigmoid_cross_entropy on an unbalanced dataset. However, I'm a little confused on the parameter weights. Here are the comments in the documentation:
weights: Optional Tensor whose rank is either 0, or the same rank as
labels, and must be broadcastable to labels (i.e., all dimensions must
be either 1, or the same as the corresponding losses dimension).
I know in tf.losses.softmax_cross_entropy the parameter weights can be a rank 1 tensor with weight for each sample. Why must the weights in tf.losses.sigmoid_cross_entropy have the same rank as labels?
Can anybody answer me? Better with an example.
You want your loss to be weighted and so tensorflow expects that you will provide it weight for each of your label. Consider the following example
Labels: [0, 0, 0, 1, 0]
possible_weights1: [1]
possible_weights2: [1, 2, 1, 1, 1]
illegal_weights1: [1, 2]
illegal_weights2: [[1], [2]]
Here your labels have rank 1 (only 1 dimension), so tensorflow expects that either you'll provide weight for each of the element in label (as demonstrated in possible_weights2) or will provide weight for each dimension (as demonstrated in possible_weights1, which is broadcasted to [1, 1, 1, 1, 1]).
But, if you have illegal_weights2 as your weights, then tensorflow does not understand how it should handle the two dimensions in the weights, since there is only one dimension in labels? So your rank should always be same.
illegal_weights1 is case where rank is same but weights are neither of the same length as labels, nor of length 1 (which can be broadcasted), but are of length 2 which cannot be broadcasted and hence is illegal.
Related
I want to implement a neural network in Keras of this architecture: say if I have some inputs and they belong to some groups. Then the neural network is like this:
input -> some layers -> separate inputs by groups -> average inputs by groups -> output
In brief, I want to separate inputs by groups then take the average of inputs by groups.
For example, if I have some inputs tensor [1, 2, 3, 4, 5, 6] and they are belonging to two groups [0, 1, 1, 0, 0, 1]. Then I want to the output tensor is like this: [3.333, 3.666, 3.666, 3.333, 3.333, 3.666]. Here 3.333 is the average of group 0 [1, 4, 5] and 3.666 is the average of group 1 [2, 3, 6].
I am not sure if you can separate the inputs as you described above directly in Keras or Tensorflow. Here is what I could come up with:
Create a mask corresponding to each class where 1 is for the element at the index being in the class and 0 for any element of another class. So in your example, you would do [0,1,1,0,0,1] for one class and [1,0,0,1,1,0] for the other. ( if you have more classes, you will correspondingly have more masks )
Stack those vectors to get a 3-D tensor and do 1D convolution with 0 stride. Use tf.nn.conv1d(). Think of those masks as filters of a Convolution operation and it's separating the classes. Be sure to reshape your Tensors to match the operation requirements.
After the convolution, you will have a 3-D Tensor where each vector would contain a classes elements. For your example you should get a Tensor with two vectors as [0,2,3,0,0,6] and [1,0,0,4,5,0]. Use tf.reduce_mean() on the correct axis to get the average of each class.
Multiply the Tensor of the mean : [[3.333], [3.666]] with the masks using tf.multiply() and add the vectors using tf.reduce_sum() on the correct axis. And it should result in the vector you desire.
I have figured out a method. It can be archived by matrix manipulation. First turn the cluster vector to a categorical matrix, for example, if the batch size is 6, the categorical matrix (cluster) is like:
1, 0
1, 0
0, 1
0, 1
1, 0
0, 1
then we generate a cluster_mean matrix:
1/3, 0
1/3, 0
0, 1/3
0, 1/3
1/3, 0
0, 1/3
If we have an input matrix n*b (n is the number of features and b is the batch), then we can get average by cluster by using
cluster * t(cluster_mean) * input
Transpose, average and dot product can be archived by using tensorflow functions.
I recently began learning tensorflow.
I am unsure about whether there is a difference
x = np.array([[1],[2],[3],[4],[5]])
dataset = tf.data.Dataset.from_tensor_slices(x)
ds.shuffle(buffer_size=4)
ds.batch(4)
and
x = np.array([[1],[2],[3],[4],[5]])
dataset = tf.data.Dataset.from_tensor_slices(x)
ds.batch(4)
ds.shuffle(buffer_size=4)
Also, I am not sure why I cannot use
dataset = dataset.shuffle_batch(buffer_size=2,batch_size=BATCH_SIZE)
as it gives the error
dataset = dataset.shuffle_batch(buffer_size=2,batch_size=BATCH_SIZE)
AttributeError: 'TensorSliceDataset' object has no attribute 'shuffle_batch'
Thank you!
TL;DR: Yes, there is a difference. Almost always, you will want to call Dataset.shuffle() before Dataset.batch(). There is no shuffle_batch() method on the tf.data.Dataset class, and you must call the two methods separately to shuffle and batch a dataset.
The transformations of a tf.data.Dataset are applied in the same sequence that they are called. Dataset.batch() combines consecutive elements of its input into a single, batched element in the output.
We can see the effect of the order of operations by considering the following two datasets:
tf.enable_eager_execution() # To simplify the example code.
# Batch before shuffle.
dataset = tf.data.Dataset.from_tensor_slices([0, 0, 0, 1, 1, 1, 2, 2, 2])
dataset = dataset.batch(3)
dataset = dataset.shuffle(9)
for elem in dataset:
print(elem)
# Prints:
# tf.Tensor([1 1 1], shape=(3,), dtype=int32)
# tf.Tensor([2 2 2], shape=(3,), dtype=int32)
# tf.Tensor([0 0 0], shape=(3,), dtype=int32)
# Shuffle before batch.
dataset = tf.data.Dataset.from_tensor_slices([0, 0, 0, 1, 1, 1, 2, 2, 2])
dataset = dataset.shuffle(9)
dataset = dataset.batch(3)
for elem in dataset:
print(elem)
# Prints:
# tf.Tensor([2 0 2], shape=(3,), dtype=int32)
# tf.Tensor([2 1 0], shape=(3,), dtype=int32)
# tf.Tensor([0 1 1], shape=(3,), dtype=int32)
In the first version (batch before shuffle), the elements of each batch are 3 consecutive elements from the input; whereas in the second version (shuffle before batch), they are randomly sampled from the input. Typically, when training by (some variant of) mini-batch stochastic gradient descent, the elements of each batch should be sampled as uniformly as possible from the total input. Otherwise, it is possible that the network will overfit to whatever structure was in the input data, and the resulting network will not achieve as high an accuracy.
Fully agree to #mrry, but there exists one case where you might want to do batching before shuffling. Suppose you're processing some text data which will be feed into an RNN. Here each sentence is treated as one sequence, and one batch will contain multiple sequences. Since the length of sentences is variable, we need to pad the sentences in a batch to a uniform length. An efficient way to do this is to group sentences of similar length together through batching, and then do shuffling. Otherwise, we may end up batches which are full with the <pad> token.
tensorflow.reduce_sum(..) computes the sum of elements across dimensions of a tensor. it is Ok.
But one thing is not clear to me , what is the purpose of saying reduce in the function name ?
Is it related to map_reduce of parallel computation?
Let's say, it distributes the required computation to
different cores , and collect the result from the cores , finally delivers the sum of the collected results ?
Because you can compute the sum along a given dimension (and therefore reduce it). And no it has nothing to do with map-reduce.
Quoting the documentation string of the method:
Reduces input_tensor along the dimensions given in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keepdims is true, the reduced dimensions are retained with length 1.
Example from the API:
x = tf.constant([[1, 1, 1], [1, 1, 1]])
tf.reduce_sum(x) # 6
tf.reduce_sum(x, 0) # [2, 2, 2]
tf.reduce_sum(x, 1) # [3, 3]
tf.reduce_sum(x, 1, keepdims=True) # [[3], [3]]
tf.reduce_sum(x, [0, 1]) # 6
I have two following tensors (note that they are both Tensorflow tensors which means they are still virtually symbolic at the time I construct the following slicing op before I launch a tf.Session()):
params: has shape (64,784, 256)
indices: has shape (64, 784)
and I want to construct an op that returns the following tensor:
output: has shape (64,784) where
output[i,j] = params_tensor[i,j, indices[i,j] ]
What is the most efficient way in Tensorflow to do so?
ps: I tried with tf.gather but couldn't make use of it to perform the operation I described above.
Many thanks.
-Bests
You can get exactly what you want using tf.gather_nd. The final expression is:
tf.gather_nd(params, tf.stack([tf.tile(tf.expand_dims(tf.range(tf.shape(indices)[0]), 1), [1, tf.shape(indices)[1]]), tf.transpose(tf.tile(tf.expand_dims(tf.range(tf.shape(indices)[1]), 1), [1, tf.shape(indices)[0]])), indices], 2))
This expression has the following explanation:
tf.gather_nd does what you expected and uses the indices to gather the output from the params
tf.stack combines three separate tensors, the last of which is the indices. The first two tensors specify the ordering of the first two dimensions (axis 0 and axis 1 of params/indices)
For the example provided, this ordering is simply 0, 1, 2, ..., 63 for axis 0, and 0, 1, 2, ... 783 for axis 1. These sequences are obtained with tf.range(tf.shape(indices)[0]) and tf.range(tf.shape(indices)[1]), respectively.
For the example provided, indices has shape (64, 784). The other two tensors from the last point above need to have this same shape in order to be combined with tf.stack
First, an additional dimension/axis is added to each of the two sequences using tf.expand_dims.
The use of tf.tile and tf.transpose can be shown by example: Assume the first two axes of params and index have shape (5,3). We want the first tensor to be:
[[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]]
We want the second tensor to be:
[[0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2], [0, 1, 2]]
These two tensors almost function like specifying the coordinates in a grid for the associated indices.
The final part of tf.stack combines the three tensors on a new third axis, so that the result has the same 3 axes as params.
Keep in mind if you have more or less axes than in the question, you need to modify the number of coordinate-specifying tensors in tf.stack accordingly.
What you want is like a custom reduction function. If you are keeping something like index of maximum value at indices then I would suggest using tf.reduce_max:
max_params = tf.reduce_max(params_tensor, reduction_indices=[2])
Otherwise, here is one way to get what you want (Tensor objects are not assignable so we create a 2d list of tensors and pack it using tf.pack):
import tensorflow as tf
import numpy as np
with tf.Graph().as_default():
params_tensor = tf.pack(np.random.randint(1,256, [5,5,10]).astype(np.int32))
indices = tf.pack(np.random.randint(1,10,[5,5]).astype(np.int32))
output = [ [None for j in range(params_tensor.get_shape()[1])] for i in range(params_tensor.get_shape()[0])]
for i in range(params_tensor.get_shape()[0]):
for j in range(params_tensor.get_shape()[1]):
output[i][j] = params_tensor[i,j,indices[i,j]]
output = tf.pack(output)
with tf.Session() as sess:
params_tensor,indices,output = sess.run([params_tensor,indices,output])
print params_tensor
print indices
print output
I know I'm late, but I recently had to do something similar, and was able to to do it using Ragged Tensors:
output = tf.gather(params, tf.RaggedTensor.from_tensor(indices), batch_dims=-1, axis=-1)
Hope it helps
So, here is what I want to do:
Right now, I have padding = 'SAME' for all of my neural net layers. I would like to make my code more generic, so I can build my nets with arbitrary paddings, and I don't want to have to calculate how big the output tensors of the layers of my net are. I would like to just access the dimension at initialization/run time, the way the tf.nn functions apparently do internally, so I can initialize my weight and bias tensors in the correct dimension...
So,
How do I access the "shape" function/object of the output placeholder of a convolution?
There are two kinds of shapes -- tensor.get_shape() which gives static shape computed by Python wrappers during Graph construction (whenever possible), and tf.shape(tensor) which is an op that can be executed during runtime to get shape of the tensor (always possible). Both of these work for convolutions.
a = tf.Variable(tf.ones((1, 3, 3, 1)))
b = tf.Variable(tf.ones((3, 3, 1, 1)))
c = tf.nn_ops.conv2d(a, b, [1, 1, 1, 1], padding="VALID")
sess = create_session()
sess.run(tf.initialize_all_variables())
print c.get_shape()
print sess.run(tf.shape(c))
This gives
(1, 1, 1, 1)
[1 1 1 1]