I want to implement a neural network in Keras of this architecture: say if I have some inputs and they belong to some groups. Then the neural network is like this:
input -> some layers -> separate inputs by groups -> average inputs by groups -> output
In brief, I want to separate inputs by groups then take the average of inputs by groups.
For example, if I have some inputs tensor [1, 2, 3, 4, 5, 6] and they are belonging to two groups [0, 1, 1, 0, 0, 1]. Then I want to the output tensor is like this: [3.333, 3.666, 3.666, 3.333, 3.333, 3.666]. Here 3.333 is the average of group 0 [1, 4, 5] and 3.666 is the average of group 1 [2, 3, 6].
I am not sure if you can separate the inputs as you described above directly in Keras or Tensorflow. Here is what I could come up with:
Create a mask corresponding to each class where 1 is for the element at the index being in the class and 0 for any element of another class. So in your example, you would do [0,1,1,0,0,1] for one class and [1,0,0,1,1,0] for the other. ( if you have more classes, you will correspondingly have more masks )
Stack those vectors to get a 3-D tensor and do 1D convolution with 0 stride. Use tf.nn.conv1d(). Think of those masks as filters of a Convolution operation and it's separating the classes. Be sure to reshape your Tensors to match the operation requirements.
After the convolution, you will have a 3-D Tensor where each vector would contain a classes elements. For your example you should get a Tensor with two vectors as [0,2,3,0,0,6] and [1,0,0,4,5,0]. Use tf.reduce_mean() on the correct axis to get the average of each class.
Multiply the Tensor of the mean : [[3.333], [3.666]] with the masks using tf.multiply() and add the vectors using tf.reduce_sum() on the correct axis. And it should result in the vector you desire.
I have figured out a method. It can be archived by matrix manipulation. First turn the cluster vector to a categorical matrix, for example, if the batch size is 6, the categorical matrix (cluster) is like:
1, 0
1, 0
0, 1
0, 1
1, 0
0, 1
then we generate a cluster_mean matrix:
1/3, 0
1/3, 0
0, 1/3
0, 1/3
1/3, 0
0, 1/3
If we have an input matrix n*b (n is the number of features and b is the batch), then we can get average by cluster by using
cluster * t(cluster_mean) * input
Transpose, average and dot product can be archived by using tensorflow functions.
Related
I'm using the MNIST handwritten numerals dataset to train a CNN.
After training the model, i use predict like this:
predictions = cnn_model.predict(test_images)
predictions[0]
and i get output as:
array([2.1273775e-06, 2.9292005e-05, 1.2424786e-06, 7.6307842e-05,
7.4305902e-08, 7.2301691e-07, 2.5368356e-08, 9.9952960e-01,
1.2401938e-06, 1.2787555e-06], dtype=float32)
In the output, there are 10 probabilities, one for each of numeral from 0 to 9. But how do i know which probability refers to which numeral ?
In this particular case, the probabilities are arranged sequentially for numerals 0 to 9. But why is that ? I didn't define that anywhere.
I tried going over documentation and example implementations found elsewhere on the internet, but no one seems to have addressed this particular behaviour.
Edit:
For context, I've defined my train/test data by:
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = (np.expand_dims(train_images, axis=-1)/255.).astype(np.float32)
train_labels = (train_labels).astype(np.int64)
test_images = (np.expand_dims(test_images, axis=-1)/255.).astype(np.float32)
test_labels = (test_labels).astype(np.int64)
And my model consists of a a few convulution and pooling layers, then a Flatten layer, then a Dense layer with 128 neurons and an output Dense layer with 10 neurons.
After that I simply fit my model and use predict like this:
model.fit(train_images, train_labels, batch_size=BATCH_SIZE, epochs=EPOCHS)
predictions = cnn_model.predict(test_images)
I don't see where I've instructed my code to output first neuron as digit 0, second neuron as digit 1 etc
And if i wanted to change the the sequence in which the resulting digits are output, where do i do that ?
This is really confusing me a lot.
Models work with numbers. Your classes/labels should be represented as numbers (e.g., 0, 1, ...., n). The prediction is always indexed to show probabilities for class 0 at index 0, class 1 at index 1. Now in the MNIST case, you are lucky the labels are integers 0 to 9. Suppose you had to classify images into three classes: cars, bicycles, trucks. You must represent those classes as numerical values. You can arrange it as you wish. If you choose this: {cars: 0, bicycles: 1, trucks: 2}, in other words, if you label your cars as 0, bicycles as 1, and trucks as 2, then your prediction would show probability for cars at index 0, bicycles at index 1 and trucks at index 2.
You could have also decided to choose this setting: {cars: 2, bicycles: 0, trucks: 1}, then your prediction would show probability for cars at index 2, bicycles at index 0 and trucks at index 1, and so on.
The point is, you have to show your classes (as many as you have) as integers indexed from 0 to n where n is the num_classes-1. Your probabilities at prediction would be indexed as such. You don't have to tell the model.
Hope this is now clear.
It depends on how you prepare your labels during training. With MNIST classification, usually, there are two different ways:
One-hot Labels: There are 10 labels in the MNIST data, therefore for each example (image), you create a label array (vector) of length 10 where all the elements are zero except the index corresponding to the digit that your input image is showing. For example, if your input image is showing the digit 8, your label contains zeros everywhere except at the 8th index (e.g. [0,0,0,0,0,0,0,0,1,0]). If your image is showing the digit 2, your label would be something like [0,0,1,0,0,0,0,0,0,0] and so on.
Sparse Labels: you just label each image directly by what digit it is showing, for example if your image is showing the digit 8, your label is a single number with value 8.
In both cases, you could choose the labels however you want, in the MNIST classification it is just intuitive to use the labels 0-9 to show digits 0-9.
Thus, in the prediction, the probability at index 0 is for digit 0, index 1 for digit 1, and so on.
You could choose to prepare your labels differently. For example you could decide to show your labels as follows:
label for digit 0: 9
label for digit 1: 8
label for digit 2: 7
label for digit 3: 6
label for digit 4: 5
label for digit 5: 4
label for digit 6: 3
label for digit 7: 2
label for digit 8: 1
label for digit 9: 0
You could train your model the same way but in this case, the probabilities in the prediction would be inverted. Probability at index 0 would be for digit 9, index 1 for digit 8, and so on.
In short, you have to define your labels using integer indices, but it is up to you to decide and remember what index you chose to refer to which label/class.
Having multiple 2D flow maps, ie vector fields how would one find statistical correlation between pairs of these?
The problem:
One should not (?) resize 2 flow maps of shape (x,y,2): flow1, flow2 to 1D vectors and run
np.correlation_coeff(flow1.reshape(1,-1),flow2.reshape(1,-1))
since x,y entries are connected.
Plotting yields, for visualization purposes only:
flow1:
flow2:
I am thinking about comparing magnitudes and direction.
How would one ideally compare those (cosinus-distance, ...)?
How would one compare covariance between vector fields?
Edit:
I am aware that np.corrcoef(flow1.reshape(2,-1), flow2.reshape(2,-1)) would return a 4,4 correlation coefficient matrix but find it unintuitive to interpret.
For some measures of similarity it may indeed be desirable to take the spatial structure of the domain into account. But a coefficient of correlation does not do that: it is invariant under any permutations of the domain. For example, the correlation between (0, 1, 2, 3, 4) and (1, 2, 4, 8, 16) is the same as between (1, 4, 2, 0, 3) and (2, 16, 4, 1, 8) where both arrays were reshuffled in the same way.
So, the coefficient of correlation would be obtained by:
Centering both arrays, i.e., subtracting their mean. Say, we get FC1 and FC2.
Taking the inner product FC1 and FC2 : this is just the sum of the products of matching entries.
Dividing by the square roots of the inner products FC1*FC1 and FC2*FC2.
Example:
flow1 = np.random.uniform(size=(10, 10, 2)) # the 3rd dimension is for the components
flow2 = flow1 + np.random.uniform(size=(10, 10, 2))
flow1_centered = flow1 - np.mean(flow1, axis=(0, 1))
flow2_centered = flow2 - np.mean(flow2, axis=(0, 1))
inner_product = np.sum(flow1_centered*flow2_centered)
r = inner_product/np.sqrt(np.sum(flow1_centered**2) * np.sum(flow2_centered**2))
Here the flows have some positive correlation because I included flow2 in flow1. Specifically, it's a number around 1/sqrt(2), subject to random noise.
If this is not what you want, then you don't want the coefficient of correlation but some other measure of similarity.
I recently began learning tensorflow.
I am unsure about whether there is a difference
x = np.array([[1],[2],[3],[4],[5]])
dataset = tf.data.Dataset.from_tensor_slices(x)
ds.shuffle(buffer_size=4)
ds.batch(4)
and
x = np.array([[1],[2],[3],[4],[5]])
dataset = tf.data.Dataset.from_tensor_slices(x)
ds.batch(4)
ds.shuffle(buffer_size=4)
Also, I am not sure why I cannot use
dataset = dataset.shuffle_batch(buffer_size=2,batch_size=BATCH_SIZE)
as it gives the error
dataset = dataset.shuffle_batch(buffer_size=2,batch_size=BATCH_SIZE)
AttributeError: 'TensorSliceDataset' object has no attribute 'shuffle_batch'
Thank you!
TL;DR: Yes, there is a difference. Almost always, you will want to call Dataset.shuffle() before Dataset.batch(). There is no shuffle_batch() method on the tf.data.Dataset class, and you must call the two methods separately to shuffle and batch a dataset.
The transformations of a tf.data.Dataset are applied in the same sequence that they are called. Dataset.batch() combines consecutive elements of its input into a single, batched element in the output.
We can see the effect of the order of operations by considering the following two datasets:
tf.enable_eager_execution() # To simplify the example code.
# Batch before shuffle.
dataset = tf.data.Dataset.from_tensor_slices([0, 0, 0, 1, 1, 1, 2, 2, 2])
dataset = dataset.batch(3)
dataset = dataset.shuffle(9)
for elem in dataset:
print(elem)
# Prints:
# tf.Tensor([1 1 1], shape=(3,), dtype=int32)
# tf.Tensor([2 2 2], shape=(3,), dtype=int32)
# tf.Tensor([0 0 0], shape=(3,), dtype=int32)
# Shuffle before batch.
dataset = tf.data.Dataset.from_tensor_slices([0, 0, 0, 1, 1, 1, 2, 2, 2])
dataset = dataset.shuffle(9)
dataset = dataset.batch(3)
for elem in dataset:
print(elem)
# Prints:
# tf.Tensor([2 0 2], shape=(3,), dtype=int32)
# tf.Tensor([2 1 0], shape=(3,), dtype=int32)
# tf.Tensor([0 1 1], shape=(3,), dtype=int32)
In the first version (batch before shuffle), the elements of each batch are 3 consecutive elements from the input; whereas in the second version (shuffle before batch), they are randomly sampled from the input. Typically, when training by (some variant of) mini-batch stochastic gradient descent, the elements of each batch should be sampled as uniformly as possible from the total input. Otherwise, it is possible that the network will overfit to whatever structure was in the input data, and the resulting network will not achieve as high an accuracy.
Fully agree to #mrry, but there exists one case where you might want to do batching before shuffling. Suppose you're processing some text data which will be feed into an RNN. Here each sentence is treated as one sequence, and one batch will contain multiple sequences. Since the length of sentences is variable, we need to pad the sentences in a batch to a uniform length. An efficient way to do this is to group sentences of similar length together through batching, and then do shuffling. Otherwise, we may end up batches which are full with the <pad> token.
I'm now trying to use tf.losses.sigmoid_cross_entropy on an unbalanced dataset. However, I'm a little confused on the parameter weights. Here are the comments in the documentation:
weights: Optional Tensor whose rank is either 0, or the same rank as
labels, and must be broadcastable to labels (i.e., all dimensions must
be either 1, or the same as the corresponding losses dimension).
I know in tf.losses.softmax_cross_entropy the parameter weights can be a rank 1 tensor with weight for each sample. Why must the weights in tf.losses.sigmoid_cross_entropy have the same rank as labels?
Can anybody answer me? Better with an example.
You want your loss to be weighted and so tensorflow expects that you will provide it weight for each of your label. Consider the following example
Labels: [0, 0, 0, 1, 0]
possible_weights1: [1]
possible_weights2: [1, 2, 1, 1, 1]
illegal_weights1: [1, 2]
illegal_weights2: [[1], [2]]
Here your labels have rank 1 (only 1 dimension), so tensorflow expects that either you'll provide weight for each of the element in label (as demonstrated in possible_weights2) or will provide weight for each dimension (as demonstrated in possible_weights1, which is broadcasted to [1, 1, 1, 1, 1]).
But, if you have illegal_weights2 as your weights, then tensorflow does not understand how it should handle the two dimensions in the weights, since there is only one dimension in labels? So your rank should always be same.
illegal_weights1 is case where rank is same but weights are neither of the same length as labels, nor of length 1 (which can be broadcasted), but are of length 2 which cannot be broadcasted and hence is illegal.
My task is to transform a special formed dense matrix tensor into a sparse one. e.g. input matrix M as followed (dense positive integer sequence followed by 0 as padding in each row)
[[3 5 7 0]
[2 2 0 0]
[1 3 9 0]]
Additionally, given the non-padding length for each row, e.g. given by tensor L =
[3, 2, 3].
The desired output would be sparse tensor S.
SparseTensorValue(indices=array([[0, 0],[0, 1],[0, 2],[1, 0],[1, 1],[2, 0],[2, 1], [2, 2]]), values=array([3, 5, 7, 2, 2, 1, 3, 9], dtype=int32), shape=array([3, 4]))
This is useful in models where objects are described by variable-sized descriptors (S are then used in embedding_lookup_sparse to connect embeddings of descriptors.)
I am able to do it when number of M's row is known (by python loop and ops like slice and concat). However, M's row number here is determined by mini-batch size and could change (say in testing phase). Is there a good way to implement that? I am trying some control_flow_ops but haven't succeeded.
Thanks!!