How BatchNormalization in keras works? - tensorflow

I want to know how BatchNormalization works in keras, so I write the code:
X_input = keras.Input((2,))
X = keras.layers.BatchNormalization(axis=1)(X_input)
model1 = keras.Model(inputs=X_input, outputs=X)
the input is a batch of two dimenstions vector, and normalizing it along axis=1, then print the output:
a = np.arange(4).reshape((2,2))
print('a=')
print(a)
print('output=')
print(model1.predict(a,batch_size=2))
and the output is:
a=
array([[0, 1],
[2, 3]])
output=
array([[ 0. , 0.99950039],
[ 1.99900079, 2.9985013 ]], dtype=float32)
I can not figure out the results. As far as I know, the mean of the batch should be ([0,1] + [2,3])/2 = [1,2], the var is 1/2*(([0,1] - [1,2])^2 + ([2,3]-[1,2])^2) = [1,1]. Finally, normalizing it with (x - mean)/sqrt(var), therefore the results are [-1, -1] and [1,1], where am I wrong?

BatchNormalization will substract the mean, divide by the variance, apply a factor gamma and an offset beta. If these parameters would actually be the mean and variance of your batch, the result would be centered around zero with variance 1.
But they are not. The keras BatchNormalization layer stores these as weights that can be trained, called moving_mean, moving_variance, beta and gamma. They are initialized as beta=0, gamma=1, moving_mean=0 and moving_variance=1. Since you don't have any train steps, BatchNorm does not change your values.
So, why don't you get exactly your input values? Because there is another parameter epsilon (a small number), which gets added to the variance. Therefore, all values are divided by 1+epsilon and end up a little bit below their input values.

Related

How can I assign certain samples as negative samples when using sampled_softmax_loss in TensorFlow?

The API of sampled_softmax_loss goes like:
tf.nn.sampled_softmax_loss(
weights,
biases,
labels,
inputs,
num_sampled,
num_classes,
num_true=1,
sampled_values=None,
...
)
I've noticed that arg sampled_values is the one which determines what negatives samples we take and it's returned by a _candidate_sampler function like tf.random.fixed_unigram_candidate_sampler.
And in tf.random.fixed_unigram_candidate_sampler we can decide the probability of each sample chosen as negative sample.
How can I assign certain sample as negative sample on purpose?
For instance, in the case of recommender system, I'd like to add some hard negative sample to the model. So I want the hard negative samples been chosen for sure, not by probability like in _candidate_sampler function
How can I assign certain samples as negative samples when using sampled_softmax_loss in TensorFlow?
You need to understand that the sampler candidates function is only a remarks function and your question is right about how to create a negative sampler.
You don't need to create a negative sampler when you assigned a unique. The sampler is (sampled_candidates, true_expected_count, sampled_expected_count). Hard negative is when you add contrast values to significant the candidates. In this way, you can have it with distributions.
Random Uniform Candidates Sampler
Candidate Sampling
Sampled SoftMax
Simple: It is weight and bias are varies, and functions are the same.
import tensorflow as tf
weights = tf.zeros([4, 1])
biases = tf.zeros([4])
labels = tf.ones([4, 1])
inputs = tf.zeros([4, 1])
num_sampled = 1
num_classes = 1
true_classes = tf.ones([4, 4], dtype=tf.int64)
num_true = 4
num_sampled = 1
unique =True
range_max = 1
sampler = tf.random.uniform_candidate_sampler(
true_classes,
num_true,
num_sampled,
unique,
range_max,
seed=None,
name=None
)
loss_fn = tf.nn.sampled_softmax_loss(
weights,
biases,
labels,
inputs,
num_sampled,
num_classes,
num_true=1,
sampled_values=sampler,
remove_accidental_hits=True,
seed=None,
name='sampled_softmax_loss'
)
print( loss_fn )
Output: Value output as examples, and ran three times.
tf.Tensor([6.437752 6.437752 6.437752 6.437752], shape=(4,), dtype=float32)
tf.Tensor([6.437752 6.437752 6.437752 6.437752], shape=(4,), dtype=float32)
tf.Tensor([6.437752 6.437752 6.437752 6.437752], shape=(4,), dtype=float32)

How does tf.nn.moments calculate variance?

Look at the test example:
import tensorflow as tf
x = tf.constant([[1,2],[3,4],[5,6]])
mean, variance = tf.nn.moments(x, [0])
with tf.Session() as sess:
m, v = sess.run([mean, variance])
print(m, v)
The output is:
[3 4]
[2 2]
We want to calculate variance along the axis 0, the first column is [1,3,5], and mean = (1+3+5)/3=3, it is right, the variance = [(1-3)^2+(3-3)^2+(5-3)^2]/3=2.6666, but the output is 2, who can tell me how tf.nn.moments calculates variance?
By the way, view the API DOC, what does shift do?
The problem is that x is an integer tensor and TensorFlow, instead of forcing a conversion, performs the computation as good as it can without changing the type (so the outputs are also integers). You can pass float numbers in the construction of x or specify the dtype parameter of tf.constant:
x = tf.constant([[1,2],[3,4],[5,6]], dtype=tf.float32)
Then you get the expected result:
import tensorflow as tf
x = tf.constant([[1,2],[3,4],[5,6]], dtype=tf.float32)
mean, variance = tf.nn.moments(x, [0])
with tf.Session() as sess:
m, v = sess.run([mean, variance])
print(m, v)
>>> [ 3. 4.] [ 2.66666675 2.66666675]
About the shift parameter, it seems to allow you specify a value to, well, "shift" the input. By shift they mean subtract, so if your input is [1., 2., 4.] and you give a shift of, say, 2.5, TensorFlow would first subtract that amount and compute the moments from [-1.5, 0.5, 1.5]. In general, it seems safe to just leave it as None, which will perform a shift by the mean of the input, but I suppose there may be cases where giving a predetermined shift value (e.g. if you know or have an approximate idea of the mean of the input) may yield better numerical stability.
# Replace the following line with correct data dtype
x = tf.constant([[1,2],[3,4],[5,6]])
# suppose you don't want tensorflow to trim the decimal then use float data type.
x = tf.constant([[1,2],[3,4],[5,6]], dtype=tf.float32)
Results: array([ 2.66666675, 2.66666675], dtype=float32)
Note: from the original implementation shift is not used

How to deal with Imbalanced Dataset for Multi Label Classification

I was wondering how to penalize less represented classes more then other classes when dealing with a really imbalanced dataset (10 classes over about 20000 samples but here is th number of occurence for each class : [10868 26 4797 26 8320 26 5278 9412 4485 16172 ]).
I read about the Tensorflow function : weighted_cross_entropy_with_logits (https://www.tensorflow.org/api_docs/python/tf/nn/weighted_cross_entropy_with_logits) but I am not sure I can use it for a multi label problem.
I found a post that sum up perfectly the problem I have (Neural Network for Imbalanced Multi-Class Multi-Label Classification) and that propose an idea but it had no answers and I thought the idea might be good :)
Thank you for your ideas and answers !
First of all, there is my suggestion you can modify your cost function to use in a multi-label way. There is code which show how to use Softmax Cross Entropy in Tensorflow for multilabel image task.
With that code, you can multiple weights in each row of loss calculation. Here is the example code in case you have multi-label task: (i.e, each image can have two labels)
logits_split = tf.split( axis=1, num_or_size_splits=2, value= logits )
labels_split = tf.split( axis=1, num_or_size_splits=2, value= labels )
weights_split = tf.split( axis=1, num_or_size_splits=2, value= weights )
total = 0.0
for i in range ( len(logits_split) ):
temp = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits_split[i] , labels=labels_split[i] ))
total += temp * tf.reshape(weights_split[i],[-1])
I think you can just use tf.nn.weighted_cross_entropy_with_logits for multiclass classification.
For example, for 4 classes, where the ratios to the class with the largest number of members are [0.8, 0.5, 0.6, 1], You would just give it a weight vector in the following way:
cross_entropy = tf.nn.weighted_cross_entropy_with_logits(
targets=ground_truth_input, logits=logits,
pos_weight = tf.constant([0.8,0.5,0.6,1]))
So I am not entirely sure that I understand your problem given what you have written. The post you link to writes about multi-label AND multi-class, but that doesn't really make sense given what is written there either. So I will approach this as a multi-class problem where for each sample, you have a single label.
In order to penalize the classes, I implemented a weight Tensor based on the labels in the current batch. For a 3-class problem, you could eg. define the weights as the inverse frequency of the classes, such that if the proportions are [0.1, 0.7, 0.2] for class 1, 2 and 3, respectively, the weights will be [10, 1.43, 5]. Defining a weight tensor based on the current batch is then
weight_per_class = tf.constant([10, 1.43, 5]) # shape (, num_classes)
onehot_labels = tf.one_hot(labels, depth=3) # shape (batch_size, num_classes)
weights = tf.reduce_sum(
tf.multiply(onehot_labels, weight_per_class), axis=1) # shape (batch_size, num_classes)
reduction = tf.losses.Reduction.MEAN # this ensures that we get a weighted mean
loss = tf.losses.softmax_cross_entropy(
onehot_labels=onehot_labels, logits=logits, weights=weights, reduction=reduction)
Using softmax ensures that the classification problem is not 3 independent classifications.

batch_dot with variable batch size in Keras

I'm trying to writting a layer to merge 2 tensors with such a formula
The shapes of x[0] and x[1] are both (?, 1, 500).
M is a 500*500 Matrix.
I want the output to be (?, 500, 500) which is theoretically feasible in my opinion. The layer will output (1,500,500) for every pair of inputs, as (1, 1, 500) and (1, 1, 500). As the batch_size is variable, or dynamic, the output must be (?, 500, 500).
However, I know little about axes and I have tried all the combinations of axes but it doesn't make sense.
I try with numpy.tensordot and keras.backend.batch_dot(TensorFlow). If the batch_size is fixed, taking a =
(100,1,500) for example, batch_dot(a,M,(2,0)), the output can be (100,1,500).
Newbie for Keras, sorry for such a stupid question but I have spent 2 days to figure out and it drove me crazy :(
def call(self,x):
input1 = x[0]
input2 = x[1]
#self.M is defined in build function
output = K.batch_dot(...)
return output
Update:
Sorry for being late. I try Daniel's answer with TensorFlow as Keras's backend and it still raises a ValueError for unequal dimensions.
I try the same code with Theano as backend and now it works.
>>> import numpy as np
>>> import keras.backend as K
Using Theano backend.
>>> from keras.layers import Input
>>> x1 = Input(shape=[1,500,])
>>> M = K.variable(np.ones([1,500,500]))
>>> firstMul = K.batch_dot(x1, M, axes=[1,2])
I don't know how to print tensors' shape in theano. It's definitely harder than tensorflow for me... However it works.
For that I scan 2 versions of codes for Tensorflow and Theano. Following are differences.
In this case, x = (?, 1, 500), y = (1, 500, 500), axes = [1, 2]
In tensorflow_backend:
return tf.matmul(x, y, adjoint_a=True, adjoint_b=True)
In theano_backend:
return T.batched_tensordot(x, y, axes=axes)
(If following changes of out._keras_shape don't make influence on out's value.)
Your multiplications should select which axes it uses in the batch dot function.
Axis 0 - the batch dimension, it's your ?
Axis 1 - the dimension you say has length 1
Axis 2 - the last dimension, of size 500
You won't change the batch dimension, so you will use batch_dot always with axes=[1,2]
But for that to work, you must ajust M to be (?, 500, 500).
For that define M not as (500,500), but as (1,500,500) instead, and repeat it in the first axis for the batch size:
import keras.backend as K
#Being M with shape (1,500,500), we repeat it.
BatchM = K.repeat_elements(x=M,rep=batch_size,axis=0)
#Not sure if repeating is really necessary, leaving M as (1,500,500) gives the same output shape at the end, but I haven't checked actual numbers for correctness, I believe it's totally ok.
#Now we can use batch dot properly:
firstMul = K.batch_dot(x[0], BatchM, axes=[1,2]) #will result in (?,500,500)
#we also need to transpose x[1]:
x1T = K.permute_dimensions(x[1],(0,2,1))
#and the second multiplication:
result = K.batch_dot(firstMul, x1T, axes=[1,2])
I prefer using TensorFlow so I tried to figure it out with TensorFlow in past few days.
The first one is much similar to Daniel's solution.
x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(None,3,3))
tf.matmul(x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>
It needs to feed values to M with fit shapes.
sess = tf.Session()
sess.run(tf.matmul(x,M), feed_dict = {x: [[[1,2,3]]], M: [[[1,2,3],[0,1,0],[0,0,1]]]})
# return : array([[[ 1., 4., 6.]]], dtype=float32)
Another way is simple with tf.einsum.
x = tf.placeholder('float32',shape=(None,1,3))
M = tf.placeholder('float32',shape=(3,3))
tf.einsum('ijk,lm->ikl', x, M)
# return: <tf.Tensor 'MatMul_22:0' shape=(?, 1, 3) dtype=float32>
Let's feed some values.
sess.run(tf.einsum('ijk,kl->ijl', x, M), feed_dict = {x: [[[1,2,3]]], M: [[1,2,3],[0,1,0],[0,0,1]]})
# return: array([[[ 1., 4., 6.]]], dtype=float32)
Now M is a 2D tensor and no need to feed batch_size to M.
What's more, now it seems such a question can be solved in TensorFlow with tf.einsum. Does it mean it's a duty for Keras to invoke tf.einsum in some situations? At least I find no where Keras calls tf.einsum. And in my opinion, when batch_dot 3D tensor and 2D tensor Keras behaves weirdly. In Daniel's answer, he pads M to (1,500,500) but in K.batch_dot() M will be adjusted to (500,500,1) automatically. I find tf will adjust it with Broadcasting rules and I'm not sure Keras does the same.

How does tensorflow batch_matmul work?

Tensorflow has a function called batch_matmul which multiplies higher dimensional tensors. But I'm having a hard time understanding how it works, perhaps partially because I'm having a hard time visualizing it.
What I want to do is multiply a matrix by each slice of a 3D tensor, but I don't quite understand what the shape of tensor a is. Is z the innermost dimension? Which of the following is correct?
I would most prefer the first to be correct -- it's most intuitive to me and easy to see in the .eval() output. But I suspect the second is correct.
Tensorflow says that batch_matmul performs:
out[..., :, :] = matrix(x[..., :, :]) * matrix(y[..., :, :])
What does that mean? What does that mean in the context of my example? What is being multiplied with with what? And why aren't I getting a 3D tensor the way I expected?
You can imagine it as doing a matmul over each training example in the batch.
For example, if you have two tensors with the following dimensions:
a.shape = [100, 2, 5]
b.shape = [100, 5, 2]
and you do a batch tf.matmul(a, b), your output will have the shape [100, 2, 2].
100 is your batch size, the other two dimensions are the dimensions of your data.
First of all tf.batch_matmul() was removed and no longer available. Now you suppose to use tf.matmul():
The inputs must be matrices (or tensors of rank > 2, representing
batches of matrices), with matching inner dimensions, possibly after
transposition.
So let's assume you have the following code:
import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)
Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_size of matrices nxm and batch_size of matrices mxk. Now for each pair of them you calculate nxm X mxk which gives you an nxk matrix. You will have batch_size of them.
Notice that something like this is also valid:
A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)
and will give you a shape (a, b, n, k)
You can now do it using tf.einsum, starting from Tensorflow 0.11.0rc0.
For example,
M1 = tf.Variable(tf.random_normal([2,3,4]))
M2 = tf.Variable(tf.random_normal([5,4]))
N = tf.einsum('ijk,lk->ijl',M1,M2)
It multiplies the matrix M2 with every frame (3 frames) in every batch (2 batches) in M1.
The output is:
[array([[[ 0.80474716, -1.38590837, -0.3379252 , -1.24965811],
[ 2.57852983, 0.05492432, 0.23039417, -0.74263287],
[-2.42627382, 1.70774114, 1.19503212, 0.43006262]],
[[-1.04652011, -0.32753903, -1.26430523, 0.8810069 ],
[-0.48935518, 0.12831448, -1.30816901, -0.01271309],
[ 2.33260512, -1.22395933, -0.92082584, 0.48991606]]], dtype=float32),
array([[ 1.71076882, 0.79229093, -0.58058828, -0.23246667],
[ 0.20446332, 1.30742455, -0.07969904, 0.9247328 ],
[-0.32047141, 0.66072595, -1.12330854, 0.80426538],
[-0.02781649, -0.29672042, 2.17819595, -0.73862702],
[-0.99663496, 1.3840003 , -1.39621222, 0.77119476]], dtype=float32),
array([[[ 0.76539308, 2.77609682, -1.79906654, 0.57580602, -3.21205115],
[ 4.49365759, -0.10607499, -1.64613271, 0.96234947, -3.38823152],
[-3.59156275, 2.03910899, 0.90939498, 1.84612727, 3.44476724]],
[[-1.52062428, 0.27325237, 2.24773455, -3.27834225, 3.03435063],
[ 0.02695178, 0.16020992, 1.70085776, -2.8645196 , 2.48197317],
[ 3.44154787, -0.59687197, -0.12784094, -2.06931567, -2.35522676]]], dtype=float32)]
I have verified, the arithmetic is correct.
tf.tensordot should solve this problem. It supports batch operations, e.g., if you want to contract a 2D tensor with a 3D tensor, with the latter having a batch dimension.
If a is shape [n,m] b is shape [?,m,l], then
y = tf.tensordot(b, a, axes=[1, 1]) will produce a tensor of shape [?,n,l]
https://www.tensorflow.org/api_docs/python/tf/tensordot
It is simply like splitting on the first dimension respectively, multiply and concat them back. If you want to do 3D by 2D, you can reshape, multiply, and reshape it back. I.e. [100, 2, 5] -> [200, 5] -> [200, 2] -> [100, 2, 2]
The answer to this particular answer is using tf.scan function.
If a = [5,3,2] #dimension of 5 batch, with 3X2 mat in each batch
and b = [2,3] # a constant matrix to be multiplied with each sample
then let def fn(a,x):
return tf.matmul(x,b)
initializer = tf.Variable(tf.random_number(3,3))
h = tf.scan(fn,outputs,initializer)
this h will store all the outputs.