Tensorflow tf.metrics.accuracy multi-label always zero - tensorflow

My label looks like this:
label = [0, 1, 0, 0, 1, 1, 0]
In other words, classes 1, 4, 5 are present at the corresponding sample. I believe this is called a soft class.
I'm calculating my loss with:
logits = tf.layers.dense(encoding, 7, activation=None)
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(
labels=labels,
logits=logits
)
loss = tf.reduce_mean(cross_entropy)
According to Tensorboard, the loss is decreasing over time, as expected. However, the accuracy is flat at zero:
eval_metric_ops = {
'accuracy': tf.metrics.accuracy(labels=labels, predictions=logits),
}
tf.summary.scalar('accuracy', eval_metric_ops['accuracy'][1])
How do I calculate the accuracy of my model when using soft classes?

Did you solve this? I think the comment about softmax_cross_entropy_with_logits is incorrect because you have a multi-label, (each label is a) binary-class problem.
Partial solution:
labels = tf.constant([1, 1, 1, 0, 0, 0]) # example
predicitons = tf.constant([0, 1, 0, 0, 1, 0]) # example
is_equal = tf.equal(label, predicitons)
accuracy = tf.reduce_mean(tf.cast(is_equal, tf.float32))
This gives a number but still need to convert it into a tf metric.

Related

Determining input shape for tensorflow keras LSTM?

I'm having a bit of trouble with this. To start off, here is what my data is like:
test_data, test_labels, train_data, train_labels
train_data[0]
[1, 5, 5, 0, 0, 1, 1, 1, 25, 1, 1, 10, 0, 1, 1, 1, 0, 1, 39, 2, 0, 1, 1, 12, 3]
train_labels[0]
0
It's the exact same for test_data and test_labels (it's just a 50/50 split of input data). The array size for each array in test_data will always be 25 elements. The label is either 0 for good or 1 for bad.
Now, I've tried lot's of things so far and can't come up with how to reshape these arrays. I'm essentially trying to do this:
model.add(keras.layers.LSTM(256, input_shape=unknown, return_sequences=False, return_state=False, dropout=0.2))
model.add(keras.layers.Dense(256))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.Dense(2, activation=tf.nn.softmax))
history = self.model.fit(self.train_data,
self.train_labels,
epochs=50,
batch_size=64,
verbose=1,
validation_split=0.2)
Another question, is 2 correct for the last dense layer, or should it be 1 in this case?

Tensorflow confusion matrix using one-hot code

I have multi-class classification using RNN and here is my main code for RNN:
def RNN(x, weights, biases):
x = tf.unstack(x, input_size, 1)
lstm_cell = rnn.BasicLSTMCell(num_unit, forget_bias=1.0, state_is_tuple=True)
stacked_lstm = rnn.MultiRNNCell([lstm_cell]*lstm_size, state_is_tuple=True)
outputs, states = tf.nn.static_rnn(stacked_lstm, x, dtype=tf.float32)
return tf.matmul(outputs[-1], weights) + biases
logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)
cost =tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(cost)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
I have to classify all inputs to 6 classes and each of classes is composed of one-hot code label as the follow:
happy = [1, 0, 0, 0, 0, 0]
angry = [0, 1, 0, 0, 0, 0]
neutral = [0, 0, 1, 0, 0, 0]
excited = [0, 0, 0, 1, 0, 0]
embarrassed = [0, 0, 0, 0, 1, 0]
sad = [0, 0, 0, 0, 0, 1]
The problem is I cannot print confusion matrix using tf.confusion_matrix() function.
Is there any way to print confusion matrix using those labels?
If not, how can I convert one-hot code to integer indices only when I need to print confusion matrix?
You cannot generate confusion matrix using one-hot vectors as input parameters of labels and predictions. You will have to supply it a 1D tensor containing your labels directly.
To convert your one hot vector to normal label, make use of argmax function:
label = tf.argmax(one_hot_tensor, axis = 1)
After that you can print your confusion_matrix like this:
import tensorflow as tf
num_classes = 2
prediction_arr = tf.constant([1, 1, 1, 1, 0, 0, 0, 0, 1, 1])
labels_arr = tf.constant([0, 1, 1, 1, 1, 1, 1, 1, 0, 0])
confusion_matrix = tf.confusion_matrix(labels_arr, prediction_arr, num_classes)
with tf.Session() as sess:
print(confusion_matrix.eval())
Output:
[[0 3]
[4 3]]

Using batch norm when restore the model?

I have a little problem that using the batch norm when restore the model in tensorflow.
Below is my batch norm which from here:
def _batch_normalization(self, input_tensor, is_training, batch_norm_epsilon, decay=0.999):
"""batch normalization for dense nets.
Args:
input_tensor: `tensor`, the input tensor which needed normalized.
is_training: `bool`, if true than update the mean/variance using moving average,
else using the store mean/variance.
batch_norm_epsilon: `float`, param for batch normalization.
decay: `float`, param for update move average, default is 0.999.
Returns:
normalized params.
"""
# actually batch normalization is according to the channels dimension.
input_shape_channels = int(input_tensor.get_shape()[-1])
# scala and beta using in the the formula like that: scala * (x - E(x))/sqrt(var(x)) + beta
scale = tf.Variable(tf.ones([input_shape_channels]))
beta = tf.Variable(tf.zeros([input_shape_channels]))
# global mean and var are the mean and var that after moving averaged.
global_mean = tf.Variable(tf.zeros([input_shape_channels]), trainable=False)
global_var = tf.Variable(tf.ones([input_shape_channels]), trainable=False)
# if training, then update the mean and var, else using the trained mean/var directly.
if is_training:
# batch norm in the channel axis.
axis = list(range(len(input_tensor.get_shape()) - 1))
batch_mean, batch_var = tf.nn.moments(input_tensor, axes=axis)
# update the mean and var.
train_mean = tf.assign(global_mean, global_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(global_var, global_var * decay + batch_var * (1 - decay))
with tf.control_dependencies([train_mean, train_var]):
return tf.nn.batch_normalization(input_tensor,
batch_mean, batch_var, beta, scale, batch_norm_epsilon)
else:
return tf.nn.batch_normalization(input_tensor,
global_mean, global_var, beta, scale, batch_norm_epsilon)
I train the model and save it using tf.train.Saver(). Below is the test code:
def inference(self, images_for_predict):
"""load the pre-trained model and do the inference.
Args:
images_for_predict: `tensor`, images for predict using the pre-trained model.
Returns:
the predict labels.
"""
tf.reset_default_graph()
images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
predictions = []
correct = 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# saver = tf.train.import_meta_graph('./models/dense_nets_model/dense_nets.ckpt.meta')
# saver.restore(sess, tf.train.latest_checkpoint('./models/dense_nets_model/'))
saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt')
for i in range(100):
pred, corr = sess.run([tf.argmax(prediction, 1), accuracy],
feed_dict={
images: [images_for_predict.images[i]],
labels: [images_for_predict.labels[i]]})
correct += corr
predictions.append(pred[0])
print("PREDICTIONS:", predictions)
print("ACCURACY:", correct / 100)
But the predict result always very bad, like that:
('PREDICTIONS:', [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
('ACCURACY:', 0.080000000000000002)
Some tips: images_for_predict = mnist.test and the self._build_graph method has two params: batch_size and is_training.
Anyone can help me?
After trying a lot of methods, I solve this problem, below are what I did.
First thanks to #gdelab, I used tf.layers.batch_normalization instead, so my batch norm function like that:
def _batch_normalization(self, input_tensor, is_training):
return tf.layers.batch_normalization(input_tensor, training=is_training)
The param is_training is a placeholder like that: is_training = tf.placeholder(tf.bool)
when building your graph, remember to add this code in your optimize:
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
train_step = tf.train.AdamOptimizer(self.learning_rate).minimize(cross_entropy)
because the tf.layers.batch_normalization adds to update mean and variance don't automatically get added as dependencies of the train operation - so if you don't do anything extra, they never get run.
So begain to train the net, after finish the training, save the model using the code like that:
saver = tf.train.Saver(var_list=tf.global_variables())
savepath = saver.save(sess, 'here_is_your_personal_model_path')
Note that var_list=tf.global_variables() param make sure tensorflow save all the params include the global mean/var which are set not trainable.
when restore and test the model, do like that:
# build the graph like training:
images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
saver = tf.train.Saver()
saver.restore(sess, 'here_is_your_personal_model_path')
And now one can test his/her model, hope that it can help u, thanks!
Seeing your implementation of batch norm, when you load your model, you need to keep the graph built with images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False) and load the weight values for the chekpoint, but NOT the meta graph. I think that saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt') also restores the meta graph now (sorry if I'm wrong), so you need to restore only the "data" part of it.
Otherwise, you're just using the graph for training, in which the mean and variance used in batch norm are the ones obtained from the batch. But when you're testing the batch has size 1, so normalizing by the mean and variance of the batch always brings your data to 0, hence the constant output.
In any case, I'd suggest using tf.layers.batch_normalization instead, with a is_training placeholder that you'll need to feed to your network...

Out of Memory error at model.compile

I have a relatively large multi-layer regression model that I want to train end-to-end. My training is a two-step procedure in which I first minimize the euclidean loss and then I minimize my loss. Effectively, this means the following pseudo-code:
model.compile(optimizer='Adam', loss='mse')
model.fit()
model.compile(optimizer='Adam', loss=my_metric)
model.fit()
I am able to run the first two statements without any problems. But I get an out of memory error when my code reaches the second model.compile statement. What should I do differently to avoid this problem ?
Edited to include my_metric. Think of y_true and y_pred as 3-dim vectors. First I'm minimizing the euclidean distance between them to initialize the weights and then I minimize a geodesic loss between them.
# compute geodesic viewpoint loss
def my_metric(y_true, y_pred):
# compute angles
angle_true = K.sqrt(K.sum(K.square(y_true), axis=1))
angle_pred = K.sqrt(K.sum(K.square(y_pred), axis=1))
# compute axes
axis_true = K.l2_normalize(y_true, axis=1)
axis_pred = K.l2_normalize(y_pred, axis=1)
# convert axes to corresponding skew-symmetric matrices
proj = tf.constant(np.asarray([[0, -1, 0, 1, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, -1, 0, 0], [0, 0, 0, 0, 0, -1, 0, 1, 0]]), dtype=tf.float32)
skew_true = K.dot(axis_true, proj)
skew_pred = K.dot(axis_pred, proj)
skew_true = K.map_fn(lambda x: K.reshape(x, [3, 3]), skew_true)
skew_pred = K.map_fn(lambda x: K.reshape(x, [3, 3]), skew_pred)
# compute rotation matrices and do a dot product
R = tf.map_fn(my_R, (skew_true, skew_pred, angle_true, angle_pred), dtype=tf.float32)
# compute the angle error
theta = K.map_fn(get_theta, R)
return K.mean(theta)
# function to compute R1^T R2 given the axis angle representations (\theta_1, v_1) and (\theta_2, v_2)
# x is a list that contains x[0] = v_1, x[1] = v_2, x[2] = \theta_1, x[3] = \theta_2
# note that the v_1 and v_2 are skew-symmetric matrices corresponding to the 3-dim vectors in this function
def my_R(x):
R1 = K.eye(3) + K.sin(x[2]) * x[0] + (1.0 - K.cos(x[2])) * K.dot(x[0], x[0])
R2 = K.eye(3) + K.sin(x[3]) * x[1] + (1.0 - K.cos(x[3])) * K.dot(x[1], x[1])
return K.dot(K.transpose(R1), R2)
# Rodrigues' formula
def get_theta(x):
return K.abs(tf.acos(K.clip(0.5*(tf.reduce_sum(tf.diag_part(x))-1.0), -1.0+1e-7, 1.0-1e-7)))

TensorFlow: questions regarding tf.argmax() and tf.equal()

I am learning the TensorFlow, building a multilayer_perceptron model. I am looking into some examples like the one at: https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/multilayer_perceptron.ipynb
I then have some questions in the code below:
def multilayer_perceptron(x, weights, biases):
:
:
pred = multilayer_perceptron(x, weights, biases)
:
:
with tf.Session() as sess:
sess.run(init)
:
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Accuracy:", accuracy.eval({x: X_test, y: y_test_onehot}))
I am wondering what do tf.argmax(prod,1) and tf.argmax(y,1) mean and return (type and value) exactly? And is correct_prediction a variable instead of real values?
Finally, how do we get the y_test_prediction array (the prediction result when the input data is X_test) from the tf session? Thanks a lot!
tf.argmax(input, axis=None, name=None, dimension=None)
Returns the index with the largest value across axis of a tensor.
input is a Tensor and axis describes which axis of the input Tensor to reduce across. For vectors, use axis = 0.
For your specific case let's use two arrays and demonstrate this
pred = np.array([[31, 23, 4, 24, 27, 34],
[18, 3, 25, 0, 6, 35],
[28, 14, 33, 22, 20, 8],
[13, 30, 21, 19, 7, 9],
[16, 1, 26, 32, 2, 29],
[17, 12, 5, 11, 10, 15]])
y = np.array([[31, 23, 4, 24, 27, 34],
[18, 3, 25, 0, 6, 35],
[28, 14, 33, 22, 20, 8],
[13, 30, 21, 19, 7, 9],
[16, 1, 26, 32, 2, 29],
[17, 12, 5, 11, 10, 15]])
Evaluating tf.argmax(pred, 1) gives a tensor whose evaluation will give array([5, 5, 2, 1, 3, 0])
Evaluating tf.argmax(y, 1) gives a tensor whose evaluation will give array([5, 5, 2, 1, 3, 0])
tf.equal(x, y, name=None) takes two tensors(x and y) as inputs and returns the truth value of (x == y) element-wise.
Following our example, tf.equal(tf.argmax(pred, 1),tf.argmax(y, 1)) returns a tensor whose evaluation will givearray(1,1,1,1,1,1).
correct_prediction is a tensor whose evaluation will give a 1-D array of 0's and 1's
y_test_prediction can be obtained by executing pred = tf.argmax(logits, 1)
The documentation for tf.argmax and tf.equal can be accessed by following the links below.
tf.argmax() https://www.tensorflow.org/api_docs/python/math_ops/sequence_comparison_and_indexing#argmax
tf.equal() https://www.tensorflow.org/versions/master/api_docs/python/control_flow_ops/comparison_operators#equal
Reading the documentation:
tf.argmax
Returns the index with the largest value across axes of a tensor.
tf.equal
Returns the truth value of (x == y) element-wise.
tf.cast
Casts a tensor to a new type.
tf.reduce_mean
Computes the mean of elements across dimensions of a tensor.
Now you can easily explain what it does. Your y is one-hot encoded, so it has one 1 and all other are zero. Your pred represents probabilities of classes. So argmax finds the positions of best prediction and correct value. After that you check whether they are the same.
So now your correct_prediction is a vector of True/False values with the size equal to the number of instances you want to predict. You convert it to floats and take the average.
Actually this part is nicely explained in TF tutorial in the Evaluate the Model part
tf.argmax(input, axis=None, name=None, dimension=None)
Returns the index with the largest value across axis of a tensor.
For the case in specific, it receives pred as argument for it's input and 1 as axis. The axis describes which axis of the input Tensor to reduce across. For vectors, use axis = 0.
Example: Given the list [2.11,1.0021,3.99,4.32] argmax will return 3 which is the index of the highest value.
correct_prediction is a tensor that will be evaluated later. It is not a regular python variable. It contains the necessary information to compute the value later.
For this specific case, it will be part of another tensor accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) and will be evaluated by eval on accuracy.eval({x: X_test, y: y_test_onehot}).
y_test_prediction should be your correct_prediction tensor.
For those who do not have much time to understand tf.argmax:
x = np.array([[1, 9, 3],[4, 5, 6]])
tf.argmax(x, axis = 0)
output:
[array([1, 0, 1], dtype=int64)]
tf.argmax(x, axis = 1)
Output:
[array([1, 2], dtype=int64)]
source