How to compute accuracy of CNN in TensorFlow - tensorflow

I am new to TensorFlow. I am doing a binary classification with my own dataset. However I do not know how to compute the accuracy. Can anyone please help me with to do this?
My classifier has 5 convolutional layers followed by 2 fully connected layers. The final FC layer has an output dimension of 2 for which I have used:
prob = tf.nn.softmax(classification_features, name="output")

Just calculate the percentage of correct predictions:
prediction = tf.math.argmax(prob, axis=1)
equality = tf.math.equal(prediction, correct_answer)
accuracy = tf.math.reduce_mean(tf.cast(equality, tf.float32))

UPDATE 2020-11-23 Keras in Tensorflow
Now you can just specify you want it in the metrics parameter in model.compile.
This post is from 3.6 years ago when tensorflow was still in version 1. Now that Tensorflow.org suggests using the Keras calls you can specify you want accuracy like so:
model.compile(loss='mse',optimizer='sgd',metrics=['accuracy'])
model.fit(x,y)
BOOM! You've got accuracy in your report when you run "model.fit".
If you are using an older version of tensorflow or just writing it from scratch, #Androbin explains it well.

Related

Fine tuning embedding weights within my Tensorflow hub model for an unsupervised learning problem

Tensorflow Version: 1.15
I'm currently using the Universal Sentence Encoder embeddings for pairwise similarity. I'd like to fine-tune the Universal Sentence to improve embeddings quality and I've gotten to this point:
module = hub.Module("https://tfhub.dev/google/universal-sentence-encoder/2", trainable=True)
variables_names = [v.name for v in tf.trainable_variables()]
with tf.Session() as sess111:
init = tf.global_variables_initializer()
sess111.run(init)
values = sess111.run(variables_names)
#for k, v in zip(variables_names, values):
# print ("Variable: ", k)
print(values[0])
module_embeds = module(sentences)
values = sess111.run(variables_names)
print(values[0])
My first thought was to pass sentences through the USE module thinking it would update the trainable variables within a tf session which wasn't the case. So at this point, I have access to each of the trainable variables but I'm not sure how to proceed. Reviewing this tensorflow hub issue, they mention the following strategy:
Define a loss, and add an optimizer for that loss, then running the optimizer will update the trained weights of the embed module.
I'm entirely sure what the best way to do this would be for my use case. I've seen this notebook which retrains a classifier but I can't grasp how we end up extracting tuned weights that can be used to generate new embeddings.
Any help or guidance would be much appreciated.

Predict batches using Tensorflow Data API and Keras Model

Suppose I have a dataset and a Keras Model. The dataset has been divided into batches using batch() in tf Dataset API. Now I am seeking an efficient and clean way to do batch predictions for all testing samples.
I have tried the following code and it works.
batch_size = 32
dataset = dataset.batch(batch_size)
predictions = keras_model.predict(dataset, steps=math.ceil(num_testing_samples / batch_size))
I wonder is there any more efficient and elegant approach to implement this?
TF >= 1.14.0
You can just set steps=None. From the official documentation of tf.keras.Model.predict():
If x is a tf.data dataset and steps is None, predict will run until the input dataset is exhausted.
Just make sure that your dataset object is not in repeat mode and you are good to go :).
TF 1.12.0 & 1.13.0
The support for tf.data.Dataset with tf.keras is very poor in these versions. The tf.data.Dataset object is transformed into an iterator here, which then triggers an error here if you didn't set the stepsĀ argument. This is patched in 1.14.0.

Using Googlenet and Alexnet Model is not giving accuracy on the Cat vs Dog dataset

I am starting to learn Convolutional Neural Networks and have designed the famous MNIST and fashion-MNIST models and obtained good accuracy.
But then I moved to another trivial dataset that is cat vs. Dog dataset from Kaggle, but after applying all my concepts, I learned from Stanford lectures and Andrew ng lectures I was only able to get 80% accuracy. So, I decided to try the GoogleNet and Alexnet, but these model were not able to give me accuracy anything above 50% on 6 epochs.
I wanted to know whether the GoogleNet and ImageNet are designed for 1000 categories output and won't work on 2 categories output?
While making my own model I obtained an accuracy of 80%. I expected the famous GoogleNet model to give me more accuracy, but that's not the case.
Below is the GoogleNet model that I am using:
data=[]
labels=[]
for i in range(0,12499):
img=cv2.imread("train/cat."+str(i)+".jpg")
res = cv2.resize(img, dsize=(224, 224), interpolation=cv2.INTER_CUBIC)
data.append(res)
labels.append(0);
img2=cv2.imread("train/dog."+str(i)+".jpg")
res2 = cv2.resize(img2, dsize=(224,224),interpolation=cv2.INTER_CUBIC)
data.append(res2)
labels.append(1);
train_data, test_data,train_labels, test_labels = train_test_split(data,
labels,
test_size=0.2,
random_state=42)
model=tf.keras.Sequential()
model.add(layers.Conv2D(64,kernel_size=3,activation='relu', input_shape=
(224,224,3)))
model.add(layers.Conv2D(64,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(128,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(128,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(256,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(256,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3),strides=(2,2)))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(layers.Conv2D(512,kernel_size=3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dense(4096,activation='relu'))
model.add(Dense(4096,activation='relu'))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
loss='sparse_categorical_c rossentropy',metrics=['accuracy'])
model.fit(x=train_data,y=train_labels,batch_size=32,epochs=10,
validation_data=(test_data,test_labels))
The expected accuracy of the above google model should be more than 50%, but it's ranging between 50% and 51% after 6 epochs.
p.s I changed the last dense layer to 2 instead of 1000, and I am using Keras API for tensor flow.
Any help would be appreciated.
I struggled a bit with this earlier as well.I didn't try it yet on googlenet but I tried it on Alexnet. On Alexnet I managed to get relatively ok results (83%) for cats vs dogs after following closely to the paper. Few things you may want to do:
If you refer to the CS231n notes from Fei Fei Li
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
On slide 10, you will notice that the input layer should be 227 by 227 instead. They also provided the mathematical justification
why it is so.
I started to try and follow other items closely to the original
paper here:
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
These included:
As in the paper section 3.3, adding a normalization layer at the end of the first two max pooling layers. Keras has stopped supporting LRN but I added batch normalization and it works. (I ran an experiment of a model with batch normalization and without. The accuracy difference is 82% versus 62%
As in the paper section 4.2, I added two dropout layers (0.5) at the end of the two fully connected layers.
As in the paper section 5, I changed my batches to 128, SGD momentum of 0.9 and weight decay of 0.0005
As pointed above in one of the comments from your original question,
my final layer was also a single dimension with sigmoid function.
Training for 20 epochs gave me a 83% accuracy. In the original paper, they included data augmentation but I did not include it in my implementation.
Keras has a modified googlenet example. It is modified from the Xecption architecture, I believe one of the derivatives of the inception architecture.
https://keras.io/examples/vision/image_classification_from_scratch/
I have tried it and after running for 15 epochs, accuracy is about 90%
Hope this helps.

BatchNormalization in Keras

How do I update moving mean and moving variance in keras BatchNormalization?
I found this in tensorflow documentation, but I don't know where to put train_op or how to work it with keras models:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
No posts I found say what to do with train_op and whether you can use it in model.compile.
You do not need to manually update the moving mean and variances if you are using the BatchNormalization layer. Keras takes care of updating these parameters during training, and to keep them fixed during testing (by using the model.predict and model.evaluate functions, same as with model.fit_generator and friends).
Keras also keeps track of the learning phase so different codepaths run during training and validation/testing.
If you need just update the weights for existing model with some new values then you can do the following:
w = model.get_layer('batchnorm_layer_name').get_weights()
# Order: [gamma, beta, mean, std]
for j in range(len(w[0])):
gamma = w[0][j]
beta = w[1][j]
run_mean = w[2][j]
run_std = w[3][j]
w[2][j] = new_run_mean_value1
w[3][j] = new_run_std_value2
model.get_layer('batchnorm_layer_name').set_weights(w)
There are two interpretations of the question: the first is assuming that the goal is to use high level training api and this question was answered by Matias Valdenegro.
The second - as discussed in the comments - is whether it is possible to use batch normalization with the standard tensorflow optimizer as discussed here keras a simplified tensorflow interface and the section "Collecting trainable weights and state updates". As mentioned there the update ops are accessible in layer.updates and not in tf.GraphKeys.UPDATE_OPS, in fact if you have a keras model in tensorflow you can optimize with a standard tensorflow optimizer and batch normalization like this
update_ops = model.updates
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
and then use a tensorflow session to fetch the train_op. To distinguish training and evaluation modes of the batch normalization layer you need to feed the
learning phase state of the keras engine (see "Different behaviors during training and testing" on the same tutorial page as given above). This would work for example like this
...
# train
lo, _ = tf_sess.run(fetches=[loss, train_step],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 1})
...
# eval
lo = tf_sess.run(fetches=[loss],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 0})
I tried this in tensorflow 1.12 and it works with models containing batch normalization. Given my existing tensorflow code and in the light of approaching tensorflow version 2.0 I was tempted to use this approach myself, but given that this approach is not being mentioned in the tensorflow documentation I am not sure this will be supported in the long term and I finally have decided to not use it and to invest a little bit more to change the code to use the high level api.

Delayed echo of sin - cannot reproduce Tensorflow result in Keras

I am experimenting with LSTMs in Keras with little to no luck. At some moment I decided to scale back to the most basic problems in order finally achieve some positive result.
However, even with simplest problems I find that Keras is unable to converge while the implementation of the same problem in Tensorflow gives stable result.
I am unwilling to just switch to Tensorflow without understanding why Keras keeps diverging on any problem I attempt.
My problem is a many-to-many sequence prediction of delayed sin echo, example below:
Blue line is a network input sequence, red dotted line is an expected output.
The experiment was inspired by this repo and workable Tensorflow solution was also created from it too.
The relevant excerpts from the my code are below, and full version of my minimal reproducible example is available here.
Keras model:
model = Sequential()
model.add(LSTM(n_hidden,
input_shape=(n_steps, n_input),
return_sequences=True))
model.add(TimeDistributed(Dense(n_input, activation='linear')))
model.compile(loss=custom_loss,
optimizer=keras.optimizers.Adam(lr=learning_rate),
metrics=[])
Tensorflow model:
x = tf.placeholder(tf.float32, [None, n_steps, n_input])
y = tf.placeholder(tf.float32, [None, n_steps])
weights = {
'out': tf.Variable(tf.random_normal([n_hidden, n_steps], seed = SEED))
}
biases = {
'out': tf.Variable(tf.random_normal([n_steps], seed = SEED))
}
lstm = rnn.LSTMCell(n_hidden, forget_bias=1.0)
outputs, states = tf.nn.dynamic_rnn(lstm, inputs=x,
dtype=tf.float32,
time_major=False)
h = tf.transpose(outputs, [1, 0, 2])
pred = tf.nn.bias_add(tf.matmul(h[-1], weights['out']), biases['out'])
individual_losses = tf.reduce_sum(tf.squared_difference(pred, y),
reduction_indices=1)
loss = tf.reduce_mean(individual_losses)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) \
.minimize(loss)
I claim that other parts of code (data_generation, training) are completely identical. But learning progress with Keras stalls early and yields unsatisfactory predictions. Graphs of logloss for both libraries and example predictions are attached below:
Logloss for Tensorflow-trained model:
Logloss for Keras-trained model:
It's not easy to read from graph, but Tensorflow reaches target_loss=0.15 and stops early after about 10k batches. But Keras uses up all 13k batches reaching loss about only 1.5. In a separate experiment where Keras was running for 100k batches it went no further stalling around 1.0.
Figures below contain: black line - model input signal, green dotted line - ground truth output, red line - acquired model output.
Predictions of Tensorflow-trained model:
Predictions of Keras-trained model:
Thank you for suggestions and insights, dear colleagues!
Ok, I have managed to solve this. Keras implementation now converges steadily to a sensible solution too:
The models were in fact not identical. You may inspect with extra caution the Tensorflow model version from the question and verify for yourself that actual Keras equivalent is listed below, and isn't what stated in the question:
model = Sequential()
model.add(LSTM(n_hidden,
input_shape=(n_steps, n_input),
return_sequences=False))
model.add(Dense(n_steps, input_shape=(n_hidden,), activation='linear'))
model.compile(loss=custom_loss,
optimizer=keras.optimizers.Adam(lr=learning_rate),
metrics=[])
I will elaborate. Workable solution here uses that last column of size n_hidden spat out by LSTM as an intermediate activation then fed to the Dense layer.
So, in a way, the actual prediction here is made by the regular perceptron.
One extra take away note - source of mistake in the original Keras solution is already evident from the inference examples attached to question. We see there that earlier timestamps fail utterly, while later timestamps are near perfect. These earlier timestamps correspond to the states of LSTM when it were just initialized on new window and clueless of context.