Logging accuracy when using tf.estimator.Estimator - tensorflow

I'm following this tutorial - https://www.tensorflow.org/tutorials/estimators/cnn - to build a CNN using TensorFlow Estimators with the MNIST data.
I would like to visualize training accuracy and loss at each step, but I'm not sure how to do that with tf.train.LoggingTensorHook.

Related

Tensorflow model saving and reusing issue

I train a ml model in TensorFlow and at that time the test accuracy was high so I saved the model and now after some days I load that pretrained model and this time I'm not getting the same test accuracy on same test dataset the accuracy is decreased a lot.

How to calculate TF object detection API accuracy over custom dataset?

I am using TF object detection API to detect object on a custom dataset but when it comes to accuracy I have no idea how to calculate it so,
How to calculate the accuracy of the object detection model over a custom dataset? And find the confident score of the model over the test dataset?
I tried to use eval.py but it is not helpful.
Are you talking about training accuracy, validation accuracy or test accuracy? As the names suggest there are 3 different values for accuracy:
Training accuracy: accuracy of the model on the training set
Validation accuracy: accuracy of the model on the validation set
Test accuracy: accuracy of the model on the test set
Training and validation accuracy are outputs of the training, for the test accuracy you need to run the model on the test set.
Did you retrain the model (from a checkpoint, fine tuning...) or did you use the model as you got it? If you have retrained the model you should have training and validation accuracy easily, actually you have those values for each epoch.
If you haven't retrained the model you can only check the test accuracy, given that the test dataset is labelled.
This link helped me to run eval.py and get mAP value for training data.
Just need to run using CUDA like this:
CUDA_VISIBLE_DEVICES="" python3 eval.py --logtostderr --pipeline_config_path=pre-trained-model/ssd_inception_v2_coco.config --checkpoint_dir=training/ --eval_dir=eval/

Training Inception V2 from scratch - diverging

As a learning exercise, I'm training the Inception (v2) model from scratch using the ImageNet dataset from the Kaggle competition. I've heard people say it took them a week or so of training on a GPU to converge this model in this same dataset. I'm currently training it on my MacBook Pro (single CPU), so I'm expecting it to converge in no less than a month or so.
Here's my implementation of the Inception model. Input is 224x224x3 images, with values in range [0, 1].
The learning rate was set to a static 0.01 and I'm using the stochastic gradient descent optimizer.
My question
After 48 hours of training, the training loss seems to indicate that it's learning from the training data, but the validation loss is beginning to get worse. Ordinarily, this would feel like the model is overfitting. Does it look like something might be wrong with my model or dataset, or is this perfectly expected, since I've only trained 5.8 epochs?
My training and validation loss and accuracy after 1.5 epochs.
Training and validation loss and accuracy after 5.8 epochs.
Some input images as seen by the model, as well as the output of one of the early convolution layers.

Drowsy dataset for training the neural network

I am trying to build a tensorflow classifier for which I need a drowsy dataset. Is there any available drowsy dataset which can be used for my training set?

How to run the example code of TensorFlow in distributed mode?

I'm new to TensorFlow and try to run it in distributed mode. Now I have found its official document in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/distributed/index.md . But it lacks something in loss function.
Can anyone help to complete that so that I can run with your code?
It not only lacks of loss function, it lacks of the model to train and thus the loss to minimize.
This file is just a template file that you have to complete in order to train your model in distributed mode.
So, when in the template file you find the comment
# Build model...
It means that you have to define a model to train (eg: a convolutional neural network, a simple perceptron...).
Something like the MNIST model that you can find in the tutorial: https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html
Your model ends with a loss function to minimize.
Following the MNIST example, the loss is:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
loss = cross_entropy
Once you defined the model to train and the loss to minimize, you have filled the template with the missing values and you can now start to train you model in distributed mode.