How to get train input data on training Tensorflow Object Detection API? - tensorflow

When model of faster_rcnn_resnet101 trains, losses are shown on terminal each step.
I want to know which data is input each step. when loss increases, i don't know why loss increases.
Is there someone knowing how to see input data each step?

You can't check each step result ,but in your object_detection/training directory trained checkpoint prefix is created which updating on after specific no of step.
you can check object detection using current trained model.
Eg:
python3 export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path training/ssd_mobilenet_v1_pets.config \
--trained_checkpoint_prefix training/model.ckpt-25000 \
--output_directory latest_dataset
here model.ckpt-25000 is the no of steps(25000) trained till now.

Related

How do I plot the validation loss in Tensorboard for object detection?

I am training an object detection model using Tensorflow's object detection API, specifically the model_main_tf2.py script. For some reason only the training accuracy is plotted, but not the validation. Can anyone help me in this regard? I would really appreciate it.
Here is the full command I'm using to start the training:
python3 model_main_tf2.py --model_dir /trained_model/ \
-- sample_1_of_n_eval_examples 10 \
--pipeline_config_path pipeline.config \
--alsologtostderr
P.S. There seem to be some answers on Stackoverflow for model_main.py, but not for the tf2 version

tensorflow Evaluation SSD_Mobilenetv2 320x320 fpnlite

I am trying to do the evaluation of a trained SSD_Mobilenetv2 320x320 fpnlite on tensorflow. I ran training and evaluation parallel in two different colabs account. But I am always getting the metrics values as -1(evaluation result) after each checkpoint. The lose is also increasing after evaluating each checkpoint.
Below is the command used to run the evaluation:
!python /model_main_tf2.py \
--pipeline_config_path /pipeline.config \
--model_dir /model_ \
--checkpoint_dir /model_

MobileNet: High Accuracy On Validation and Poor Prediction Results

I am training MobileNet_v1_1.0_224 using TensorFlow. I am using the python scripts present in the TensorFlow-Slim image classification model library for training. My dataset distribution with 4 classes is as follows:
normal_faces: 42070
oncall_faces: 13563 (People faces with mobile in the image when they're on call)
smoking_faces: 5949
yawning_faces: 1630
All images in the dataset are square images and larger than 224x224
I am using train_image_classifier.py to train the model with following arguments,
python train_image_classifier.py \
--train_dir=${TRAIN_DIR} \
--dataset_name=custom \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=mobilenet_v1 \
--batch_size=32\
--max_number_of_steps=25000
After training the model, eval_image_classifier.py shows an accuracy greater than 95% on Validation set but when I exported the frozen graph and used it for predictions, it performs very poorly.
I have also tried this notebook but this also produced similar results.
Log: Training Log
Plots: Loss and Accuracy
What is the reason for this? How do I fix this issue?
I have seen similar issues on SO but nothing related to MobileNets specifically.
Did you use a validation set? If so what was the validation accuracy?
If you used a validation set a good way to check if you are doing predictions properly is to run model.predict on the validation set.

TensorFlow lite: High loss in accuracy after converting model to tflite

I have been trying TFLite to increase detection speed on Android but strangely my .tflite model now almost only detects 1 category.
I have done testing on the .pb model that I got after retraining a mobilenet and the results are good but for some reason, when I convert it to .tflite the detection is way off...
For the retraining I used the retrain.py file from Tensorflow for poets 2
I am using the following commands to retrain, optimize for inference and convert the model to tflite:
python retrain.py \
--image_dir ~/tf_files/tw/ \
--tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v1_100_224/feature_vector/1 \
--output_graph ~/new_training_dir/retrainedGraph.pb \
-–saved_model_dir ~/new_training_dir/model/ \
--how_many_training_steps 500
sudo toco \
--input_file=retrainedGraph.pb \
--output_file=optimized_retrainedGraph.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TENSORFLOW_GRAPHDEF \
--input_shape=1,224,224,3 \
--input_array=Placeholder \
--output_array=final_result \
sudo toco \
--input_file=optimized_retrainedGraph.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--output_file=retrainedGraph.tflite \
--inference_type=FLOAT \
--inference_input_type=FLOAT \
--input_arrays=Placeholder \
--output_array=final_result \
--input_shapes=1,224,224,3
Am I doing anything wrong here? Where could the loss in accuracy come from?
I faced the same issue while I was trying to convert a .pb model into .lite.
In fact, my accuracy would come down from 95 to 30!
Turns out the mistake I was committing was not during the conversion of .pb to .lite or in the command involved to do so. But it was actually while loading the image and pre-processing it before it is passed into the lite model and inferred using
interpreter.invoke()
command.
The below code you see is what I meant by pre-processing:
test_image=cv2.imread(file_name)
test_image=cv2.resize(test_image,(299,299),cv2.INTER_AREA)
test_image = np.expand_dims((test_image)/255, axis=0).astype(np.float32)
interpreter.set_tensor(input_tensor_index, test_image)
interpreter.invoke()
digit = np.argmax(output()[0])
#print(digit)
prediction=result[digit]
As you can see there are two crucial commands/pre-processing done on the image once it is read using "imread()":
i) The image should be resized to the size that is the "input_height" and "input_width" values of the input image/tensor that was used during the training. In my case (inception-v3) this was 299 for both "input_height" and "input_width". (Read the documentation of the model for this value or look for this variable in the file that you used to train or retrain the model)
ii) The next command in the above code is:
test_image = np.expand_dims((test_image)/255, axis=0).astype(np.float32)
I got this from the "formulae"/model code:
test_image = np.expand_dims((test_image-input_mean)/input_std, axis=0).astype(np.float32)
Reading the documentation revealed that for my architecture input_mean = 0 and input_std = 255.
When I did the said changes to my code, I got the accuracy that was expected (90%).
Hope this helps.
Please file an issue on GitHub https://github.com/tensorflow/tensorflow/issues and add the link here.
Also please add more details on what you are retraining the last layer for.

How to get rid of additional ops added in the graph while fine-tuning Tensorflow Inception_V3 model?

I am trying to convert a fine-tuned tensorflow inception_v3 model to uff format which can be run on NVIDIA's Jetson TX2. For conversion to uff, certain ops are supported, some are not. I am able to successfully freeze and convert to uff inception_v3 model with imagenet checkpoint provided by tensorflow. However if I fine-tune the model, additional ops like Floor, RandomUniform, etc are added in the new graph which are not yet supported. These layers remain even after freezing the model. This is happening in the fine-tuning for flowers sample provided on tensorflow site as well.
I want to understand why additional ops are added in the graph, while fine-tuning is just supposed to modify the final layer to match number of outputs required.
If they are added while training, how can I get rid of them? What post-processing steps tensorflow team followed before releasing inception_v3 model for imagenet?
I can share the pbtxt files if needed. For now, model layers details are uploaded at https://github.com/shrutim90/TF_to_UFF_Issue. I am using Tensorflow 1.6 with GPU.
I am following the steps to freeze or fine-tune the model from: https://github.com/tensorflow/models/tree/master/research/slim#Pretrained. As described in the above link, to reproduce the issue, install TF-Slim image models library and follow these steps:
1. python export_inference_graph.py \
--alsologtostderr \
--model_name=inception_v3 \
--output_file=/tmp/inception_v3_inf_graph.pb
2. python freeze_graph.py \
--input_graph=/tmp/inception_v3_inf_graph.pb \
--input_checkpoint=/tmp/checkpoints/inception_v3.ckpt \
--input_binary=true --output_graph=/tmp/frozen_inception_v3.pb \
--output_node_names=InceptionV3/Predictions/Reshape_1
3. DATASET_DIR=/tmp/flowers
TRAIN_DIR=/tmp/flowers-models/inception_v3
CHECKPOINT_PATH=/tmp/my_checkpoints/inception_v3.ckpt
python train_image_classifier.py --train_dir=$TRAIN_DIR --dataset_dir=$DATASET_DIR --dataset_name=flowers --dataset_split_name=train --model_name=inception_v3 --checkpoint_path=${CHECKPOINT_PATH} --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits
4. python freeze_graph.py \
--input_graph=/tmp/graph.pbtxt \
--input_checkpoint=/tmp/checkpoints/model.ckpt-2539 \
--input_binary=false --output_graph=/tmp/frozen_inception_v3_flowers.pb \
--output_node_names=InceptionV3/Predictions/Reshape_1
To check the layers, you can check out .pbtxt file or use NVIDIA's convert-to-uff utility.
Run training script -> export_inference_graph -> freeze_graph . This gets rid of all the extra nodes and model can be easily converted to uff.