let me preface this by saying im still quite new to using torch / ML in general, but I recently setup a yolov7 model and trained it on a custom dataset of car license plate images.
I used 1600 training images, and 200 validation.
Training Command:
python train.py --workers 1 --device 0 --batch-size 8 --epochs 100 --img 640 640 --data data/custom_data.yaml --hyp data/hyp.scratch.custom.yaml --cfg cfg/training/yolov7-custom.yaml --name yolov7-custom --weights yolov7.pt
Prediction Command:
python detect.py --weights yolov7_custom.pt --conf 0.5 --img-size 640 --source 1.jpg --view-img --no-trace
Here are the result graphs from the training incase they provide useful insight:
I've also just noticed a lot of zeros on the training images but with accurate bounding boxes:
As I'm not the most clued up on ML, could anyone point me in the right direction as to how to resolve the issue of the detection just returning the same input image with no bounding box?
Related
My Model is the Yolov4 Darknet model. I used the "-map" function to compute the metrics there, -map function belongs to the darknet.
However, when converting my model to tensorflow lite, I want to recompute these metrics on the tensorflow model in between. So actually my question is how can I find values like f1 score, mean average precision..etc on my yolov4-416 tensorflow model?
It is hard to answer as repo which you use is unknown for us. You can calculate it with formula of mAP, but which iou threshold will be used depends on you. Often it is 0.5, so mAP#0.5 will be calculated as:
Set IoU threshold to 0.5
Calculate AP as: TP/(TP+FP) where TP stand for True Positives and FP stands for False Positives per each class.
Then calculate mAP as average of AP.
If you are using hunglc007, then try this:
Convert to framework which you are will use(in your project, it is tflite):
python save_model.py --weights ./path/to/your/weights --output ./checkpoints/yolov4-416 --input_size 416 --model yolov4 --framework tflite python convert_tflite.py --weights ./checkpoints/yolov4-416 --output ./checkpoints/yolov4-416.tflite --quantize_mode( you can use eiter float16 or int8)
Evaluate your TFLite model:
python evaluate.py --weights ./checkpoints/yolov4-416.tflite --framework tflite --input_size 416 --model yolov4 --annotations /path/to/your/annotations.txt
Calculate mAP:
cd mAP/extra python remove_space.py cd .. python main.py --output results_yolov4_tflite
Then watch in mAP/results_yolov4_tflite/results.txt for your mAP
I am training MobileNet_v1_1.0_224 using TensorFlow. I am using the python scripts present in the TensorFlow-Slim image classification model library for training. My dataset distribution with 4 classes is as follows:
normal_faces: 42070
oncall_faces: 13563 (People faces with mobile in the image when they're on call)
smoking_faces: 5949
yawning_faces: 1630
All images in the dataset are square images and larger than 224x224
I am using train_image_classifier.py to train the model with following arguments,
python train_image_classifier.py \
--train_dir=${TRAIN_DIR} \
--dataset_name=custom \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=mobilenet_v1 \
--batch_size=32\
--max_number_of_steps=25000
After training the model, eval_image_classifier.py shows an accuracy greater than 95% on Validation set but when I exported the frozen graph and used it for predictions, it performs very poorly.
I have also tried this notebook but this also produced similar results.
Log: Training Log
Plots: Loss and Accuracy
What is the reason for this? How do I fix this issue?
I have seen similar issues on SO but nothing related to MobileNets specifically.
Did you use a validation set? If so what was the validation accuracy?
If you used a validation set a good way to check if you are doing predictions properly is to run model.predict on the validation set.
I'm trying to fine-tune an efficient det model. Here is a recap of what I've done:
download coco dataset 2014
convert to tfrecord with a script from tensorflow
download efficientDet D0 from official model zoo
edit pipeline.config (batch_size: 1, sync_replicas: false, replicas_to_aggregate: 1, fine_tune_checkpoint_type: "detection", use_bfloat16: false) and adjust the paths.
clone github.com/tensorflow/models.git, docker-compose run object_detection.
inside the container:
python models/research/object_detection/model_main_tf2.py \
--pipeline_config_path=efficientdet_d0_coco17_tpu-32/pipeline.config \
--model_dir=foo/model/ \
--alsologtostderr
My problem is that as seen in tensorboard (ie after data preprocessing), contrast is maxed out (or sometimes not maxed, but still way too high), and brightness is often too low:
I checked the content of the tfrecords with https://github.com/sulc/tfrecord-viewer, the colors are fine.
I tried on another machine with a different nvidia GPU model, same problem.
Any idea where the problem could come from? Thanks!
this seems to be a visualization issue, and not a training issue. It can be solved by changing the normalization from (-1,1) to (0,1).
follow these changes in the code:
https://github.com/tensorflow/models/pull/9019
I am checking the option to run image segmentation using the pre-trained deeplab xception65_coco_voc_trainval model.
The frozen model size is ~161MB, after I convert it to tflite the size is ~160MB, and running this model on my PC cpu takes ~25 seconds.
Is that "expected" or there is something I can do better?
The conversion to tflite is as follow:
tflite_convert \
--graph_def_file="deeplabv3_pascal_trainval/frozen_inference_graph.pb" \
--output_file="deeplab_xception_pascal.tflite" \
--output_format=TFLITE \
--input_shape=1,513,513,3 \
--input_arrays="sub_7" \
--output_arrays="ArgMax" \
--inference_type=FLOAT \
--allow_custom_ops
Thanks!
According to https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md, xception65_coco_voc_trainval with 3 eval scales takes about 223 seconds. The frozen graph has a single eval scale, so ~25 seconds sounds about right to me.
To speed up inference for TfLite I would suggest using gpu delegate, but as you are running on a PC, you will need to find a smaller model. Maybe try one of the mobilenet based models? The edgetpu models will run in tflite without an edgetpu and should be quite fast, although these are trained on cityscapes.
I used the ssd_mobilenet_v1_coco from detection model zoo in tensorflow object detection. I am currently training the model by running
python legacy/train.py --logtostderr --train_dir=trainingmobile/ --pipeline_config_path=trainingmobile/pipeline.config
I want to run an evaluation job by running eval.py to get other metrics like IOU and PR Curve but I don't know how to do that. I am able to run the command
python legacy/eval.py \
--logtostderr \
--checkpoint_dir= path/to/checkpoint \
--eval_dir= path/to/eval \
--pipeline_config_path= path/to/config
then I ran the command
tensorboard --logdir=path/to/eval
The tensorboard shows only the test image output. How can i get other metrics like IOU and PR Curve?
First of all, I'd highly recommend you to use the newer model_main.py script for training and evaluation combined. You can use it as shown below:
python object_detection/model_main.py \
--pipeline_config_path=path/to/config \
--model_dir=path/to/train_dir \
--num_train_steps=NUM_TRAIN_STEPS \
--num_eval_steps=NUM_EVAL_STEPS \
--alsologtostderr
It combines training and evaluation and you can enter tensorboard with
tensorboard -logdir=path/to/train_dir
Tensorboard will not only disply the training process, it will also show your progress over your validation set. They use the COCO metric as default metric!
To your original problem: Maybe you should change the eval settings in your config file to larger numbers:
eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10}
If you'll use the model_main.py script, the number of evaluation will be set by the flags.
Good to know: The info output of tnesorflow is disabled in the newer model_main.py script. You can enable it by adding
tf.logging.set_verbosity(tf.logging.INFO)
after the import section.