YOLOv7 compare pre-trained and finetuned models - yolo

I have done some fine tuning on YOLOv7 (with only 4 classes which are present in 80 classes from coco dataset). So I wanted to compare the performance of the model without being finetuned and after the finetuning.
I could run the 'test.py' command for my fine-tuned mode, but for the pre-trained model I get this error (when running the following command:
python3 test.py --data "data/custom.yaml" --img 640 --batch 32 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --save-txt
):
Class Images Labels P R mAP#.5 mAP#.5:.95: 24%|███ | 4/17 [00:04<00:14, 1.15s/it] Traceback (most recent call last): File "/home/caa-stage/yolov7/test.py", line 314, in <module> test(opt.data, File "/home/yolov7/test.py", line 184, in test confusion_matrix.process_batch(predn, torch.cat((labels[:, 0:1], tbox), 1)) File "/home/yolov7/utils/metrics.py", line 148, in process_batch self.matrix[gc, detection_classes[m1[j]]] += 1 # correct IndexError: index 16 is out of bounds for axis 1 with size 5
Have sombedu tried to compare these 2 results? ( I think it is possible by running detect.py with finetuned and pretrained weights and than establish the metrics, but I couldn't find any code for this).
Thank you in advance!

Related

Can't save YOLOv4 model because of array shape mismatch

I am able to run transfer learning on YOLOv4 and my custom dataset with the following command (which runs successfully and can identify test images I present to the model):
!./darknet detector train /content/darknet/build/darknet/x64/data/obj.data /content/darknet/build/darknet/x64/cfg/yolov4_train.cfg /content/darknet/build/darknet/x64/yolov4.conv.137 -dont_show
I am using the save_model.py tool from this github site:
!git clone https://github.com/hunglc007/tensorflow-yolov4-tflite
When I enter the following command to save the model it fails:
!python3 save_model.py --weights /content/darknet/build/darknet/x64/backup/yolov4_train_final.weights --output ./checkpoints/yolov4-224 --input_size 224
The failure is a mismatch between the weights saved in training and the expected array shape in the core/utility module utils.py (line 63):
Traceback (most recent call last):
File "save_model.py", line 58, in <module>
app.run(main)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "save_model.py", line 54, in main
save_tf()
File "save_model.py", line 49, in save_tf
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
File "/content/tensorflow-yolov4-tflite/core/utils.py", line 65, in load_weights
conv_weights = conv_weights.reshape(conv_shape).transpose([2, 3, 1, 0])
ValueError: cannot reshape array of size 4554552 into shape (1024,512,3,3)
I added a debug print, and it looks like the it's getting all the way to the last layer before choking. In other words, the previous layers all get through this line of code in utils.py with a match between the saved weights and the array shape. I think this is somehow related to the fact I'm using image sizes of 224,224,3 instead of 416,416,3, but I did specify that in the input_size. For completeness, here's the last couple of debug prints before the Traceback above:
layer (out_dim, in_dim, height, width) 107 512 1024 1 1
layer (out_dim, in_dim, height, width) 108 1024 512 3 3
If anyone has any ideas, that would be great!

Camera Digit Prediction stopped working after moving to python 3.7, anyone know why?

When I moved my code from an interpreter based python 3.9 and tensorflow to python 3.7 and tensorflow-directml (so I could use my AMD GPU). The training part worked fine when I copied over the code. But when running the model I get an error suddenly complaining about the sizes of the input arrays to my neural network. The error does not occur with the initial interpreter but does with the second one even though the code is identical.
(The shapes of the digit array are the same for both versions (1, 28, 28) - binary image)
def cam_predict_digits(cam):
dig = np.zeros((1, 28, 28))
dig[0, :, :] = np.array(cam)
digit = np.array(dig)
print("predict input shape: " + str(digit.shape))
# Make prediction
prediction = model.predict(digit)
print(prediction)
print(f'Detected is probably: {np.argmax(prediction)}')
Traceback (most recent call last):
File "C:/Z_Uni/Individual_Project/Python_Projects/NeuralNet_GPU/Conv_NN_GPU_Model.py", line 123, in <module>
cam_predict_digits(Processed_Frame)
File "C:/Z_Uni/Individual_Project/Python_Projects/NeuralNet_GPU/Conv_NN_GPU_Model.py", line 74, in cam_predict_digits
prediction = model.predict(digit)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet_GPU\source\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 908, in predict
use_multiprocessing=use_multiprocessing)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet_GPU\source\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 716, in predict
x, check_steps=True, steps_name='steps', steps=steps)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet_GPU\source\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 2471, in _standardize_user_data
exception_prefix='input')
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet_GPU\source\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py", line 563, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected conv2d_input to have 4 dimensions, but got array with shape (1, 28, 28)
Process finished with exit code 1
Could anyone explain why this is happening and what I can do to fix it? Thanks

OSError: [Errno 95] Operation not supported: '/content/drive/Mask_RCNN' on Google Colab

Hello trying to use saved weights for a Mask RCNN model within colab and keep incurring the error message below. I have tried different ways of accessing the .h5 problem, which was an issue before, and now I have hit a brick wall. I have tried to train different parts of the model, nothing works. Nothing specific is available on google colab with these circumstances.
The following is the cell that throws the issue:
# Training dataset.
dataset_train = linkedinDataset()
dataset_train.load_dataset(dataset_dir, "train")
dataset_train.prepare()
# Validation dataset
dataset_val = linkedinDataset()
dataset_val.load_dataset(dataset_dir, "val")
dataset_val.prepare()
# *** This training schedule is an example. Update to your needs ***
#
#
#
print("Training network heads")
model.train(dataset_train,
dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=5,
layers='heads')```
```Training network heads
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-19-174a93609e58> in <module>()
17 learning_rate=config.LEARNING_RATE,
18 epochs=5,
---> 19 layers='heads')
2 frames
/content/Mask_RCNN/mrcnn/model.py in train(self, train_dataset, val_dataset, learning_rate, epochs,
layers, augmentation, custom_callbacks, no_augmentation_sources)
2334 # Create log_dir if it does not exist
2335 if not os.path.exists(self.log_dir):
-> 2336 os.makedirs(self.log_dir)
2337
2338 # Callbacks
/usr/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
208 if head and tail and not path.exists(head):
209 try:
--> 210 makedirs(head, mode, exist_ok)
211 except FileExistsError:
212 # Defeats race condition when another thread created the path
/usr/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
218 return
219 try:
--> 220 mkdir(name, mode)
221 except OSError:
222 # Cannot rely on checking for EEXIST, since the operating system
OSError: [Errno 95] Operation not supported: '/content/drive/Mask_RCNN'```
You cannot use
'/content/drive/Mask_RCNN'
You should save to either
'/content/Mask_RCNN'
Or, if to use Google Drive,
'/content/drive/MyDrive/Mask_RCNN'

PyTorch Object Detection with GPU on Ubuntu 18.04 - RuntimeError: CUDA out of memory. Tried to allocate xx.xx MiB

I'm attempting to get this PyTorch person detection example:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
running locally with a GPU, either in a Jupyter Notebook or a regular python file. I get the error in the title either way.
I'm using Ubuntu 18.04. Here is a summary of the steps I've performed:
1) Stock Ubuntu 18.04 install on a Lenovo ThinkPad X1 Extreme Gen 2 with a GTX 1650 GPU.
2) Perform a standard CUDA 10.0 / cuDNN 7.4 install. I'd rather not restate all the steps as this post is going to be more than long enough already. This is a standard procedure, pretty much any link found via googling is what I followed.
3) Install torch and torchvision
pip3 install torch torchvision
4) From this link on the PyTorch site:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
I've both saved the linked notebook:
https://colab.research.google.com/github/pytorch/vision/blob/temp-tutorial/tutorials/torchvision_finetuning_instance_segmentation.ipynb
And Also tried the link at the bottom that has the regular Python file:
https://pytorch.org/tutorials/_static/tv-training-code.py
5) Before running either the notebook or the regular Python way, I did the following (found at the top of the above linked notebook):
Install the CoCo API into Python:
cd ~
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
open Makefile in gedit, change the two instances of "python" to "python3", then:
python3 setup.py build_ext --inplace
sudo python3 setup.py install
Get the necessary files the above linked files need to run:
cd ~
git clone https://github.com/pytorch/vision.git
cd vision
git checkout v0.5.0
from ~/vision/references/detection, copy coco_eval.py, coco_utils.py, engine.py, transforms.py, and utils.py to whichever directory the above linked notebook or tv-training-code.py file are being ran from.
6) Download the Penn Fudan Pedestrian dataset from the link on the above page:
https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip
then unzip and put in the same directory as the notebook or tv-training-code.py
In case the above link ever breaks or just for easier reference, here is tv-training-code.py as I have downloaded it at this time:
# Sample code from the TorchVision 0.3 Object Detection Finetuning Tutorial
# http://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
import os
import numpy as np
import torch
from PIL import Image
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
from engine import train_one_epoch, evaluate
import utils
import transforms as T
class PennFudanDataset(object):
def __init__(self, root, transforms):
self.root = root
self.transforms = transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
def __getitem__(self, idx):
# load images ad masks
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
img = Image.open(img_path).convert("RGB")
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance
# with 0 being background
mask = Image.open(mask_path)
mask = np.array(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
# split the color-encoded mask into a set
# of binary masks
masks = mask == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
masks = torch.as_tensor(masks, dtype=torch.uint8)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.imgs)
def get_model_instance_segmentation(num_classes):
# load an instance segmentation model pre-trained pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# now get the number of input features for the mask classifier
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256
# and replace the mask predictor with a new one
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
hidden_layer,
num_classes)
return model
def get_transform(train):
transforms = []
transforms.append(T.ToTensor())
if train:
transforms.append(T.RandomHorizontalFlip(0.5))
return T.Compose(transforms)
def main():
# train on the GPU or on the CPU, if a GPU is not available
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# our dataset has two classes only - background and person
num_classes = 2
# use our dataset and defined transformations
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False))
# split the dataset in train and test set
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-50])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=2, shuffle=True, num_workers=4,
collate_fn=utils.collate_fn)
data_loader_test = torch.utils.data.DataLoader(
dataset_test, batch_size=1, shuffle=False, num_workers=4,
collate_fn=utils.collate_fn)
# get the model using our helper function
model = get_model_instance_segmentation(num_classes)
# move model to the right device
model.to(device)
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005,
momentum=0.9, weight_decay=0.0005)
# and a learning rate scheduler
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.1)
# let's train it for 10 epochs
num_epochs = 10
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the test dataset
evaluate(model, data_loader_test, device=device)
print("That's it!")
if __name__ == "__main__":
main()
Here is an exmaple run of tv-training-code.py
$ python3 tv-training-code.py
Epoch: [0] [ 0/60] eta: 0:01:17 lr: 0.000090 loss: 4.1717 (4.1717) loss_classifier: 0.8903 (0.8903) loss_box_reg: 0.1379 (0.1379) loss_mask: 3.0632 (3.0632) loss_objectness: 0.0700 (0.0700) loss_rpn_box_reg: 0.0104 (0.0104) time: 1.2864 data: 0.1173 max mem: 1865
Traceback (most recent call last):
File "tv-training-code.py", line 165, in <module>
main()
File "tv-training-code.py", line 156, in main
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
File "/xxx/PennFudanExample/engine.py", line 46, in train_one_epoch
losses.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/function.py", line 77, in apply
return self._forward_cls.backward(self, *args)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/function.py", line 189, in wrapper
outputs = fn(ctx, *args)
File "/usr/local/lib/python3.6/dist-packages/torchvision/ops/roi_align.py", line 38, in backward
output_size[0], output_size[1], bs, ch, h, w, sampling_ratio)
RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 3.81 GiB total capacity; 2.36 GiB already allocated; 132.69 MiB free; 310.59 MiB cached) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:267)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fdfb6c9b813 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1ce68 (0x7fdfb6edce68 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1de6e (0x7fdfb6edde6e in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10_cuda.so)
frame #3: at::native::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) + 0x279 (0x7fdf59472789 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so)
[many more frame lines omitted]
Clearly the line:
RuntimeError: CUDA out of memory. Tried to allocate 132.00 MiB (GPU 0; 3.81 GiB total capacity; 2.36 GiB already allocated; 132.69 MiB free; 310.59 MiB cached) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:267)
is the critical error.
If I run an nvidia-smi before a run:
$ nvidia-smi
Tue Dec 24 14:32:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:01:00.0 On | N/A |
| N/A 47C P8 5W / N/A | 296MiB / 3903MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1190 G /usr/lib/xorg/Xorg 142MiB |
| 0 1830 G /usr/bin/gnome-shell 72MiB |
| 0 3711 G ...uest-channel-token=14371934934688572948 78MiB |
+-----------------------------------------------------------------------------+
It seems pretty clear there is plenty of GPU memory available (this GPU is 4GB).
Moreover, I'm confident my CUDA/cuDNN install and GPU hardware are good b/c I train and inference the TensorFlow object detection API on this computer frequently, and as long as I use the allow_growth option I never have GPU related errors.
From Googling on this error it seems to be relatively common. The most common solutions are:
1) Try a smaller batch size (not really applicable in this case since the training and testing batch sizes are 2 and 1 respectively, and I tried with 1 and 1 and still got the same error)
2) Update to the latest version of PyTorch (but I'm already at the latest version).
Some other suggestions involve reworking the training script. I'm very familiar with TensorFlow but I'm new to PyTorch so I'm not sure how to go about that. Also, most of the rework suggestions I can find for this error do not pertain to object detection and therefore I'm not able to relate them to this training script specifically.
Has anybody else gotten this script to run locally with an NVIDIA GPU? Do you suspect a OS/CUDA/PyTorch configuration concern, or is there someway the script can be reworked to prevent this error? Any assistance would be greatly appreciated.
Very strange, after changing both the training and testing batch size to 1, it now does not crash with a GPU error. Very strange since I'm certain I tried this before.
Perhaps it had something to do with changing the batch size to 1 for both training and testing, and then rebooting or somehow refreshing something else? I'm not really sure. Very odd.
Now the evaluate function call is crashing with the error:
object of type <class 'numpy.float64'> cannot be safely interpreted as an integer.
But it seems this is completely unrelated so I'll make a separate post for that.

Near empty frozen graph after using freeze_graph from Tensorflow

I am currently trying to strip the training operations from my GraphDef so that I can run it on Android. However, to do so, I need to first freeze the graph using Tensorflow's freeze_graph.py script.
However, I get the error UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 331: invalid start byte when attempting to run the bash script:
#!/bin/bash
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=/Users/leslie/Downloads/trained_model.pb \
--input_checkpoint=/Users/leslie/Downloads/Y6_1478303913_Leslie \
--output_graph=/tmp/frozen_graph.pb --output_node_names=Y_GroundTruth
Could this be a problem in the way I created my graph and checkpoint? I created the input_graph via tf.train.write_graph(sess.graph_def, location, 'trained_model.pb', as_text=False) and the checkpoint is created via saver.save(sess, chkpointpath). Answers from StackOverflow say that the python script has non-ascii characters and that I should just simply strip them from the python script but I do not think that is such a great idea.
Full traceback:
Traceback (most recent call last):
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow /python/tools/freeze_graph.py", line 135, in <module>
tf.app.run()
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 132, in main
FLAGS.output_graph, FLAGS.clear_devices, FLAGS.initializer_nodes)
File "/Users/leslie/tensorflow-master/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 98, in freeze_graph
text_format.Merge(f.read().decode("utf-8"), input_graph_def)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 331: invalid start byte
I also generated my protobuf file with as_text = True and the error above did not show up. However, I only got the following output.
Converted 0 variables to const ops.
1 ops in the final graph.
Complete contents of "frozen_graph.pb"
6
Y_GroundTruth��Placeholder*�
�dtype��0�*�
�shape��:
Snippet of PB-file generation code:
#Start all code before training code
# Tensor placeholders and variables
...
# Network weights and biases
...
# Network layer definitions
...
# Definition of cost function
...
# Create optimizer
...
# Session operations
...
#END all code before training code
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, model_save_path)
sess.run(tf.initialize_all_variables())
tf.train.write_graph(sess.graph_def, outputlocation, 'trained_model.pb', as_text=False)