How to crop the detected object after training using YOLO? - yolo

I am using YOLO for model training. I want to crop the detected object.
For Darknet repository am using is:
For Detection and storing output coordinates in a text file am using this:
!./darknet detector test data_for_colab/ data_for_colab/yolov3-tiny-obj.cfg yolov3-tiny-obj_10000.weights -dont_show -ext_output < TEST.txt > result.txt

Considering in the TEST.txt file you have details as the sample image.
You can use re module of python for text pattern detection, ie your "class_name".
Parsing the .txt file
import re
pattern= "class_name"
for line in lines:
Cord=Cord_Raw.split("(")[1].split(")")[0].split(" ")
Now we will get the coordinates in a list.
Coordinate calculation
x_max=x_min + int(Cord[5])
y_max=y_min+ int(Cord[7])
Cropping from the actual image
import cv2
img = cv2.imread("Image.jpg")
crop_img = img[y_min:y_max, x_min:x_max]


Spectral Python imshow displaying scrambled image

I am learning Spectral Python and using their own documentation and sample image files to display a multispectral image as RGB. However, for some reason, my image appears scrambled up. I have tested the image file by opening it in MultiSpec and it appears as it should, so I do not think the file is damaged. My code is as follows:
import spectral as s
import matplotlib as mpl
path = '/content/92AV3C.lan'
img = s.open_image(path)
#Load and display hyperspectral image
arr = img.load()
view = s.imshow(arr, (29, 19, 9))
#Load and display Ground truth image
gt = s.open_image('92AV3GT.GIS').read_band(0)
view = s.imshow(classes=gt)
Output is as follows:
I suggest that you try the following command instead of view=imshow(img, (RGB))`. SpectralPython has the smarts, once you identify the image type, i.e., *.lan to display the image in the correct format.

How to use FasterRCNN Openimages v4?

I can't seem to find any documentation on how to use this model.
I am trying to use it to print out the objects that appear in a video
any help would be greatly appreciated
I am just starting out so go easy on me
I interpret that your problem is to print out the name of the found objects.
I don't know how you implemented where you got Fast RCNN trained on OpenImages v4. Therefore, I will give you the way with the model from Tensorflow Hub. Google Colab. AI Hub
After some digging around and a LOT of trial and error I came up with this
import tensorflow as tf
import tensorflow_hub as hub
import time,imageio,sys,pickle
# sys.argv[1] is used for taking the video path from the terminal
video = sys.argv[1]
#passing the video file to ImageIO to be read later in form of frames
video = imageio.get_reader(video)
dictionary = {}
#download and extract the model( faster_rcnn/openimages_v4/inception_resnet_v2 or
# openimages_v4/ssd/mobilenet_v2) in the same folder
module_handle = "*Path to the model folder*"
detector = hub.load(module_handle).signatures['default']
#looping over every frame in the video
for index, frames in enumerate(video):
# converting the images ( video frames ) to tf.float32 which is the only acceptable input format
image = tf.image.convert_image_dtype(frames, tf.float32)[tf.newaxis]
# passing the converted image to the model
detector_output = detector(image)
class_names = detector_output["detection_class_entities"]
scores = detector_output["detection_scores"]
# in case there are multiple objects in the frame
for i in range(len(scores)):
if scores[i] > 0.3:
#converting form bytes to string
object = class_names[i].numpy().decode("ascii")
#adding the objects that appear in the frames in a dictionary and their frame numbers
if object not in dictionary:
dictionary[object] = [index]

How to save an image that has been visualized/generated by a Keras model?

I am using detecto model to visualize an image. So basically I am passing an image to this model and it will draw a boundary line accross the object and dislay the visualized image.
from keras.preprocessing.image import load_img
from keras.preprocessing.image import save_img
from keras.preprocessing.image import img_to_array
from detecto import core, utils, visualize
image = utils.read_image('retina_model/4.jpg')
model = core.Model()
labels, boxes, scores = model.predict_top(image)
img=visualize.show_labeled_image(image, boxes,)
Now, I am trying to convert this visualized image into Numpy array. I am using the below line for converting the image into numpy array :
img_array = img_to_array(img)
It is giving the errror :
Unsupported Image Shape
All I want is to display the visualized image which is the output of this model to my website. The plan is to convert the image into numpy array and then save the image by code using the below line :
save_img('image1.jpg', img_array)
So I was planning to download this visualized image (output of this model) so that I can display the downloaded image to my website. If there is some other way to do achieve this then please let me know.
Detecto's documentation says the utils.read_image() is already returning a NumPy array.
But you are passing the return of visualize.show_labeled_image() to Keras' img_to_array(img)
Looking at the Detecto source code of visualize.show_labeled_image(), it has no return type, so it is returning None by default. So I think your problem is you are not passing a valid image to img_to_array(img), but None.
I don't think the call to img_to_array(img) is needed, because you already have the image as a NumPy array. But note that according to Detecto's documentation, utils.read_image() is "Equivalent to using OpenCV’s cv2.imread function and converting from BGR to RGB format" . Make sure that's what you want.
you can visit the official github repo of detecto/visualize.pyto find out the show_labeled_image() function it uses matplotlib to plot the image with bounding boxes you can modify that code in your file to save the plot using plt.save_fig()

Tensorflow: Load unknown TFRecord dataset

I got a TFRecord data file filename = train-00000-of-00001 which contains images of unknown size and maybe other information as well. I know that I can use dataset = to open the dataset.
How can I extract the images from this file to save it as a numpy-array?
I also don't know if there is any other information saved in the TFRecord file such as labels or resolution. How can I get these information? How can I save them as a numpy-array?
I normally only use numpy-arrays and am not familiar with TFRecord data files.
1.) How can I extract the images from this file to save it as a numpy-array?
What you are looking for is this:
record_iterator = tf.python_io.tf_record_iterator(path=filename)
for string_record in record_iterator:
example = tf.train.Example()
# Exit after 1 iteration as this is purely demonstrative.
2.) How can I get these information?
Here is the official documentation. I strongly suggest that you read the documentation because it goes step by step in how to extract the values that you are looking for.
Essentially, you have to convert example to a dictionary. So if I wanted to find out what kind of information is in a tfrecord file, I would do something like this (in context with the code stated in the first question): dict(example.features.feature).keys()
3.) How can I save them as a numpy-array?
I would build upon the for loop mentioned above. So for every loop, it extracts the values that you are interested in and appends them to numpy arrays. If you want, you could create a pandas dataframe from those arrays and save it as a csv file.
You seem to have multiple tfrecord returns a dataset that is used to train models.
So in the event for multiple tfrecords, you would need a double for loop. The outer loop will go through each file. For that particular file, the inner loop will go through all of the tf.examples.
Converting to np.array()
import tensorflow as tf
from PIL import Image
import io
for string_record in record_iterator:
example = tf.train.Example()
# Get the values in a dictionary
example_bytes = dict(example.features.feature)['image_raw'].bytes_list.value[0]
image_array = np.array(
Sources for the code above:
Base code
Converting bytes to PIL.JpegImagePlugin.JpegImageFile
Converting from PIL.JpegImagePlugin.JpegImageFile to np.array
Official Documentation for PIL
import tensorflow as tf
from PIL import Image
import io
import numpy as np
# Load image
cat_in_snow = tf.keras.utils.get_file(path, '')
#------------------------------------------------------Convert to tfrecords
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def image_example(image_string):
feature = {
'image_raw': _bytes_feature(image_string),
return tf.train.Example(features=tf.train.Features(feature=feature))
with tf.python_io.TFRecordWriter('images.tfrecords') as writer:
image_string = open(cat_in_snow, 'rb').read()
tf_example = image_example(image_string)
#------------------------------------------------------Begin Operation
record_iterator = tf.python_io.tf_record_iterator(path to tfrecord file)
for string_record in record_iterator:
example = tf.train.Example()
# OPTION 1: convert bytes to arrays using PIL and IO
example_bytes = dict(example.features.feature)['image_raw'].bytes_list.value[0]
PIL_array = np.array(
# OPTION 2: convert bytes to arrays using Tensorflow
with tf.Session() as sess:
TF_array =, channels=3))
#------------------------------------------------------Compare results
(PIL_array.flatten() != TF_array.flatten()).sum()
PIL_array == TF_array
PIL_img = Image.fromarray(PIL_array, 'RGB')'PIL_IMAGE.jpg')
TF_img = Image.fromarray(TF_array, 'RGB')'TF_IMAGE.jpg')
Remember that tfrecords is just simply a way of storing information for tensorflow models to read in an efficient manner.
I use PIL and IO to essentially convert the bytes to an image. IO takes the bytes and converts them to a file like object that PIL.Image can then read
Yes, there is a pure tensorflow way to do it: tf.image.decode_jpeg
Yes, there is a difference between the two approaches when you compare the two arrays
Which one should you pick? Tensorflow is not the way to go if you are worried about accuracy as stated in Tensorflow's github : "The TensorFlow-chosen default for jpeg decoding is IFAST, sacrificing image quality for speed". Credit for this information belongs to this post

Training a model to be biased to a given background

I am re-training an inception resnet v2 model to recognize a sequence of 3 digit numbers(of a particular font type).The sequence is artificially generated with black lettering on a white background. I am of the opinion that subjecting the model to see only a specific background will help me eliminate false detections(in this case any other 3 digit sequence not on a white background), as the model will not predict(high probabilities) for sequences whose backgrounds are not white.Is it a valid assumption to make?
PS:I have tried using tesseract previously to perform text extraction from an image. I used east text detector for detection, which gave me the bounding boxes for the text. I followed that with OCR using pytesseract, but it always returned an empty string. Furthermore, on rotating of the digits, east-text detector failed to recognize the rotated sequence of digits. I was hence left with no option but to train and perform text detection and extractiion using a Neural Network Model.
code for pytesseract:
import cv2
import numpy as np
import pytesseract
from pytesseract import image_to_string
from PIL import Image
refPt=[(486,302),(540,308),(538,328),(484,323)] #the bbox returned by east
refpt = np.array(refPt,dtype=np.int32)
mask = np.zeros(inp_img.shape, dtype=np.uint8)
channel_count = inp_img.shape[2]
ignore_mask_color = (255,)*channel_count
mask = cv2.fillPoly(mask, np.array(refPt[0:4],np.int32).reshape((-1,1,2))], ignore_mask_color)
masked_image = cv2.bitwise_and(inp_img, mask)
print (image_to_string(Image.fromarray(masked_image),lang='eng'))